Professional Documents
Culture Documents
2022 - Automatic Machine Learning For Evolving Data Streams
2022 - Automatic Machine Learning For Evolving Data Streams
by
Lotachukwu Ibe
ii
Abstract
Keywords: AutoML, Stream learning, Concept drift, Adaptive Machine Learning, Online
Regression
iv
Dissertation Declaration
I agree that, should the University wish to retain it for reference purposes, a copy of my dissertation
may be held by Bournemouth University normally for a period of 3 academic years. I understand
that once the retention period has expired my dissertation will be destroyed.
Confidentiality
I confirm that this dissertation does not contain information of a commercial or confidential nature
or include personal information other than that which would normally be in the public domain
unless the relevant permissions have been obtained. Any information which identifies a particular
individual's religious or political beliefs, information relating to their health, ethnicity, criminal
history, or sex life has been anonymised unless permission has been granted for its publication
from the person to whom it relates.
Copyright
I agree that this dissertation may be made available as the result of a request for information under
the Freedom of Information Act.
Date: 06/05/2022
This dissertation and the project that it is based on are my own work, except where stated, in
accordance with University regulations.
Date: 06/05/2022
vi
Acknowledgments
I would like to thank my supervisor Rashid Bakirov for his continuous guidance and support through
every phase of this dissertation.
In addition, I am thankful to the academic staff in the Faculty of Science and Technology at
Bournemouth University who have contributed towards my learnings during my MSc. Program.
Finally, I am thankful to my family, who have supported me throughout the MSc. and have always
been there my whole life.
vii
TABLE OF CONTENTS
1 INTRODUCTION _____________________________________________________________________________ 1
1.1 Background ____________________________________________________________________________ 1
1.2 Problem Definition _______________________________________________________________________ 2
1.3 Aims and objectives ______________________________________________________________________ 3
1.4 Original Contributions ____________________________________________________________________ 3
1.5 Organisation of the dissertation ____________________________________________________________ 3
3 METHODOLOGY ____________________________________________________________________________ 12
3.1 Overview _____________________________________________________________________________ 12
3.2 Search Space Design ____________________________________________________________________ 13
3.3 Combined Algorithm Selection and Hyperparameter Optimization ________________________________ 15
3.4 Online AutoML Model ___________________________________________________________________ 16
3.5 Evaluation Protocol _____________________________________________________________________ 20
5 RESULTS __________________________________________________________________________________ 26
5.1 Experiments with real-world data __________________________________________________________ 26
5.2 Experiments with synthetic data ___________________________________________________________ 30
5.3 Pipeline Analysis _______________________________________________________________________ 34
5.4 Experiments on Ensemble Adaptation_______________________________________________________ 35
5.1 Effect of Search Algorithm ________________________________________________________________ 37
5.2 Effect of Online Pre-processing Algorithms ___________________________________________________ 38
6 DISCUSSION _______________________________________________________________________________ 42
7 CRITICAL EVALUATION_______________________________________________________________________ 43
8 CONCLUSION ______________________________________________________________________________ 44
REFERENCES ___________________________________________________________________________________ 46
APPENDIX A____________________________________________________________________________________ 51
viii
APPENDIX B ____________________________________________________________________________________ 52
APPENDIX C ____________________________________________________________________________________ 53
APPENDIX D ___________________________________________________________________________________ 56
ix
LIST OF FIGURES
1 INTRODUCTION
This chapter provides an overview of the MSc dissertation titled “Automatic Machine Learning for
evolving data streams”. The general background and motivation doe the project is given in Section
1.1. In Section 1.2, we clearly define the problem addressed in this work. The aim, research
questions and objectives are stated in Section 1.3, followed by the original contributions of this work
in Section 1.4. A summary of the organisation of this dissertation is provided in Section 1.5.
1.1 Background
Automated machine learning systems have benefited from the growing demand for machine learning
domain expertise in recent years. These systems give users without a technical know-how the ability
to quickly build machine learning solutions quickly and enable domain experts to automate or
optimize their tasks. AutoML systems achieve this by automating the steps involved in data
processing, feature extraction, model selection and hyperparameter tuning. Although they can
achieve state-of-the-art solutions on supervised learning tasks, current AutoML systems are
somewhat constrained in their applications. They assume that the data is static, that is, all the training
data is available at once, and that the data distribution in the training set is the same as the
distribution used for prediction. However, real world data often arrives in batches, and the data
distribution evolves over time. Hence, to successfully integrate AutoML systems to industry
applications, it is desirable to have AutoML systems that can operate in an online learning setting.
A major challenge machine learning algorithms face in an online learning setting is concept
drift. This phenomenon occurs when the target concept changes over time. To mitigate this, online
learning algorithms which can adapt and cope with a changing concept have been developed.
However, in the presence of concept drift, the hyperparameters of these online algorithms may
require retuning.
The proposed OAML system is evaluated on well-known concept drift datasets for regression
and classification and compare against the original OAML frameworks as well as baseline adaptive
learners. Our findings indicate that the Extended OAML system can achieve competitive results on
regression data streams with varying drift complexities while also performing better than the original
OAML system on benchmark data streams for classification tasks. We find that AutoML systems
benefit from online pre-processing algorithms and that ensemble methods can be improved via
weighted ensemble voting and sound model replacement techniques.
2
Originally defined by Thornton, et al., (2013), AutoML aims to automatically construct full machine
learning pipelines to minimize loss on a given metric. For the scenario where data is trained on
𝐷!"#$% = (𝑥& , 𝑦& ), . . . , (𝑥% , 𝑦% ), the goal is to automatically produce the set of predictions 𝑦%'& , … , 𝑦%'(
from a test set 𝐷!)*! = (𝑥%'& , 𝑦%'& ), . . . , (𝑥%'( , 𝑦%'( ) (which possess the same distribution as 𝐷!"#$% )
that minimize the loss function ℒ (. , . ), given a limited resource budget b. Where 𝑖 = 1, . . ., 𝑛 + 𝑚, 𝑛,
𝑚 ∈ ℕ' , 𝑥$ ∈ ℝ+ represents features of d dimensions and 𝑦$ ∈ 𝑌. The loss function is given as:
Eq. 1
(
1
5 ℒ( 𝑦6%', , 𝑦%', )
𝑚
,-&
Furthermore, The AutoML problem can be restricted to a Combined Algorithm Selection and
Hyperparameter Optimization (CASH) problem. CASH, (Thornton et al. 2013), is defined as the
search over a set of machine learning predictors and transformers 𝐴 = {𝐴(&) , … , 𝐴(0) } with their
associated hyperparameters Λ(&) , … , Λ(0) for an optimal combination 𝐴1∗ that maximizes the given
evaluation metric of the system on k sets of D.
0 Eq. 2
1
𝐴1∗ = 5 ℒ (𝐴1(#) , {𝑋!"#$%(%) , 𝑦!"#$%(%) }, {𝑋2#3$+ (%) 𝑦2#3$+ (%) })
𝑘
$-&
For the online learning setting, the AutoML problem can be formulated like the scenario presented
above with the following modifications.
1. The data is assumed to be infinitely long
2. Due to this, data is streamed with a temporal (Celik & Vanschoren, 2021)
3. Data must be processed in the order in which they arrive
4. Memory usage is restricted, hence all the data cannot be stored
5. Evaluation should be done in a prequential manner (more in Section 3.5)
Hence the objective for the online AutoML problem becomes the pipeline 𝐴1∗ ! at time steps t. 𝐴1∗ ! is
makes predictions and is evaluated on the { 𝑋! , 𝑦! } and is then trained on the last seen batch { 𝑋!4& ,
𝑦!4& }
0 Eq. 3
1
𝐴1∗ ! = 5 ℒ (𝐴1,! (#) , {𝑋!"#$%4&(%) , 𝑦!"#$%4&(%) }, {𝑋!"#$%(%) 𝑦!"#$%(%) })
𝑘
$-&
The distribution of the target variable may also change in an online learning setting for
supervised learning tasks; thus, it is important to detect this and adapt the learner to this new change.
3
This aim of this thesis is to extend the capabilities of the OAML framework with automated online
pre-processing algorithms, multiple adaptive mechanisms for online adaptation and an end-to-end
automatic online pipeline for regression tasks. This work is aimed at providing relevant results to the
following research questions:
• Research Question 1: How can online AutoML techniques be designed to address
regression problems?
• Research Question 2: How does the adaptation strategy for ensemble methods on online
AutoML influence the predictive performance?
An overview of the problem addressed, aims and objectives of this dissertation have been discussed
in this Chapter. 2 focuses on the related work and explores the existing literature on the research
area. The methodologies and procedures undertaken to complete achieve the aims of this projects
are discussed in Chapter 3. The experimental setup and design are presented in Chapter 4. Chapter
5 presents and analyses the findings from experiments. Chapter 6 provides a summary of the results.
4
A critical evaluation of this work is done in Chapter 7. The conclusion and future recommendations
are provided in Chapters 8 and 9 respectively.
5
2 LITERATURE REVIEW
This chapter provides a background for this thesis. We start with a review of relevant AutoML
approaches and frameworks, followed by an analysis of literature on online learning and adaptation
mechanisms. We then discuss existing approaches for AutoML on streaming data that are integral
to the new system proposed in this dissertation.
2.1 AutoML
To solve the AutoML problem (as a CASH problem) on a given dataset, a machine learning solution
is automatically constructed from a search space of potential combinations of prediction algorithms
and associated hyperparameters. Early works addressed this problem from a Bayesian Optimization
(BO) perspective (Feurer, et al., 2015). In this approach a probabilistic surrogate model is selected
for modelling the objective function. The search is guided by measuring the output gotten from the
evaluating the objective function at different points. Although BO has seen success in offline
applications, for example, Auto-Weka (Thornton, et al., 2013), another stream of research in this
area focuses on Meta Learning (Muñoz, et al., 2017), in which a series of meta-features that capture
the properties of the data are extracted and used to infer model performance based on past
experience with similar data (without any model-training). Reinforcement learning (RL) is a relatively
new field in a machine learning, but it has also seen applications in AutoML (Zoph & Le, 2017). In
RL approaches to the problem, hyperparameter optimization is formulated as a policy to learn and
solved using RL techniques (Baker, et al., 2016). Grid Search is another technique for finding the
optimal configuration for a given search space. By randomly creating a grid of possible configurations
and searching through them in a distributed way, the H20.ai framework takes advantage of
parallelization to speed up grid search for AutoML problems (H2O.ai, 2017).
Another effective way of addressing the AutoML problem is to learn a distribution over
hyperparameters and continuously update it to improve the search, this method is achieved with
Evolutionary Algorithms (EA). Inspired by the process of natural selection, these set of algorithms
make use of phenomena like mutation and cross-over to iteratively improve solutions to a problem.
Examples of this approach can be seen in the python AutoML frameworks; Tree-based Pipeline
Optimization Tool (TPOT) (Olson, et al., 2016) and Genetic Automated Machine Learning (GAMA),
(Gijsbers & Vanschoren, 2020). Although TPOT and GAMA, make use of EA (particularly genetic
algorithms) to optimize machine learning pipelines, GAMA’s implementation uses asynchronous
evolution, which has the potential to speed up the search process. The original OAML framework
(Celik, et al., 2022) makes use of GAMA for pipeline optimization because of its effectiveness in
online adaptation (Celik & Vanschoren, 2021). Thus, this work which extends OAML also makes use
of GAMA. A summary of open-source AutoML frameworks are provided in Table 1.
6
Online learning generally refers to machine learning techniques applied to a data stream. In this
setting, data is received on an instance incremental or batch incremental basis. Unlike the offline
setting, stream learning algorithms have memory restrictions, hence the whole training data is not
stored. Rather, this class of algorithms depend on adaptation to manage changing concepts across
the stream.
Many prediction problems need a model that continuously receives data and thus cannot realistically
work in an offline setting with historical or static data. This is the case in big data applications, for
example, credit scoring where the goodness of credit is revealed after some time (Dal Pozzolo, et
al., 2015), or stock market prediction where the prices of stocks are revealed after prediction has
been made (Hazan & Seshadhri, 2009). In these environments, the data distribution is subject to
change over time, leading to a phenomenon known as concept drift (Schlimmer & Granger, 1986);
(Gama, et al., 2014). Concept drift could occur when the independent probability 𝜌(𝑥) changes
(virtual concept drift) or the conditional probability distribution 𝜌(𝑦|𝑥) changes (real concept drift).
According to literature (Tsymbal, et al., 2008); (Žliobaitė, 2010), these two kinds of concept drift can
be treated as the same since a model update is required once a drift is detected. Concept drift can
also be grouped as follows (Gama, et al., 2014):
• Sudden/Abrupt drift: Sudden change of concept at a certain time 𝜏
• Incremental drift: Slow change involving intermediate changes before the start of the drift to
the final concept after the drift is finished
• Gradual drift: Gradual replacement of a concept with another over a period 𝜏 6 − 𝜏&
• Reoccurring concept: Concepts which have been replaced, reappear at a later stage in the
stream. As can be seen in seasonal processes
7
When a concept drift occurs, the predictions made by models become less accurate with time and
hence the need to detect the presence of this drift and adapt the model to it. A simple way of
detecting drift is the Page Hinkley test (Page, 1954), which computes the mean of an input variable
(observed data or prediction accuracy) up to the current moment. As soon as the variable differs
significantly from its historical average, a change is flagged. More recent approaches to detecting
drift, adopt a sliding window approach for which relevant statistics are computed for that window.
Adaptive Windowing (ADWIN) (Bifet & Gavaldà, 2007)analyses the average of some statistic relies
on two detection windows and as soon as the two windows are distinct (for the observed statistic) a
flag is triggered. Drift Detection Method (DDM) (Gama, et al., 2004) is a sliding window method that
works on the premise that a learner’s error rate will decrease as the number of analysed samples
increase for a constant distribution. A warning zone or flag is triggered as soon as the algorithm
detects an increase in the error rate. Early Drift Detection Method (EDDM) (Baena-Garcia, et al.,
2006) aims to improve upon the detection rate of concept drift in DDM by keeping track of the
average distance between two errors instead of only the error rate.
It is essential for models to adapt when a drift is detected. In this work, we make use of the definition
of adaptation by Bakirov (2017), as the process of updating a model’s training data coverage,
structure, and parameters to improve its predictive accuracy in response to changes in the data
stream. Algorithms may adapt by changing their structure e.g., Concept-Adapting Very Fast Decision
Tree (Hulten, et al., 2001), data coverage e.g., K-Nearest Neighbours or changing model
parameters.
In this work we make use of the parameter adaptation, model hyper-parameters are updated
in our Ensemble techniques (more in Section 3.4.1)
8
2.2.4 Ensemble methods of Adaptation
Originally proposed in 1969 for forecasting on stationary airline passenger data (Bates & Granger,
1969), ensemble methods have become a mainstream approach for prediction in offline and online
settings. An ensemble (in the context of machine learning) is a combination of multiple models for
prediction (Ruta & Gabrys, 2000). Research shows that these models can have a higher prediction
accuracy (Freund & Schapire, 1997); (Dietterich, 2000) and better generalisability (Brieman, 1996)
when compared with single “experts” on stationary data. Recent studies have also shown that the
same is true in a streaming setting (Kadlec & Gabrys, 2011); (Shao & Tian, 2015).
This method of adaptation involves changing the combination weights of experts in an ensemble. As
seen in (Kuncheva, 2004), combination can be done by using the global weights of experts (known
as fusion) or by selecting a single learner (known as selection). The special case where all experts
have a weight of 0 except one can also be termed selection. Mathematically, for a set of 𝐼 experts
𝑆 = {𝑠$ , …, 𝑠7 } with weight vectors 𝝎 = {𝜔& , …, 𝜔7 } and predictions 𝒚
G = {𝑦6$ , … , 𝑦67 } ∀ 𝑖 = 1 … 𝐼. For a
classification problem with labels 𝐶 = {𝑐& , …,𝑐8 } ∀ 𝑗 = 1 … 𝐽, the aggregated weight of experts that
predicted 𝑐, is 𝑧, = ∑7$-& 𝑤$ 𝑎$,, and final prediction is given as:
Eq. 4
𝑦6 = 𝑎𝑟𝑔𝑚𝑎𝑥S𝑧, T
𝑐,
For regression, the ensemble prediction is the weighted sum of experts, given as:
Eq. 5
∑7$-& 𝜔$ 𝑦6$
𝑦6 =
∑7$-& 𝜔$
Ensembles can also be adapted by changing their structure. In this method of adaptation, a base
learner is added or removed. Addition of experts to the ensemble can be done to bring new learning
concepts to the model or dispose of old data (Bakirov, 2017). This method of adaptation is suitable
when other methods of adaptation (e.g., weight combination) do not produce satisfactory results.
9
Addition or removal of experts can be done on a regular basis (e.g., according to a schedule) or
according to a defined trigger mechanism (e.g., when a condition is satisfied). When the latter occurs,
it typically signifies that there is a possible ongoing change in the data. New experts may be
introduced after every misclassification and reducing the weights of underperforming experts, (Kotler
& Maloof, 2005) showed that this method of adaptation can yield good results. However, to address
the problem of increased complexity of the ensemble that could arise from this approach, (Kotler &
Maloof, 2007) proposed adding an expert at every Ω!9 misclassified instance. (Kotler & Maloof, 2005)
propose adding a new expert for the regression case if |𝑦6! − 𝑦! | > 𝜁, where 𝑦6! , 𝑦! are predicted
value and the real values respectively at time 𝑡, and 𝜁 is the predefined threshold for triggering the
addition of an expert. Slight variations of these methods of expert addition are used in this
dissertation. Other (trigger) methods of adding experts to ensembles are discussed in (Bakirov &
Gabrys, 2013); (Bakirov, 2017). Expert addition according to a defined schedule or fixed interval,
regardless of the performance of the model is also another popular strategy (Scholz & Kilkenberg,
2007); (Elwell & Polikar, 2011); (Gomes & Araújo, 2015a).
Experts are often removed from an ensemble due to changes in the data, unsatisfactory
performance (Gomes & Araújo, 2015b) , their age (Hazan & Seshadhri, 2009), or when an old expert
is substituted for a new expert due to poor performance (Street & Kim, 2001). The different strategies
for adapting ensembles are illustrated in Figure 2.
The application of AutoML in online settings has gained attention amongst the machine learning
research community. This was evidenced in Neural Information Processing Systems (NIPS) 2018
AutoML Challenge (Codalab, 2018) formulated as a Lifelong AutoML problem. (Madrid, et al., 2018)
propose a solution to the challenge by extending the Autosklearn library (Feurer, et al., 2015) with a
Fast-Hoeffding Drift Detection Method (FHDDM). In the proposed solution, the model is either
improved or replaced when a drift is detected. The best results from this approach are obtained by
training a fresh model on the entire dataset, which may not be possible in a big data application (due
to memory limitation) and does not satisfy the problem defined by Section 1.2 Another prominent
solution to the AutoML challenge is an adaptive self-optimized end-to-end machine learning pipeline
proposed by (Wilson, et al., 2020). Their solution relies on boosted Decision Trees, with automated
hyperparameter tuning. Unlike the previous solution, it lacks an explicit algorithm for drift detection,
it instead depends on the implicit drift detection of LightGBT library (Ke, et al., 2017) to address
change. Although, this solution produced very good results, it is somewhat limiting in the context of
the CASH or AutoML problem and lacks flexibility for other use cases.
Beyond the NIPS 2018 AutoML challenge, (Bakirov, et al., 2018); (Bakirov, et al., 2021)
propose the automation of the selection of an adaptation strategy for a given stream learning
algorithm whenever a drift occurs. These methods of automated adaptation are however only
applicable to a single model. (Wu, et al., 2021) propose the ChaCha (Champion-Challenger)
algorithm built on FLAML (Wang, et al., 2019) for automatically finding hyperparameters in an online
setting by considering only one base learner at a time. Hence it does not fully address the CASH
problem defined in Section 1.2.
(Celik & Vanschoren, 2021) investigate the performance of several AutoML methods under
concept drift and propose several strategies for their adaptation when a drift occurs. This is done by
integrating multiple adaptation strategies with well-known AutoML techniques. This work led to the
development of the OAML framework (Celik, et al., 2022), an adaptive AutoML framework for online
learning. This technique addresses the AutoML / CASH problem (with memory constraints) by
automatically searching for optimal pipelines (inclusive pre-processing steps) that use online
learning algorithms to adapt to concept drift. Furthermore, it is fitted with an explicit drift detection
algorithm (EDDM) that triggers an adaptation (redesign or retuning of pipelines) when certain drift
conditions are met. While the results corroborate that AutoML in an online setting can recover from
concept drifts and yield competitive results, the framework is limited to only classification tasks,
makes use of offline pre-processors (which are not applicable in an online setting) and does not
explore different methods of ensemble adaptation. The extended OAML system proposed in this
dissertation explores multiple ensemble adaptation mechanisms, includes online pre-processors and
is (at the time of writing) the first to propose an AutoML system capable of creating full pipelines for
both regression and classification tasks.
11
A key part of any machine learning system is its evaluation methodology. Evaluation of a learning
system serves two purposes: To assess the hypothesis inside of the learning system and to estimate
how applicable the learning system is to a given problem. In an online setting however, the data
stream is unbounded, and classic methods for evaluating machine learning systems such as cross-
validation and train-test split are not applicable (Gama, et al., 2009). To address this problem, two
common methods of evaluating an online learning model presented in literature are described below:
1. Periodic Holdout: This technique involves holding out an independent test set at scheduled
intervals for testing the online model. The test data (of a predefined size) is held out from the
data stream and renewed after a given time frame or data instances. Hence, the data used
in the test set is never used for training.
2. Prequential Selection (Dawid, 1984): In this method, the online model makes a prediction
on each data sample (testing) before training. Hence, each data sample has two functions:
testing the online model and then training the online model. Before training, predictions are
made based on the attribute-values of data instances. Afterwards, the prequential-error is
computed according to a loss function between the observed value and predicted value and
its metrics are updated. Unlike the periodic holdout technique, all data samples are used for
training, and no additional memory is allocated for a holdout set.
3 METHODOLOGY
The nature of the problem addressed in this project can be divided into two main groups; algorithm
design (to solve the CASH problem) and software development (for the implementation of the online
AutoML framework). Thus, the methodology used in this dissertation differs from a traditional data
science project where already existing algorithms are chosen to address a specific business problem
on a given dataset. This dissertation is instead focused on the development of a novel framework
for addressing the online AutoML problem capable of handling concept drift for all supervised
learning tasks. The proposed solution is built on the OAML framework (Celik, et al., 2022), and
includes techniques derived from extensive literature survey. The justification for adopting OAML
framework is that it is the only available framework (at current time of writing this dissertation) that
implements an automated system for online learning that fully addresses the CASH problem defined
in Section 1.2 (although limited to classification tasks alone). Furthermore, it is an Open-Source
framework and provides flexibility for modification and extension, hence making it suitable for
achieving the goals of this dissertation.
The complexity, originality and relevance of this dissertation is evidenced by the growing
research interest in the fields of AutoML (Chen, et al., 2022) and stream learning (Bakirov, et al.,
2021). Whist, the use of AutoML in an online setting is explored in recent literature for classification
tasks (Celik, et al., 2022); (Wu, et al., 2021); (Wilson, et al., 2020), the application of online AutoML
for regression problems remains unexplored. Hence, one of the key contributions of this project is to
fill the knowledge gap and contribute a fully Open-Source framework for online AutoML for all
supervised learning tasks.
3.1 Overview
The OAML framework (Celik, et al., 2022) is chosen as a suitable framework (justification in Section
3Error! Reference source not found.) and extended with a regression search space (Section 3.2),
sound ensemble adaptive mechanisms, online pre-processing techniques and techniques for
handling online regression tasks. The goal of the search phase is to construct an end-to-end machine
learning pipeline comprising of one or more estimators including pre-processing and or prediction
steps with their associated hyperparameters. This combined algorithm selection and
hyperparameter optimization step is achieved via the use of select optimization algorithms described
in Section 3.2. The single-best found pipeline is assigned the online model for prediction. Data is
streamed in an incremental one-by-one fashion and when a drift is detected, the online model adapts
to this change using a pre-defined strategy: by automatically starting a new search for better
pipelines. We implement both the scheduled and trigger-based adaptation strategy for handling
concept drift in the data stream (Section 3.4)
13
The new system is evaluated on industry standard datasets for stream learning and compared with
existing baselines (Section 5). The obtained results are critically evaluated against the research
questions defined in Section 1.3.
The search space for the proposed OAML system comprises a large set of online pre-processors,
online regressors, online classifiers and online ensemble methods. These algorithms have all been
in River (Montiel, et al., 2020), an online machine learning library written in Python. River
implementations of these algorithms have been chosen because it supports incremental learning
(suitable for streaming data) and includes inherent adaptive methods (for handling drift). For
classification tasks, we maintain the same classifier algorithms included in the search space of the
original OAML framework (Celik, et al., 2022), since the results show very good results in handling
concept drift.
The regression algorithms included in the search space comprise of simple regressors and
adaptive tree/ensemble methods that are capable of handling concept drift. The selection of
algorithms includes Adaptive Random Forests Regressor (Gomes, et al., 2018), Hoeffding Adaptive
Tree Regressor (Bifet & Ricard, 2009), Oza Bagging Regressor (Oza & Russel, 2005), Exponentially
Weighted Average Regressor (Kivenen & Warmuth, 1997). The experts used in ensemble methods
are online versions of Linear Regression and Hoeffding Tree Regressor. Furthermore, we have also
included a simple version of Linear Regressor as an independent learner in the search space since
literature shows good results can be gotten by switching to simple methods when a drift can occurs
(Baier, et al., 2020).
Although the search space for the original OAML framework includes online pre-processors
for data normalization and scaling such as Adaptive Standard Scaler, Robust Scaler, amongst others
from the river machine learning library. These pre-processors were however not implemented into
the AutoML system (hence are absent from the final AutoML Pipelines). In this work, we maintain
the same pre-processors for data scaling and normalization and implement them into the AutoML
system, making them available for selection by the AutoML Pipeline. In addition to this, we extend
the search phase with online versions of pre-processing algorithms for missing value imputation and
categorical variable encoding. The selected algorithms include Previous Imputer and One Hot
Encoder.
Full descriptions and details of all algorithms included in the search space for pre-processing,
regression and classification are available in river (Montiel, et al., 2020). For classification algorithms
Model Hyperparameters Search range
The choice of an optimization algorithm for solving the CASH problem is a key part of any AutoML
system. While the GAMA library allows for the implementation of custom search algorithms, this work
makes use of the default search algorithms provided by the library due to their reported speed and
effectiveness (Gijsbers & Vanschoren, 2020).
The following search algorithms (implemented in GAMA) are available for solving the CASH problem.
Note, for the classification task the objective function is framed as a maximization problem (e.g.,
maximization of accuracy), while regression is framed as a minimization problem (e.g., minimization
of root mean squared error)
Random Search (RS): In this search method, a machine learning pipeline is sampled at random
from the search space and evaluated. This naive optimization technique is effective because it does
not make any assumption about the structure of the objective function and allows for non-intuitive
combination of hyperparameters to be discovered.
An overview of the proposed OAML systems is shown in Figure 1. The system is divided into two
stages: the AutoML stage (left in in Figure 1) and the online learning stage (right in in Figure 1). The
system is initialised with an initial batch of 𝑏# samples from a stream of data 𝑿, 𝒚, and a supervised
learning task (classification or regression). For the assigned learning task, the search algorithm 𝑆
trains and evaluates machine learning pipeline configurations from the search space with a metric
𝑀: over 𝑏# within a time budget 𝑡(#; . It is worth noting that different search spaces are used for
regression and classification tasks (as shown in Figure 4).
The best machine learning pipeline, 𝑃<∗ , found at time 𝑡(#; is trained, fitted to 𝑏# and set as
the current online model 𝐴< . In the online learning stage, the data is streamed on an incremental
one-by-one basis, that is, 𝑃<∗ makes a prediction 𝑦6! for data features 𝑋! . After prediction the true
value of the target, 𝑦! , becomes available, the online model is evaluated and metric 𝑀: is updated.
To check for concept drift, the explicit drift detector, 𝐷>? , is updated with values (𝑦6! , 𝑦! ).
When a drift is detected, the system activates the AutoML phase to search for new pipelines
with the last seen batch of 𝑏# samples from the data stream. A new AutoML pipeline search may
also be triggered by a scheduled model update scheme if the pipeline does not change after 𝑘#
iterations.
In this adaptation strategy, the online model is replaced with a new model when a trigger point is
reached at time 𝑡. At the start of stream and when (𝑦6! , 𝑦! ) does not trigger the drift detector, the best-
found pipeline 𝑃<∗ from the search phase is incrementally trained on (𝑋! , 𝑦! ). When a trigger point is
reached the AutoML phase is restarted to search for a new pipeline 𝑃!∗ with the last sliding window
((𝑋!4%& , 𝑦!4%& ), (𝑋! , 𝑦! ). If the performance of 𝑃!∗ is better than the current pipeline 𝑃<∗ on the evaluation
metric, then 𝑃<∗ is discarded and the online model is set to 𝑃!∗ , else 𝑃<∗ remains as the active online
model and the stream learning phase resumes. This global replacement strategy is suitable for data
with frequent and sudden concept drift. Furthermore, this strategy has low memory requirements,
since only a controlled volume of data is stored at a given time. Hence, it is suitable for applications
with limited memory resources. A drawback to this approach, however, is that old models which may
carry useful information for future data instances are completely discarded. The pseudocode for this
step is provided in Algorithm 1.
The Ensemble strategy follows a similar procedure to the Basic strategy except for its adaptation.
Instead of a global replacement scheme for pipelines, the set n pipelines (referred to as experts) with
the highest prediction accuracy is created 𝐸 = {𝑃$∗ }. The extended OAML systems introduces three
modes of adaptation for the Ensemble strategy.
1. Unweighted Ensemble: In this mode of adaptation, each expert in 𝐸 is assigned a weight
equal to 1. Thus, expert votes are weighted equally.
2. Dynamic Weighted Majority (for classification tasks only): Each expert in the ensemble is
initialised with a weight of 1. When misclassifications are made by experts, their weights are
adjusted by a constant 𝛽, where 0 < 𝛽 < 1. All expert weights are normalised after each
prediction to minimize the prevalence of freshly introduced experts to the ensemble. Expert
predictions are aggregated according to the weight of each expert in the ensemble
3. Additive Expert (AddExp)
a. AddExp for Discrete Classes (AddExp.D): Experts are initialised with a weight of 1
(when the ensemble is empty) or the sum of expert weights in the ensemble times a
constant 𝛾. After prediction is made, experts that predicted incorrectly have their
weights reduced by a multiplicative constant 𝛽 and all expert weights are normalised.
b. AddExp for Continuous Classes (AddExp.C): The overall operation here is similar to
AddExp.D. However, it is suitable for only regression tasks. AddExp.C predicts the
sum of all expert predictions, weighted by their relative expert weight. Weights are
updated according to 𝜔!'&,$ = 𝜔!,$ 𝛽JK',% 4 L' J
For all three modes of adaptation, when the OAML system is triggered at time t (due to a drift or
schedule), a new pipeline search begins and the best-found pipeline 𝑃!∗ is compared with the backup
ensemble, 𝐸! . If the predictive performance of 𝐸! is better than 𝑃!∗ over the last sliding, the current
∗
online model 𝑃!4& is replaced with 𝐸! . Furthermore, instead of discarding the 𝑃!∗ , it is added to the
ensemble.
19
When the ensemble size limit n is reached, the new model is added to the ensemble by
replacing the least performing model. It is important to note that the original OAML system replaces
the oldest model first, however we adopt this approach with the rationale that the oldest expert may
have a better predictive performance than some newly added experts (e.g., when a bad expert is
added to the ensemble). The Ensemble strategy addresses the shortcomings of the Basic strategy
and helps to retain models that may have a good predictive performance on future data instances.
The pseudocode of the ensemble mode is shown in Algorithm 2.
A well-designed Online AutoML system should improve with experience from observed data and be
able to generate compact and approximate representations of its observed values. In this work, we
make use of the Prequential Evaluation technique (Section 2.4), to evaluate the effectiveness of our
proposed system. The interleaved test-then-train method is chosen over Holdout technique since it
makes use of all the data available in the stream, therefore avoiding test selection bias. Furthermore,
it does not depend on the test set selection and provides good error estimates in the presence of
concept drift (Gama, et al., 2009). Mathematically, this is calculated as:
$ $ Eq. 6
1 1
𝑃) (𝑖) = 5 ℒ(𝑦% , 𝑦6% ) = 5 𝑒%
𝑖 𝑖
%-& %-&
In our OAML system, each observation in the data stream consists of attributes-target pairs (𝑿, 𝑦)
and the following steps are performed in the prequential evaluation process:
1. Make a prediction, G𝑦% , for a single observation of 𝑿
2. Compute the prediction loss given the observed target 𝑦%
3. Update the model with (𝑦% , G𝑦% )
4. Proceed to the next observation
Our OAML systems allows users for flexibility in choice of a loss metric. Loss in Regression tasks
can be computed using the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE),
while the loss in classification tasks can be computed using the F1, Accuracy, Precision and Recall
scores.
The proposed system is evaluated on 12 well-known benchmark data streams for regression and
classification tasks in an online setting in literature. For each prediction task (classification and
regression), we make use of 3 real world data streams from industrial processes which are volatile
and are susceptible to high and abrupt drift and 3 artificial data streams. A description of each data
stream is given in Table 3.
21
SEA Mixed (Street & contains data generated from a Streaming Ensemble Algorithm
Kim, 2001) (SEA). Abrupt concept drift is added like in SEA Abrupt data.
data generated from a Rotating Hyperplane Algorithm (ROA).
By determining the orientation of a point on a rotate hyperplane,
Hyperplane (Hulten, et
a binary classification problem is formed. Change of concept is
al., 2001)
introduced by changing the weights and reversing the direction
of the hyperplane.
a real-world condition-based simulation of highly volatile catalyst
Catalyst activation activation in a multi-tube reactor. Flows, concentrations, and
(Strackeljan, 2006) temperature gotten from 14 sensor measurements determine
catalyst activity. The duration of data spans 1 year.
Sulfur Recovery data from a sulfur recovery unit. Gas and air flow measurements
(Fortuna, et al., 2003) determine the 𝑆𝑂6 output in the recovery unit.
collected from a debutaniser column, temperature, pressure,
Debutaniser Column
and flows determine the concentration of the gas at the output
(Fortuna, et al., 2005)
of the system.
Bank (Akujuobi & simulation of how customers choose their banks. Target is the
Zhang, 2017) fraction of customers who leave the bank due to full queues.
4 EXPERIMENT DESIGN
In this chapter, we setup several experiments to evaluate our OAML system on data streams with
various kinds of drift, analyse the results and contrast with the original OAML system (Celik, et al.,
2022). This is done with reproducibility in mind to aid future research work on online automatic
machine learning. The code, data streams and results are open source and publicly available on
GitHub.
For the empirical evaluation of our OAML system we used benchmark data from concept drift data
streams. Concept drift data streams are good for the evaluation of online learning systems because
they comprise a dynamic data distribution which is typical in an online setting. Table 4 shows a
summary of the characteristics of each data stream.
Similar to the original OAML system (Celik, et al., 2022), our system can be configured to be run with
different user-defined settings. These settings may vary with the specific application of the system
and their default values used for experimentation is provided below.
1. Initial batch size (𝑏# ) is to 5000 samples for classification tasks and 500 for regression tasks.
Classification data streams have significantly more data samples (Table 4) than regression
data streams, hence the difference in 𝑏# .
2. Sliding window size (𝑏* ) is set to 2000 samples for classification tasks and 200 for
regression tasks.
3. AutoML Search Budget (𝑡(#; ) is set to 60 seconds for all prediction tasks. This value has
been chosen for the experimental setup only and should be increased for better performance
in a real-world setting
4. Performance Metric (𝑀N ) is prequential accuracy for classification tasks and prequential
RMSE for regression tasks
5. Online Metrics (𝑀< ) is prequential accuracy for classification tasks and prequential RMSE
for regression tasks
6. Search Algorithm (𝑂𝐴𝑀𝐿O)#":9 ) can be configured to use ASHA, RS or AEO. While
experiments are conducted with AEO, the effect of the choice of a search algorithm on our
OAML system is also compared
7. Ensemble Adaptative Mechanism can also be configured to use DWM, AddExp or
Unweighted Voting. While experiments are conducted with DWM, the effect of the choice of
a search algorithm on our OAML system is also compared.
8. Drift Detection is set to EDDM for classification tasks due to its ability to detect high and
abrupt drift and ADWIN for regression tasks since EDDM cannot be applied to a continuous
target
9. Alternative Detector (𝑎𝑙𝑡!"#$% ) is set to 50000 samples for classification tasks and 5000
samples for regression tasks
4.3 Baselines
The performance of our OAML system is compared with competitive alternative techniques in online
learning and AutoML. For online learning baselines, we include Leverage Bagging algorithms (Bifet
& Gavaldà, 2007); (Bifet & Ricard, 2009) which have been shown to outperform other stream learning
algorithms in literature and Hoeffding Adaptive Trees (Hulten, et al., 2001) as a competitive non-
ensemble algorithm. ChaCha (Wu, et al., 2021) and the original OAML (Celik, et al., 2022) are (at
the time of writing) the only freely available AutoML algorithms capable handling concept drift,
25
although they are limited to classification tasks only. An overview of the baselines is presented in
Table 5.
5 RESULTS
The results of the designed experiments (Section 4) are discussed in this Chapter. First, we compare
the different adaptation strategies against each other, baseline techniques discussed in the previous
section and the original OAML framework. These experiments are performed on real-word and
artificial data streams for regression and classification tasks. Next, the pipelines (and their individual
components) generated by our OAML system are analysed. In addition to this, we explore the
Ensemble Adaptation strategy: effect of the ensemble size, which adaptation mechanism performs
best and the best performing model replacement strategy. The effect of search algorithms on the
predictive performance of our OAML system is also comparatively analysed. Finally, we investigate
the effect of Online Pre-processing on our OAML system. It is important to note that each experiment
was run multiple times and the average scores are plotted below.
To determine how practical our OAML system, it is important to evaluate it on real-world data. We
do this for the regression and classification data streams listed in Table 4.
For the catalyst activation data stream (Figure 6), the goal is to predict catalyst activity (continuous
variable) inside a reactor. High drift may often occur due complicated chemical processes such as
cooling and catalyst decay. The Hoeffding Adaptive Tree (HAT) Regressor performed best amongst
other state-of-the-art online algorithms (Table 5) in this study, furthermore since this is the first
This improvement is more pronounced in the sulfur (Figure 8) stream. Both OAML strategies perform
significantly better than the baseline and are affected less by the sudden drift after ~4,000 samples.
Next, we evaluate the performance of our OAML system on classification problems. We also include
other competitive Online AutoML approaches, such as the original OAML (Celik, et al., 2022) and
ChaCha (Wu, et al., 2021). On classification data streams, Leverage Bagging performed the best
amongst the chosen stream learning algorithms used in this study and is thus used as a benchmark
for comparison. Electricity data stream (Harries, 1999) is heavily autocorrelated which is beneficial
to the Leverage Bagging algorithm, hence its high performance (Figure 9).
The variations in the behaviour of the learning algorithms above are caused by the numerous
kinds of drift present in Electricity. With Extended OAML, the trigger point is reached frequently
leading to several re-trainings. Extended OAML performs the best amongst the AutoML online
methods here. Further, the Basic method performs better than the Ensemble method for both
Extended OAML and the Original OAML (Celik, et al., 2022). Frequent re-training of OAML could
lead to the addition of experts with low predictive performance (given the limited time budget used
in this experiment), hence the lower performance of Ensemble methods. Although DWM performs
better than AddExp throughout the stream, AddExp recovers faster from the sudden drift after 20,000
samples. ChaCha suffers the most from concept drift and does not recover well. In general, Extended
OAML methods perform better than the original OAML for this stream.
The New Airlines (Figure 10) data stream contains cyclical and frequent drifting patterns. The
immediate drop in performance of the Leverage Bagging algorithm indicates the need for adaptation.
OAML, Extended OAML and ChaCha recover quickly from the concept changes and exhibit similar
29
adaptation schemes. The Basic and Ensemble methods follow the same pattern as in the Electricity
data scheme, with DWM out-performing AddExp. We also observe that Extended OAML improves
on the performance achieved by OAML.
Figure 10: Prequential evaluation (Accuracy) for New Airlines data stream
On the Run or Walk data stream, drift is more gradual here, with some occasional abrupt
changes. All the adaptation algorithms recover well and relatively quickly on this data stream, except
for the original OAML – Ensemble method. The sudden drop in performance around ~60,000 and
~80,000 can be attributed to inclusion of bad experts during adaptation. Like the other data streams,
Extended OAML performs better than OAML on this data stream.
Synthetic datasets are important in evaluating the performance of online learning systems since the
presence of concept drift and its properties can only be ascertained here. We repeat the same steps
from previous experiments on real-world data for regression and classification problems using the
same algorithms and techniques
5.2.1 Regression
The Friedman data stream (Breiman, et al., 1984) simulates different electrical states in an
alternating circuit. Hence both gradual and incremental drift patterns exist. Extended OAML – Basic
and Ensemble have roughly the same behaviour in this stream and handle concept drift better than
the Leverage Bagging algorithm. These methods equally suffer from concept changes but recover
faster than the HAT Regressor. This further demonstrates the capabilities of our system in dynamic
regression applications.
There are fewer concept drifts in the Bank and 2D Planes data streams, allowing all
adaptation and online techniques to cope well. For the Bank data stream Figure 13, Extended OAML
– Basic starts off poorly with a high RMSE error but recovers well further down the stream. HAT
regressor maintains a consistent performance but performs the worst amongst all three algorithms.
31
Extended OAML – Ensemble improves in predictive performance for most of the stream and
performs the best overall. This can be as a result of good experts added to the ensemble when a
trigger point is reached in the ensemble.
The Online HAT Regressor performs poorly on the 2D Planes data stream (Figure 14), while
both Extended OAML methods are indistinguishable from each other here.
Overall, the Extended OAML techniques perform very well on the synthetic benchmark
datasets and in all cases offer significant improvements on online learning algorithms.
The hyperplane data stream (Figure 15) is symbolic of a real-world scenario with time changing
concepts. It consists majorly of high gradual drift introduced by the smooth rotation of a hyperplane
over time. As with the real-world data streams, the Extended OAML – Basic approach produces
better results than Ensemble methods with DWM performing better than AddExp. An interesting
phenomenon observed in (Figure 15) is the speedy recovery of the AddExp technique after
experiencing a concept drift. Although AddExp and DWM have similar approaches for penalising
bad experts, AddExp strongly favours new experts in ensemble voting, hence we can attribute this
phenomenon to the addition of a new expert with a strong prediction accuracy. The Leverage
Bagging algorithm performs poorly here in comparison to the other techniques.
The SEA Mixed data stream (Figure 16) comprises of both abrupt and gradual drift patterns. It
simulates a real-world dynamic environment where different kinds of drift persist. ChaCha. Extended
OAML – Basic, and Extended OAML – Ensemble (DWM) all perform very well and similarly here.
The sudden increase in performance of the original OAML – Ensemble is indicative of the addition
of a good expert during adaptation. Leverage bagging produces a flat curve and is unable to improve
on this data stream. For this data stream, we observe that our Extended OAML system out-performs
the original OAML system. SEA Abrupt (Figure 17) simulates a real-world high drift scenario by
alternating between several classification functions. The results are similar to SEA Mixed, however
the original OAML techniques offer competitive results here. The increase in performance of the
Ensemble techniques of the original OAML over the Extended version on this data stream could be
due to the equal voting technique employed by it. It would be interesting to draw a comparison of
33
Figure 16: Prequential evaluation (Accuracy) for SEA mixed data stream
voting strategies and how they affect the performance of Online Automatic Machine Learning, this is
further discussed in Section 5.4.
Figure 17: Prequential evaluation (Accuracy) for SEA Abrupt data stream
The pipelines constructed by the Extended OAML system are discussed in this section. For the
Ensemble method, we examine whether the active online model is the Ensemble algorithm, or a
single pipeline built by the OAML Search. The system logically updates the active online model
between these two predictors (see Algorithm 2 for more details) depending on their prequential
performance. Figure 18 shows model updates for the Friedman and Sulfur data streams. Before an
ensemble is constructed, the system begins with the single best-found pipeline from the search
phase, hence all lines start as orange.
(a) (b)
For the Friedman data stream, the active online model frequently switches from the Ensemble to the
AutoML optimizer. This change can be attributed to the incremental and gradual drift concepts that
persist in this data stream. After about 1,200 samples we observe a recovery from concept drift by
the newly activated Ensemble. The Sulfur data stream quickly activates the ensemble both algorithm
and then goes through a short period of alternating between both models. The frequent changes are
caused by the presence of high drift in the given window, triggering the drift adaptation mechanism.
Next, we analyse the actual pipelines generated by the Basic method. Figure 19 shows the
model updates made for the Catalyst Activation stream and the Debutaniser stream. Drift points are
represented by pink markers, while different models are represented by dotted lines of different
colours. Gradual change of concept in the Catalyst Activation process triggers the adaptation of the
OAML system. The model starts out with a Hoeffding Adaptive Tree Regressor and is replaced by
an Adaptive Random Forest Tree Regressor after the first trigger point is reached. Note that
retraining here was triggered by the alternative detector (3.4.1) and not because of concept drift.
When the drift detector is triggered, the Leverage Bagging Regressor replaces the Adaptive Random
Forest Regressor, and remains active until the end of the stream. In the Debutaniser process, Figure
19 – (b), the Leverage Bagging Regressor replaces the Hoeffding Adaptive Tree Regressor initially,
however after a sudden change of concept the drift detector triggers the OAML search for new
pipelines to handle this change. Shortly afterwards another trigger point is reached, and the stream
ends with an Adaptive Random Forest Regressor.
35
(a) (b)
It is worth noting during the retraining process, Extended OAML may select a new pipeline with the
current active online algorithm (but with different hyperparameters). In the case where a drift is
detected and the current online model remains active, it indicates that the newly constructed pipeline
performed worse on the current sliding window and thus the active online model remains unchanged.
In this section we focus on investigating the effects of various parameters on the performance of
Extended OAML – Ensemble. We start by varying the size of the ensemble, then we compare various
expert replacement strategies in the ensemble, finally we compare three popular voting strategies
and report the results. Experiments here are focused only on regression tasks, however, future work
should be done to investigate the effect of these properties on classification problems. We run tests
here with one real-world and data stream and one synthetic data stream. Further, experiments were
performed with a sliding window size, 𝑏* , of 200 and a search budget, 𝑡(#; , of 40 for simplicity.
The experiments done in Section 5.1 and Section 5.2 used a fixed ensemble size of 10 experts. Here
we seek to find if increasing or reducing the number of experts in the ensemble offers any significant
improvement or reduction in performance. Figure 20-(a) shows the prequential performance on the
Catalyst Activation data for an Ensemble of 20, 10 and 3 experts. Although the ensemble behaves
similarly in all cases, the results show that the number of experts is proportional to the prequential
performance of the ensemble. This relationship is further demonstrated in Figure 20-(b) on the
Friedman data stream. This behaviour is expected and shows that in general, the search phase of
Extended OAML adds good experts to the ensemble. If the reverse case were true, then we would
expect a decrease in prequential performance as the ensemble size increases.
36
Although, a simple experimental setup was used here, the results from an ensemble size of
3 experts are comparable to an ensemble of 20 experts, hence the former is suitable for applications
with limited computational resources.
(a) (b)
Figure 20: Effect of Ensemble size: (a) Catalyst Activation data (b) Friedman data
One of the ways in which our work differs from the original OAML system is the method used for
expert replacement. Extended OAML - Ensemble uses a worst expert replacement method as
against the oldest expert replacement technique used in OAML – Ensemble (discussed in Section
3.4.1). Thus, we investigate the effects of this using an ensemble size of 10 experts.
By using the worst expert replacement approach, a lower RMSE score on the Catalyst
Activation data stream (Figure 21-(a)) is achieved in comparison to the oldest expert replacement
approach. The ensemble recovers faster from concept drift because of this. Figure 21-(b) shows a
more interest pattern, both replacement techniques have a similar initial behaviour, the worst expert
replacement technique handles the gradual drift a bit worse than the oldest expert replacement
technique, however, the former manages abrupt drift much better and recovers quickly from it.
Both replacement techniques are capable of handling different kinds of concept drift,
however, the worst expert replacement approach appears to be superior in abrupt drift environments.
37
(a) (b)
Figure 21: Effect of Expert replacement: (a) Catalyst Activation data (b) Friedman data
Another significant contribution of this work is the implementation of multiple weight combination
approaches. We extend the unweighted voting approach used in OAML with Additive Expert and
Dynamic Weighted Majority voting. In this section, we compare Additive Experts and equal voting
for regression data streams (since DWM is not suitable for regression). We run the experiments with
the same setup described in Section 5.4, however since the goal here is to find the weight
combination approach that produces the best results, we modify the online model selection
described in algorithm in to always select the ensemble for prediction. On the regression
benchmarks, AddExp and equal voting behave similarly. While unweighted voting performs better
on the Catalyst Activation data (Figure 22-(a)), the AddExp technique gives better results for the
synthetic Friedman data stream. This suggests that both methods are equally effective for handling
concept drift, hence the equal voting approach is favourable due to its simplicity.
To evaluate the effectiveness of available search algorithms in re-optimizing pipelines after a drift
occurs, we run experiments with random search “RandomSearch()” and asynchronous evolutionary
algorithm “AsyncEA()” on regression data streams. The original OAML framework (Celik, et al.,
2022) reports the effect of these algorithms on classification streams. We maintain the same
parameters used in previous experiments and vary the search algorithm. In addition, we assume
38
(a) (b)
Figure 22: Effect of weight combination (a) Catalyst Activation (b) Friedman
that the effect of the search algorithm is the same for Extended OAML – Basic and Extended OAML
– Ensemble, hence we only run experiments for the simpler Basic method. Each search algorithm
was run multiple times and the aggregate results of these are shown in Figure 23 below
For the Catalyst Activation data stream, random search (red line) performs better on average,
than the evolutionary algorithm. This could be because random search can quickly find good
pipelines when a drift occurs. The same pattern is seen in the Friedman stream, where various drift
types of different magnitudes are present. The random search adapts better for incremental drift
process (later stages of the stream). Despite the asynchronous implementation of the evolutionary
algorithm, the time budget of 40 seconds used in these experiments appears be insufficient for the
evolutionary algorithm to produce better results than random search.
In general, both search algorithm implementations construct good pipelines and achieve
good results on the benchmark data streams.
The final modification to the original OAML system is the inclusion of online pre-processing
algorithms. Thus, in this section we investigate the effect of this change on the system. To achieve
this, we maintain all other changes to the original OAML system and exclude the online pre-
processing steps and compare this with the Extended OAML system. As expected, Figure 24 - Figure
29 show that online automatic pre-processing algorithms improve the performance AutoML systems
in online environments.
(a) (b) 39
Figure 23: Effect of Search Algorithm: (a) Catalyst Activation (b) Friedman
For Bank (Figure 24) and Catalyst Activation (Figure 27) the effect the pre-processing step is not
minimal. On the Friedman stream (Figure 25), the system without pre-processing suffers more from
the incremental change in concept and slowly recovers from it. This improvement is pronounced
further in the 2D Planes (Figure 26) data stream. Both systems behave very similarly here in the
presence of concept drift, the difference in performance here may be attributed to data encoding
which helps improve the predictive performance of the model.
Finally, for the Debutaniser process (Figure 28) and the Sulfur process (Figure 29), the
Extended OAML system with pre-processing initially performs better than the Extended OAML
system without pre-processing. However, when a sudden drift occurs, the difference in performance
increases and the system with online pre-processing recovers faster from the sudden drift. Overall,
the addition of pre-processing to the OAML system offers improvements in predictive performance
as well as in handling different kinds of drift patterns.
41
In this section, we thoroughly evaluated the performance of our OAML system under varying drift
conditions and reported the results which show that our system can produce competitive results
across all benchmarks.
42
6 DISCUSSION
The experiments performed indicate that automated machine learning can be applied to regression
problems in different kinds of drifting environments. Although the performance of the AutoML system
may suffer due to this change, the inclusion of an explicit drift detection system and sound adaptation
mechanisms help it to recover quickly from this change. In most cases, the proposed system
outperforms competitive stream learning algorithms.
For classification problems, our system consistently improves on the original OAML system
across all benchmark datasets. From our experiments, this improvement can be attributed to the
online-pre-processing algorithms used. Although the original OAML system includes algorithms for
pre-processing, the pipelines constructed by it did not include them. Extended OAML constructs
pipelines of automatic pre-processing algorithms and includes pre-processing steps to deal with null
values and encode categorical variables.
Both methods for learning with Extended OAML perform quite well on the benchmark
datasets. The Ensemble approach also outperforms the version used in the original OAML system.
Our results indicate that using the worst expert replacement, when the ensemble becomes full
produces better results than the oldest expert replacement technique used in the original system.
Examination of the various weight re-combination strategies show that equal voting produces similar
and, in some cases, better results than AddExp for regression problems. While for classification
streams, DWM performs better than AddExp. Increasing the ensemble size may offer improvements
to the system, however, the results of doing this are marginal compared to a simpler system.
The Basic Extended OAML approach performed better in general than the Ensemble
approach. This is contrary to the behaviour of the original OAML system (Celik, et al., 2022). This
change in behaviour can be attributed to online pre-processing and weight re-combination strategies.
While online pre-processing improves both systems, AddExp and DWM may sometimes perform
worse than equal voting.
Finally, both the random search and evolutionary approach find good pipelines, the pipelines
produced by the random search algorithm in our experiments performed better the evolutionary
algorithm. This could be because of limited time budget (40 seconds) used in all experiments.
43
7 CRITICAL EVALUATION
The objectives of this dissertation were achieved by extending capabilities of the OAML framework
with automated online pre-processing algorithms, multiple adaptive mechanisms for online
adaptation and an end-to-end automatic online pipeline for regression tasks.
Objective 1: To design a new search space of online pre-processors and regressors in the OAML
system.
o We designed a new search space of online pre-processors and online regression
algorithm. Further, we adapted the search algorithm for regression tasks to enable it find
and construct good machine learning pipelines. (Section 3.2)
Objective 2: To implement new drift detection algorithms for detecting concept drift in regression
o We implemented ADWIN in our new system. From literature and our experiments,
ADWIN offers the best drift detection for regression tasks. ADWIN was implemented in
our system using scikit-multifilow. (Section 3.4)
Objective 3: To implement the Additive Expert (AddExp.) method and Dynamic Weighted Majority
(DWM) method for backup ensemble adaptation in the presence of concept drift
o We implemented AddExp Continuous (regression) and AddExp Discrete (classification)
and DWM (classification) for backup ensembles. Further, we implemented new model
replacement techniques for Ensembles. (Section 3.4.1)
Objective 4: To evaluate the performance of the extended OAML framework for regression and
classification tasks and compare against adaptive learners and the original OAML framework
o We evaluated our OAML systems on benchmark real-world and synthetic data streams
for regression and classification tasks and compared the results against the original
OAML as a baseline and other state-of-the-art techniques in the field. (Section 5)
Further, the results of this dissertation have provided answers to the following research questions.
1. Research Question 1: How can online AutoML techniques be designed to address
regression problems? (Section 3)
2. Research Question 2: How does the adaptation strategy for ensemble methods on online
AutoML influence the predictive performance? (Section 5.4)
44
8 CONCLUSION
We developed a new system that applies existing automated machine learning techniques to
changing environments for the two main supervised machine learning tasks, classification, and
regression. This was done by extending OAML for regression environments. The new system
automatically constructs full machine learning pipelines of online pre-processors and predictors
We include an explicit ADWIN drift detector for regression tasks which allows it to detect a
change in concept and automatically re-optimize its pipeline to adapt to concept changes. Our
system includes Dynamic Weighted Majority and Additive Expert techniques for adapting to different
kinds of drift. Furthermore, we substitute the oldest expert replacement with the worst expert
replacement technique in Ensemble methods.
The performance of our system is evaluated in a prequential manner on real-world and
synthetic data streams for regression and classification tasks. Our results show that our proposed
system offers competitive results and is a significant improvement on the original OAML system. We
explore which components of the new system are responsible for these changes and find that online
pre-processing and the ensemble replacement technique contribute the most. The results also show
that Extended OAML system can produce competitive results in environments with limited time and
memory.
Overall, the results indicate that this work has the potential to advance AutoML research.
45
9 FUTURE WORK
Given the trend in big data and automation of machine learning tasks, future work should be done
to develop online AutoML systems that include automatic feature selection and automatic feature
engineering. Our system can serve as a baseline and be further extended to include new
transformation and prediction steps.
Furthermore, our results indicate that ChaCha is capable of tuning hyper-parameters
optimally, thus future work should be done to incorporate this feature into our OAML system.
46
REFERENCES
Albert Bifet and Ricard, G., Learning from Time-Changing Data with Adaptive Windowing.
Baier, L., Hofmann, M., Kühl, N., Mohr, M. and Satzger, G., 2020. Handling Concept Drifts in
Regression Problems - the Error Intersection Approach, Wirtschaftsinformatik.
Baker, B., Gupta, O., Naik, N. and Raskar, R., 2017. Designing Neural Network Architectures
using Reinforcement Learning. ArXiv, abs/1611.02167.
Bakirov, R., 2017. Multiple adaptive mechanisms for predictive models on streaming
data [http://eprints.bournemouth.ac.uk/29443/]. DoctorateBournemouth University.
Bakirov, R., Fay, D. and Gabrys, B., 2021. Automated adaptation strategies for stream
learning. Machine Learning, 110 (6), 1429-1462.
Bakirov, R. and Gabrys, B., 2013. Investigation of Expert Addition Criteria for Dynamically
Changing Online Ensemble Classifiers with Multiple Adaptive Mechanisms, 9th Artificial
Intelligence Applications and Innovations (AIAI) (Vol. AICT-412, pp. 646-656). Paphos,
Greece: Springer.
Bates, J. M. and Granger, C. W. J., 1969. The Combination of Forecasts. Journal of the
Operational Research Society, 20, 451-468.
Bifet, A. and Gavaldà, R., 2009. Adaptive Learning from Evolving Data Streams (pp. 249-260).
Berlin, Heidelberg: Springer Berlin Heidelberg.
Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H., Jansen, T. and Seidl, T.,
2011. MOA: A Real-Time Analytics Open Source Framework (pp. 617-620). Berlin,
Heidelberg: Springer Berlin Heidelberg.
Celik, B., Singh, P. and Vanschoren, J., 2022. Online AutoML: An adaptive AutoML framework for
online learning. ArXiv, abs/2201.09750.
Celik, B. and Vanschoren, J., 2021. Adaptation Strategies for Automated Machine Learning on
Evolving Data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3067-
3078.
Chen, B., Zhao, X., Wang, Y., Fan, W., Guo, H. and Tang, R., 2022. Automated Machine Learning
for Deep Recommender Systems: A Survey. ArXiv,abs/2204.01390.
Codalab, 2018. AutoML3:: AutoML for Lifelong Machine Learning [online]. Available from:
https://competitions.codalab.org/competitions/19836 [Accessed 08/05/2022].
Dawid, P., 1984. Present Position and Potential Developments: Some Personal Views: Statistical
Theory: The Prequential Approach. Journal of the Royal Statistical Society. Series
A (General), 147 (2), 278-292.
47
Dietterich, T. G., 2000. Ensemble Methods in Machine Learning (pp. 1-15). Berlin, Heidelberg:
Springer Berlin Heidelberg.
Fortuna, L., Graziani, S. and Xibilia, M. G., 2005. Soft sensors for product quality monitoring in
debutanizer distillation columns. Control Engineering Practice, 13, 499-508.
Gama, J., Medas, P., Castillo, G. and Rodrigues, P., 2004. Learning with Drift Detection (pp. 286-
295). Berlin, Heidelberg: Springer Berlin Heidelberg.
Gama, J., Sebastião, R. and Rodrigues, P. P., 2009. Issues in evaluation of stream learning
algorithms. Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining, Paris, France. Association for Computing Machinery. 329–338.
Available from: https://doi.org/10.1145/1557019.1557060 [Accessed
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M. and Bouchachia, A., 2014. A survey on concept
drift adaptation. ACM Comput. Surv., 46 (4), Article 44.
Gijsbers, P. J. A. and Vanschoren, J., 2020. GAMA: a General Automated Machine learning
Assistant. ArXiv, abs/2007.04911.
Gomes, H. M., Barddal, J. P., Ferreira, L. E. B. and Bifet, A., 2018. Adaptive random forests for
data stream regression, ESANN.
Harries, M., Nsw-tr, U. and Wales, N. S., SPLICE-2 Comparative Evaluation: Electricity
Pricing [techreport].
Hazan, E. and Seshadhri, C., 2009. Efficient learning algorithms for changing
environments. Proceedings of the 26th Annual International Conference on Machine
Learning, Montreal, Quebec, Canada. Association for Computing Machinery. 393–400.
Available from: https://doi.org/10.1145/1553374.1553425 [Accessed
Hulten, G., Spencer, L. and Domingos, P., 2001. Mining time-changing data streams. Proceedings
of the seventh ACM SIGKDD international conference on Knowledge discovery and data
mining, San Francisco, California. Association for Computing Machinery. 97–106.
Available from: https://doi.org/10.1145/502512.502529 [Accessed
Jyrki Kivinen and Manfred, K. W., 1997. Exponentiated Gradient versus Gradient Descent for
Linear Predictors. Information and Computation, 132 (1), 1-63.
Kolter, J. Z. and Maloof, M. A., 2005. Using additive expert ensembles to cope with concept
drift. Proceedings of the 22nd international conference on Machine learning, Bonn,
48
Germany. Association for Computing Machinery. 449–456. Available from:
https://doi.org/10.1145/1102351.1102408 [Accessed
Kolter, J. Z. and Maloof, M. A., 2007. Dynamic Weighted Majority: An Ensemble Method for
Drifting Concepts. J. Mach. Learn. Res., 8, 2755–2790.
Kuncheva, L. I., 2004. Combining pattern classifiers : methods and algorithms. Hoboken, NJ: J.
Wiley.
LeDell, E. and Poirier, S., H2O AutoML: Scalable Automatic Machine Learning. 7th ICML
Workshop on Automated Machine Learning (AutoML).
Luigi, F., A, R., M, S. and M, X., 2003. Soft Analysers for a Sulphur Recovery Unit. Control
Engineering Practice, 11 (12), 1491-1500.
Madrid, J. G., Escalante, H. J., Morales, E. F., Tu, W.-W., Yu, Y., Sun-Hosoya, L., Guyon, I. and
Sebag, M., 2019. Towards AutoML in the presence of Drift: first
results. CoRR, abs/1907.10772.
Marco, F. D. a. Y., 2004. Vehicle classification in distributed sensor networks. Journal of Parallel
and Distributed Computing, 64 (7), 826-838.
Montiel, J., Halford, M., Mastelini, S. M., Bolmier, G., Sourty, R., Vaysse, R., Zouitine, A., Gomes,
H. M., Read, J., Abdessalem, T. and Bifet, A., 2021. River: machine learning for streaming
data in Python. J. Mach. Learn. Res., 22, 110:111-110:118.
Muñoz, M. A., Villanova, L., Baatar, D. and Smith-Miles, K., 2018. Instance spaces for machine
learning classification. Machine Learning, 107 (1), 109-147.
Olson, R. S., Bartley, N., Urbanowicz, R. J. and Moore, J. H., 2016. Evaluation of a Tree-based
Pipeline Optimization Tool for Automating Data Science. Proceedings of the Genetic and
Evolutionary Computation Conference 2016.
Oza, N. C. and Russell, S. J., 2005. Online bagging and boosting. 2005 IEEE International
Conference on Systems, Man and Cybernetics, 3, 2340-2345 Vol. 2343.
Petr Kadlec and Ratko Grbić and Bogdan, G., 2011. Review of adaptation mechanisms for data-
driven soft sensors. Computers & Chemical Engineering, 35 (1), 1-24.
Pozzolo, A. D., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2015a. Credit card fraud
detection and concept-drift adaptation with delayed supervised information. 2015
International Joint Conference on Neural Networks (IJCNN), 1-8.
Pozzolo, A. D., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2015b. Credit card fraud
detection and concept-drift adaptation with delayed supervised information. 2015
International Joint Conference on Neural Networks (IJCNN), 1-8.
Ruta, D. and Gabrys, B., An Overview of Classifier Fusion Methods. Computing and Information
Systems, 7 (1), 1-10.
Schlimmer, J. C. and Granger, R. H., 1986. Beyond incremental processing: tracking concept
drift. Proceedings of the Fifth AAAI National Conference on Artificial Intelligence,
Philadelphia, Pennsylvania. AAAI Press. 502–507.
49
Scott, E. O. and De Jong, K. A., Understanding Simple Asynchronous Evolutionary Algorithms
(pp. 85-98): Association for Computing Machinery.
Shawi, R. E., Maher, M. and Sakr, S., 2019. Automated Machine Learning: State-of-The-Art and
Open Challenges. CoRR, abs/1906.02287.
Silver, D. L., Yang, Q. and Li, L., 2013. Lifelong machine learning systems: Beyond learning
algorithms. in AAAI Spring Symposium Series, 2013.
Soares, S. G. and Araújo, R., 2015. A dynamic and on-line ensemble regression for changing
environments. Expert Syst. Appl., 42 (6), 2935–2948.
Sobhani, P. and Beigy, H., 2011. New Drift Detection Method for Data Streams (pp. 88-97). Berlin,
Heidelberg: Springer Berlin Heidelberg.
Street, W. N. and Kim, Y., 2001. A streaming ensemble algorithm (SEA) for large-scale
classification. Proceedings of the seventh ACM SIGKDD international conference on
Knowledge discovery and data mining, San Francisco, California. Association for
Computing Machinery. 377–382. Available from: https://doi.org/10.1145/502512.502568
[Accessed
Symone and Rui, A., 2015. An on-line weighted ensemble of regressor models to handle concept
drifts. Engineering Applications of Artificial Intelligence,37, 392-406.
Thompson, W. R., 1933. On the Likelihood that One Unknown Probability Exceeds Another in
View of the Evidence of Two Samples. Biometrika, 25 (3/4), 285-294.
Thornton, C., Hutter, F., Hoos, H. H. and Leyton-Brown, K., 2013. Auto-WEKA: combined
selection and hyperparameter optimization of classification algorithms. Proceedings of the
19th ACM SIGKDD international conference on Knowledge discovery and data mining,
Chicago, Illinois, USA. Association for Computing Machinery. 847–855. Available from:
https://doi.org/10.1145/2487575.2487629 [Accessed
Tsymbal, A., Pechenizkiy, M., Cunningham, P. and Puuronen, S., 2008. Dynamic integration of
classifiers for handling concept drift. Inf. Fusion, 9 (1), 56–68.
Wang, C., Wu, Q., Weimer, M. and Zhu, E., 2019. FLAML: A Fast and Lightweight AutoML
Library.
Weiming Shao and Xuemin, T., 2015. Adaptive soft sensor for quality prediction of chemical
processes based on selective ensemble of local partial least squares models. Chemical
Engineering Research and Design, 95, 113-132.
Wilson, J., Meher, A. K., Bindu, B. V., Chaudhury, S., Lall, B., Sharma, M. and Pareek, V., 2020a.
Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High
Cardinality Data Streams Under Concept Drift (pp. 317-335). Cham: Springer International
Publishing.
Wilson, J., Meher, A. K., Bindu, B. V., Chaudhury, S., Lall, B., Sharma, M. and Pareek, V., 2020b.
Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High
Cardinality Data Streams Under Concept Drift (pp. 317-335). Cham: Springer International
Publishing.
50
Wu, Q. a. W. C. a. L. J. a. M. P. a. R. M., 2021. ChaCha for Online AutoML. 2021 International
Conference on Machine Learning (ICML 2021), July. Available from:
https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/ ,
[Accessed
Zoph, B. and Le, Q. V., 2017. Neural Architecture Search with Reinforcement
Learning. ArXiv, abs/1611.01578.
51
APPENDIX A
The necessary files for installation are provided in the folder “Extended_OAML.zip”. Unzip the
folder and perform the following steps. The unzipped folder consists of the following
o gama
o river
1. We recommend creating a python virtual environment before installation
2. To install gama, change directory the gama folder and run the following
pip install -r requirements.txt
3. To install river, change directory to the river folder and run
python setup.py install
4. It is important that you install river, and gama according to steps 2 and 3 above, to prevent
version errors but also because we have modified these libraries for our system
52
APPENDIX B
After installation and setup, the file tree of the OAML system resembles the structure below.
── gama
│ ├── ci_scripts
│ ├── data
│ ├── data_streams
│ ├── docs
│ ├── examples
│ ├── gama
│ ├── gama.egg-info
│ ├── oaml_paper
│ ├── tests
│ ├── wandb
│ ├── Extended_basic.py
│ ├── Extended_ensemble.py
│ ├── LICENSE
│ ├── OAML-UserGuide
│ ├── OAML_baseline_LB.py
│ ├── README.md
│ ├── code_of_conduct.md
│ ├── codecov.yml
│ ├── mypy.ini
│ └── setup.py
We focus only on the relevant files and folders for this dissertation.
- data_streams (folder): contains all data sets and python scripts for connecting them to the
OAML system
- gama (folder): contains scripts, files, folders for configuring the AutoML search phase. This
is the actual Gama AutoML library
- oaml_paper (folder): contains relevant files from the original OAML system
- Extended_basic.py (script): Python script for running the basic Extended OAML
- Extended_ensemble.py (script): Python script for running the basic Extended OAML
- OAML_baseline.py(script): Python script for testing baseline algorithms
APPENDIX C
User Guide
Extended OAML has been developed to be easily executed on the Command Line (Terminal). Users
are not required to interact with python scripts. Basic and Ensemble methods of Extended OAML
are run with the same command line arguments (parameters).The requirements to install our
framework are listed in Appendix D.
A sample run for the Extended OAML basic can be run as follows:
python Extended_basic.py 'data_streams/debutaniser.arff' 500 100 rmse rmse 80
random False
#0 Python Script
- This could either be Extended_basic.py or Extended_ensemble.py. It tells the computer
what Extended OAML approach to run
#8 Live-Plot (Boolean)
- Condition to create a live plot with Wandb-ai or not.
- If True
o Registration required on https://wandb.ai/site
o Set entity to your wandb username
o Set project name as desired
o [True, False]
APPENDIX D
Machine Setup
All experiments were run locally. The specifications of the local machine used to run experiments
are given below:
1. Local Machine: MacBook Pro 2021
2. CPU: Apple M1
3. GPU: Apple M1