Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

FACULTY OF SCIENCE & TECHNOLOGY

MSc Data Science and Artificial Intelligence


May 2022

Automatic Machine Learning for evolving data streams

by

Lotachukwu Ibe
ii

Faculty of Science & Technology


Department of Computing and Informatics
Individual Masters Project
iii

Abstract

Automated Machine Learning (AutoML) for prediction on non-stationary streaming data is a


developing area with existing research limited to classification tasks alone. However, many real-
world machine learning tasks need to be formulated as regression problems (e.g., taxi demand,
sales forecast, etc.), and it is yet to be shown whether current approaches for AutoML in a streaming
scenario can be effectively applied to predict a continuous target. This work therefore aims to
develop an AutoML solution for all supervised machine learning tasks in an online setting where the
data distribution is continuously changing. We achieve this by extending the Online Automated
Machine Learning (OAML) framework. OAML is a python framework for automatic machine learning
on classification problems in online environments. Thus, we extend the OAML framework with a
configuration space of online pre-processing algorithms and regressors. The proposed system
includes sound adaptive mechanisms for stream learning that allow it to cope with data drift on all
supervised learning tasks (classification and regression).
Literature on ensemble adaptation techniques suggest that weight re-combination of base
learners in an ensemble may increase its performance in an online environment. For this reason, we
implement an ensemble method in the extended OAML system that includes weight re-combination
techniques such as dynamic weighted majority, equal voting, and additive expert. We evaluate our
system on real-world and synthetic benchmark datasets (classification and regression) for concept
drift detection and report experimental results of its performance.
The results show that our novel system can handle various kinds of concept drift of varying
magnitudes in an online setting. Our OAML system provides a significant improvement on the
original OAML system across all classification benchmarks and in most cases out-performs state-of-
the-art online algorithms. Our results also indicate that online automated machine learning can
improve on competitive stream learning algorithm, and thus the proposed system can serve as a
baseline for online AutoML on regression tasks.
Finally, we find that the adaptation capabilities of an online AutoML system can be
significantly improved by including online pre-processing algorithms.

Keywords: AutoML, Stream learning, Concept drift, Adaptive Machine Learning, Online
Regression
iv

Dissertation Declaration

I agree that, should the University wish to retain it for reference purposes, a copy of my dissertation
may be held by Bournemouth University normally for a period of 3 academic years. I understand
that once the retention period has expired my dissertation will be destroyed.

Confidentiality

I confirm that this dissertation does not contain information of a commercial or confidential nature
or include personal information other than that which would normally be in the public domain
unless the relevant permissions have been obtained. Any information which identifies a particular
individual's religious or political beliefs, information relating to their health, ethnicity, criminal
history, or sex life has been anonymised unless permission has been granted for its publication
from the person to whom it relates.

Copyright

The copyright for this dissertation remains with me.

Requests for Information

I agree that this dissertation may be made available as the result of a request for information under
the Freedom of Information Act.

Signed: Lotachukwu Ibe

Name: Lotachukwu Ibe

Date: 06/05/2022

Programme: MSc. Data Science and Artificial Intelligence


v

Original Work Declaration

This dissertation and the project that it is based on are my own work, except where stated, in
accordance with University regulations.

Signed: Lotachukwu Ibe

Name: Lotachukwu Ibe

Date: 06/05/2022
vi

Acknowledgments

I would like to thank my supervisor Rashid Bakirov for his continuous guidance and support through
every phase of this dissertation.

In addition, I am thankful to the academic staff in the Faculty of Science and Technology at
Bournemouth University who have contributed towards my learnings during my MSc. Program.

Finally, I am thankful to my family, who have supported me throughout the MSc. and have always
been there my whole life.
vii

TABLE OF CONTENTS

1 INTRODUCTION _____________________________________________________________________________ 1
1.1 Background ____________________________________________________________________________ 1
1.2 Problem Definition _______________________________________________________________________ 2
1.3 Aims and objectives ______________________________________________________________________ 3
1.4 Original Contributions ____________________________________________________________________ 3
1.5 Organisation of the dissertation ____________________________________________________________ 3

2 LITERATURE REVIEW _________________________________________________________________________ 5


2.1 AutoML _______________________________________________________________________________ 5
2.2 Online Learning and Adaptation on Streaming Data ____________________________________________ 6
2.3 AutoML for Online Learning ______________________________________________________________ 10
2.4 Evaluation Techniques ___________________________________________________________________ 11

3 METHODOLOGY ____________________________________________________________________________ 12
3.1 Overview _____________________________________________________________________________ 12
3.2 Search Space Design ____________________________________________________________________ 13
3.3 Combined Algorithm Selection and Hyperparameter Optimization ________________________________ 15
3.4 Online AutoML Model ___________________________________________________________________ 16
3.5 Evaluation Protocol _____________________________________________________________________ 20

4 EXPERIMENT DESIGN ________________________________________________________________________ 23


4.1 Data Streams __________________________________________________________________________ 23
4.2 Configuration of OAML System ____________________________________________________________ 24
4.3 Baselines _____________________________________________________________________________ 24

5 RESULTS __________________________________________________________________________________ 26
5.1 Experiments with real-world data __________________________________________________________ 26
5.2 Experiments with synthetic data ___________________________________________________________ 30
5.3 Pipeline Analysis _______________________________________________________________________ 34
5.4 Experiments on Ensemble Adaptation_______________________________________________________ 35
5.1 Effect of Search Algorithm ________________________________________________________________ 37
5.2 Effect of Online Pre-processing Algorithms ___________________________________________________ 38

6 DISCUSSION _______________________________________________________________________________ 42

7 CRITICAL EVALUATION_______________________________________________________________________ 43

8 CONCLUSION ______________________________________________________________________________ 44

9 FUTURE WORK _____________________________________________________________________________ 45

REFERENCES ___________________________________________________________________________________ 46

APPENDIX A____________________________________________________________________________________ 51
viii

APPENDIX B ____________________________________________________________________________________ 52

APPENDIX C ____________________________________________________________________________________ 53

APPENDIX D ___________________________________________________________________________________ 56
ix

LIST OF FIGURES

Figure 1: Types of Concept Drift (Gama, et al., 2014) ....................................................................... 7


Figure 2: Strategies for ensemble adaptation (Bakirov, 2017) .......................................................... 9
Figure 3: Online Evaluation: (a) Periodic Holdout (b) Prequential Selection ................................... 11
Figure 4: Extended OAML System .................................................................................................. 16
Figure 5: Prequential Evaluation...................................................................................................... 20
Figure 6: Prequential evaluation (RMSE) for Catalyst Activation data stream ................................ 26
Figure 7: Prequential evaluation (RMSE) for Debutaniser data stream........................................... 27
Figure 8: Prequential evaluation (RMSE) for Sulfur data stream ..................................................... 27
Figure 9: Prequential evaluation (Accuracy) for Electricity data stream .......................................... 28
Figure 10: Prequential evaluation (Accuracy) for New Airlines data stream.................................... 29
Figure 11 Prequential evaluation (Accuracy) for Run or Walk data stream ..................................... 29
Figure 12: Prequential evaluation (RMSE) for Friedman data stream ............................................. 30
Figure 13: Prequential evaluation (RMSE) for Bank data stream .................................................... 31
Figure 14: Prequential evaluation (RMSE) for 2D Planes ............................................................... 31
Figure 15: Prequential evaluation (Accuracy) for Hyperplane data stream ..................................... 32
Figure 16: Prequential evaluation (Accuracy) for SEA mixed data stream ...................................... 33
Figure 17: Prequential evaluation (Accuracy) for SEA Abrupt data stream ..................................... 33
Figure 18: Online model update for Extended OAML – Ensemble .................................................. 34
Figure 19: Online model update for Extended OAML – Basic ......................................................... 35
Figure 20: Effect of Ensemble size: (a) Catalyst Activation data (b) Friedman data ....................... 36
Figure 21: Effect of Expert replacement: (a) Catalyst Activation data (b) Friedman data ................ 37
Figure 22: Effect of weight combination (a) Catalyst Activation (b) Friedman ................................. 38
Figure 23: Effect of Search Algorithm: (a) Catalyst Activation (b) Friedman ................................... 39
Figure 24: Effect of Online Pre-processing on Bank data stream .................................................... 39
Figure 25: Effect of Online Pre-processing on Friedman data stream ............................................. 40
Figure 26: Effect of Online Pre-processing on 2D Planes data stream ........................................... 40
Figure 27: Effect of Online Pre-processing on Catalyst Activation data stream .............................. 40
Figure 28: Effect of Online Pre-processing on Debutaniser data stream ........................................ 41
Figure 29: Effect of Online Pre-processing on Sulfur data stream .................................................. 41
1

1 INTRODUCTION

This chapter provides an overview of the MSc dissertation titled “Automatic Machine Learning for
evolving data streams”. The general background and motivation doe the project is given in Section
1.1. In Section 1.2, we clearly define the problem addressed in this work. The aim, research
questions and objectives are stated in Section 1.3, followed by the original contributions of this work
in Section 1.4. A summary of the organisation of this dissertation is provided in Section 1.5.

1.1 Background

Automated machine learning systems have benefited from the growing demand for machine learning
domain expertise in recent years. These systems give users without a technical know-how the ability
to quickly build machine learning solutions quickly and enable domain experts to automate or
optimize their tasks. AutoML systems achieve this by automating the steps involved in data
processing, feature extraction, model selection and hyperparameter tuning. Although they can
achieve state-of-the-art solutions on supervised learning tasks, current AutoML systems are
somewhat constrained in their applications. They assume that the data is static, that is, all the training
data is available at once, and that the data distribution in the training set is the same as the
distribution used for prediction. However, real world data often arrives in batches, and the data
distribution evolves over time. Hence, to successfully integrate AutoML systems to industry
applications, it is desirable to have AutoML systems that can operate in an online learning setting.
A major challenge machine learning algorithms face in an online learning setting is concept
drift. This phenomenon occurs when the target concept changes over time. To mitigate this, online
learning algorithms which can adapt and cope with a changing concept have been developed.
However, in the presence of concept drift, the hyperparameters of these online algorithms may
require retuning.
The proposed OAML system is evaluated on well-known concept drift datasets for regression
and classification and compare against the original OAML frameworks as well as baseline adaptive
learners. Our findings indicate that the Extended OAML system can achieve competitive results on
regression data streams with varying drift complexities while also performing better than the original
OAML system on benchmark data streams for classification tasks. We find that AutoML systems
benefit from online pre-processing algorithms and that ensemble methods can be improved via
weighted ensemble voting and sound model replacement techniques.
2

1.2 Problem Definition

Originally defined by Thornton, et al., (2013), AutoML aims to automatically construct full machine
learning pipelines to minimize loss on a given metric. For the scenario where data is trained on
𝐷!"#$% = (𝑥& , 𝑦& ), . . . , (𝑥% , 𝑦% ), the goal is to automatically produce the set of predictions 𝑦%'& , … , 𝑦%'(
from a test set 𝐷!)*! = (𝑥%'& , 𝑦%'& ), . . . , (𝑥%'( , 𝑦%'( ) (which possess the same distribution as 𝐷!"#$% )
that minimize the loss function ℒ (. , . ), given a limited resource budget b. Where 𝑖 = 1, . . ., 𝑛 + 𝑚, 𝑛,
𝑚 ∈ ℕ' , 𝑥$ ∈ ℝ+ represents features of d dimensions and 𝑦$ ∈ 𝑌. The loss function is given as:
Eq. 1
(
1
5 ℒ( 𝑦6%', , 𝑦%', )
𝑚
,-&

Furthermore, The AutoML problem can be restricted to a Combined Algorithm Selection and
Hyperparameter Optimization (CASH) problem. CASH, (Thornton et al. 2013), is defined as the
search over a set of machine learning predictors and transformers 𝐴 = {𝐴(&) , … , 𝐴(0) } with their
associated hyperparameters Λ(&) , … , Λ(0) for an optimal combination 𝐴1∗ that maximizes the given
evaluation metric of the system on k sets of D.
0 Eq. 2
1
𝐴1∗ = 5 ℒ (𝐴1(#) , {𝑋!"#$%(%) , 𝑦!"#$%(%) }, {𝑋2#3$+ (%) 𝑦2#3$+ (%) })
𝑘
$-&

For the online learning setting, the AutoML problem can be formulated like the scenario presented
above with the following modifications.
1. The data is assumed to be infinitely long
2. Due to this, data is streamed with a temporal (Celik & Vanschoren, 2021)
3. Data must be processed in the order in which they arrive
4. Memory usage is restricted, hence all the data cannot be stored
5. Evaluation should be done in a prequential manner (more in Section 3.5)
Hence the objective for the online AutoML problem becomes the pipeline 𝐴1∗ ! at time steps t. 𝐴1∗ ! is
makes predictions and is evaluated on the { 𝑋! , 𝑦! } and is then trained on the last seen batch { 𝑋!4& ,
𝑦!4& }

0 Eq. 3
1
𝐴1∗ ! = 5 ℒ (𝐴1,! (#) , {𝑋!"#$%4&(%) , 𝑦!"#$%4&(%) }, {𝑋!"#$%(%) 𝑦!"#$%(%) })
𝑘
$-&

The distribution of the target variable may also change in an online learning setting for
supervised learning tasks; thus, it is important to detect this and adapt the learner to this new change.
3

1.3 Aims and objectives

This aim of this thesis is to extend the capabilities of the OAML framework with automated online
pre-processing algorithms, multiple adaptive mechanisms for online adaptation and an end-to-end
automatic online pipeline for regression tasks. This work is aimed at providing relevant results to the
following research questions:
• Research Question 1: How can online AutoML techniques be designed to address
regression problems?
• Research Question 2: How does the adaptation strategy for ensemble methods on online
AutoML influence the predictive performance?

Based on the research aim, the specific objectives are:


1. To design a new search space of online pre-processors and regressors in the OAML system.
2. To implement new drift detection algorithms for detecting concept drift in regression
3. To implement the Additive Expert (AddExp.) method and Dynamic Weighted Majority (DWM)
method for backup ensemble adaptation in the presence of concept drift.
4. To evaluate the performance of the extended OAML framework for regression and
classification tasks and compare against adaptive learners and the original OAML framework

1.4 Original Contributions

The major contributions of this dissertation are listed below:


• Development of an Online Adaptive AutoML framework for all supervised learning
tasks: An online adaptive AutoML framework in Python for classification and regression
tasks in an incremental learning setting
• Investigation of the performance of Adaptive Mechanisms in Online AutoML:
Performing empirical experiments on standard concept drift datasets for online regression
and classification, we report the effect of Adaptive Mechanisms in an online AutoML setting

1.5 Organisation of the dissertation

An overview of the problem addressed, aims and objectives of this dissertation have been discussed
in this Chapter. 2 focuses on the related work and explores the existing literature on the research
area. The methodologies and procedures undertaken to complete achieve the aims of this projects
are discussed in Chapter 3. The experimental setup and design are presented in Chapter 4. Chapter
5 presents and analyses the findings from experiments. Chapter 6 provides a summary of the results.
4
A critical evaluation of this work is done in Chapter 7. The conclusion and future recommendations
are provided in Chapters 8 and 9 respectively.
5

2 LITERATURE REVIEW

This chapter provides a background for this thesis. We start with a review of relevant AutoML
approaches and frameworks, followed by an analysis of literature on online learning and adaptation
mechanisms. We then discuss existing approaches for AutoML on streaming data that are integral
to the new system proposed in this dissertation.

2.1 AutoML

To solve the AutoML problem (as a CASH problem) on a given dataset, a machine learning solution
is automatically constructed from a search space of potential combinations of prediction algorithms
and associated hyperparameters. Early works addressed this problem from a Bayesian Optimization
(BO) perspective (Feurer, et al., 2015). In this approach a probabilistic surrogate model is selected
for modelling the objective function. The search is guided by measuring the output gotten from the
evaluating the objective function at different points. Although BO has seen success in offline
applications, for example, Auto-Weka (Thornton, et al., 2013), another stream of research in this
area focuses on Meta Learning (Muñoz, et al., 2017), in which a series of meta-features that capture
the properties of the data are extracted and used to infer model performance based on past
experience with similar data (without any model-training). Reinforcement learning (RL) is a relatively
new field in a machine learning, but it has also seen applications in AutoML (Zoph & Le, 2017). In
RL approaches to the problem, hyperparameter optimization is formulated as a policy to learn and
solved using RL techniques (Baker, et al., 2016). Grid Search is another technique for finding the
optimal configuration for a given search space. By randomly creating a grid of possible configurations
and searching through them in a distributed way, the H20.ai framework takes advantage of
parallelization to speed up grid search for AutoML problems (H2O.ai, 2017).
Another effective way of addressing the AutoML problem is to learn a distribution over
hyperparameters and continuously update it to improve the search, this method is achieved with
Evolutionary Algorithms (EA). Inspired by the process of natural selection, these set of algorithms
make use of phenomena like mutation and cross-over to iteratively improve solutions to a problem.
Examples of this approach can be seen in the python AutoML frameworks; Tree-based Pipeline
Optimization Tool (TPOT) (Olson, et al., 2016) and Genetic Automated Machine Learning (GAMA),
(Gijsbers & Vanschoren, 2020). Although TPOT and GAMA, make use of EA (particularly genetic
algorithms) to optimize machine learning pipelines, GAMA’s implementation uses asynchronous
evolution, which has the potential to speed up the search process. The original OAML framework
(Celik, et al., 2022) makes use of GAMA for pipeline optimization because of its effectiveness in
online adaptation (Celik & Vanschoren, 2021). Thus, this work which extends OAML also makes use
of GAMA. A summary of open-source AutoML frameworks are provided in Table 1.
6

AutoML Library Search Strategy


Auto-Weka Bayesian Optimization
H20.ai Grid Search
TPOT Genetic Programming
Auto-sklearn Sequential Model Based Algorithm Configuration
GAMA Random Search, Genetic Programming, Asynchronous Halving

Table 1: Current AutoML Libraries

2.2 Online Learning and Adaptation on Streaming Data

Online learning generally refers to machine learning techniques applied to a data stream. In this
setting, data is received on an instance incremental or batch incremental basis. Unlike the offline
setting, stream learning algorithms have memory restrictions, hence the whole training data is not
stored. Rather, this class of algorithms depend on adaptation to manage changing concepts across
the stream.

2.2.1 Concept Drift

Many prediction problems need a model that continuously receives data and thus cannot realistically
work in an offline setting with historical or static data. This is the case in big data applications, for
example, credit scoring where the goodness of credit is revealed after some time (Dal Pozzolo, et
al., 2015), or stock market prediction where the prices of stocks are revealed after prediction has
been made (Hazan & Seshadhri, 2009). In these environments, the data distribution is subject to
change over time, leading to a phenomenon known as concept drift (Schlimmer & Granger, 1986);
(Gama, et al., 2014). Concept drift could occur when the independent probability 𝜌(𝑥) changes
(virtual concept drift) or the conditional probability distribution 𝜌(𝑦|𝑥) changes (real concept drift).
According to literature (Tsymbal, et al., 2008); (Žliobaitė, 2010), these two kinds of concept drift can
be treated as the same since a model update is required once a drift is detected. Concept drift can
also be grouped as follows (Gama, et al., 2014):
• Sudden/Abrupt drift: Sudden change of concept at a certain time 𝜏
• Incremental drift: Slow change involving intermediate changes before the start of the drift to
the final concept after the drift is finished
• Gradual drift: Gradual replacement of a concept with another over a period 𝜏 6 − 𝜏&
• Reoccurring concept: Concepts which have been replaced, reappear at a later stage in the
stream. As can be seen in seasonal processes
7

Figure 1: Types of Concept Drift (Gama, et al., 2014)

2.2.2 Drift Detection

When a concept drift occurs, the predictions made by models become less accurate with time and
hence the need to detect the presence of this drift and adapt the model to it. A simple way of
detecting drift is the Page Hinkley test (Page, 1954), which computes the mean of an input variable
(observed data or prediction accuracy) up to the current moment. As soon as the variable differs
significantly from its historical average, a change is flagged. More recent approaches to detecting
drift, adopt a sliding window approach for which relevant statistics are computed for that window.
Adaptive Windowing (ADWIN) (Bifet & Gavaldà, 2007)analyses the average of some statistic relies
on two detection windows and as soon as the two windows are distinct (for the observed statistic) a
flag is triggered. Drift Detection Method (DDM) (Gama, et al., 2004) is a sliding window method that
works on the premise that a learner’s error rate will decrease as the number of analysed samples
increase for a constant distribution. A warning zone or flag is triggered as soon as the algorithm
detects an increase in the error rate. Early Drift Detection Method (EDDM) (Baena-Garcia, et al.,
2006) aims to improve upon the detection rate of concept drift in DDM by keeping track of the
average distance between two errors instead of only the error rate.

2.2.3 Adaptation mechanisms under concept drift

It is essential for models to adapt when a drift is detected. In this work, we make use of the definition
of adaptation by Bakirov (2017), as the process of updating a model’s training data coverage,
structure, and parameters to improve its predictive accuracy in response to changes in the data
stream. Algorithms may adapt by changing their structure e.g., Concept-Adapting Very Fast Decision
Tree (Hulten, et al., 2001), data coverage e.g., K-Nearest Neighbours or changing model
parameters.
In this work we make use of the parameter adaptation, model hyper-parameters are updated
in our Ensemble techniques (more in Section 3.4.1)
8
2.2.4 Ensemble methods of Adaptation

Originally proposed in 1969 for forecasting on stationary airline passenger data (Bates & Granger,
1969), ensemble methods have become a mainstream approach for prediction in offline and online
settings. An ensemble (in the context of machine learning) is a combination of multiple models for
prediction (Ruta & Gabrys, 2000). Research shows that these models can have a higher prediction
accuracy (Freund & Schapire, 1997); (Dietterich, 2000) and better generalisability (Brieman, 1996)
when compared with single “experts” on stationary data. Recent studies have also shown that the
same is true in a streaming setting (Kadlec & Gabrys, 2011); (Shao & Tian, 2015).

An important aspect of ensemble adaptation in an online setting is the re-combination of


weights of the base learners (Elwell & Polikar, 2011). This adaptation is often done based on each
learner’s prediction accuracy. Strategies for adapting an ensemble include adaptation via weight
combination and via the addition/removal of experts.

2.2.4.1 Adaptation via combination of weights

This method of adaptation involves changing the combination weights of experts in an ensemble. As
seen in (Kuncheva, 2004), combination can be done by using the global weights of experts (known
as fusion) or by selecting a single learner (known as selection). The special case where all experts
have a weight of 0 except one can also be termed selection. Mathematically, for a set of 𝐼 experts
𝑆 = {𝑠$ , …, 𝑠7 } with weight vectors 𝝎 = {𝜔& , …, 𝜔7 } and predictions 𝒚
G = {𝑦6$ , … , 𝑦67 } ∀ 𝑖 = 1 … 𝐼. For a
classification problem with labels 𝐶 = {𝑐& , …,𝑐8 } ∀ 𝑗 = 1 … 𝐽, the aggregated weight of experts that
predicted 𝑐, is 𝑧, = ∑7$-& 𝑤$ 𝑎$,, and final prediction is given as:
Eq. 4
𝑦6 = 𝑎𝑟𝑔𝑚𝑎𝑥S𝑧, T
𝑐,

For regression, the ensemble prediction is the weighted sum of experts, given as:
Eq. 5
∑7$-& 𝜔$ 𝑦6$
𝑦6 =
∑7$-& 𝜔$

The special case where weights 𝜔& = 𝜔6 = ⋯ = 𝜔7 is known as un-weighted combination

2.2.4.2 Adaptation via adding and removing predictors

Ensembles can also be adapted by changing their structure. In this method of adaptation, a base
learner is added or removed. Addition of experts to the ensemble can be done to bring new learning
concepts to the model or dispose of old data (Bakirov, 2017). This method of adaptation is suitable
when other methods of adaptation (e.g., weight combination) do not produce satisfactory results.
9
Addition or removal of experts can be done on a regular basis (e.g., according to a schedule) or
according to a defined trigger mechanism (e.g., when a condition is satisfied). When the latter occurs,
it typically signifies that there is a possible ongoing change in the data. New experts may be
introduced after every misclassification and reducing the weights of underperforming experts, (Kotler
& Maloof, 2005) showed that this method of adaptation can yield good results. However, to address
the problem of increased complexity of the ensemble that could arise from this approach, (Kotler &
Maloof, 2007) proposed adding an expert at every Ω!9 misclassified instance. (Kotler & Maloof, 2005)
propose adding a new expert for the regression case if |𝑦6! − 𝑦! | > 𝜁, where 𝑦6! , 𝑦! are predicted
value and the real values respectively at time 𝑡, and 𝜁 is the predefined threshold for triggering the
addition of an expert. Slight variations of these methods of expert addition are used in this
dissertation. Other (trigger) methods of adding experts to ensembles are discussed in (Bakirov &
Gabrys, 2013); (Bakirov, 2017). Expert addition according to a defined schedule or fixed interval,
regardless of the performance of the model is also another popular strategy (Scholz & Kilkenberg,
2007); (Elwell & Polikar, 2011); (Gomes & Araújo, 2015a).
Experts are often removed from an ensemble due to changes in the data, unsatisfactory
performance (Gomes & Araújo, 2015b) , their age (Hazan & Seshadhri, 2009), or when an old expert
is substituted for a new expert due to poor performance (Street & Kim, 2001). The different strategies
for adapting ensembles are illustrated in Figure 2.

Figure 2: Strategies for ensemble adaptation (Bakirov, 2017)


10

2.3 AutoML for Online Learning

The application of AutoML in online settings has gained attention amongst the machine learning
research community. This was evidenced in Neural Information Processing Systems (NIPS) 2018
AutoML Challenge (Codalab, 2018) formulated as a Lifelong AutoML problem. (Madrid, et al., 2018)
propose a solution to the challenge by extending the Autosklearn library (Feurer, et al., 2015) with a
Fast-Hoeffding Drift Detection Method (FHDDM). In the proposed solution, the model is either
improved or replaced when a drift is detected. The best results from this approach are obtained by
training a fresh model on the entire dataset, which may not be possible in a big data application (due
to memory limitation) and does not satisfy the problem defined by Section 1.2 Another prominent
solution to the AutoML challenge is an adaptive self-optimized end-to-end machine learning pipeline
proposed by (Wilson, et al., 2020). Their solution relies on boosted Decision Trees, with automated
hyperparameter tuning. Unlike the previous solution, it lacks an explicit algorithm for drift detection,
it instead depends on the implicit drift detection of LightGBT library (Ke, et al., 2017) to address
change. Although, this solution produced very good results, it is somewhat limiting in the context of
the CASH or AutoML problem and lacks flexibility for other use cases.
Beyond the NIPS 2018 AutoML challenge, (Bakirov, et al., 2018); (Bakirov, et al., 2021)
propose the automation of the selection of an adaptation strategy for a given stream learning
algorithm whenever a drift occurs. These methods of automated adaptation are however only
applicable to a single model. (Wu, et al., 2021) propose the ChaCha (Champion-Challenger)
algorithm built on FLAML (Wang, et al., 2019) for automatically finding hyperparameters in an online
setting by considering only one base learner at a time. Hence it does not fully address the CASH
problem defined in Section 1.2.
(Celik & Vanschoren, 2021) investigate the performance of several AutoML methods under
concept drift and propose several strategies for their adaptation when a drift occurs. This is done by
integrating multiple adaptation strategies with well-known AutoML techniques. This work led to the
development of the OAML framework (Celik, et al., 2022), an adaptive AutoML framework for online
learning. This technique addresses the AutoML / CASH problem (with memory constraints) by
automatically searching for optimal pipelines (inclusive pre-processing steps) that use online
learning algorithms to adapt to concept drift. Furthermore, it is fitted with an explicit drift detection
algorithm (EDDM) that triggers an adaptation (redesign or retuning of pipelines) when certain drift
conditions are met. While the results corroborate that AutoML in an online setting can recover from
concept drifts and yield competitive results, the framework is limited to only classification tasks,
makes use of offline pre-processors (which are not applicable in an online setting) and does not
explore different methods of ensemble adaptation. The extended OAML system proposed in this
dissertation explores multiple ensemble adaptation mechanisms, includes online pre-processors and
is (at the time of writing) the first to propose an AutoML system capable of creating full pipelines for
both regression and classification tasks.
11

2.4 Evaluation Techniques

A key part of any machine learning system is its evaluation methodology. Evaluation of a learning
system serves two purposes: To assess the hypothesis inside of the learning system and to estimate
how applicable the learning system is to a given problem. In an online setting however, the data
stream is unbounded, and classic methods for evaluating machine learning systems such as cross-
validation and train-test split are not applicable (Gama, et al., 2009). To address this problem, two
common methods of evaluating an online learning model presented in literature are described below:
1. Periodic Holdout: This technique involves holding out an independent test set at scheduled
intervals for testing the online model. The test data (of a predefined size) is held out from the
data stream and renewed after a given time frame or data instances. Hence, the data used
in the test set is never used for training.
2. Prequential Selection (Dawid, 1984): In this method, the online model makes a prediction
on each data sample (testing) before training. Hence, each data sample has two functions:
testing the online model and then training the online model. Before training, predictions are
made based on the attribute-values of data instances. Afterwards, the prequential-error is
computed according to a loss function between the observed value and predicted value and
its metrics are updated. Unlike the periodic holdout technique, all data samples are used for
training, and no additional memory is allocated for a holdout set.

Figure 3: Online Evaluation: (a) Periodic Holdout (b) Prequential Selection


12

3 METHODOLOGY

The nature of the problem addressed in this project can be divided into two main groups; algorithm
design (to solve the CASH problem) and software development (for the implementation of the online
AutoML framework). Thus, the methodology used in this dissertation differs from a traditional data
science project where already existing algorithms are chosen to address a specific business problem
on a given dataset. This dissertation is instead focused on the development of a novel framework
for addressing the online AutoML problem capable of handling concept drift for all supervised
learning tasks. The proposed solution is built on the OAML framework (Celik, et al., 2022), and
includes techniques derived from extensive literature survey. The justification for adopting OAML
framework is that it is the only available framework (at current time of writing this dissertation) that
implements an automated system for online learning that fully addresses the CASH problem defined
in Section 1.2 (although limited to classification tasks alone). Furthermore, it is an Open-Source
framework and provides flexibility for modification and extension, hence making it suitable for
achieving the goals of this dissertation.
The complexity, originality and relevance of this dissertation is evidenced by the growing
research interest in the fields of AutoML (Chen, et al., 2022) and stream learning (Bakirov, et al.,
2021). Whist, the use of AutoML in an online setting is explored in recent literature for classification
tasks (Celik, et al., 2022); (Wu, et al., 2021); (Wilson, et al., 2020), the application of online AutoML
for regression problems remains unexplored. Hence, one of the key contributions of this project is to
fill the knowledge gap and contribute a fully Open-Source framework for online AutoML for all
supervised learning tasks.

3.1 Overview

The OAML framework (Celik, et al., 2022) is chosen as a suitable framework (justification in Section
3Error! Reference source not found.) and extended with a regression search space (Section 3.2),
sound ensemble adaptive mechanisms, online pre-processing techniques and techniques for
handling online regression tasks. The goal of the search phase is to construct an end-to-end machine
learning pipeline comprising of one or more estimators including pre-processing and or prediction
steps with their associated hyperparameters. This combined algorithm selection and
hyperparameter optimization step is achieved via the use of select optimization algorithms described
in Section 3.2. The single-best found pipeline is assigned the online model for prediction. Data is
streamed in an incremental one-by-one fashion and when a drift is detected, the online model adapts
to this change using a pre-defined strategy: by automatically starting a new search for better
pipelines. We implement both the scheduled and trigger-based adaptation strategy for handling
concept drift in the data stream (Section 3.4)
13
The new system is evaluated on industry standard datasets for stream learning and compared with
existing baselines (Section 5). The obtained results are critically evaluated against the research
questions defined in Section 1.3.

3.2 Search Space Design

The search space for the proposed OAML system comprises a large set of online pre-processors,
online regressors, online classifiers and online ensemble methods. These algorithms have all been
in River (Montiel, et al., 2020), an online machine learning library written in Python. River
implementations of these algorithms have been chosen because it supports incremental learning
(suitable for streaming data) and includes inherent adaptive methods (for handling drift). For
classification tasks, we maintain the same classifier algorithms included in the search space of the
original OAML framework (Celik, et al., 2022), since the results show very good results in handling
concept drift.
The regression algorithms included in the search space comprise of simple regressors and
adaptive tree/ensemble methods that are capable of handling concept drift. The selection of
algorithms includes Adaptive Random Forests Regressor (Gomes, et al., 2018), Hoeffding Adaptive
Tree Regressor (Bifet & Ricard, 2009), Oza Bagging Regressor (Oza & Russel, 2005), Exponentially
Weighted Average Regressor (Kivenen & Warmuth, 1997). The experts used in ensemble methods
are online versions of Linear Regression and Hoeffding Tree Regressor. Furthermore, we have also
included a simple version of Linear Regressor as an independent learner in the search space since
literature shows good results can be gotten by switching to simple methods when a drift can occurs
(Baier, et al., 2020).
Although the search space for the original OAML framework includes online pre-processors
for data normalization and scaling such as Adaptive Standard Scaler, Robust Scaler, amongst others
from the river machine learning library. These pre-processors were however not implemented into
the AutoML system (hence are absent from the final AutoML Pipelines). In this work, we maintain
the same pre-processors for data scaling and normalization and implement them into the AutoML
system, making them available for selection by the AutoML Pipeline. In addition to this, we extend
the search phase with online versions of pre-processing algorithms for missing value imputation and
categorical variable encoding. The selected algorithms include Previous Imputer and One Hot
Encoder.
Full descriptions and details of all algorithms included in the search space for pre-processing,
regression and classification are available in river (Montiel, et al., 2020). For classification algorithms
Model Hyperparameters Search range

Linear Regressor l2 {0, 1.00E-5, 1.00E-7}

n_models [15 40]


max_features {0.2, 0.5, 0.7, 0.12, 1.0, sqrt. log2, None}
aggregation_method {mean, median}
Adaptive Random Forest
lambda_value [2 10]
Regressor
grace_period [50 350]
split_confidence {1.00E-1, 1.00E-4, 1.00E-5, 1.00E-9}
tie_threshold {0.01, 0.02, 0.08}
leaf_prediction {mean, model, adaptive}
Oza Bagging Regressor model {LinearRegression, HoeffdingTreeRegression}
model_selector_decay {0.2, 0.4, 0.7}
n_models [1 20]
grace_period [50 350]
split_confidence {1.00E-2, 1.00E-4, 1.00E-7, 1.00E-9}
tie_threshold {0.01, 0.02, 0.08}
Hoeffding Adaptive Tree Regressor leaf_prediction {mean, model, adaptive}
model_selector_decay {0.2, 0.4, 0.7}

Table 2: Designed Search Space (regression) for extended OAML


min_sample_split {3, 5, 7, 10}
bootstrap_sampling {True, False}
drift_window__threshold
models [100 500]
{LinearRegression, HoeffdingTreeRegression}
EWA Regressor
adwin_confidence
learning_rate {2.00E-2,
{0.01, 0.3, 0.5}2.00E-4}
0.1,2.00E-3,
14
15
we maintain the same hyperparameters included in the original OAML framework (Celik, et al.,
2022). Hyperparameters for regression and pre-processing algorithms have been chosen based on
experimentation done in the development of this work and are provided in Table 2.

3.3 Combined Algorithm Selection and Hyperparameter Optimization

The choice of an optimization algorithm for solving the CASH problem is a key part of any AutoML
system. While the GAMA library allows for the implementation of custom search algorithms, this work
makes use of the default search algorithms provided by the library due to their reported speed and
effectiveness (Gijsbers & Vanschoren, 2020).

3.3.1 Optimization Algorithms

The following search algorithms (implemented in GAMA) are available for solving the CASH problem.
Note, for the classification task the objective function is framed as a maximization problem (e.g.,
maximization of accuracy), while regression is framed as a minimization problem (e.g., minimization
of root mean squared error)

Random Search (RS): In this search method, a machine learning pipeline is sampled at random
from the search space and evaluated. This naive optimization technique is effective because it does
not make any assumption about the structure of the objective function and allows for non-intuitive
combination of hyperparameters to be discovered.

Asynchronous Evolutionary Optimization (AEO): Evolutionary algorithms are becoming


mainstream for tuning the parameters of large machine learning models (Scottt & De Jong, 2015).
They are heuristic search algorithm based on the principle of natural selection that can capture global
optimal solutions of complex optimization problems. In the proposed OAML system, evolutionary
search works by continuously evolving a population of machine learning pipelines, creating new
learning pipelines from the fittest in the population. The EA implemented used in the proposed
system via the GAMA library is parallelized using an asynchronous approach to reduce the amount
of wall-clock time it takes to find a solution.

Asynchronous Successive Halving (ASHA): Successive halving is a multi-armed bandit algorithm


(Thompson, 1993) to perform principled early stopping. The optimization begins by first assigning a
resource budget to a set of candidates (machine learning pipelines) in the search space. Each
candidate is evaluated (on the data), and the best half are promoted to the next phase. The budget
for each candidate is then doubled and the previous step is repeated until the single best candidate
16
remains. Like the EA implementation in GAMA, successive halving is parallelized using an
asynchronous approach.
The proposed OAML system grants flexibility in choice of the Optimization algorithm and the
performance of these algorithms are discussed and critically evaluated.

3.4 Online AutoML Model

An overview of the proposed OAML systems is shown in Figure 1. The system is divided into two
stages: the AutoML stage (left in in Figure 1) and the online learning stage (right in in Figure 1). The
system is initialised with an initial batch of 𝑏# samples from a stream of data 𝑿, 𝒚, and a supervised
learning task (classification or regression). For the assigned learning task, the search algorithm 𝑆
trains and evaluates machine learning pipeline configurations from the search space with a metric
𝑀: over 𝑏# within a time budget 𝑡(#; . It is worth noting that different search spaces are used for
regression and classification tasks (as shown in Figure 4).
The best machine learning pipeline, 𝑃<∗ , found at time 𝑡(#; is trained, fitted to 𝑏# and set as
the current online model 𝐴< . In the online learning stage, the data is streamed on an incremental
one-by-one basis, that is, 𝑃<∗ makes a prediction 𝑦6! for data features 𝑋! . After prediction the true
value of the target, 𝑦! , becomes available, the online model is evaluated and metric 𝑀: is updated.
To check for concept drift, the explicit drift detector, 𝐷>? , is updated with values (𝑦6! , 𝑦! ).
When a drift is detected, the system activates the AutoML phase to search for new pipelines
with the last seen batch of 𝑏# samples from the data stream. A new AutoML pipeline search may
also be triggered by a scheduled model update scheme if the pipeline does not change after 𝑘#
iterations.

Figure 4: Extended OAML System


17
The extended OAML system two adaptation strategies to update the online model after a trigger
point is reached. These are the Basic and Ensemble strategies. The Basic strategy is similar in
function to the original OAML system, while the Ensemble strategy is fitted with multiple adaptive
mechanisms. Unlike the original OAML system, we leave out the Model Store strategy because our
results show that there is no increase in performance with the additional complexity that this strategy
introduces.

3.4.1 Basic Adaptation Strategy

In this adaptation strategy, the online model is replaced with a new model when a trigger point is
reached at time 𝑡. At the start of stream and when (𝑦6! , 𝑦! ) does not trigger the drift detector, the best-
found pipeline 𝑃<∗ from the search phase is incrementally trained on (𝑋! , 𝑦! ). When a trigger point is
reached the AutoML phase is restarted to search for a new pipeline 𝑃!∗ with the last sliding window
((𝑋!4%& , 𝑦!4%& ), (𝑋! , 𝑦! ). If the performance of 𝑃!∗ is better than the current pipeline 𝑃<∗ on the evaluation
metric, then 𝑃<∗ is discarded and the online model is set to 𝑃!∗ , else 𝑃<∗ remains as the active online
model and the stream learning phase resumes. This global replacement strategy is suitable for data
with frequent and sudden concept drift. Furthermore, this strategy has low memory requirements,
since only a controlled volume of data is stored at a given time. Hence, it is suitable for applications
with limited memory resources. A drawback to this approach, however, is that old models which may
carry useful information for future data instances are completely discarded. The pseudocode for this
step is provided in Algorithm 1.

Algorithm 1: Extended OAML - Basic

Inputs: 𝑏! , 𝑏" , 𝑀# , 𝑡$"% , 𝑆(), (𝑿, 𝑦), 𝑛 𝑤ℎ𝑒𝑟𝑒 𝑏! ≥ 𝑏" , 𝑎𝑙𝑡&'"()


Initialization: 𝑡𝑎𝑠𝑘 𝜏, 𝑂𝐴𝑀𝐿*)#":9 ⇐ 𝑆 ( ), 𝐷>? ⇐ 𝐷𝑟𝑖𝑓𝑡 𝐷𝑒𝑡𝑒𝑐𝑡𝑜𝑟()
1 𝑃@∗ ⇐ 𝑎𝑟𝑔𝑚𝑖𝑛𝑡𝑚𝑎𝑥 , 𝑂𝐴𝑀𝐿*)#":9 (𝑋A$%+@A , 𝑦A$%+@A )
2 LastTrain = i
3
4 𝑖 ⇐ 𝑏0 + 1
5 𝑡 ⇐0
6 𝐴< ⇐ 𝑃<∗
7 while 𝑋$ ∈ 𝑋⃗ do
8 predict 𝑦6$ ⇐ 𝐴< (𝑋$ )
9 evaluate 𝑀: (𝑦6$ , 𝑦$ )
10 train 𝐴< (𝑋$ , 𝑦$ )
11 update 𝐷𝐸𝐷 ⇐ 𝑀𝑐 (𝑦6$ , 𝑦$ )
12 if 𝐷>? 𝑖𝑛 𝑑𝑟𝑖𝑓𝑡 𝑧𝑜𝑛𝑒 ∨ 𝑖 − 𝐿𝑎𝑠𝑡𝑇𝑟𝑎𝑖𝑛 ≥ 𝑎𝑙𝑡𝑡𝑟𝑎𝑖𝑛
13 𝑡 ⇐𝑡+1
18
14 𝑋A$%+@A = (𝑋$4G@ , … , 𝑋$ ), 𝑦A$%+@A = (𝑦$4G@ , … , 𝑦$ )
15 𝑃!∗ ⇐ 𝑎𝑟𝑔𝑚𝑖𝑛𝑡𝑚𝑎𝑥 , 𝑂𝐴𝑀𝐿*)#":9 (𝑋A$%+@A , 𝑦A$%+@A ), LastTrain = i
16 𝑀!H ⇐ 𝐴< (𝑋A$%+@A , 𝑦A$%+@A ), 𝑀!I∗ ⇐ 𝑃!∗ (𝑋A$%+@A , 𝑦A$%+@A )
17 if 𝑀!> ≤ 𝑀!I∗ then
18 𝐴< ⇐ 𝐴<
19 else
20 𝐴< ⇐ 𝑃!∗

3.4.1 Ensemble Adaptation Strategy

The Ensemble strategy follows a similar procedure to the Basic strategy except for its adaptation.
Instead of a global replacement scheme for pipelines, the set n pipelines (referred to as experts) with
the highest prediction accuracy is created 𝐸 = {𝑃$∗ }. The extended OAML systems introduces three
modes of adaptation for the Ensemble strategy.
1. Unweighted Ensemble: In this mode of adaptation, each expert in 𝐸 is assigned a weight
equal to 1. Thus, expert votes are weighted equally.
2. Dynamic Weighted Majority (for classification tasks only): Each expert in the ensemble is
initialised with a weight of 1. When misclassifications are made by experts, their weights are
adjusted by a constant 𝛽, where 0 < 𝛽 < 1. All expert weights are normalised after each
prediction to minimize the prevalence of freshly introduced experts to the ensemble. Expert
predictions are aggregated according to the weight of each expert in the ensemble
3. Additive Expert (AddExp)
a. AddExp for Discrete Classes (AddExp.D): Experts are initialised with a weight of 1
(when the ensemble is empty) or the sum of expert weights in the ensemble times a
constant 𝛾. After prediction is made, experts that predicted incorrectly have their
weights reduced by a multiplicative constant 𝛽 and all expert weights are normalised.
b. AddExp for Continuous Classes (AddExp.C): The overall operation here is similar to
AddExp.D. However, it is suitable for only regression tasks. AddExp.C predicts the
sum of all expert predictions, weighted by their relative expert weight. Weights are
updated according to 𝜔!'&,$ = 𝜔!,$ 𝛽JK',% 4 L' J

For all three modes of adaptation, when the OAML system is triggered at time t (due to a drift or
schedule), a new pipeline search begins and the best-found pipeline 𝑃!∗ is compared with the backup
ensemble, 𝐸! . If the predictive performance of 𝐸! is better than 𝑃!∗ over the last sliding, the current

online model 𝑃!4& is replaced with 𝐸! . Furthermore, instead of discarding the 𝑃!∗ , it is added to the
ensemble.
19
When the ensemble size limit n is reached, the new model is added to the ensemble by
replacing the least performing model. It is important to note that the original OAML system replaces
the oldest model first, however we adopt this approach with the rationale that the oldest expert may
have a better predictive performance than some newly added experts (e.g., when a bad expert is
added to the ensemble). The Ensemble strategy addresses the shortcomings of the Basic strategy
and helps to retain models that may have a good predictive performance on future data instances.
The pseudocode of the ensemble mode is shown in Algorithm 2.

Algorithm 2: Extended OAML - Ensemble

Inputs: 𝑏! , 𝑏" , 𝑀# , 𝑡$"% , 𝑆(), (𝑿, 𝑦), 𝑛 𝑤ℎ𝑒𝑟𝑒 𝑏! ≥ 𝑏" , 𝑎𝑙𝑡&'"()


Initialization: 𝑡𝑎𝑠𝑘 𝜏, 𝑂𝐴𝑀𝐿*)#":9 ⇐ 𝑆 ( ), 𝐸< = {}, 𝐷>? ⇐ 𝐷𝑟𝑖𝑓𝑡 𝐷𝑒𝑡𝑒𝑐𝑡𝑜𝑟()
1 𝑃@∗ ⇐ 𝑎𝑟𝑔𝑚𝑖𝑛𝑡𝑚𝑎𝑥 , 𝑂𝐴𝑀𝐿*)#":9 (𝑋A$%+@A , 𝑦A$%+@A )
2 LastTrain = i
3 Append 𝑃<∗ to 𝐸<
4 𝑖 ⇐ 𝑏0 + 1
5 𝑖 ⇐0
6 𝐴< ⇐ 𝑃<∗
7 while 𝑋$ ∈ 𝑋⃗ do
8 predict 𝑦6$ ⇐ 𝐴< (𝑋$ )
9 evaluate 𝑀: (𝑦6$ , 𝑦$ )
10 train 𝐴< (𝑋$ , 𝑦$ )
11 update 𝐷𝐸𝐷 ⇐ 𝑀𝑐 (𝑦6$ , 𝑦$ )
12 if 𝐷>? 𝑖𝑛 𝑑𝑟𝑖𝑓𝑡 𝑧𝑜𝑛𝑒 ∨ 𝑖 − 𝐿𝑎𝑠𝑡𝑇𝑟𝑎𝑖𝑛 ≥ 𝑎𝑙𝑡𝑡𝑟𝑎𝑖𝑛
13 𝑡 ⇐𝑡+1
14 𝑋A$%+@A = (𝑋$4G@ , … , 𝑋$ ), 𝑦A$%+@A = (𝑦$4G@ , … , 𝑦$ )
15 𝑃!∗ ⇐ 𝑎𝑟𝑔𝑚𝑖𝑛𝑡𝑚𝑎𝑥 , 𝑂𝐴𝑀𝐿*)#":9 (𝑋A$%+@A , 𝑦A$%+@A ), LastTrain = i
16 𝑀!> ⇐ 𝐸! (𝑋A$%+@A , 𝑦A$%+@A ), 𝑀!I∗ ⇐ 𝑃!∗ (𝑋A$%+@A , 𝑦A$%+@A )
17 if 𝑀!> ≤ 𝑀!I∗ then
18 𝐴< ⇐ 𝐸!
19 else
20 𝐴< ⇐ 𝑃!∗
21 append 𝑃!∗ to 𝐸!
22 if |𝐸𝑡 | ≥ n then
23 remove worst 𝑃!∗ from 𝐸!
20

3.5 Evaluation Protocol

A well-designed Online AutoML system should improve with experience from observed data and be
able to generate compact and approximate representations of its observed values. In this work, we
make use of the Prequential Evaluation technique (Section 2.4), to evaluate the effectiveness of our
proposed system. The interleaved test-then-train method is chosen over Holdout technique since it
makes use of all the data available in the stream, therefore avoiding test selection bias. Furthermore,
it does not depend on the test set selection and provides good error estimates in the presence of
concept drift (Gama, et al., 2009). Mathematically, this is calculated as:
$ $ Eq. 6
1 1
𝑃) (𝑖) = 5 ℒ(𝑦% , 𝑦6% ) = 5 𝑒%
𝑖 𝑖
%-& %-&

Where ℒ, 𝑦% , G𝑦% are the same as in Eq. 1.

Figure 5: Prequential Evaluation

In our OAML system, each observation in the data stream consists of attributes-target pairs (𝑿, 𝑦)
and the following steps are performed in the prequential evaluation process:
1. Make a prediction, G𝑦% , for a single observation of 𝑿
2. Compute the prediction loss given the observed target 𝑦%
3. Update the model with (𝑦% , G𝑦% )
4. Proceed to the next observation

Our OAML systems allows users for flexibility in choice of a loss metric. Loss in Regression tasks
can be computed using the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE),
while the loss in classification tasks can be computed using the F1, Accuracy, Precision and Recall
scores.

3.5.1 Evaluation Data Streams

The proposed system is evaluated on 12 well-known benchmark data streams for regression and
classification tasks in an online setting in literature. For each prediction task (classification and
regression), we make use of 3 real world data streams from industrial processes which are volatile
and are susceptible to high and abrupt drift and 3 artificial data streams. A description of each data
stream is given in Table 3.
21

Data Stream Description

a time series data collected from the Australian New South


Wales market. Prices are volatile and vary with demand and
Electricity (Harries,
supply of the market every. Price change occurs at an interval
1999)
of five minutes throughout the data stream. The class label is
indicative of the direction of price change into a moving average.
time series flight data with schedule information which can
Airlines (Bifet, et al.,
change daily or weekly. The class label indicates whether the
2011)
flight arrived on time.
sensor data collected from smart devices. Prediction task is to
Run or Walk
detect is a person is running or walking. Drift is introduced to the
Information
stream by this change.
data generated from a Streaming Ensemble Algorithm (SEA).
SEA Abrupt (Street &
Abrupt. Changing concepts are added to the stream by
Kim, 2001)
switching between one of four possible classification functions.

SEA Mixed (Street & contains data generated from a Streaming Ensemble Algorithm
Kim, 2001) (SEA). Abrupt concept drift is added like in SEA Abrupt data.
data generated from a Rotating Hyperplane Algorithm (ROA).
By determining the orientation of a point on a rotate hyperplane,
Hyperplane (Hulten, et
a binary classification problem is formed. Change of concept is
al., 2001)
introduced by changing the weights and reversing the direction
of the hyperplane.
a real-world condition-based simulation of highly volatile catalyst
Catalyst activation activation in a multi-tube reactor. Flows, concentrations, and
(Strackeljan, 2006) temperature gotten from 14 sensor measurements determine
catalyst activity. The duration of data spans 1 year.
Sulfur Recovery data from a sulfur recovery unit. Gas and air flow measurements
(Fortuna, et al., 2003) determine the 𝑆𝑂6 output in the recovery unit.
collected from a debutaniser column, temperature, pressure,
Debutaniser Column
and flows determine the concentration of the gas at the output
(Fortuna, et al., 2005)
of the system.

Bank (Akujuobi & simulation of how customers choose their banks. Target is the
Zhang, 2017) fraction of customers who leave the bank due to full queues.

Friedman (Breiman, et simulates impedance and phase changes that occur in an


al., 1984) alternating electric circuit using 4 variables.
22

2D Planes (Brieman, similar to the Friedman dataset, it is an artificially generated.


1996) Dataset commonly used to test concept drift.

Table 3: Data Streams used in Evaluation Protocol


23

4 EXPERIMENT DESIGN

In this chapter, we setup several experiments to evaluate our OAML system on data streams with
various kinds of drift, analyse the results and contrast with the original OAML system (Celik, et al.,
2022). This is done with reproducibility in mind to aid future research work on online automatic
machine learning. The code, data streams and results are open source and publicly available on
GitHub.

4.1 Data Streams

For the empirical evaluation of our OAML system we used benchmark data from concept drift data
streams. Concept drift data streams are good for the evaluation of online learning systems because
they comprise a dynamic data distribution which is typical in an online setting. Table 4 shows a
summary of the characteristics of each data stream.

Data Stream Type Task Samples

Electricity (Harries, 1999) Real Classification 45,312


Airlines (Bifet, et al., 2011) Real Classification 539,383
Run or Walk Information Real Classification 88,588
SEA Abrupt (Street & Kim, 2001) Synthetic Classification 150,000
SEA Mixed (Street & Kim, 2001) Synthetic Classification 150,000
Hyperplane (Hulten, et al., 2001) Synthetic Classification 500,000
Catalyst activation (Strackeljan,
2006) Real Regression 5,867
Sulfur Recovery (Fortuna, et al.,
2003) Real Regression 10,081
Debutaniser Column (Fortuna, et al.,
2005) Real Regression 2,394
Bank (Akujuobi & Zhang, 2017) Synthetic Regression 8,192
2D Planes (Breiman, et al., 1984) Synthetic Regression 3,500
Friedman (Friedman, 1991) Synthetic Regression 5,670

Table 4: Overview of data streams used in the evaluation protocol


24

4.2 Configuration of OAML System

Similar to the original OAML system (Celik, et al., 2022), our system can be configured to be run with
different user-defined settings. These settings may vary with the specific application of the system
and their default values used for experimentation is provided below.
1. Initial batch size (𝑏# ) is to 5000 samples for classification tasks and 500 for regression tasks.
Classification data streams have significantly more data samples (Table 4) than regression
data streams, hence the difference in 𝑏# .
2. Sliding window size (𝑏* ) is set to 2000 samples for classification tasks and 200 for
regression tasks.
3. AutoML Search Budget (𝑡(#; ) is set to 60 seconds for all prediction tasks. This value has
been chosen for the experimental setup only and should be increased for better performance
in a real-world setting
4. Performance Metric (𝑀N ) is prequential accuracy for classification tasks and prequential
RMSE for regression tasks
5. Online Metrics (𝑀< ) is prequential accuracy for classification tasks and prequential RMSE
for regression tasks
6. Search Algorithm (𝑂𝐴𝑀𝐿O)#":9 ) can be configured to use ASHA, RS or AEO. While
experiments are conducted with AEO, the effect of the choice of a search algorithm on our
OAML system is also compared
7. Ensemble Adaptative Mechanism can also be configured to use DWM, AddExp or
Unweighted Voting. While experiments are conducted with DWM, the effect of the choice of
a search algorithm on our OAML system is also compared.
8. Drift Detection is set to EDDM for classification tasks due to its ability to detect high and
abrupt drift and ADWIN for regression tasks since EDDM cannot be applied to a continuous
target
9. Alternative Detector (𝑎𝑙𝑡!"#$% ) is set to 50000 samples for classification tasks and 5000
samples for regression tasks

4.3 Baselines

The performance of our OAML system is compared with competitive alternative techniques in online
learning and AutoML. For online learning baselines, we include Leverage Bagging algorithms (Bifet
& Gavaldà, 2007); (Bifet & Ricard, 2009) which have been shown to outperform other stream learning
algorithms in literature and Hoeffding Adaptive Trees (Hulten, et al., 2001) as a competitive non-
ensemble algorithm. ChaCha (Wu, et al., 2021) and the original OAML (Celik, et al., 2022) are (at
the time of writing) the only freely available AutoML algorithms capable handling concept drift,
25
although they are limited to classification tasks only. An overview of the baselines is presented in
Table 5.

Baseline Task AutoML

Leverage Bagging Classifier (Bifet & Gavaldà, 2007) Classification No


Hoeffding Adaptive Tree (Hulten, et al., 2001) Classification No
ChaCha (Wu, et al., 2021) Classification Yes
Original OAML (Celik, et al., 2022) Classification Yes
Adaptive Random Forest Regressor (Gomes, et al., 2018) Regression No
Leverage Bagging Regressor (Bifet & Ricard, 2009) Regression No
Hoeffding Adaptive Tree Regressor (Hulten, et al., 2001) Regression No

Table 5: Overview of Baselines


26

5 RESULTS

The results of the designed experiments (Section 4) are discussed in this Chapter. First, we compare
the different adaptation strategies against each other, baseline techniques discussed in the previous
section and the original OAML framework. These experiments are performed on real-word and
artificial data streams for regression and classification tasks. Next, the pipelines (and their individual
components) generated by our OAML system are analysed. In addition to this, we explore the
Ensemble Adaptation strategy: effect of the ensemble size, which adaptation mechanism performs
best and the best performing model replacement strategy. The effect of search algorithms on the
predictive performance of our OAML system is also comparatively analysed. Finally, we investigate
the effect of Online Pre-processing on our OAML system. It is important to note that each experiment
was run multiple times and the average scores are plotted below.

5.1 Experiments with real-world data

To determine how practical our OAML system, it is important to evaluate it on real-world data. We
do this for the regression and classification data streams listed in Table 4.

5.1.1 Regression experiments

For the catalyst activation data stream (Figure 6), the goal is to predict catalyst activity (continuous
variable) inside a reactor. High drift may often occur due complicated chemical processes such as
cooling and catalyst decay. The Hoeffding Adaptive Tree (HAT) Regressor performed best amongst
other state-of-the-art online algorithms (Table 5) in this study, furthermore since this is the first

Figure 6: Prequential evaluation (RMSE) for Catalyst Activation data stream


27
publicly available work that applies Online AutoML to regression problems, no other baselines are
used in our experiments. Throughout most of the catalyst activation stream HAT performs better
than the machine learning pipelines found by OAML. The abrupt drop in performance after ~3,800
samples is likely due to a sudden change of concept. The abrupt drop in performance after ~3,800
samples is likely due to a sudden change of concept in the catalyst activation process. However,
both Extended OAML strategies find new pipelines (Section 5.3) to handle this drift. This is indicative
of the ability of Extended OAML to manage dynamic environments. Although the Basic strategy
achieves a marginally better prequential performance, both the Basic and Ensemble strategies
handle drift similarly.
Extended OAML improves on the baseline performance on Debutaniser data stream (Figure
7) for both strategies. The sudden change in performance can be attributed to concept drift in the
stream. While both strategies are affected by this change, they recover from this change faster than
the HAT regressor.

Figure 7: Prequential evaluation (RMSE) for Debutaniser data stream

This improvement is more pronounced in the sulfur (Figure 8) stream. Both OAML strategies perform
significantly better than the baseline and are affected less by the sudden drift after ~4,000 samples.

Figure 8: Prequential evaluation (RMSE) for Sulfur data stream


28
Overall, Extended OAML can generate pipelines capable of coping well with different kinds of drifts
in real-world dynamic environments for regression problems.

5.1.2 Classification experiments

Next, we evaluate the performance of our OAML system on classification problems. We also include
other competitive Online AutoML approaches, such as the original OAML (Celik, et al., 2022) and
ChaCha (Wu, et al., 2021). On classification data streams, Leverage Bagging performed the best
amongst the chosen stream learning algorithms used in this study and is thus used as a benchmark
for comparison. Electricity data stream (Harries, 1999) is heavily autocorrelated which is beneficial
to the Leverage Bagging algorithm, hence its high performance (Figure 9).

Figure 9: Prequential evaluation (Accuracy) for Electricity data stream

The variations in the behaviour of the learning algorithms above are caused by the numerous
kinds of drift present in Electricity. With Extended OAML, the trigger point is reached frequently
leading to several re-trainings. Extended OAML performs the best amongst the AutoML online
methods here. Further, the Basic method performs better than the Ensemble method for both
Extended OAML and the Original OAML (Celik, et al., 2022). Frequent re-training of OAML could
lead to the addition of experts with low predictive performance (given the limited time budget used
in this experiment), hence the lower performance of Ensemble methods. Although DWM performs
better than AddExp throughout the stream, AddExp recovers faster from the sudden drift after 20,000
samples. ChaCha suffers the most from concept drift and does not recover well. In general, Extended
OAML methods perform better than the original OAML for this stream.
The New Airlines (Figure 10) data stream contains cyclical and frequent drifting patterns. The
immediate drop in performance of the Leverage Bagging algorithm indicates the need for adaptation.
OAML, Extended OAML and ChaCha recover quickly from the concept changes and exhibit similar
29
adaptation schemes. The Basic and Ensemble methods follow the same pattern as in the Electricity
data scheme, with DWM out-performing AddExp. We also observe that Extended OAML improves
on the performance achieved by OAML.

Figure 10: Prequential evaluation (Accuracy) for New Airlines data stream

On the Run or Walk data stream, drift is more gradual here, with some occasional abrupt
changes. All the adaptation algorithms recover well and relatively quickly on this data stream, except
for the original OAML – Ensemble method. The sudden drop in performance around ~60,000 and
~80,000 can be attributed to inclusion of bad experts during adaptation. Like the other data streams,
Extended OAML performs better than OAML on this data stream.

Figure 11 Prequential evaluation (Accuracy) for Run or Walk data stream


30
Extended OAML manages different kinds of concept drift from real data sources very well,
competitive state-of-the-art online algorithms in most cases and performing better than the original
OAML in all cases.

5.2 Experiments with synthetic data

Synthetic datasets are important in evaluating the performance of online learning systems since the
presence of concept drift and its properties can only be ascertained here. We repeat the same steps
from previous experiments on real-world data for regression and classification problems using the
same algorithms and techniques

5.2.1 Regression

The Friedman data stream (Breiman, et al., 1984) simulates different electrical states in an
alternating circuit. Hence both gradual and incremental drift patterns exist. Extended OAML – Basic
and Ensemble have roughly the same behaviour in this stream and handle concept drift better than
the Leverage Bagging algorithm. These methods equally suffer from concept changes but recover
faster than the HAT Regressor. This further demonstrates the capabilities of our system in dynamic
regression applications.

Figure 12: Prequential evaluation (RMSE) for Friedman data stream

There are fewer concept drifts in the Bank and 2D Planes data streams, allowing all
adaptation and online techniques to cope well. For the Bank data stream Figure 13, Extended OAML
– Basic starts off poorly with a high RMSE error but recovers well further down the stream. HAT
regressor maintains a consistent performance but performs the worst amongst all three algorithms.
31
Extended OAML – Ensemble improves in predictive performance for most of the stream and
performs the best overall. This can be as a result of good experts added to the ensemble when a
trigger point is reached in the ensemble.
The Online HAT Regressor performs poorly on the 2D Planes data stream (Figure 14), while
both Extended OAML methods are indistinguishable from each other here.

Figure 13: Prequential evaluation (RMSE) for Bank data stream

Overall, the Extended OAML techniques perform very well on the synthetic benchmark
datasets and in all cases offer significant improvements on online learning algorithms.

Figure 14: Prequential evaluation (RMSE) for 2D Planes


32
5.2.2 Classification

The hyperplane data stream (Figure 15) is symbolic of a real-world scenario with time changing
concepts. It consists majorly of high gradual drift introduced by the smooth rotation of a hyperplane
over time. As with the real-world data streams, the Extended OAML – Basic approach produces
better results than Ensemble methods with DWM performing better than AddExp. An interesting
phenomenon observed in (Figure 15) is the speedy recovery of the AddExp technique after
experiencing a concept drift. Although AddExp and DWM have similar approaches for penalising
bad experts, AddExp strongly favours new experts in ensemble voting, hence we can attribute this
phenomenon to the addition of a new expert with a strong prediction accuracy. The Leverage
Bagging algorithm performs poorly here in comparison to the other techniques.

Figure 15: Prequential evaluation (Accuracy) for Hyperplane data stream

The SEA Mixed data stream (Figure 16) comprises of both abrupt and gradual drift patterns. It
simulates a real-world dynamic environment where different kinds of drift persist. ChaCha. Extended
OAML – Basic, and Extended OAML – Ensemble (DWM) all perform very well and similarly here.
The sudden increase in performance of the original OAML – Ensemble is indicative of the addition
of a good expert during adaptation. Leverage bagging produces a flat curve and is unable to improve
on this data stream. For this data stream, we observe that our Extended OAML system out-performs
the original OAML system. SEA Abrupt (Figure 17) simulates a real-world high drift scenario by
alternating between several classification functions. The results are similar to SEA Mixed, however
the original OAML techniques offer competitive results here. The increase in performance of the
Ensemble techniques of the original OAML over the Extended version on this data stream could be
due to the equal voting technique employed by it. It would be interesting to draw a comparison of
33

Figure 16: Prequential evaluation (Accuracy) for SEA mixed data stream

voting strategies and how they affect the performance of Online Automatic Machine Learning, this is
further discussed in Section 5.4.

Figure 17: Prequential evaluation (Accuracy) for SEA Abrupt data stream

We have thoroughly investigated the performance of our Extended OAML system on


benchmark datasets for regression and classification problems against state-of-the-art algorithms in
the field. Overall, the results show that the Extended OAML system offers a significant improvement
in performance when compared with the original OAML system. Furthermore, this new system can
produce competitive results that are often better than the baseline algorithms.
34

5.3 Pipeline Analysis

The pipelines constructed by the Extended OAML system are discussed in this section. For the
Ensemble method, we examine whether the active online model is the Ensemble algorithm, or a
single pipeline built by the OAML Search. The system logically updates the active online model
between these two predictors (see Algorithm 2 for more details) depending on their prequential
performance. Figure 18 shows model updates for the Friedman and Sulfur data streams. Before an
ensemble is constructed, the system begins with the single best-found pipeline from the search
phase, hence all lines start as orange.
(a) (b)

Figure 18: Online model update for Extended OAML – Ensemble

For the Friedman data stream, the active online model frequently switches from the Ensemble to the
AutoML optimizer. This change can be attributed to the incremental and gradual drift concepts that
persist in this data stream. After about 1,200 samples we observe a recovery from concept drift by
the newly activated Ensemble. The Sulfur data stream quickly activates the ensemble both algorithm
and then goes through a short period of alternating between both models. The frequent changes are
caused by the presence of high drift in the given window, triggering the drift adaptation mechanism.
Next, we analyse the actual pipelines generated by the Basic method. Figure 19 shows the
model updates made for the Catalyst Activation stream and the Debutaniser stream. Drift points are
represented by pink markers, while different models are represented by dotted lines of different
colours. Gradual change of concept in the Catalyst Activation process triggers the adaptation of the
OAML system. The model starts out with a Hoeffding Adaptive Tree Regressor and is replaced by
an Adaptive Random Forest Tree Regressor after the first trigger point is reached. Note that
retraining here was triggered by the alternative detector (3.4.1) and not because of concept drift.
When the drift detector is triggered, the Leverage Bagging Regressor replaces the Adaptive Random
Forest Regressor, and remains active until the end of the stream. In the Debutaniser process, Figure
19 – (b), the Leverage Bagging Regressor replaces the Hoeffding Adaptive Tree Regressor initially,
however after a sudden change of concept the drift detector triggers the OAML search for new
pipelines to handle this change. Shortly afterwards another trigger point is reached, and the stream
ends with an Adaptive Random Forest Regressor.
35

(a) (b)

Figure 19: Online model update for Extended OAML – Basic

It is worth noting during the retraining process, Extended OAML may select a new pipeline with the
current active online algorithm (but with different hyperparameters). In the case where a drift is
detected and the current online model remains active, it indicates that the newly constructed pipeline
performed worse on the current sliding window and thus the active online model remains unchanged.

5.4 Experiments on Ensemble Adaptation

In this section we focus on investigating the effects of various parameters on the performance of
Extended OAML – Ensemble. We start by varying the size of the ensemble, then we compare various
expert replacement strategies in the ensemble, finally we compare three popular voting strategies
and report the results. Experiments here are focused only on regression tasks, however, future work
should be done to investigate the effect of these properties on classification problems. We run tests
here with one real-world and data stream and one synthetic data stream. Further, experiments were
performed with a sliding window size, 𝑏* , of 200 and a search budget, 𝑡(#; , of 40 for simplicity.

5.4.1 Ensemble size

The experiments done in Section 5.1 and Section 5.2 used a fixed ensemble size of 10 experts. Here
we seek to find if increasing or reducing the number of experts in the ensemble offers any significant
improvement or reduction in performance. Figure 20-(a) shows the prequential performance on the
Catalyst Activation data for an Ensemble of 20, 10 and 3 experts. Although the ensemble behaves
similarly in all cases, the results show that the number of experts is proportional to the prequential
performance of the ensemble. This relationship is further demonstrated in Figure 20-(b) on the
Friedman data stream. This behaviour is expected and shows that in general, the search phase of
Extended OAML adds good experts to the ensemble. If the reverse case were true, then we would
expect a decrease in prequential performance as the ensemble size increases.
36
Although, a simple experimental setup was used here, the results from an ensemble size of
3 experts are comparable to an ensemble of 20 experts, hence the former is suitable for applications
with limited computational resources.

(a) (b)

Figure 20: Effect of Ensemble size: (a) Catalyst Activation data (b) Friedman data

5.4.2 Expert Replacement

One of the ways in which our work differs from the original OAML system is the method used for
expert replacement. Extended OAML - Ensemble uses a worst expert replacement method as
against the oldest expert replacement technique used in OAML – Ensemble (discussed in Section
3.4.1). Thus, we investigate the effects of this using an ensemble size of 10 experts.
By using the worst expert replacement approach, a lower RMSE score on the Catalyst
Activation data stream (Figure 21-(a)) is achieved in comparison to the oldest expert replacement
approach. The ensemble recovers faster from concept drift because of this. Figure 21-(b) shows a
more interest pattern, both replacement techniques have a similar initial behaviour, the worst expert
replacement technique handles the gradual drift a bit worse than the oldest expert replacement
technique, however, the former manages abrupt drift much better and recovers quickly from it.
Both replacement techniques are capable of handling different kinds of concept drift,
however, the worst expert replacement approach appears to be superior in abrupt drift environments.
37
(a) (b)

Figure 21: Effect of Expert replacement: (a) Catalyst Activation data (b) Friedman data

5.4.3 Weight Combination Mechanism

Another significant contribution of this work is the implementation of multiple weight combination
approaches. We extend the unweighted voting approach used in OAML with Additive Expert and
Dynamic Weighted Majority voting. In this section, we compare Additive Experts and equal voting
for regression data streams (since DWM is not suitable for regression). We run the experiments with
the same setup described in Section 5.4, however since the goal here is to find the weight
combination approach that produces the best results, we modify the online model selection
described in algorithm in to always select the ensemble for prediction. On the regression
benchmarks, AddExp and equal voting behave similarly. While unweighted voting performs better
on the Catalyst Activation data (Figure 22-(a)), the AddExp technique gives better results for the
synthetic Friedman data stream. This suggests that both methods are equally effective for handling
concept drift, hence the equal voting approach is favourable due to its simplicity.

5.1 Effect of Search Algorithm

To evaluate the effectiveness of available search algorithms in re-optimizing pipelines after a drift
occurs, we run experiments with random search “RandomSearch()” and asynchronous evolutionary
algorithm “AsyncEA()” on regression data streams. The original OAML framework (Celik, et al.,
2022) reports the effect of these algorithms on classification streams. We maintain the same
parameters used in previous experiments and vary the search algorithm. In addition, we assume
38
(a) (b)

Figure 22: Effect of weight combination (a) Catalyst Activation (b) Friedman

that the effect of the search algorithm is the same for Extended OAML – Basic and Extended OAML
– Ensemble, hence we only run experiments for the simpler Basic method. Each search algorithm
was run multiple times and the aggregate results of these are shown in Figure 23 below
For the Catalyst Activation data stream, random search (red line) performs better on average,
than the evolutionary algorithm. This could be because random search can quickly find good
pipelines when a drift occurs. The same pattern is seen in the Friedman stream, where various drift
types of different magnitudes are present. The random search adapts better for incremental drift
process (later stages of the stream). Despite the asynchronous implementation of the evolutionary
algorithm, the time budget of 40 seconds used in these experiments appears be insufficient for the
evolutionary algorithm to produce better results than random search.
In general, both search algorithm implementations construct good pipelines and achieve
good results on the benchmark data streams.

5.2 Effect of Online Pre-processing Algorithms

The final modification to the original OAML system is the inclusion of online pre-processing
algorithms. Thus, in this section we investigate the effect of this change on the system. To achieve
this, we maintain all other changes to the original OAML system and exclude the online pre-
processing steps and compare this with the Extended OAML system. As expected, Figure 24 - Figure
29 show that online automatic pre-processing algorithms improve the performance AutoML systems
in online environments.
(a) (b) 39

Figure 23: Effect of Search Algorithm: (a) Catalyst Activation (b) Friedman

For Bank (Figure 24) and Catalyst Activation (Figure 27) the effect the pre-processing step is not
minimal. On the Friedman stream (Figure 25), the system without pre-processing suffers more from
the incremental change in concept and slowly recovers from it. This improvement is pronounced
further in the 2D Planes (Figure 26) data stream. Both systems behave very similarly here in the
presence of concept drift, the difference in performance here may be attributed to data encoding
which helps improve the predictive performance of the model.

Figure 24: Effect of Online Pre-processing on Bank data stream


40

Figure 25: Effect of Online Pre-processing on Friedman data stream

Figure 26: Effect of Online Pre-processing on 2D Planes data stream

Figure 27: Effect of Online Pre-processing on Catalyst Activation data stream

Finally, for the Debutaniser process (Figure 28) and the Sulfur process (Figure 29), the
Extended OAML system with pre-processing initially performs better than the Extended OAML
system without pre-processing. However, when a sudden drift occurs, the difference in performance
increases and the system with online pre-processing recovers faster from the sudden drift. Overall,
the addition of pre-processing to the OAML system offers improvements in predictive performance
as well as in handling different kinds of drift patterns.
41

Figure 28: Effect of Online Pre-processing on Debutaniser data stream

Figure 29: Effect of Online Pre-processing on Sulfur data stream

In this section, we thoroughly evaluated the performance of our OAML system under varying drift
conditions and reported the results which show that our system can produce competitive results
across all benchmarks.
42

6 DISCUSSION

The experiments performed indicate that automated machine learning can be applied to regression
problems in different kinds of drifting environments. Although the performance of the AutoML system
may suffer due to this change, the inclusion of an explicit drift detection system and sound adaptation
mechanisms help it to recover quickly from this change. In most cases, the proposed system
outperforms competitive stream learning algorithms.
For classification problems, our system consistently improves on the original OAML system
across all benchmark datasets. From our experiments, this improvement can be attributed to the
online-pre-processing algorithms used. Although the original OAML system includes algorithms for
pre-processing, the pipelines constructed by it did not include them. Extended OAML constructs
pipelines of automatic pre-processing algorithms and includes pre-processing steps to deal with null
values and encode categorical variables.
Both methods for learning with Extended OAML perform quite well on the benchmark
datasets. The Ensemble approach also outperforms the version used in the original OAML system.
Our results indicate that using the worst expert replacement, when the ensemble becomes full
produces better results than the oldest expert replacement technique used in the original system.
Examination of the various weight re-combination strategies show that equal voting produces similar
and, in some cases, better results than AddExp for regression problems. While for classification
streams, DWM performs better than AddExp. Increasing the ensemble size may offer improvements
to the system, however, the results of doing this are marginal compared to a simpler system.
The Basic Extended OAML approach performed better in general than the Ensemble
approach. This is contrary to the behaviour of the original OAML system (Celik, et al., 2022). This
change in behaviour can be attributed to online pre-processing and weight re-combination strategies.
While online pre-processing improves both systems, AddExp and DWM may sometimes perform
worse than equal voting.
Finally, both the random search and evolutionary approach find good pipelines, the pipelines
produced by the random search algorithm in our experiments performed better the evolutionary
algorithm. This could be because of limited time budget (40 seconds) used in all experiments.
43

7 CRITICAL EVALUATION

The objectives of this dissertation were achieved by extending capabilities of the OAML framework
with automated online pre-processing algorithms, multiple adaptive mechanisms for online
adaptation and an end-to-end automatic online pipeline for regression tasks.

Objective 1: To design a new search space of online pre-processors and regressors in the OAML
system.
o We designed a new search space of online pre-processors and online regression
algorithm. Further, we adapted the search algorithm for regression tasks to enable it find
and construct good machine learning pipelines. (Section 3.2)

Objective 2: To implement new drift detection algorithms for detecting concept drift in regression
o We implemented ADWIN in our new system. From literature and our experiments,
ADWIN offers the best drift detection for regression tasks. ADWIN was implemented in
our system using scikit-multifilow. (Section 3.4)

Objective 3: To implement the Additive Expert (AddExp.) method and Dynamic Weighted Majority
(DWM) method for backup ensemble adaptation in the presence of concept drift
o We implemented AddExp Continuous (regression) and AddExp Discrete (classification)
and DWM (classification) for backup ensembles. Further, we implemented new model
replacement techniques for Ensembles. (Section 3.4.1)

Objective 4: To evaluate the performance of the extended OAML framework for regression and
classification tasks and compare against adaptive learners and the original OAML framework
o We evaluated our OAML systems on benchmark real-world and synthetic data streams
for regression and classification tasks and compared the results against the original
OAML as a baseline and other state-of-the-art techniques in the field. (Section 5)

Further, the results of this dissertation have provided answers to the following research questions.
1. Research Question 1: How can online AutoML techniques be designed to address
regression problems? (Section 3)
2. Research Question 2: How does the adaptation strategy for ensemble methods on online
AutoML influence the predictive performance? (Section 5.4)
44

8 CONCLUSION

We developed a new system that applies existing automated machine learning techniques to
changing environments for the two main supervised machine learning tasks, classification, and
regression. This was done by extending OAML for regression environments. The new system
automatically constructs full machine learning pipelines of online pre-processors and predictors
We include an explicit ADWIN drift detector for regression tasks which allows it to detect a
change in concept and automatically re-optimize its pipeline to adapt to concept changes. Our
system includes Dynamic Weighted Majority and Additive Expert techniques for adapting to different
kinds of drift. Furthermore, we substitute the oldest expert replacement with the worst expert
replacement technique in Ensemble methods.
The performance of our system is evaluated in a prequential manner on real-world and
synthetic data streams for regression and classification tasks. Our results show that our proposed
system offers competitive results and is a significant improvement on the original OAML system. We
explore which components of the new system are responsible for these changes and find that online
pre-processing and the ensemble replacement technique contribute the most. The results also show
that Extended OAML system can produce competitive results in environments with limited time and
memory.
Overall, the results indicate that this work has the potential to advance AutoML research.
45

9 FUTURE WORK

Given the trend in big data and automation of machine learning tasks, future work should be done
to develop online AutoML systems that include automatic feature selection and automatic feature
engineering. Our system can serve as a baseline and be further extended to include new
transformation and prediction steps.
Furthermore, our results indicate that ChaCha is capable of tuning hyper-parameters
optimally, thus future work should be done to incorporate this feature into our OAML system.
46

REFERENCES

Akujuobi, U. a. Z. X., 2017. Delve: A Dataset-Driven Scholarly Search and Analysis


System. SIGKDD Explor. Newsl., 19 (2), 36–46 , numpages = 11.

Albert Bifet and Ricard, G., Learning from Time-Changing Data with Adaptive Windowing.

Baier, L., Hofmann, M., Kühl, N., Mohr, M. and Satzger, G., 2020. Handling Concept Drifts in
Regression Problems - the Error Intersection Approach, Wirtschaftsinformatik.

Baker, B., Gupta, O., Naik, N. and Raskar, R., 2017. Designing Neural Network Architectures
using Reinforcement Learning. ArXiv, abs/1611.02167.

Bakirov, R., 2017. Multiple adaptive mechanisms for predictive models on streaming
data [http://eprints.bournemouth.ac.uk/29443/]. DoctorateBournemouth University.

Bakirov, R., Fay, D. and Gabrys, B., 2021. Automated adaptation strategies for stream
learning. Machine Learning, 110 (6), 1429-1462.

Bakirov, R. and Gabrys, B., 2013. Investigation of Expert Addition Criteria for Dynamically
Changing Online Ensemble Classifiers with Multiple Adaptive Mechanisms, 9th Artificial
Intelligence Applications and Innovations (AIAI) (Vol. AICT-412, pp. 646-656). Paphos,
Greece: Springer.

Bates, J. M. and Granger, C. W. J., 1969. The Combination of Forecasts. Journal of the
Operational Research Society, 20, 451-468.

Bifet, A. and Gavaldà, R., 2009. Adaptive Learning from Evolving Data Streams (pp. 249-260).
Berlin, Heidelberg: Springer Berlin Heidelberg.

Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H., Jansen, T. and Seidl, T.,
2011. MOA: A Real-Time Analytics Open Source Framework (pp. 617-620). Berlin,
Heidelberg: Springer Berlin Heidelberg.

Breiman, L., 1996. Bagging Predictors. Machine Learning, 24 (2), 123-140.

Celik, B., Singh, P. and Vanschoren, J., 2022. Online AutoML: An adaptive AutoML framework for
online learning. ArXiv, abs/2201.09750.

Celik, B. and Vanschoren, J., 2021. Adaptation Strategies for Automated Machine Learning on
Evolving Data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3067-
3078.

Chen, B., Zhao, X., Wang, Y., Fan, W., Guo, H. and Tang, R., 2022. Automated Machine Learning
for Deep Recommender Systems: A Survey. ArXiv,abs/2204.01390.

Codalab, 2018. AutoML3:: AutoML for Lifelong Machine Learning [online]. Available from:
https://competitions.codalab.org/competitions/19836 [Accessed 08/05/2022].

Dawid, P., 1984. Present Position and Potential Developments: Some Personal Views: Statistical
Theory: The Prequential Approach. Journal of the Royal Statistical Society. Series
A (General), 147 (2), 278-292.
47
Dietterich, T. G., 2000. Ensemble Methods in Machine Learning (pp. 1-15). Berlin, Heidelberg:
Springer Berlin Heidelberg.

Feurer, M. a. K. A. a. E. K. a. S. J. a. B. M. a. H. F., 2015. Efficient and Robust Automated


Machine Learning. In: Garnett, C. C. a. N. L. a. D. L. a. M. S. a. R., ed. Available from:
https://proceedings.neurips.cc/paper/2015/file/11d0e6287202fced83f79975ec59a3a6-
Paper.pdf [Accessed

Fortuna, L., Graziani, S. and Xibilia, M. G., 2005. Soft sensors for product quality monitoring in
debutanizer distillation columns. Control Engineering Practice, 13, 499-508.

Gama, J., Medas, P., Castillo, G. and Rodrigues, P., 2004. Learning with Drift Detection (pp. 286-
295). Berlin, Heidelberg: Springer Berlin Heidelberg.

Gama, J., Sebastião, R. and Rodrigues, P. P., 2009. Issues in evaluation of stream learning
algorithms. Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining, Paris, France. Association for Computing Machinery. 329–338.
Available from: https://doi.org/10.1145/1557019.1557060 [Accessed

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M. and Bouchachia, A., 2014. A survey on concept
drift adaptation. ACM Comput. Surv., 46 (4), Article 44.

Gijsbers, P. J. A. and Vanschoren, J., 2020. GAMA: a General Automated Machine learning
Assistant. ArXiv, abs/2007.04911.

Gomes, H. M., Barddal, J. P., Ferreira, L. E. B. and Bifet, A., 2018. Adaptive random forests for
data stream regression, ESANN.

Harries, M., Nsw-tr, U. and Wales, N. S., SPLICE-2 Comparative Evaluation: Electricity
Pricing [techreport].

Hazan, E. and Seshadhri, C., 2009. Efficient learning algorithms for changing
environments. Proceedings of the 26th Annual International Conference on Machine
Learning, Montreal, Quebec, Canada. Association for Computing Machinery. 393–400.
Available from: https://doi.org/10.1145/1553374.1553425 [Accessed

Hulten, G., Spencer, L. and Domingos, P., 2001. Mining time-changing data streams. Proceedings
of the seventh ACM SIGKDD international conference on Knowledge discovery and data
mining, San Francisco, California. Association for Computing Machinery. 97–106.
Available from: https://doi.org/10.1145/502512.502529 [Accessed

Imbrea, A.-I., 2021. Automated Machine Learning Techniques for Data


Streams. CoRR, abs/2106.07317.

Jyrki Kivinen and Manfred, K. W., 1997. Exponentiated Gradient versus Gradient Descent for
Linear Predictors. Information and Computation, 132 (1), 1-63.

Ke, G. a. M. Q. a. F. T. a. W. T. a. C. W. a. M. W. a. Y. Q. a. L. T.-Y., 2017. LightGBM: A Highly


Efficient Gradient Boosting Decision Tree. In: Garnett, I. G. a. U. V. L. a. S. B. a. H. W. a.
R. F. a. S. V. a. R., ed. Available from:
https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-
Paper.pdf [Accessed

Kolter, J. Z. and Maloof, M. A., 2005. Using additive expert ensembles to cope with concept
drift. Proceedings of the 22nd international conference on Machine learning, Bonn,
48
Germany. Association for Computing Machinery. 449–456. Available from:
https://doi.org/10.1145/1102351.1102408 [Accessed

Kolter, J. Z. and Maloof, M. A., 2007. Dynamic Weighted Majority: An Ensemble Method for
Drifting Concepts. J. Mach. Learn. Res., 8, 2755–2790.

Kuncheva, L. I., 2004. Combining pattern classifiers : methods and algorithms. Hoboken, NJ: J.
Wiley.

LeDell, E. and Poirier, S., H2O AutoML: Scalable Automatic Machine Learning. 7th ICML
Workshop on Automated Machine Learning (AutoML).

Luigi, F., A, R., M, S. and M, X., 2003. Soft Analysers for a Sulphur Recovery Unit. Control
Engineering Practice, 11 (12), 1491-1500.

Madrid, J. G., Escalante, H. J., Morales, E. F., Tu, W.-W., Yu, Y., Sun-Hosoya, L., Guyon, I. and
Sebag, M., 2019. Towards AutoML in the presence of Drift: first
results. CoRR, abs/1907.10772.

Marco, F. D. a. Y., 2004. Vehicle classification in distributed sensor networks. Journal of Parallel
and Distributed Computing, 64 (7), 826-838.

Montiel, J., Halford, M., Mastelini, S. M., Bolmier, G., Sourty, R., Vaysse, R., Zouitine, A., Gomes,
H. M., Read, J., Abdessalem, T. and Bifet, A., 2021. River: machine learning for streaming
data in Python. J. Mach. Learn. Res., 22, 110:111-110:118.

Muñoz, M. A., Villanova, L., Baatar, D. and Smith-Miles, K., 2018. Instance spaces for machine
learning classification. Machine Learning, 107 (1), 109-147.

Olson, R. S., Bartley, N., Urbanowicz, R. J. and Moore, J. H., 2016. Evaluation of a Tree-based
Pipeline Optimization Tool for Automating Data Science. Proceedings of the Genetic and
Evolutionary Computation Conference 2016.

Oza, N. C. and Russell, S. J., 2005. Online bagging and boosting. 2005 IEEE International
Conference on Systems, Man and Cybernetics, 3, 2340-2345 Vol. 2343.

Page, E. S., 1954. Continuous Inspection Schemes. Biometrika, 41 (1/2), 100-115.

Petr Kadlec and Ratko Grbić and Bogdan, G., 2011. Review of adaptation mechanisms for data-
driven soft sensors. Computers & Chemical Engineering, 35 (1), 1-24.

Pozzolo, A. D., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2015a. Credit card fraud
detection and concept-drift adaptation with delayed supervised information. 2015
International Joint Conference on Neural Networks (IJCNN), 1-8.

Pozzolo, A. D., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2015b. Credit card fraud
detection and concept-drift adaptation with delayed supervised information. 2015
International Joint Conference on Neural Networks (IJCNN), 1-8.

Ruta, D. and Gabrys, B., An Overview of Classifier Fusion Methods. Computing and Information
Systems, 7 (1), 1-10.

Schlimmer, J. C. and Granger, R. H., 1986. Beyond incremental processing: tracking concept
drift. Proceedings of the Fifth AAAI National Conference on Artificial Intelligence,
Philadelphia, Pennsylvania. AAAI Press. 502–507.
49
Scott, E. O. and De Jong, K. A., Understanding Simple Asynchronous Evolutionary Algorithms
(pp. 85-98): Association for Computing Machinery.

Shawi, R. E., Maher, M. and Sakr, S., 2019. Automated Machine Learning: State-of-The-Art and
Open Challenges. CoRR, abs/1906.02287.

Silver, D. L., Yang, Q. and Li, L., 2013. Lifelong machine learning systems: Beyond learning
algorithms. in AAAI Spring Symposium Series, 2013.

Soares, S. G. and Araújo, R., 2015. A dynamic and on-line ensemble regression for changing
environments. Expert Syst. Appl., 42 (6), 2935–2948.

Sobhani, P. and Beigy, H., 2011. New Drift Detection Method for Data Streams (pp. 88-97). Berlin,
Heidelberg: Springer Berlin Heidelberg.

Street, W. N. and Kim, Y., 2001. A streaming ensemble algorithm (SEA) for large-scale
classification. Proceedings of the seventh ACM SIGKDD international conference on
Knowledge discovery and data mining, San Francisco, California. Association for
Computing Machinery. 377–382. Available from: https://doi.org/10.1145/502512.502568
[Accessed

Symone and Rui, A., 2015. An on-line weighted ensemble of regressor models to handle concept
drifts. Engineering Applications of Artificial Intelligence,37, 392-406.

Thompson, W. R., 1933. On the Likelihood that One Unknown Probability Exceeds Another in
View of the Evidence of Two Samples. Biometrika, 25 (3/4), 285-294.

Thornton, C., Hutter, F., Hoos, H. H. and Leyton-Brown, K., 2013. Auto-WEKA: combined
selection and hyperparameter optimization of classification algorithms. Proceedings of the
19th ACM SIGKDD international conference on Knowledge discovery and data mining,
Chicago, Illinois, USA. Association for Computing Machinery. 847–855. Available from:
https://doi.org/10.1145/2487575.2487629 [Accessed

Tsymbal, A., Pechenizkiy, M., Cunningham, P. and Puuronen, S., 2008. Dynamic integration of
classifiers for handling concept drift. Inf. Fusion, 9 (1), 56–68.

vZliobait.e, I. e., 2010. Learning under Concept Drift: an Overview.

Wang, C., Wu, Q., Weimer, M. and Zhu, E., 2019. FLAML: A Fast and Lightweight AutoML
Library.

Weiming Shao and Xuemin, T., 2015. Adaptive soft sensor for quality prediction of chemical
processes based on selective ensemble of local partial least squares models. Chemical
Engineering Research and Design, 95, 113-132.

Wilson, J., Meher, A. K., Bindu, B. V., Chaudhury, S., Lall, B., Sharma, M. and Pareek, V., 2020a.
Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High
Cardinality Data Streams Under Concept Drift (pp. 317-335). Cham: Springer International
Publishing.

Wilson, J., Meher, A. K., Bindu, B. V., Chaudhury, S., Lall, B., Sharma, M. and Pareek, V., 2020b.
Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High
Cardinality Data Streams Under Concept Drift (pp. 317-335). Cham: Springer International
Publishing.
50
Wu, Q. a. W. C. a. L. J. a. M. P. a. R. M., 2021. ChaCha for Online AutoML. 2021 International
Conference on Machine Learning (ICML 2021), July. Available from:
https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/ ,
[Accessed

Zoph, B. and Le, Q. V., 2017. Neural Architecture Search with Reinforcement
Learning. ArXiv, abs/1611.01578.
51

APPENDIX A

Installation and Setup

The necessary files for installation are provided in the folder “Extended_OAML.zip”. Unzip the
folder and perform the following steps. The unzipped folder consists of the following
o gama
o river
1. We recommend creating a python virtual environment before installation
2. To install gama, change directory the gama folder and run the following
pip install -r requirements.txt
3. To install river, change directory to the river folder and run
python setup.py install
4. It is important that you install river, and gama according to steps 2 and 3 above, to prevent
version errors but also because we have modified these libraries for our system
52

APPENDIX B

Navigating the File System

After installation and setup, the file tree of the OAML system resembles the structure below.

── gama
│ ├── ci_scripts
│ ├── data
│ ├── data_streams
│ ├── docs
│ ├── examples
│ ├── gama
│ ├── gama.egg-info
│ ├── oaml_paper
│ ├── tests
│ ├── wandb
│ ├── Extended_basic.py
│ ├── Extended_ensemble.py
│ ├── LICENSE
│ ├── OAML-UserGuide
│ ├── OAML_baseline_LB.py
│ ├── README.md
│ ├── code_of_conduct.md
│ ├── codecov.yml
│ ├── mypy.ini
│ └── setup.py

We focus only on the relevant files and folders for this dissertation.
- data_streams (folder): contains all data sets and python scripts for connecting them to the
OAML system
- gama (folder): contains scripts, files, folders for configuring the AutoML search phase. This
is the actual Gama AutoML library
- oaml_paper (folder): contains relevant files from the original OAML system
- Extended_basic.py (script): Python script for running the basic Extended OAML
- Extended_ensemble.py (script): Python script for running the basic Extended OAML
- OAML_baseline.py(script): Python script for testing baseline algorithms

Original contributions of this work can be seen in gama/gama, Extended_basic.py and


Extended_ensemble.py
53

APPENDIX C

User Guide

Extended OAML has been developed to be easily executed on the Command Line (Terminal). Users
are not required to interact with python scripts. Basic and Ensemble methods of Extended OAML
are run with the same command line arguments (parameters).The requirements to install our
framework are listed in Appendix D.

After installation, change directory to the “gama” folder


~ cd gama

A sample run for the Extended OAML basic can be run as follows:
python Extended_basic.py 'data_streams/debutaniser.arff' 500 100 rmse rmse 80
random False

#0 Python Script
- This could either be Extended_basic.py or Extended_ensemble.py. It tells the computer
what Extended OAML approach to run

#1 Data Stream (str)


- Path to the data stream for stream learning. Data Streams must be arff files. If you would
like to include a new data stream. Add the data stream to the gama/data_streams folder
and include its path in gama/data_stream/data_utils.py.

#2 Batch size (int)


- Initial number of data instances used in OAML search

#3 Sliding window size (int)


- Number of data instances used in re-training OAML after a trigger point

#4 Gama evaluation metric (str)


- Evaluation metric used in optimizing the search phase
[acc, f1, roc_auc, b_acc] for classification
[rmse, mse] for regression
- If the metric does not match the dataset, (e.g., mse should only be used for regression
tasks), the OAML system throws an error
54

#5 River evaluation metric (str)


- Evaluation metric for the online learning algorithm
[acc, f1, roc_auc, b_acc] for classification
[rmse, mse] for regression
- If the metric does not match the dataset, (e.g., mse should only be used for regression
tasks), the OAML system throws an error

#6 Time budget (int)


- Time budget for the search algorithm

#7 Search Algorithm (str)


- Optimisation algorithm used in the pipeline search
[random, evol]

#8 Live-Plot (Boolean)
- Condition to create a live plot with Wandb-ai or not.
- If True
o Registration required on https://wandb.ai/site
o Set entity to your wandb username
o Set project name as desired
o [True, False]

After proper installation the system should working as follows:


55
56

APPENDIX D

Machine Setup

All experiments were run locally. The specifications of the local machine used to run experiments
are given below:
1. Local Machine: MacBook Pro 2021
2. CPU: Apple M1
3. GPU: Apple M1

You might also like