Fyp2021 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Enhancing Stock Price Prediction

Models by using Concept Drift


Detectors
Charlton Sammut

Supervisor(s): Dr. Charlie Abela and Dr. Vince Vella

Faculty of ICT
University of Malta

April 2021

Submitted in partial fulfillment of the requirements for the degree of B.Sc. ICT in
Artificial Intelligence (Hons.)
Faculty of Information and Communication Technology
FACULTY/INSTITUTE/CENTRE/SCHOOL______________________

DECLARATIONS BY UNDERGRADUATE STUDENTS

0490200L
Student’s I.D. /Code _____________________________

Charlton Sammut
Student’s Name & Surname _________________________________________________

Bachelor of Science in Information Technology (Honours) (Artificial


Course _________________________________________________________________ Intelligence)

Title of Long Essay/Dissertation


________________________________________________________________________

Enhancing Stock Price Prediction Models by using Concept Drift Detectors


________________________________________________________________________

________________________________________________________________________

9500
Word Count ___________

(a) Authenticity of Long Essay/Dissertation

I hereby declare that I am the legitimate author of this Long Essay/Dissertation and that it is my
original work.

No portion of this work has been submitted in support of an application for another degree or
qualification of this or any other university or institution of higher education.

I hold the University of Malta harmless against any third party claims with regard to copyright
violation, breach of confidentiality, defamation and any other third party right infringement.

(b) Research Code of Practice and Ethics Review Procedures

I declare that I have abided by the University’s Research Ethics Review Procedures.

______________________ CHARLTON SAMMUT


______________________
Signature of Student Name of Student (in Caps)

28/06/2021
_____________________
Date

08.02.2018
Abstract:

Due to recent advances made in the field of machine learning, various research has been
done on the issue of applying machine learning models to the stock market. As stated
by the efficient market hypothesis, the market is constantly fluctuating and due to it’s
dynamic nature, certain underlying concepts start to change over time. This phenomena
is known as concept drift. When concept drift occurs the performance of machine learning
models tends to suffer, sometimes drastically. This decline in performance occurs because
the data distributions that were used to train the model are no longer in-line with the
current data distribution.

This dissertation contributes four retraining processes to help mitigate the problem of con-
cept drift. In this dissertation a state-of-the-art Adv-ALSTM model is used together with
a HDDMA concept drift detector. Every time the HDDMA concept drift detector detects
a concept drift, the model undergoes one of the four possible retraining methods.

In the evaluation, the results of the vanilla model are compared to the results of the
models that are fitted with a concept drift detector. The conducted experiments highlight
the effectiveness of each of the proposed retraining methods, as well as how each of the
methods mitigates the negative effects of concept drift in different ways. The best observed
results were a 2.5% increase in accuracy and a 135.38% increase in MCC when compared
to the vanilla model. These results validate the effectiveness of the proposed retraining
methods, and highlight how important it is for a machine learning model to address concept
drift.
Acknowledgements:

I would like to thank and express my gratitude to my supervisors Dr Charlie Abela and
Dr Vincent Vella, who guided me throughout my work and shared their knowledge and
expertise in their respective fields. This dissertation would not have been possible with-
out them. I would also like to thank my family and Petra for their continuous support
throughout my academic journey thus far. Lastly I would like to thank my close friends
along with the JEF Executive Board for making my University experience a memorable
one.
Contents
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background and Literature Review 5


2.1 The Stock Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Concept Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Handling Concept Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Concept Drift Frameworks . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Concept Drift Detectors . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Error rate-based drift detectors . . . . . . . . . . . . . . . . . . . . 10
2.4 Machine Learning for Stock Price Prediction . . . . . . . . . . . . . . . . . 13

3 Methodology 15
3.1 Replicating the machine learning model . . . . . . . . . . . . . . . . . . . . 15
3.2 Selected dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Selecting the concept drift detector . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Attaching the concept drift detector to the model . . . . . . . . . . . . . . 17
3.4.1 Method 1: Handling recurring distributions . . . . . . . . . . . . . 17
3.4.2 Method 2: Periodic training every n days . . . . . . . . . . . . . . . 18
3.4.3 Method 3: Forgetting irrelevant data . . . . . . . . . . . . . . . . . 19
3.4.4 Method 4: Training on similar distributions . . . . . . . . . . . . . 20
3.4.5 Evaluation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Evaluation and Results 25


4.1 O1 - Selecting and replicating a machine learning model . . . . . . . . . . 25
4.1.1 Experiment 1 - Replicating the chosen Adv-ALSTM model . . . . . 25
4.2 O2: Selecting the concept drift detector . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Experiment 2 - Testing the concept drift detectors on the datasets . 26
4.2.2 Experiment 3 - Testing the concept drift detectors for type II errors 26
4.2.3 Experiment 4 - Finding the most optimal parameters for HDDMA . 27

i
4.3 O3: Attaching the concept drift detector to the model . . . . . . . . . . . . 28
4.3.1 Experiment 5 - Evaluation and results of Method 1 . . . . . . . . . 29
4.3.2 Experiment 6 - Evaluation and results of Method 2 . . . . . . . . . 29
4.3.3 Experiment 7 - Evaluation and results of Method 3 . . . . . . . . . 30
4.3.4 Experiment 8 - Evaluation and results of Method 4 . . . . . . . . . 30
4.3.5 Summary of the generated results . . . . . . . . . . . . . . . . . . . 31

5 Conclusions 32
5.1 Revisiting aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A: Results Generated 41

ii
List of Figures
1 Types of concept drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Flowcharts of the four proposed retraining methods . . . . . . . . . . . . . 22
3 HDDMA tested on the Google stock . . . . . . . . . . . . . . . . . . . . . . 28
4 Method 2 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Method 2 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Method 3 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Method 3 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Method 4 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9 Method 4 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

iii
List of Tables
1 Ten different events that caused a concept drift in the stock market . . . . 17
2 KDD17 Replicated Results (RR) compared to Original Results (OR) . . . 25
3 StockNet Replicated Results (RR) compared to Original Results (OR) . . . 25
4 The different detectors tested on both datasets . . . . . . . . . . . . . . . . 26
5 The different detectors tested for type II errors . . . . . . . . . . . . . . . . 26
6 HDDMA tested on the StockNet dataset . . . . . . . . . . . . . . . . . . . 27
7 HDDMA tested on the KDD17 dataset . . . . . . . . . . . . . . . . . . . . 27
8 Method 1 results compared to vanilla model’s results . . . . . . . . . . . . 29
9 Method 1 & 2 results compared to vanilla model’s result . . . . . . . . . . 30
10 Method 3 results compared to vanilla model’s result . . . . . . . . . . . . . 30
11 Method 4 results compared to vanilla model’s result . . . . . . . . . . . . . 31
12 Best results generated by each method . . . . . . . . . . . . . . . . . . . . 31
13 Method 2 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
14 Method 3 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
15 Method 4 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

iv
1 Introduction
Time series prediction is the process of forecasting future values from a temporally ordered
sequence of data [1]. Time series prediction is used in multiple real-world domains, includ-
ing economic forecasting [2], sales forecasting [3] and network load forecasting [4]. This
dissertation aims to investigate time series prediction in the finance domain, specifically in
the stock market domain. This dissertation also tackles a particularly challenging problem
related to time series prediction, that of concept drift [1, 5, 6, 7].

1.1 Problem Definition


Applying time series prediction to the stock market is no trivial task. This is especially
true if one assumes that the efficient market hypothesis holds true (like many experts do)
[8]. The efficient market hypothesis states that the current asset price reflects any and all
available information. This implies that all of the new information that enters the market
is used, hence making it very difficult to predict future prices based on old prices. Due to
the market’s dynamic and fluctuating nature, certain underlying concepts start to change
over time. This phenomena is classified as concept drift [1, 5, 6, 7].
When concept drift occurs machine learning models tend to give substantially less
accurate results [1, 5, 6, 7], this occurs due to the concepts of past training data no longer
being in-line with the current data’s concepts. This was clearly seen in the 1,000,000
model test, where 1,000,000 different Support Vector Machine (SVM) models were trained
to forecast the market direction of ten different stocks [9]. All ten stocks are found in the
Standard and Poor’s 500 (S&P 500) index. This is a popular index that is considered by
many investors and experts to be the best overall measurement of the current state of the
American stock market [10]. The models were trained and tested on a number of factors
such as the closing price of the ten different stocks from November 1993 till December 2008
(15 years worth of data). The best models had an accuracy of over 78%. The researchers
then proceeded to test their best models for the next two years (January 2009 - January
2011), to make predictions on out-of-sample data. The models accuracy dropped to 51%
[9]. This occurred since the concept’s that were present in the training data were no longer
in-line with the concepts present in the out-of-sample data.
The problem of concept drift in real-world domains is very challenging as the concepts
that are present can be hidden or unknown. These concepts tend to be susceptible to
change, and the cause for this change can also be hidden or unknown [7]. Another difficulty

1
that the problem of concept drift presents is that there are multiple types of concept
drifts, with each one presenting its own unique set of challenges (refer to Section 2.2).
Furthermore, another ensuing difficulty is that one needs to be able to differentiate between
concept drift and noise, as it is very common for these two to be confused [6, 7, 11].

1.2 Motivation
A large number of machine learning techniques are used as analytical tools by traders in
order to make predictions on different financial instruments [12]. If a model has an accuracy
of around 56%, then that model can be considered to be satisfactory [13]. However an
accuracy of 56% isn’t much better than a random guess (50% chance of being correct),
and even then, an accuracy of 56% is very difficult to achieve let alone maintain [13, 14].
Furthermore, past research has shown that an investor needs to have at least an accuracy
of 66% to make more profit than if they simply flip a coin [15]. Due to these issues,
investors do not rely solely on machine learning models. With this in mind, one of the
main motivators behind this dissertation was to see how close we can get to the 66%
accuracy threshold by using state-of-the-art technologies.
Another motivator behind this dissertation was to investigate how well state-of-the-art
technologies handle the problem of concept drift, as it is a problem that negatively effects
many machine learning models in different fields of interest.

1.3 Aims and Objectives


The main research question that this dissertation aims to answer is: “Can a machine
learning model that has been trained on a financial dataset get better results if it undergoes
a retraining process every time a concept drift has been detected?” This research question
will be answered by addressing the following three objectives:

• Objective 1 (O1): Replicate a state-of-the-art machine learning model that has been
previously applied on a dataset containing stock market data.

• Objective 2 (O2): Compare and contrast state-of-the-art concept drift detectors in


order to find the most suitable concept drift detector.

• Objective 3 (O3): Use the selected concept drift detector together with the selected
machine learning model and observe if together they give better results when com-
pared to the vanilla machine learning model.

2
In O1 a number of machine learning models that have previously given good results in
the field of stock movement prediction will be examined. We intend to replicate a machine
learning model that is both feasible to replicate and gives the best results on a variety
of stocks. It is going to be ensured that the model is replicated with a high degree of
precision. This will be done by comparing the results generated by the replicated model
with the results of the original model. In O2 we intend to test and evaluate various state-
of-the-art concept drift detectors based on a set of criteria, such as the number of concept
drifts detected on time and the number of concept drifts which were detected late. Then
in O3, the most suitable concept drift detector will be integrated with the model chosen
from O1. Each time the concept drift detector detects a concept drift, the model will
undergo a retraining process. During the evaluation, the results of the vanilla model will
be compared to the results of the model that was fitted with a concept drift detector. If
the model with the concept drift detector gives better results than the vanilla model, it
would indeed suggest that machine learning models that have been trained on a financial
dataset would benefit from having a concept drift detector attached to them.

1.4 Summary of Findings


From the research conducted it was concluded that the best machine learning model to be
used is an Adversarial Attentive Long Short-Term Memory (Adv-ALSTM) neural network
[14]. As is discussed in Experiment 1 (Section 4.1.1), the model was successfully replicated.
In addition to this, as is seen in Experiments 2 through 4 (Section 4.2), it was concluded
that the most suitable concept drift detector is the Hoeffding’s inequality based Drift
Detection Method with moving Average-test (HDDMA ) concept drift detector [16]. From
additional experimentation, it was also observed that the larger the percentage of in-line
data that the model has been trained on, the better the results obtained. In Experiment 5,
the amount of data known by the model that was not in-line with the current distribution
was increased. This led to a drop in performance. On the other hand, in Experiments 6-8
the percentage of data known by the model that was in-line with the current distribution
was gradually increased. With each gradual increase, an increase in performance was
observed. The highest Accuracy (ACC) score achieved was that of 54.18, resulting in
an improvement of 2.50% and the highest Matthews Correlation Coefficient (MCC) score
achieved was that of 0.1224, resulting in an improvement of 135.38%.

3
1.5 Document Structure
The remainder of this dissertation is organised as follows. Section 2 discusses the stock
market while also explaining different machine learning approaches that have been previ-
ously used for stock price prediction. It also defines concept drift and discusses a variety of
past literature related with handling concept drift. Section 3 details how the experiments
used to address O1, O2 and O3 are going to be conducted. Section 4 shows and discusses
the results generated from these experiments. Lastly, Section 5 shows what future work
can be conducted, followed by a conclusion.

4
2 Background and Literature Review
2.1 The Stock Market
The stock market, also referred to as the equity market [12], is essentially an organized
and regulated financial market where tradable financial assets, referred to as securities,
can be purchased. Some examples of securities include stocks, bonds and shares [17]. The
market has been previously described as a random walk model [18, 19]. If this is the case
then it would practically be impossible to make any sort of real accurate predictions based
on previous prices. It would also mean that the most accurate possible prediction for any
future price is the current price. There is however, a considerable amount of evidence to
suggest that this is not the case [20].
It is known that the stock market experiences regime shifts, also referred to as regime
drifts or regime changes [21, 22]. A market regime shows the current state of a given
stock/group of stocks. A regime shift occurs when a stock stops following its previously
observed pattern. For instance, it suddenly increases or decreases in price. A multitude
of reasons can cause market regime shift such as inflation, economic growth and a change
in the political environment [12, 21]. In recent years, a lot of research and effort has
gone into classifying, detecting and even forecasting regime shifts [22]. In order for one to
make good predictions on stock price movement, regime shifts need to be accounted for.
This dissertation addresses the issue of regime shift from a data driven perspective, by
addressing the problem of concept drift [21, 22, 23]. The problem of making predictions
on stock information has been extensively studied by market analysts and investors, as
well as researchers from various fields of disciplines. This is both due to the difficulty of
the problem and also the potential financial gain that comes with solving the problem. A
stock has a number of key features, these being [12, 14, 24]:
• Open Price: The price of a stock at the start of a given trading day.
• Close Price: The price of a stock at the end of a given trading day.
• Adjusted Close Price: The price of a stock at the end of a given trading day, with
the adjusted close price accounting for any corporate expenses. It is often considered
the true price of the stock.
• High: The highest price that a stock was selling for on a given trading day.
• Low: The lowest price that a stock was selling for on a given trading day.
• Volume: The number of shares of a stock traded during a given trading day.

5
One can make predictions on one or more of these features, however models that make
predictions on either the close price feature or adjusted close price feature tend to be the
most popular [12]. Predicting the exact price of a stock is believed to be unpredictable and
non-feasible [14], however predicting the movement of a stock, that is, whether the price
of a stock will increase or decrease is a much more achievable and feasible goal. Hence,
this dissertation will mainly focus on those models that make predictions on stock price
movement.

2.2 Concept Drift


The problem of concept drift was first proposed in 1986 by J.Schlimmer et al. [25]. They
argued that if the noise in a given dataset becomes abundant, it no longer remains noise
but rather becomes an integral part of the data [25, 26]. Since then, concept drift has been
labelled differently across research, such as concept shift [27] and dataset shift [28]. While
there is more than one definition for concept drift, it can be formally written as follows:
Given a time period [0, t] and a set S = {d0 , d1 , ..., dt }, where dα is a data instance at
timestamp α and S follows a distribution f0,t , concept drift is said to occur at timestamp j
if fi,j−1 6= fj,k , where i < j < k and k ≤ t [11, 26, 29]. From this definition of concept drift
a hypothesis can be made of what the ideal effective learner should look like. It should be
noted that the term ‘learner’ refers to some general machine learning algorithm or model.
The ideal learner should be able to quickly adapt to concept drift, while still being robust
enough to noise [7, 26]. It should also be able to save data distributions so, should these
data distributions reappear, their data can be used later by the learner [7].
There are four main types of concept drift [7, 26]. To better understand each type
of concept drift, an example is going to be given based on those models that attempt to
predict the daily weather [30], as depicted in Figure 1.

• Gradual: This kind of concept drift occurs when a new data distribution gradually
replaces the current data distribution. This normally happens over a prolonged
period of time. At the start of this concept drift, the new data distribution can be
easily confused for noise, as the two distributions will be fluctuating, until eventually
the new distribution overrides the current one (example: a change in season).

• Incremental: This kind of concept drift occurs when one data distribution incre-
mentally changes to another over a period of time (example: global warming).

6
• Recurring: This kind of concept drift follows a cyclic pattern, where the data
distribution shifts between a selected set of data distributions (example: the four
seasons of the year).

• Sudden: Sudden concept drift, also referred to as instantaneous or abrupt concept


drift, changes the data distribution very rapidly and over a short period of time
(example: heat waves).

Figure 1: Types of concept drift

2.3 Handling Concept Drift


2.3.1 Concept Drift Frameworks

A number of different techniques and frameworks have been developed to try and tackle
concept drift. Most of these developed frameworks can be clustered into one of the following
categories [5, 7]:

• Instance selection: These techniques train the model on select instances/batches


of data that follow a similar data distribution to the current distribution, while also
training the model on recent batches of data. These techniques work on the assump-
tion that recent data follows the current distribution, hence they are susceptible to

7
sudden concept drift. An example of such an algorithm is the window resize algorithm
for batch data [31].

• Ensemble learning: These techniques maintain a set of data descriptions, and


through some defined heuristic, the most similar data distribution is selected to be
trained on by the model. Some techniques remove distributions if they are deemed
to not be useful in order to save computational resources. An example of this is the
conceptual clustering and prediction framework [32].

• Instance weighting: These techniques concurrently use multiple data distribu-


tions, but give more weighting to those distributions that are more similar to the
current distribution. When implementing such a technique one needs to be careful
of overfitting, due to the large number of data distributions that could be present.
An example of such an algorithm is concept drift 3 (CD3) [33].

2.3.2 Concept Drift Detectors

All of the techniques mentioned in Section 2.3.1 rely on concept drift detectors to identify
when a concept drift has occurred. A concept drift detector quantifies concept drift by
identifying points in the data stream where a change in data distribution occurs [26]. In
general, most concept drift detectors tend to follow these four stages [26]:

1. Data Retrieval Stage: In this stage, chunks of data are collected from the data
stream. The size of each collected chunk depends on the concept drift detector.

2. Data Modeling Stage: In this stage the collected data is modeled/converted into
a structure that is more easily consumed by the concept drift detector. Not only
does this make it easier to infer certain key features from the data, but also makes
reading the data faster [34]. During this stage any redundant data is discarded.

3. Test Statistics Calculation Stage: This stage measures the amount (if any) of
dissimilarity that is present in the data. How this dissimilarity is measured depends
on the concept drift detector, as each concept drift detector makes use of different
statistical tests, with each test considering different aspects of the data. This stage
tends to be the most challenging, as it is still up to debate on how to define an
accurate and robust dissimilarity measurement [26].

8
4. Hypothesis Test Stage: This stage makes use of the previously obtained dissim-
ilarity score and determines the drift confidence interval. The drift confidence score
determines if the dissimilarity that is present in the data is caused by concept drift,
noise or random sample selected bias [26, 35]. This is accomplished by making use of
a hypothesis test that evaluates the statistical significance of the value calculated in
the third stage. Some hypothesis tests that have been used in past literature include;
the permutation test [35], bootstrapping [36] and the Hoeffding’s inequality-based
bound identification [16].

Modern concept drift detectors have added a variety of features to the above framework
in order to achieve more accurate predictions. With the vast amount of concept drift
detectors that are available, it would infeasible to list all of them, however, they can be
clustered into the following three methods [26]:
Error rate-based drift detection methods: These methods, also referred to as
sequential analysis drift detection methods [37], are a group of concept drift detectors that
keep track of the predictions made by a base model. These predictions are compared to
the real observed values, and using these values, the model’s error rate is calculated. This
process is repeated with each new data point. If the error rate is proven to have changed
by a statistically significant amount, the concept drift detector raises a flag to indicate it
has detected a concept drift [26, 37]. This group of concept drift detectors is the most
popular group as they tend to give the best results [26, 37, 38]. Some popular examples
of concept drift detectors that belong to this group include the Drift Detection Method
(DDM) [39] and the Page-Hinkley Test (PHT) [40] concept drift detectors.
Data Distribution-based Drift Detection: Concept drift detectors that fall under
this group of detectors make use of a distance metric to quantify the level of dissimilarity
between the previously observed distributions and the current distribution. If this calcu-
lated dissimilarity value is proven to be statistically significant, the detector raises a flag to
indicate that concept drift has occurred. These detectors come with a unique advantage,
as they generate certain key information about the drift, such as accurately finding the
time when the drift started to occur. However these algorithms tend to be computation-
ally expensive [26]. Popular examples of concept drift detectors that belong to this group
include: Principal Component Analysis based Change Detection framework (PCA-CD)[41]
and Statistical Change Detection for multidimensional data (SCD) [42].
Multiple Hypothesis Test Drift Detection: This group of concept drift detectors
apply multiple hypothesis/statistical tests. For the detector to identify a concept drift, all

9
of these hypothesis tests need to evaluate to true. Some detectors run tests in parallel,
while others run them systematically, meaning that for test t n to start, the test t n-1 needs
to evaluate to true. This group tends to not give good results when compared to the other
two groups [26, 38]. Some examples of concept drift detectors that belong to this group
include: Just-In-Time adaptive classifiers (JIT) [43] and Hierarchical Change-Detection
Tests (HCDTs) [44].
This dissertation will focus on the most popular and accurate concept drift detectors
and will therefore provide further details about detectors from the first group.

2.3.3 Error rate-based drift detectors

One of the most cited and used concept drift detector is the DDM detector [26, 38, 39].
It was proposed by Gama et al. in 2004 [39]. DDM was one of the first concept drift
detection algorithms to implement an error rate-based drift detection approach [26, 39],
and while perhaps it’s performance is not as good as more modern concept drift detectors
[38], it is definitely a pioneer in the field of detecting concept drift. Not only did it inspire
a multitude of other drift detection methods such as; Dynamic Extreme Learning Machine
(DELM) [45], Early Drift Detection Method (EDDM) [46], Hoeffding’s inequality based
Drift Detection Method (HDDM) [16] and Learning with Local Drift Detection (LLDD)
[47], it is also the first concept drift detector that is able to raise both warning and drift
flags [26]. Prior to DDM, all concept drift detectors raised a flag to indicate that a concept
drift has occurred, however with the addition of warning flags, concept drift detectors could
now raise a warning flag to indicate that a concept drift will probably soon occur, and a
drift flag to indicate that a concept drift has just occurred. This allowed for models to not
only take a reactive approach but also take a proactive approach.
DDM works by firstly checking the online error-rate. If this error rate changes to a
statistically significant amount that is either greater or equal to the warning level threshold,
it raises a warning flag and starts to use the oncoming data points to train a new model. If
the error rate continues to change even more, by an amount that is either greater or equal
to the drift level threshold, then a drift flag is raised and the old model is replaced by the
newer model. It is this new model that will now be used to make future predictions, since
the data it was trained on is more in-line with the current distribution.
Around two years after the release of the DDM detector, the EDDM concept drift
detector was proposed [46]. This detector quickly began to pick up in popularity [38].
When compared to DDM it was able to detect gradual concept drift with better accuracy,

10
while still having a good accuracy score on sudden concept drift [38, 46]. EDDM was
proposed by Garcı́a et al. and they argued that when no concept drift is present, the
distance between any two consecutive misclassifications should be larger than the distance
between two consecutive misclassifications when concept drift is present. EDDM uses this
hypothesis to detect concept drift. If the distance between these misclassifications starts
to decrease by a statistically significant amount, then warning and drift flags are raised.
The significance in the decrease in distance depends on certain threshold values that can
be changed through the concept drift detector’s parameters [46].
Another popular concept drift detector that has been extensively used in past literature
is the PHT concept drift detector [38, 40]. This concept drift detector works by measuring
and keeping track of the distance between the current data distribution and a normal
distribution. This distance is calculated using the PHT [48]. If this distance changes by a
statistically significant amount, then the concept drift detector raises a drift flag [40].
In 2018, a survey was conducted by Barros and Santos [38], whereby 14 popular state-
of-the-art concept drift detectors were tested on a large number of datasets. From the
survey it emerged that the HDDMA [16] detector gave the best results [38].
For one to understand how HDDMA works, the concept drift detector which it is based
on, this being HDDM needs to first be explained [16]. HDDM works by making use of
three different states. The STABLE state, which implies that there is currently no concept
drift. The WARNING state, which implies that the probability of a concept drift occurring
soon is high. The DRIFT state, which implies that the probability that a concept drift has
just occurred is 99% or more. When the WARNING state is entered, the detector raises
a warning flag, and similarly, when the DRIFT state is entered, the detector raises a drift
flag. The detector always starts at the STABLE state and for the detector to transition to
the WARNING state, the following condition needs to hold [16]:

P (µ − 2σ ≤ c ≤ µ + 2σ) ≈ αW (1)

where:

• µ= Expected value, this is calculated using Hoeffding’s inequality [49]

• σ= Standard Deviation

• c = Current observed value

• αW ∈ (0, 1] = Warning confidence parameter (default = 0.95)

11
Equation 1 determines if the current observed value c is statistically far from the
expected value µ, and if it is, then the detector transitions to the WARNING state. Once
Equation 1 no longer evaluates to true, the detector transitions from the WARNING state
back to the STABLE state. For the detector to transition from the STABLE or WARNING
state to the DRIFT state the following equation needs to evaluate to true [16]:

P (µ − 3σ ≤ c ≤ µ + 3σ) ≈ αD (2)

where:

• αD ∈ (0, 1] = Drift confidence parameter (default = 0.99)

Equation 2 works very similar to Equation 1, in the sense that it checks if c is statisti-
cally far from the expected value µ. If Equation 2 holds, then the detector will transition
to the DRIFT state, regardless of which state it is in.
Due to the nature of how HDDM detects concept drift, it tends to not detect sudden
concept drift very quickly [16, 38]. This is not ideal as models need to react as quickly as
possible to sudden concept drift. To combat this, HDDMA further enhances on HDDM by
applying the bounding moving Averages Test (A-Test) [16]. This is done by keeping track
of the average value of the current distribution, and if this average value were to suddenly
change to an amount that is statistically significant, the probability for the detector to
change to the WARNING or DRIFT state is increased [16, 38].
Another variant of HDDM has also been developed, that being Hoeffding’s inequality
based Drift Detection Method with Weighted moving averages (HDDMW ). This detector
works very similarly to HDDMA , but rather than applying the A-Test to HDDM it applies
the Weighted moving averages Test (W-Test). The W-Test also keeps track of the average
value of the current distribution, but it gives more weight (importance) to recent data [16],
hence the older the data, the less weight it is given. While HDDMW has been used in past
literature, HDDMA tends to give better results in a variety of real-world situations [38].

12
2.4 Machine Learning for Stock Price Prediction
A survey conducted in 2020 investigated and compared a large number of machine learning
techniques that are used by traders as analytical tools in order to make predictions on
different financial instruments [12]. Some of these techniques include k-means clustering,
neural networks and genetic algorithms. In the survey, these various machine learning
models were applied to a number of datasets containing data about different financial
instruments. One common element that was observed from the results generated is that
neural networks gave good results throughout [12].
In O1, a suitable machine learning model needs to be selected and replicated. To com-
plete this objective, a number of state-of-the-art neural networks are going to be analysed
on a number of criteria. The first criterion is to ensure that, when compared to other mod-
els, the chosen machine learning model can generate good results. Secondly, the model
should resemble as closely as possible the previously defined ideal effective learner in Sec-
tion 2.2. Another important criterion is to ensure that the chosen model can be replicated
with a high degree of precision. This is to ensure that the results generated by the chosen
model fitted with a concept drift detector can be fairly compared with past literature.
Elman Neural Network (ENN): ENNs are a special type of Recurrent Neural Net-
works (RNNs) [50, 51]. ENNs use the output generated by their hidden layer as feedback.
This is done with the addition of a context/recurrent layer. This allows for ENNs to learn
the current data distribution better than other neural network models [50]. Despite being
originally proposed in 1990, it is still used in a variety of domains, as even when compared
to more modern models, it still gives good results [52]. In 2021, grey wolf optimization was
applied on an ENN by S. Kumar [52]. This optimised ENN model was then applied on a
financial dataset. The ENN managed to outperform all of the other previously proposed
models, and achieved results that were not seen in past literature. With this being said,
the dataset only contained eight different stocks, and the ENN model had a relatively high
standard deviation, meaning it could not consistently make predictions on all of the eight
stocks.
Long Short-Term Memory (LSTM): A LSTM neural network [53] is a type of RNN
that is capable of learning long-term dependencies. This is accomplished by replacing the
traditionally used artificial neurons in the hidden layer with memory cells. These memory
cells are capable of remembering data which is deemed important for a long period of
time. Through this, the model is capable of making a connection with past data to the
present data, hence partially solving the problem of long-term dependencies [53]. LSTM

13
neural networks are one of the most successful RNNs architectures designed, and they have
previously given good results in the problem of stock price movement prediction [14, 54].
Due to their great success, efforts have been made to try and improve upon LSTMs.
Attentive Long Short-Term Memory (ALSTM): As seen in [14, 55], an ALSTM
neural network further builds upon the previously mentioned LSTM neural network. This
is mainly accomplished by the addition of a temporal attention layer. This layer compresses
data at different time-steps into an overall representation with adaptive weights. The aim
of this layer is to use multiple compressed representations of data at the same time, giving
more importance to those with a higher weighting. As can be seen in [14], an ALSTM
model outperforms it’s LSTM counterpart.
Adversarial Attentive Long Short-Term Memory (Adv-ALSTM): Through
the use of adversarial training, the Adv-ALSTM neural network further enhances the
previously discussed ALSTM neural network [56, 57]. Adversarial training is the process of
adding malicious data to the training data. This malicious data, referred to as adversarial
examples, are generated by systematically copying and then transforming parts of the
training data. The transformation that occurs is classified as adversarial perturbation [14].
Through this method not only does the model have more data to work with but it also
becomes more robust to noise. As seen in [14], Adv-ALSTM severely outperforms both
it’s LSTM and ALSTM counterparts.
Since the Adv-ALSTM model gave the best results when compared to the LSTM and
ALSTM models [14], as well as taking into account that the ENN model was only applied
on eight different stocks [50], whereas the Adv-ALSTM model was tested on over 130
different stocks [14], the Adv-ALSTM model was deemed to be the most suitable model
to be replicated.

14
3 Methodology
In this section, the methods used to achieve the three objectives mentioned in Section 1.3
are going to be discussed.

3.1 Replicating the machine learning model


As stated in Section 2.4, the Adv-ALSTM model found in [14] was chosen to be repli-
cated.1 According to [14], the model ran for a total of 150 epoch, hence, in order to fairly
compare the replicated model with the original, the replicated model will also run for 150
epoch. Since the model comes with a small degree of randomness [14, 53], this process
will be repeated for ten iterations, hence creating ten different models. The model that
produces results most similar to the results found in [14] will be selected. The best results
generated by the vanilla model will be used as a benchmark.

3.2 Selected dataset


The chosen Adv-ALSTM model was originally implemented with the aim to enhance stock
movement prediction through the use of adversarial training [14]. The model was applied
on two different datasets; the StockNet dataset [58] and another unnamed dataset that
was used in [59] to predict stock prices. In order to be able to compare the results of the
replicated model with the results of the original model, the same datasets were used.
As done in [14], the unnamed dataset shall be referred to as the KDD17 dataset. The
StockNet dataset contains historical data of 88 different stocks from the 1st of January
2014 till the 1st of January 2016, with a total of 652 data points for each stock, which gives
a total number of 57,376 different data points. It contains stocks from both the National
Association of Securities Dealer Automated Quotation market and the New York stock
exchange market. All of the data was collected from Yahoo! Finance 2 [58]. On the other
hand, the KDD17 dataset contains historical data of 50 different stocks from the 1st of
January 2006 till the 1st of January 2016, with a total of 2,518 data points for each stock.
This gives a total number of 125,900 data points. All of the stocks belong to the United
States markets. Once again, all of the data was collected from Yahoo! Finance 2 [59]. As
done in [14] the data is going to be split as follows, 80% training, 10% validation and 10%
testing.
1
The original code for this model can be found in: https://www.github.com/fulifeng/Adv-ALSTM
2
https://finance.yahoo.com/

15
3.3 Selecting the concept drift detector
The aim of O2 is to find a suitable concept drift detector. To do this, two experiments
are going to be carried out. In the first experiment a number of concept drift detectors
are going to be tested and evaluated on the StockNet and KDD17 datasets. In the second
experiment, the same concept drift detectors are going to be tested on stock data from
periods of time that concept drift occurred beyond any unreasonable doubt, that is those
periods in time where a stock experienced events such as a crash, rally or surge. The ideal
concept drift detector should be able to distinguish concept drift from noise and detect all
four types of concept drift. A total of four concept drift detectors are going to be tested,
these being; EDDM [46], HDDMA [16], HDDMW [16] and PH [40]. These four specific
concept drift detectors have been applied in numerous real-world applications/domains
[38]. An explanation for each of these concept drift detectors can be found in Section
2.3.2.
In the first experiment, the concept drift detectors are going to be evaluated on the
number of concept drifts they detect in the StockNet and KDD17 datasets. This experi-
ment does not account for type I and type II errors. Type I errors, also referred to as false
positives, occur when the concept drift detector detects a concept drift, even though no
concept drift occurred. These types of errors are the most damaging when computational
resources are limited, as they force a retraining process when it is not needed, hence effec-
tively wasting computational resources. Type II errors, also referred to as false negatives,
are much more damaging to the performance of the model. These types of errors take
place when a concept drift occurs, and the concept drift detector does not raise a drift
flag. This leads to the model not adapting to this new distribution, hence decreasing the
model’s performance.
Since the cause for a concept drift can be hidden or unknown, it is not easy to check for
type I errors. However it is possible to check for type II errors by testing a concept drift
detector during a period in time where concept drift occurred beyond any unreasonable
doubt. To check for type II errors, in the second experiment, the concept drift detectors
are going to be tested on ten different events that led to the stock market experiencing a
concept drift. These ten events can be found in Table 1. They were chosen due to the fact
that together they cover a broad range of well known events.

16
Table 1: Ten different events that caused a concept drift in the stock market
Time Period Stock(s) effected Type Cause
March 1999 Sony Group Corp. Rally Announcement of the PlayStation 2
June 2007 Apple Inc. Surge Release of the iPhone
September 2008 Dow Jones Industrial Average Crash Congress rejecting the bailout bill
May 2010 Dow Jones Industrial Average Crash Flash Crash of May 2010
August 2011 S&P 500 Fall Fear of contagion of the European sovereign debt crisis
August 2015 Dow Jones Industrial Average Sell-off Multitude of reasons, such as Greece defaulting on its debt
February 2020 S&P 500 Fall The COVID-19 pandemic
Elon Musk, CEO of Tesla, making a statement on
May 2020 Tesla Inc. Sudden Drop
Twitter stating that the Tesla stock price is too high3
Announcement of the Ray Tracing Texel eXtreme
September 2020 NVIDIA Corp. Surge
(RTX) 30 Series graphics processing unit
The WallStreetBets subreddit surging the price of the stock
January 2021 GameStop Corp. Short Squeeze
due to a multitude of hedge funds making short sells on the stock4

3.4 Attaching the concept drift detector to the model


In O3, the concept drift detector chosen in O2 will be used together with the model chosen
in O1. Every time the selected concept drift detector raises a drift flag, the model will
undergo a retraining process. In this dissertation, four retraining methods are proposed.

3.4.1 Method 1: Handling recurring distributions

As was explained in Section 2.2, it is not uncommon for certain distributions to reappear. If
a reoccurring distribution was not present in the training set, then the model’s performance
would drop every time the distribution is encountered. Method 1 aims to remedy this. In
Method 1, if a concept drift occurs, the previous data distribution is trained on by the
given model. This is done so that, if the previous data distribution reappears in the future,
the model would be able to make better predictions on it. Figure 2 illustrates this method
and Algorithm 1 formalises it.
Originally, it was intended that the model would always train on the previously observed
data distribution, however this would not account for those instances of distributions that
contained a small number of data points. Such instances are not sufficient enough for the
model to fully learn the distribution. In order to account for this, a threshold system was
setup (Algorithm 1 - line 12). This threshold system ensures that the model will resume
3
https://www.twitter.com/elonmusk/status/1256239815256797184
4
https://www.twitter.com/elonmusk/status/1256239815256797184

17
training on at least three months of data, meaning a total of 63 data points, as the stock
market is open on an average of 21 days per month [60]. This threshold system works
by checking if the distribution is larger than 63 data points, and if it is not, the model is
instead trained on the last 63 data points. The reasoning behind this threshold system is
that recent data tends to be similar to the current distribution [5, 7], and hence by using
this recent data the model will be able to better learn the distribution.

Algorithm 1: Pseudocode for Method 1


1 Initialise: machine learning model: M
2 Initialise: concept drift detector: D
3 Initialise: list with potentially infinite length containing a stream of real values:
data
4 Initialise: empty list: currentDistribution
5 Split data into trainingData, validationData and testingData
6 Train model M on trainingData
7 Validate model M on validationData
8 foreach element e ∈ testingData do
9 Test model M on element e
10 Add element e to detector D
11 Append element e to list currentDistribution
12 # If concept drift just occurred
13 if D.driftDetected then
14 if currentDistribution.length ≥ 63 then
15 Train model M on currentDistribution
16 else
17 Train model M on the last 63 data points.
18 end
19 Clear currentDistribution
20 end

3.4.2 Method 2: Periodic training every n days

If the model has never been trained on a previous instance of the current distribution,
then it’s performance is bound to decrease. Method 2 aims to remedy this by training the
model on data that is in-line, hence similar with the current data distribution. Every time

18
a concept drift occurs, the model is periodically trained every n number of days. From this
we get the splitDays parameter. This parameter controls how many days the model waits
before training. Figure 2 illustrates this method and it is also formalised in Algorithm 2.

Algorithm 2: Pseudocode for Method 2


1 Initialise: machine learning model: M
2 Initialise: concept drift detector: D
3 Initialise: list with potentially infinite length containing a stream of real values:
data
4 Initialise: int: counter = 0
5 Initialise: bool: newDistribution = False
6 Split data into trainingData, validationData and testingData
7 Train model M on trainingData
8 Validate model M on validationData
9 foreach element e ∈ testingData do
10 Test model M on element e
11 Add element e to concept detector D
12 if D.driftDetected then
13 newDistribution = True
14 counter = 0
15 end
16 if newDistribution ∧ (counter ≥ splitDays) then
17 Train model M on the last splitDays number of data points
18 counter = 0
19 end
20 Increment counter
21 end

3.4.3 Method 3: Forgetting irrelevant data

While in Method 2, the model is being trained on the current distribution, the model has
still been trained on data that is not in-line with the current distribution. This data is
now irrelevant and prevents the model from making accurate predictions. Method 3 aims
to remedy this, by not only training the model on the current distribution, but also by
making the model forget data that does not belong to the current distribution. In this

19
method, when a concept drift occurs, it first forgets all the new out-of-sample data that
has been previously learnt, and then periodically learns from the new data distribution.
Algorithm 3 formalises this method while Figure 2 illustrates it.

Algorithm 3: Pseudocode for Method 3


1 Initialise: machine learning model: M
2 Initialise: concept drift detector: D
3 Initialise: list with potentially infinite length containing a stream of real values:
data
4 Initialise: int: counter = 0
5 Initialise: bool: newDistribution = False
6 Split data into trainingData, validationData and testingData
7 Train model M on trainingData
8 Validate model M on validationData
9 Initialise: machine learning model: Mr = M # Make a copy of model M, in order
to be able to rollback and forget all new out-of-sample data
10 foreach element e ∈ testingData do
11 Test model M on element e
12 Add element e to concept detector D
13 if D.driftDetected then
14 newDistribution = True
15 counter = 0
16 M = Mr # Rollback model M to Mr
17 end
18 if newDistribution ∧ (counter ≥ splitDays) then
19 Train model M on the last splitDays number of data points
20 counter = 0
21 end
22 Increment counter
23 end

3.4.4 Method 4: Training on similar distributions

While Method 3 has the advantage of ensuring that all the out-of-sample data learnt follows
the current distribution, it also assumes that all previously encountered out-of-sample data
is not in-line with the current distribution. This assumption is most likely not completely
correct, as there probably exists some older data distribution that is at least slightly similar
to the current distribution. Method 4 aims to enhance Method 3 by first measuring the
distance between the current data distribution and previously encountered distributions.

20
Then the model is trained on that distribution that is closest to the current distribution.
Various metrics have been proposed in past literature that measure the distance between
two data distributions such as the Chi-square distance [61] and the Kolmogorov–Smirnov
statistic [62]. All of these different metrics take into account different properties of the
distribution. For this particular implementation, the Kullback–Leibler (KL) divergence
test [63] is used as it has been used for similar applications in other real-world domains
[64, 65].
The KL divergence test calculates the relative entropy between two distributions, mean-
ing that it measures how much one distribution differs from another. The relative entropy
from distribution Q to distribution P is found by subtracting the entropy of P from the
cross entropy of P and Q. This is denoted as:

DKL (P ||Q) = H(P, Q) − H(P ) (3)

where:

• DKL (P ||Q) = The relative entropy from distribution Q to distribution P. The further
away this value is from 0, the larger the relative entropy.

• H(P, Q) = Cross entropy of P and Q

• H(P ) = Entropy of P

As explained in [63], Equation 3 is simplified into:

X P (i)
DKL (P ||Q) = P (i) log (4)
i
Q(i)

In Method 4 a list of data distributions l is maintained. Each time a concept drift


occurs, the previous data distribution is added to l. Much like Method 3, the model
forgets all of the new out-of-sample data it has previously learnt. It then continues to
make predictions on the current distribution, each time waiting for a splitDays number of
days before training on the current distribution. This time however, with each prediction
made, the true value is stored and maintained in a list c. After a certain number of days,
the relative entropy from distribution c is compared to every distribution found in l. The
distribution with the lowest amount of relative entropy is used for training. The number of
days the model waits before comparing c to the other distribution found in l is controlled
by the numberOfWaitDays parameter. Only one distribution is chosen as l tends to only

21
contain two to four different distributions, with only one of them being somewhat similar
to the current distribution. Should this method be applied on a larger test-set, more than
one distribution can be picked from l. Algorithm 4 formalises this method and Figure 2
illustrates it.

Figure 2: Flowcharts of the four proposed retraining methods

22
Algorithm 4: Pseudocode for Method 4
1 Initialise: machine learning model: M
2 Initialise: concept drift detector: D
3 Initialise: list with potentially infinite length containing a stream of real values:
data
4 Initialise: int: i, j = 0
5 Initialise: bool: newDistribution, learnt = False
6 Initialise: empty list: c, l
7 Split data into trainingData, validationData and testingData
8 Train model M on trainingData
9 Validate model M on validationData
10 Initialise: machine learning model: Mr = M
11 foreach element e ∈ testingData do
12 Test model M on element e
13 Add element e to concept detector D
14 Append e to list c
15 if D.driftDetected then
16 newDistribution = True
17 i, j = 0
18 M = Mr
19 Append a copy of c to l
20 Clear c
21 learnt = False
22 end
23 if newDistribution ∧ (i ≥ splitDays) then
24 Train model M on the last splitDays number of data points
25 i=0
26 end
27 if newDistribution∧!learnt ∧ (j ≥ numberOf W aitDays) ∧ l.length > 1 then
28 Calculate all the relative entropies from distribution c to each distribution
in l, and train model M on that distribution that has it’s relative entropy
from distribution c closest to 0.
29 j=0
30 learnt = True
31 end
32 Increment i
33 Increment j
34 end
3.4.5 Evaluation Plan

To evaluate Algorithms 1-4 the ACC [66] and MCC [67] metrics are going to used. These
metrics were chosen over other possible metrics as they were used in the paper that pro-
posed the chosen Adv-ALSTM model [14]. By making use of these two metrics, Algorithms
1-4 can be easily compared to the original Adv-ALSTM model.
ACC represents the proportion of samples that are correctly classified and is described
as the percentage of correctly made predictions.

Number of correct predictions TP + TN


ACC = = (5)
Total number of predictions TP + TN + FP + FN

where:

• Accuracy ∈ [0, 1] (Values closer to 1 are better)

• TP = Number of true positives

• TN = Number of true negatives

• FP = Number of false positives

• FN = Number of false negatives

While accuracy is a simple and elegant metric, it does not take into account class imbalance.
The MCC metric on the other hand accounts for such imbalance, as it treats the true values
and the predicted values as binary variables, and computes their phi-coefficient [67].

TP ∗ TN − FP ∗ FN
MCC = p (6)
(T P + F P )(T P + F N )(T N + F P )(T N + F N )

where:

• MCC ∈ [-1, 1] (Values closer to 1 are better)

24
4 Evaluation and Results
In this section, the results generated by the experiments detailed in Section 3 are going to
be discussed.5 All the code used to conduct the experiments was written in Python 3.6.6

4.1 O1 - Selecting and replicating a machine learning model


For O1 to be completed, a suitable machine learning model needed to be selected and
replicated. As stated in Section 3.1, the Adv-ALSTM machine learning model found in
[14] was chosen to be replicated. In Experiment 1, the results generated by the replicated
model were compared to the results generated by [14]. Ideally, the replicated model’s
results closely resemble those of the original model.

4.1.1 Experiment 1 - Replicating the chosen Adv-ALSTM model

Experiment 1 was conducted to ensure that the chosen Adv-ALSTM model was successfully
replicated. The replicated model was trained and tested on both the StockNet [58] and
the KDD17 [59] datasets.

Table 2: KDD17 Replicated Results (RR) compared to Original Results (OR)


ACC MCC OR ACC %Difference OR MCC %Difference
Original Results (OR) 53.05 0.0523 0 0
Replicated Results (RR) 52.86 0.0520 -0.36% -0.57%

Table 3: StockNet Replicated Results (RR) compared to Original Results (OR)


ACC MCC OR ACC %Difference OR MCC %Difference
Original Results (OR) 57.2 0.1483 0 0
Replicated Results (RR) 57.19 0.1352 -0.02% -8.83%

In Table 2 and Table 3, the best results generated by the original Adv-ALSTM model,
denoted by OR, are compared to the best results generated by the replicated model, de-
noted by RR. The differences are very small and acceptable, with almost all differences
being less than 1%. This difference was expected, due to the small amount of randomness
involved with the model [14, 53]. O1 was therefore considered to be completed.

5
The code used to conduct the experiments can be found in: https://github.com/tm-26/Enhancing-
stock-price-prediction-models-by-using-concept-drift-detectors
6
https://www.python.org/

25
4.2 O2: Selecting the concept drift detector
For O2 to be completed, a suitable concept drift detector needed to be selected and
implemented. This was done by comparing four state-of-the-art concept drift detectors.
As stated in Section 3.3, EDDM [46], HDDMA [16], HDDMW [16] and PH [40] were chosen.
These four concept drift detectors were implemented using the scikit-multiflow7 library. In
Experiment 2, these four concept drift detectors were tested on the StockNet [58] and the
KDD17 [59] datasets, while in Experiment 3 the four concept drift detectors were tested
on periods in time where a concept drift occurred beyond any unreasonable doubt.

4.2.1 Experiment 2 - Testing the concept drift detectors on the datasets

Experiment 2 was conducted in order to check the number of drift flags each concept drift
detector raises in the two chosen datasets.

Table 4: The different detectors tested on both datasets


Concept Drift Detector Num. Of Detections In StockNet Num. Of Detections in KDD17 Total Num. Of Detections
EDDM 408 378 786
HDDMA 6383 6624 13007
HDDMW 3128 2588 5716
PH 1058 404 1462

From Table 4 it is evident that the HDDMA concept drift detector has the largest number
of detections. This was expected as the same has been observed in [37, 38].

4.2.2 Experiment 3 - Testing the concept drift detectors for type II errors

In Experiment 2, the detectors were solely tested on the number of drifts that they can
detect. This however does not account for type I and type II errors. Therefore Experiment
3 was conducted to test for type II errors, as the concept drift detectors were tested on ten
different periods in time where concept drift occurred beyond any unreasonable doubt. A
list of these periods in time is found in Section 3.3.

Table 5: The different detectors tested for type II errors


Detector Type Num. Of Drifts Detected In Time Num. Of Drifts Detected Late Num. Of Missed Drifts
EDDM 0 3 7
HDDMW 2 8 0
HDDMW 0 8 2
PH 0 6 4

7
https://scikit-multiflow.github.io/

26
The concept drift detectors were evaluated on three criteria; the number of drifts they
managed to detect in time, the number of drifts they detected late and the number of
drifts they did not detect. If a drift was detected up to 31 days late, it is considered to
be detected in time, else it was considered to be detected late. As can be seen in Table
5, the HDDMA concept drift detector once again outperformed the other concept drift
detectors. It was the only concept drift detector to not miss any of the drifts, and also the
only detector to make detections in time. This was to be expected since HDDMA is known
to detect certain concept drifts while they are happening [16, 38].
From Experiment 2 and Experiment 3, it was clear that HDDMA is the most suitable
concept drift detector.

4.2.3 Experiment 4 - Finding the most optimal parameters for HDDMA

The aim of Experiment 4 is to find the optimal parameter values for the HDDMA concept
drift detector. As discussed in Section 2.3.3, the most important parameter that HDDMA
makes use of is the drift confidence parameter. This parameter controls how high the
probability of a drift occurring needs to be in order for the concept drift detector to return
an alert. By default this is set to 99%. Different values of the drift confidence parameter
were tested.
Table 6: HDDMA tested on the StockNet dataset
Drift Confidence Average Number Of Drifts Per Stock Total Number Of Drifts
0.999 73 6383
0.99 83 7294
0.9 102 8934
0.8 111 9731

Table 7: HDDMA tested on the KDD17 dataset


Drift confidence Average Number Of Drifts Per Stock Total Number Of Drifts
0.999 130 6624
0.99 149 7580
0.9 185 9438
0.8 203 10329

As was expected, the lower the drift confidence parameter the more drifts are detected.
Since retraining is a computationally taxing process, it was decided to keep the default

27
value of 0.99. Setting the parameter any lower would result in the model retraining to
often and setting the parameter any higher would lead to not enough drifts being detected.
The chart in Figure 3 illustrates the number of drifts detected in a popular stock found in
the StockNet dataset. A change in colour signifies that a concept drift has occurred.

Figure 3: HDDMA tested on the Google stock

4.3 O3: Attaching the concept drift detector to the model


For O3 to be completed, the concept drift detector chosen in O2, needs to be applied
to the model chosen in O1. The model will then use the applied concept drift detector
to undergo one out of four possible retraining methods. An explanation for each of the
retraining methods can be found in Section 3.3. In Sections 4.3.1 through 4.3.4 we evaluate
the methods and discuss the results for each of these retraining methods. All the generated
results are compared to the results generated by the vanilla model.
Originally it was planned that all the proposed retraining methods will be tested on
both the StockNet and KDD17 datasets, however since the test-set of the StockNet dataset
is small in size, little to no concept drift is present, and hence the four retraining methods

28
could not be properly tested on the StockNet dataset. Hence, it was decided to only
evaluate the retraining processes on the KDD17 dataset. All of the results shown in this
section indicate the mean and standard deviation of all the results obtained across the 50
stocks present in the KDD17 dataset.

4.3.1 Experiment 5 - Evaluation and results of Method 1

In Method 1, every time a concept drift occurs, the model was trained on the previously
observed data distribution. This was done with the assumption that the model would be
better suited to handle reoccurring distributions.

Table 8: Method 1 results compared to vanilla model’s results


ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 1 51.29 ±0.11079 0.0286 ±0.21426 -2.97% -45.0%

As can be seen from Table 8, when compared to the vanilla model, Method 1 produced
worse results. At first it was hypothesised that the model was overfitting and was not
able to generalise enough to make accurate predictions. While this might have some truth,
another and perhaps more accurate hypothesis is that the model was being trained on
data that is not in-line (not similar) with the oncoming data distribution. Since this data
is very different to the current distribution, it confuses the model hence leading to worse
results than that of the vanilla model. To test for this hypothesis, in Experiments 6-8, the
model is only trained on data that is in-line, hence similar to the current distribution.

4.3.2 Experiment 6 - Evaluation and results of Method 2

In Method 2, every time a concept drift occurred, the model was periodically trained every
splitDays number of days. In this case the assumption was that the model would benefit
from data that is in-line with the current distribution. The splitDays parameter was tested
with values ranging from 3 to 30. Table 9 shows the best results achieved from Method 2.
The complete set of results can be found in Appendix A.

29
Table 9: Method 1 & 2 results compared to vanilla model’s result
ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 1 51.29 ±0.11079 0.0286 ±0.21426 -2.97% -45.0%
Method 2 (splitDays=14) 51.96 ±0.28763 0.0487 ±0.35922 -1.70% -6.35%
Method 2 (splitDays=23) 51.92 ±0.22472 0.0769 ±0.34640 -1.78% +47.88%

From Table 9 it can be observed that Method 2 gave a better MCC result, however there
was still no improvement in the ACC result. Nonetheless, an improvement over Method 1
was observed and this improvement is due to the model being constantly fed recent data
that was in-line with the data distribution it was being tested on.

4.3.3 Experiment 7 - Evaluation and results of Method 3

Method 3 works in a similar manner to Method 2, however this time, when a concept drift
occurred, all the previously learnt out-of-sample data is forgotten. Afterwards, the model
continues exactly like Method 2, by periodically learning every splitDays number of days.
The splitDays parameter was once again tested with values ranging from 3 to 30. Table 10
shows the best results achieved from Method 3. The complete set of results can be found
in Appendix A.

Table 10: Method 3 results compared to vanilla model’s result


ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 3 (splitDays=5) 54.02 ±0.39041 0.0783 ±0.32280 +2.19% +50.58%
Method 3 (splitDays=26) 52.48 ±0.23163 0.1141 ±0.35125 -0.72% +119.42%

As can be seen from Table 10, Method 3 produced better ACC and MCC results when
compared to those of the vanilla model. From these results, and the results generated by
Method 2, it was observed that if the percentage of data learnt by the model that is in-line
with the current distribution is increased, an increase in results should be observed.

4.3.4 Experiment 8 - Evaluation and results of Method 4

Method 4 increases the amount of in-line data that the model has been trained on by
comparing the current distribution to previously encountered distributions. The model is
then trained on the distribution that is most similar to the current one. The KL divergence
test is used in order to calculate the similarity between two given distributions. The

30
numberOfWaitDays parameter, which controls how many days the model waits before
comparing the current distribution to previous distributions, was tested with values ranging
from 4 to 40. The complete set of results can be found in Appendix A.

Table 11: Method 4 results compared to vanilla model’s result


ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 4 (numberOfWaitDays=5) 54.18 ±0.11209 0.0747 ±0.09596 +2.50% +43.65%
Method 4 (numberOfWaitDays=9) 52.72 ±0.11944 0.1224 ±0.16686 -0.26% +135.38%

As can be seen from Table 11, Method 4 generated a better ACC and MCC result
when compared to those of the vanilla model. This was expected due to the high amount
of in-line data that the model was trained on. The results themselves are very promising,
with an ACC improvement of 2.5% and a MCC improvement of 135.38%. While the MCC
improved by quite a substantial amount, it’s standard deviation is also quite high, meaning
that a high MCC score was not achieved through all of the different stocks .

4.3.5 Summary of the generated results

Table 12: Best results generated by each method


ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 1 51.29 ±0.11079 0.0286 ±0.21426 -2.97% -45.0%
Method 2 51.96 ±0.28763 0.0769 ±0.34640 -1.70% +47.88%
Method 3 54.02 ±0.39041 0.1141 ±0.35125 +2.19% +119.42%
Method 4 54.18 ±0.11209 0.1224 ±0.16686 +2.50% +135.38%

Table 12 summarises the best results generated by each model. There is a clear trend,
that the larger the percentage of in-line data known by the model, the better the results
obtained. This is best shown in both Method 1 and Method 4. In Method 1, the model
was fed data that did not follow the current distribution, and hence it was the only model
to achieve substantially worse results when compared to the vanilla model. On the other
hand, in Method 4, the model is trained on both data from the current distribution, as
well as data from a previous distribution similar to the current distribution. This led to
Method 4 generating the best results.

31
5 Conclusions
5.1 Revisiting aims and objectives
This dissertation sought to answer the research question “Can a machine learning model
that has been trained on a financial dataset get better results if it undergoes a retraining
process every time a concept drift has been detected?”. As stated in Section 1.3, to answer
this research question, O1 to O3 needed to be addressed.
In O1, a state-of-the-art machine learning model that was previously applied on a
dataset that contains stock market data was replicated. Through the research conducted,
the Adv-ALSTM model found in [14] was chosen to be the most suitable model, and
Experiment 1 was conducted in order to see if it was possible to replicate the selected
model. The observed differences in results were negligible and hence O1 was considered
to be addressed.
O2 required that a state-of-the art concept drift detector be selected and implemented.
Through past literature and the comparisons made in Experiments 2 and 3, the HDDMA
concept drift detector was deemed to be the most suitable [16]. Experiment 4 was then
conducted in order to find an optimal drift confidence parameter. After these three exper-
iments were conducted, O2 was considered to be addressed.
O3 required that HDDMA be used together with the chosen Adv-ALSTM model. In
total, four retraining methods were proposed. Experiments 5 to 8 tested each of these
methods. The results generated showed that the model benefitted mostly from the fourth
method, as a 2.5% increase in ACC and 135.38% in MCC was observed.
From this increase in results it is clear that the chosen model benefitted from the
attached concept drift detector, hence the answer to the research question is that the
model does indeed benefit from an attached concept drift detector.

5.2 Future Work


While the results generated are very promising, further research can be done to potentially
improve these results even more.
Firstly, the performance of the proposed methods heavily depends on the performance
of the attached concept drift detector, and as showed in Section 4.3, state-of-the-art concept
drift detectors do not consistently detect concept drift in the stock market, as they tend
to detect it a few data points after it occurs. If a more accurate concept drift detector

32
is developed and the number of drifts detected in time increases, one would expect an
increase in performance. Since news and social media outlets effect stock prices, it would
be interesting to make use of such data to try to improve concept drift detection. Data
mining and web scraping techniques can be used to see if any stock is currently being
discussed on such platforms. Similar to the work conducted in [58, 68]. Secondly, the
proposed retraining methods can be applied on other stock price prediction models in
order to observe if all models equally benefit from the proposed retraining methods.
In addition to this, while this dissertation addresses the problem of concept drift in the
stock market domain, the proposed retraining techniques can be applied in other domains
where concept drift is also a major issue, such as the biomedical domain and the weather
prediction domain [69]. This should be done in order to see if this approach can be gener-
alised. For such a task to be carried out, one would need to first identify and understand
how the causes of concept drift differ from one domain to another. This needs to be done
in order to find the best way to detect concept drift in the different domains.

5.3 Final Remarks


This dissertation analysed the problem of concept drift in the context of stock price move-
ment prediction. It highlighted that for a machine learning model to make accurate pre-
dictions on stock prices it needs to adapt to any changes in the data distributions. When
comparing the results generated in this dissertation to results obtained from past literature,
one can see that our results outperform the previously best observed results [14]. Even
when making use of a state-of-the-art machine learning model, enhanced with a state-
of-the-art concept drift detector, the best ACC score achieved was that of 54.18%. This
clearly highlights how difficult it is to make predictions on the stock market when using
such models, and how much more interesting work can be done in this domain.

33
References
[1] M. Harries and K. Horn, “Detecting concept drift in financial time series prediction
using symbolic machine learning,” in AI-CONFERENCE-, pp. 91–98, Citeseer, 1995.

[2] Z. Xiang-rong, H. Long-ying, and W. Zhi-sheng, “Multiple kernel support vector re-
gression for economic forecasting,” in 2010 International Conference on Management
Science Engineering 17th Annual Conference Proceedings, pp. 129–134, Nov. 2010.
ISSN: 2155-1855.

[3] B. M. Pavlyshenko, “Machine-Learning Models for Sales Time Series Forecasting,”


Data, vol. 4, Mar. 2019. Number: 1 Publisher: Multidisciplinary Digital Publishing
Institute.

[4] S. Jung, C. Kim, and Y. Chung, “A Prediction Method of Network Traffic Using
Time Series Models,” in Computational Science and Its Applications - ICCSA 2006
(M. Gavrilova, O. Gervasi, V. Kumar, C. J. K. Tan, D. Taniar, A. Laganá, Y. Mun,
and H. Choo, eds.), Lecture Notes in Computer Science, (Berlin, Heidelberg), pp. 234–
243, Springer, 2006.

[5] P. Lindstrom, “Handling Concept Drift in the Context of Expensive Labels,” Doctoral,
Sept. 2013.

[6] J. C. Schlimmer and R. H. Granger, “Beyond incremental processing: tracking concept


drift,” in Proceedings of the Fifth AAAI National Conference on Artificial Intelligence,
AAAI’86, (Philadelphia, Pennsylvania), pp. 502–507, AAAI Press, Aug. 1986.

[7] A. Tsymbal, “The Problem of Concept Drift: Definitions and Related Work,” Com-
puter Science Department, Trinity College Dublin, May 2004.

[8] M. Sewell, “History of the efficient market hypothesis,” Rn, vol. 11, no. 04, p. 04,
2011.

[9] J. Kinlay and D. Rico, “Can Machine Learning Techniques Be Used To Predict Market
Direction? - The 1,000,000 Model Test,”

[10] The Editors of Encyclopaedia Britannica, “S&P 500,” Encyclopedia Britannica, Apr.
2019.

34
[11] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on
concept drift adaptation,” ACM Computing Surveys, vol. 46, pp. 44:1–44:37, Mar.
2014.

[12] M. Obthong, N. Tantisantiwong, W. Jeamwatthanachai, and G. Wills, A Survey on


Machine Learning for Stock Price Prediction: Algorithms and Techniques. Feb. 2020.

[13] B. Qian and K. Rasheed, “Stock market prediction with multiple classifiers,” Applied
Intelligence, vol. 26, pp. 25–33, Feb. 2007.

[14] F. Feng, H. Chen, X. He, J. Ding, M. Sun, and T.-S. Chua, “Enhancing Stock Move-
ment Prediction with Adversarial Training,” in Proceedings of the Twenty-Eighth In-
ternational Joint Conference on Artificial Intelligence, (Macao, China), pp. 5843–
5849, International Joint Conferences on Artificial Intelligence Organization, Aug.
2019.

[15] R. J. Bauer and J. R. Dahlquist, “Market Timing and Roulette Wheels Revisited,”
Financial Analysts Journal, 2012.

[16] I. Frı́as-Blanco, J. d. Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-


Dı́az, and Y. Caballero-Mota, “Online and Non-Parametric Drift Detection Methods
Based on Hoeffding’s Bounds,” IEEE Transactions on Knowledge and Data Engi-
neering, vol. 27, pp. 810–823, Mar. 2015. Conference Name: IEEE Transactions on
Knowledge and Data Engineering.

[17] E. Keller and G. A. Gehlmann, “Introductory comment: a historical introduction to


the securities act of 1933 and the securities exchange act of 1934,” Ohio St. LJ, vol. 49,
p. 329, 1988.

[18] E. F. Fama, “Random Walks in Stock Market Prices,” Financial Analysts Journal,
vol. 21, no. 5, pp. 55–59, 1965. Publisher: CFA Institute.

[19] M. Godfrey, C. Granger, and O. Morgenstern, “The Random Walk Hypothesis of


Stock Market Behavior,” Kyklos, vol. 17, pp. 1–30, May 2007.

[20] A. W. Lo and A. C. MacKinlay, “Stock Market Prices Do Not Follow Random Walks:
Evidence from a Simple Specification Test,” The Review of Financial Studies, vol. 1,
pp. 41–66, Jan. 1988. Publisher: Oxford Academic.

35
[21] P. Vorburger, Catching the drift : when regimes change over time. Dissertation,
University of Zurich, Zürich, 2009. Publication Title: Vorburger, Peter. Catching
the drift : when regimes change over time. 2009, University of Zurich, Faculty of
Economics.

[22] F. Zhu, C. Nam, and D. M. Aguilar, “Market Regime Classification Using Correlation
Networks,” p. 25.

[23] C. McIndoe, “A Data Driven Approach to Market Regime Classification,” p. 41.

[24] D. Yang and Q. Zhang, “Drift-Independent Volatility Estimation Based on High, Low,
Open, and Close Prices,” The Journal of Business, vol. 73, no. 3, pp. 477–492, 2000.
Publisher: The University of Chicago Press.

[25] J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine
Learning, vol. 1, pp. 317–354, Sept. 1986.

[26] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under Concept
Drift: A Review,” IEEE Transactions on Knowledge and Data Engineering, 2018.
arXiv: 2004.05785.

[27] G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden
contexts,” Machine Learning, vol. 23, pp. 69–101, Apr. 1996.

[28] J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, “When


Training and Test Sets Are Different: Characterizing Learning Transfer,” in Dataset
Shift in Machine Learning, pp. 3–28, MIT Press, 2009. Conference Name: Dataset
Shift in Machine Learning.

[29] V. Losing, B. Hammer, and H. Wersing, “KNN Classifier with Self Adjusting Memory
for Heterogeneous Concept Drift,” in 2016 IEEE 16th International Conference on
Data Mining (ICDM), pp. 291–300, Dec. 2016. ISSN: 2374-8486.

[30] Y. Radhika and M. Shashi, “Atmospheric Temperature Prediction using Support Vec-
tor Machines,” International Journal of Computer Theory and Engineering, pp. 55–58,
2009.

[31] L. I. Kuncheva, “Using Control Charts for Detecting Concept Change in Streaming
Data,” Bangor University, 2009.

36
[32] I. Katakis, G. Tsoumakas, and I. Vlahavas, “Tracking recurring contexts using ensem-
ble classifiers: an application to email filtering,” Knowledge and Information Systems,
vol. 22, pp. 371–391, Mar. 2010.

[33] M. Black and R. J. Hickey, “Maintaining the performance of a learned classifier under
concept drift,” Intelligent Data Analysis, vol. 3, pp. 453–474, Dec. 1999.

[34] A. Liu, Y. Song, G. Zhang, and J. Lu, “Regional Concept Drift Detection and Density
Synchronized Drift Adaptation,” pp. 2280–2286, 2017.

[35] N. Lu, G. Zhang, and J. Lu, “Concept drift detection via competence models,” Arti-
ficial Intelligence, vol. 209, pp. 11–28, Apr. 2014.

[36] T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi, “An Information-Theoretic


Approach to Detecting Changes in MultiDimensional Data Streams,” Interfaces, Jan.
2006.

[37] A. Pesaranghader, H. Viktor, and E. Paquet, “Reservoir of diverse adaptive learn-


ers and stacking fast hoeffding drift detection methods for evolving data streams,”
Machine Learning, vol. 107, pp. 1711–1743, Nov. 2018.

[38] R. Barros and S. Santos, “A Large-scale Comparison of Concept Drift Detectors,”


Information Sciences, vol. 451-452, July 2018.

[39] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, Learning with Drift Detection,
vol. 8. Sept. 2004. Journal Abbreviation: Intelligent Data Analysis Pages: 295 Pub-
lication Title: Intelligent Data Analysis.

[40] Y. Sakamoto, K. Fukui, J. Gama, D. Nicklas, K. Moriyama, and M. Numao, “Concept


Drift Detection with Clustering via Statistical Change Detection Methods,” in 2015
Seventh International Conference on Knowledge and Systems Engineering (KSE),
pp. 37–42, Oct. 2015.

[41] A. Qahtan, B. Alharbi, s. Wang, and X. Zhang, A PCA-Based Change Detection


Framework for Multidimensional Data Streams. Aug. 2015.

[42] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Statistical change detection for multi-
dimensional data,” in Proceedings of the 13th ACM SIGKDD international conference
on Knowledge discovery and data mining - KDD ’07, (San Jose, California, USA),
p. 667, ACM Press, 2007.

37
[43] C. Alippi and M. Roveri, “Just-in-Time Adaptive Classifiers—Part I: Detecting Non-
stationary Changes,” IEEE Transactions on Neural Networks, vol. 19, pp. 1145–1153,
July 2008. Conference Name: IEEE Transactions on Neural Networks.

[44] C. Alippi, G. Boracchi, and M. Roveri, “Hierarchical Change-Detection Tests,” IEEE


Transactions on Neural Networks and Learning Systems, vol. 28, pp. 246–258, Feb.
2017. Conference Name: IEEE Transactions on Neural Networks and Learning Sys-
tems.

[45] S. Xu and J. Wang, “Dynamic extreme learning machine for data stream classifica-
tion,” Neurocomputing, vol. 238, pp. 433–449, May 2017.

[46] M. Baena-Garcı́a, J. Campo-Ávila, R. Fidalgo-Merino, A. Bifet, R. Gavald, and


R. Morales-Bueno, “Early Drift Detection Method,” Jan. 2006.

[47] J. Gama and G. Castillo, “Learning with Local Drift Detection,” in Advanced Data
Mining and Applications (X. Li, O. R. Zaı̈ane, and Z. Li, eds.), Lecture Notes in
Computer Science, (Berlin, Heidelberg), pp. 42–55, Springer, 2006.

[48] D. V. Hinkley, “Inference About the Change-Point in a Sequence of Random Vari-


ables,” Biometrika, vol. 57, no. 1, pp. 1–17, 1970. Publisher: Oxford University Press,
Biometrika Trust.

[49] W. Hoeffding, “Probability Inequalities for Sums of Bounded Random Variables,”


Journal of the American Statistical Association, vol. 58, no. 301, pp. 13–30, 1963.
Publisher: American Statistical Association, Taylor & Francis, Ltd.

[50] J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp. 179–
211, 1990. https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402 1.

[51] L. Medsker and L. C. Jain, Recurrent Neural Networks: Design and Applications.
CRC Press, Dec. 1999. Google-Books-ID: ME1SAkN0PyMC.

[52] S. Kumar Chandar, “Grey Wolf optimization-Elman neural network model for stock
price prediction,” Soft Computing, vol. 25, pp. 649–658, Jan. 2021.

[53] S. Hochreiter and J. Schmidhuber, “Long Short-term Memory,” Neural computation,


vol. 9, Dec. 1997.

38
[54] K. Chen, Y. Zhou, and F. Dai, “A LSTM-based method for stock returns prediction:
A case study of China stock market,” in 2015 IEEE International Conference on Big
Data (Big Data), pp. 2823–2824, Oct. 2015.

[55] Q. Wang and Y. Hao, “ALSTM: An attention-based long short-term memory frame-
work for knowledge base reasoning,” Neurocomputing, vol. 399, pp. 342–351, July
2020.

[56] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial


Examples,” arXiv:1412.6572 [cs, stat], Mar. 2015. arXiv: 1412.6572.

[57] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial Machine Learning at Scale,”


arXiv:1611.01236 [cs, stat], Feb. 2017. arXiv: 1611.01236.

[58] Y. Xu and S. B. Cohen, “Stock Movement Prediction from Tweets and Historical
Prices,” in Proceedings of the 56th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 1970–1979,
Association for Computational Linguistics, July 2018.

[59] L. Zhang, C. Aggarwal, and G.-J. Qi, “Stock Price Prediction via Discovering Multi-
Frequency Trading Patterns,” in Proceedings of the 23rd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, (Halifax NS Canada),
pp. 2141–2149, ACM, Aug. 2017.

[60] H. R. Stoll and R. E. Whaley, “Stock Market Structure and Volatility,” The Review
of Financial Studies, vol. 3, pp. 56–58, Jan. 1990. Publisher: Oxford Academic.

[61] K. Pearson, “On the Criterion that a Given System of Deviations from the Probable
in the Case of a Correlated System of Variables is Such that it Can be Reasonably
Supposed to have Arisen from Random Sampling,” in Breakthroughs in Statistics:
Methodology and Distribution (S. Kotz and N. L. Johnson, eds.), Springer Series in
Statistics, pp. 11–28, New York, NY: Springer, 1992.

[62] “Kolmogorov–Smirnov Test,” in The Concise Encyclopedia of Statistics, pp. 283–287,


New York, NY: Springer, 2008.

[63] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Math-
ematical Statistics, vol. 22, pp. 79–86, Mar. 1951. Publisher: Institute of Mathematical
Statistics.

39
[64] B. Bigi, “Using Kullback-Leibler Distance for Text Categorization,” in Advances in In-
formation Retrieval (F. Sebastiani, ed.), Lecture Notes in Computer Science, (Berlin,
Heidelberg), pp. 305–319, Springer, 2003.

[65] X. Zhang, G. Zou, and R. J. Carroll, “Model Averaging Based On Kullback-Leibler


Distance,” Statistica Sinica, vol. 25, pp. 1583–1598, 2015.

[66] I. O. for Standardization, ISO 5725-1 Accuracy (trueness and precision) of measure-
ment methods and results - Part 1: General principles and definitions. International
Organization for Standardization, Dec. 1994.

[67] B. W. Matthews, “Comparison of the predicted and observed secondary structure


of T4 phage lysozyme,” Biochimica et Biophysica Acta (BBA) - Protein Structure,
vol. 405, pp. 442–451, Oct. 1975.

[68] G. Gidofalvi and G. Gidófalvi, Using News Articles to Predict Stock Price Movements.
2001.

[69] Y. Kadwe and V. Suryawanshi, “A Review on Concept Drift,” IOSR Journal of Com-
puter Engineering, vol. 17, pp. 20–26, Jan. 2015.

40
Appendix A: Results Generated

Table 13: Method 2 results

splitDays ACC MCC


3 50.67 ±0.43421 0.0054 ±0.26232
4 48.49 ±0.43764 -0.0018 ±0.23445
5 52.18 ±0.38482 0.0276 ±0.33250
6 51.19 ±0.38985 0.0097 ±0.29094
7 49.36 ±0.34505 0.0106 ±0.34335
8 51.76 ±0.34632 0.0289 ±0.33609
9 50.79 ±0.32627 0.0186 ±0.36261
10 50.20 ±0.32547 0.0269 ±0.3483
11 48.90 ±0.30522 0.0250 ±0.37127
12 49.69 ±0.30256 0.0458 ±0.36601
13 51.42 ±0.28662 0.0557 ±0.37774
14 51.96 ±0.28764 0.0487 ±0.35923
15 50.65 ±0.27399 0.0496 ±0.35796
16 49.57 ±0.27403 0.0207 ±0.34731
17 49.32 ±0.25288 0.0284 ±0.34930
18 50.38 ±0.26594 0.0440 ±0.35871
19 49.54 ±0.24776 0.0554 ±0.34758
20 49.71 ±0.25176 0.0474 ±0.34180
21 49.22 ±0.24695 0.0399 ±0.34000
22 49.89 ±0.24364 0.0472 ±0.32335
23 51.92 ±0.22472 0.0769 ±0.34639
24 49.74 ±0.23030 0.0390 ±0.34461
25 51.59 ±0.23141 0.0578 ±0.37147
26 49.99 ±0.23558 0.0422 ±0.34290
27 48.97 ±0.23172 0.0437 ±0.31380
28 50.63 ±0.24025 0.0560 ±0.36346
29 51.20 ±0.23855 0.0551 ±0.31765
30 51.45 ±0.23848 0.0661 ±0.30661
Figure 4: Method 2 - ACC results

Figure 5: Method 2 - MCC results

42
Table 14: Method 3 results

splitDays ACC MCC


3 53.93 ±0.43625 0.0759 ±0.25465
4 51.65 ±0.43593 0.0504 ±0.22863
5 54.02 ±0.39041 0.0783 ±0.32280
6 50.04 ±0.39006 0.0531 ±0.29493
7 52.21 ±0.34783 0.0752 ±0.33585
8 53.00 ±0.35259 0.0856 ±0.35465
9 53.32 ±0.32218 0.1093 ±0.35916
10 52.49 ±0.32944 0.0937 ±0.36482
11 51.92 ±0.30692 0.0964 ±0.37266
12 51.90 ±0.30564 0.0873 ±0.38683
13 51.95 ±0.28728 0.0976 ±0.37550
14 52.55 ±0.27763 0.1016 ±0.36024
15 53.31 ±0.27392 0.0969 ±0.38118
16 52.16 ±0.28846 0.0680 ±0.37739
17 51.49 ±0.25934 0.0889 ±0.37988
18 52.99 ±0.25973 0.1025 ±0.37188
19 51.70 ±0.24841 0.0948 ±0.35902
20 52.30 ±0.25503 0.0778 ±0.35669
21 52.53 ±0.24521 0.0715 ±0.35517
22 51.56 ±0.24745 0.0768 ±0.36173
23 52.03 ±0.22386 0.0967 ±0.35602
24 52.09 ±0.22680 0.0982 ±0.35103
25 51.83 ±0.21934 0.1059 ±0.34742
26 52.48 ±0.23163 0.1141 ±0.35125
27 52.27 ±0.23963 0.1047 ±0.34501
28 52.53 ±0.23584 0.1000 ±0.34923
29 53.40 ±0.24259 0.0960 ±0.35134
30 52.80 ±0.14514 0.0758 ±0.19357
43
Figure 6: Method 3 - ACC results

Figure 7: Method 3 - MCC results

44
Table 15: Method 4 results
splitDays ACC MCC
4 48.87 ±0.43528 0.0129 ±0.23240
5 54.18 ±0.11209 0.0747 ±0.09596
6 50.16 ±0.10471 0.0484 ±0.11371
7 51.10 ±0.11936 0.0701 ±0.12425
8 51.90 ±0.13321 0.0837 ±0.14677
9 52.72 ±0.11945 0.1224 ±0.16686
10 52.04 ±0.15327 0.0900 ±0.17826
11 51.82 ±0.13738 0.0950 ±0.15446
12 51.66 ±0.12921 0.0759 ±0.19130
13 51.95 ±0.13274 0.1001 ±0.18381
14 52.48 ±0.12512 0.1060 ±0.18142
15 50.36 ±0.27622 0.0430 ±0.37787
16 51.82 ±0.13452 0.0696 ±0.19358
17 51.47 ±0.13306 0.1005 ±0.18613
18 52.46 ±0.11953 0.1095 ±0.20192
19 51.12 ±0.12462 0.1003 ±0.21149
20 51.96 ±0.13277 0.0864 ±0.20957
21 52.29 ±0.13113 0.0768 ±0.22172
22 51.47 ±0.14147 0.0846 ±0.23246
23 51.19 ±0.14414 0.1005 ±0.22901
24 51.65 ±0.14512 0.1024 ±0.22646
25 51.61 ±0.14093 0.1081 ±0.22287
26 52.04 ±0.14360 0.1128 ±0.23226
27 52.03 ±0.14299 0.1079 ±0.22613
28 51.87 ±0.14683 0.0993 ±0.20915
29 52.77 ±0.14787 0.0969 ±0.22001
30 52.12 ±0.14753 0.0767 ±0.21490
31 52.62 ±0.14361 0.0970 ±0.20989
32 52.10 ±0.14308 0.0691 ±0.20438
33 51.65 ±0.14541 0.0753 ±0.19004
34 51.02 ±0.14363 0.0897 ±0.20730
35 52.22 ±0.13461 0.0890 ±0.20964
36 52.32 ±0.12594 0.0771 ±0.22118
37 51.68 ±0.13563 0.0690 ±0.22299
38 51.75 ±0.13135 0.0771 ±0.22690
39 51.96 ±0.12667 0.0644 ±0.21806
40 52.56 ±0.12824 0.0670 ±0.22549
45
Figure 8: Method 4 - ACC results

Figure 9: Method 4 - MCC results

46

You might also like