Professional Documents
Culture Documents
Fyp2021 3
Fyp2021 3
Fyp2021 3
Faculty of ICT
University of Malta
April 2021
Submitted in partial fulfillment of the requirements for the degree of B.Sc. ICT in
Artificial Intelligence (Hons.)
Faculty of Information and Communication Technology
FACULTY/INSTITUTE/CENTRE/SCHOOL______________________
0490200L
Student’s I.D. /Code _____________________________
Charlton Sammut
Student’s Name & Surname _________________________________________________
________________________________________________________________________
9500
Word Count ___________
I hereby declare that I am the legitimate author of this Long Essay/Dissertation and that it is my
original work.
No portion of this work has been submitted in support of an application for another degree or
qualification of this or any other university or institution of higher education.
I hold the University of Malta harmless against any third party claims with regard to copyright
violation, breach of confidentiality, defamation and any other third party right infringement.
I declare that I have abided by the University’s Research Ethics Review Procedures.
28/06/2021
_____________________
Date
08.02.2018
Abstract:
Due to recent advances made in the field of machine learning, various research has been
done on the issue of applying machine learning models to the stock market. As stated
by the efficient market hypothesis, the market is constantly fluctuating and due to it’s
dynamic nature, certain underlying concepts start to change over time. This phenomena
is known as concept drift. When concept drift occurs the performance of machine learning
models tends to suffer, sometimes drastically. This decline in performance occurs because
the data distributions that were used to train the model are no longer in-line with the
current data distribution.
This dissertation contributes four retraining processes to help mitigate the problem of con-
cept drift. In this dissertation a state-of-the-art Adv-ALSTM model is used together with
a HDDMA concept drift detector. Every time the HDDMA concept drift detector detects
a concept drift, the model undergoes one of the four possible retraining methods.
In the evaluation, the results of the vanilla model are compared to the results of the
models that are fitted with a concept drift detector. The conducted experiments highlight
the effectiveness of each of the proposed retraining methods, as well as how each of the
methods mitigates the negative effects of concept drift in different ways. The best observed
results were a 2.5% increase in accuracy and a 135.38% increase in MCC when compared
to the vanilla model. These results validate the effectiveness of the proposed retraining
methods, and highlight how important it is for a machine learning model to address concept
drift.
Acknowledgements:
I would like to thank and express my gratitude to my supervisors Dr Charlie Abela and
Dr Vincent Vella, who guided me throughout my work and shared their knowledge and
expertise in their respective fields. This dissertation would not have been possible with-
out them. I would also like to thank my family and Petra for their continuous support
throughout my academic journey thus far. Lastly I would like to thank my close friends
along with the JEF Executive Board for making my University experience a memorable
one.
Contents
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Methodology 15
3.1 Replicating the machine learning model . . . . . . . . . . . . . . . . . . . . 15
3.2 Selected dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Selecting the concept drift detector . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Attaching the concept drift detector to the model . . . . . . . . . . . . . . 17
3.4.1 Method 1: Handling recurring distributions . . . . . . . . . . . . . 17
3.4.2 Method 2: Periodic training every n days . . . . . . . . . . . . . . . 18
3.4.3 Method 3: Forgetting irrelevant data . . . . . . . . . . . . . . . . . 19
3.4.4 Method 4: Training on similar distributions . . . . . . . . . . . . . 20
3.4.5 Evaluation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
i
4.3 O3: Attaching the concept drift detector to the model . . . . . . . . . . . . 28
4.3.1 Experiment 5 - Evaluation and results of Method 1 . . . . . . . . . 29
4.3.2 Experiment 6 - Evaluation and results of Method 2 . . . . . . . . . 29
4.3.3 Experiment 7 - Evaluation and results of Method 3 . . . . . . . . . 30
4.3.4 Experiment 8 - Evaluation and results of Method 4 . . . . . . . . . 30
4.3.5 Summary of the generated results . . . . . . . . . . . . . . . . . . . 31
5 Conclusions 32
5.1 Revisiting aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A: Results Generated 41
ii
List of Figures
1 Types of concept drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Flowcharts of the four proposed retraining methods . . . . . . . . . . . . . 22
3 HDDMA tested on the Google stock . . . . . . . . . . . . . . . . . . . . . . 28
4 Method 2 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Method 2 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Method 3 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Method 3 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Method 4 - ACC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9 Method 4 - MCC results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iii
List of Tables
1 Ten different events that caused a concept drift in the stock market . . . . 17
2 KDD17 Replicated Results (RR) compared to Original Results (OR) . . . 25
3 StockNet Replicated Results (RR) compared to Original Results (OR) . . . 25
4 The different detectors tested on both datasets . . . . . . . . . . . . . . . . 26
5 The different detectors tested for type II errors . . . . . . . . . . . . . . . . 26
6 HDDMA tested on the StockNet dataset . . . . . . . . . . . . . . . . . . . 27
7 HDDMA tested on the KDD17 dataset . . . . . . . . . . . . . . . . . . . . 27
8 Method 1 results compared to vanilla model’s results . . . . . . . . . . . . 29
9 Method 1 & 2 results compared to vanilla model’s result . . . . . . . . . . 30
10 Method 3 results compared to vanilla model’s result . . . . . . . . . . . . . 30
11 Method 4 results compared to vanilla model’s result . . . . . . . . . . . . . 31
12 Best results generated by each method . . . . . . . . . . . . . . . . . . . . 31
13 Method 2 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
14 Method 3 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
15 Method 4 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iv
1 Introduction
Time series prediction is the process of forecasting future values from a temporally ordered
sequence of data [1]. Time series prediction is used in multiple real-world domains, includ-
ing economic forecasting [2], sales forecasting [3] and network load forecasting [4]. This
dissertation aims to investigate time series prediction in the finance domain, specifically in
the stock market domain. This dissertation also tackles a particularly challenging problem
related to time series prediction, that of concept drift [1, 5, 6, 7].
1
that the problem of concept drift presents is that there are multiple types of concept
drifts, with each one presenting its own unique set of challenges (refer to Section 2.2).
Furthermore, another ensuing difficulty is that one needs to be able to differentiate between
concept drift and noise, as it is very common for these two to be confused [6, 7, 11].
1.2 Motivation
A large number of machine learning techniques are used as analytical tools by traders in
order to make predictions on different financial instruments [12]. If a model has an accuracy
of around 56%, then that model can be considered to be satisfactory [13]. However an
accuracy of 56% isn’t much better than a random guess (50% chance of being correct),
and even then, an accuracy of 56% is very difficult to achieve let alone maintain [13, 14].
Furthermore, past research has shown that an investor needs to have at least an accuracy
of 66% to make more profit than if they simply flip a coin [15]. Due to these issues,
investors do not rely solely on machine learning models. With this in mind, one of the
main motivators behind this dissertation was to see how close we can get to the 66%
accuracy threshold by using state-of-the-art technologies.
Another motivator behind this dissertation was to investigate how well state-of-the-art
technologies handle the problem of concept drift, as it is a problem that negatively effects
many machine learning models in different fields of interest.
• Objective 1 (O1): Replicate a state-of-the-art machine learning model that has been
previously applied on a dataset containing stock market data.
• Objective 3 (O3): Use the selected concept drift detector together with the selected
machine learning model and observe if together they give better results when com-
pared to the vanilla machine learning model.
2
In O1 a number of machine learning models that have previously given good results in
the field of stock movement prediction will be examined. We intend to replicate a machine
learning model that is both feasible to replicate and gives the best results on a variety
of stocks. It is going to be ensured that the model is replicated with a high degree of
precision. This will be done by comparing the results generated by the replicated model
with the results of the original model. In O2 we intend to test and evaluate various state-
of-the-art concept drift detectors based on a set of criteria, such as the number of concept
drifts detected on time and the number of concept drifts which were detected late. Then
in O3, the most suitable concept drift detector will be integrated with the model chosen
from O1. Each time the concept drift detector detects a concept drift, the model will
undergo a retraining process. During the evaluation, the results of the vanilla model will
be compared to the results of the model that was fitted with a concept drift detector. If
the model with the concept drift detector gives better results than the vanilla model, it
would indeed suggest that machine learning models that have been trained on a financial
dataset would benefit from having a concept drift detector attached to them.
3
1.5 Document Structure
The remainder of this dissertation is organised as follows. Section 2 discusses the stock
market while also explaining different machine learning approaches that have been previ-
ously used for stock price prediction. It also defines concept drift and discusses a variety of
past literature related with handling concept drift. Section 3 details how the experiments
used to address O1, O2 and O3 are going to be conducted. Section 4 shows and discusses
the results generated from these experiments. Lastly, Section 5 shows what future work
can be conducted, followed by a conclusion.
4
2 Background and Literature Review
2.1 The Stock Market
The stock market, also referred to as the equity market [12], is essentially an organized
and regulated financial market where tradable financial assets, referred to as securities,
can be purchased. Some examples of securities include stocks, bonds and shares [17]. The
market has been previously described as a random walk model [18, 19]. If this is the case
then it would practically be impossible to make any sort of real accurate predictions based
on previous prices. It would also mean that the most accurate possible prediction for any
future price is the current price. There is however, a considerable amount of evidence to
suggest that this is not the case [20].
It is known that the stock market experiences regime shifts, also referred to as regime
drifts or regime changes [21, 22]. A market regime shows the current state of a given
stock/group of stocks. A regime shift occurs when a stock stops following its previously
observed pattern. For instance, it suddenly increases or decreases in price. A multitude
of reasons can cause market regime shift such as inflation, economic growth and a change
in the political environment [12, 21]. In recent years, a lot of research and effort has
gone into classifying, detecting and even forecasting regime shifts [22]. In order for one to
make good predictions on stock price movement, regime shifts need to be accounted for.
This dissertation addresses the issue of regime shift from a data driven perspective, by
addressing the problem of concept drift [21, 22, 23]. The problem of making predictions
on stock information has been extensively studied by market analysts and investors, as
well as researchers from various fields of disciplines. This is both due to the difficulty of
the problem and also the potential financial gain that comes with solving the problem. A
stock has a number of key features, these being [12, 14, 24]:
• Open Price: The price of a stock at the start of a given trading day.
• Close Price: The price of a stock at the end of a given trading day.
• Adjusted Close Price: The price of a stock at the end of a given trading day, with
the adjusted close price accounting for any corporate expenses. It is often considered
the true price of the stock.
• High: The highest price that a stock was selling for on a given trading day.
• Low: The lowest price that a stock was selling for on a given trading day.
• Volume: The number of shares of a stock traded during a given trading day.
5
One can make predictions on one or more of these features, however models that make
predictions on either the close price feature or adjusted close price feature tend to be the
most popular [12]. Predicting the exact price of a stock is believed to be unpredictable and
non-feasible [14], however predicting the movement of a stock, that is, whether the price
of a stock will increase or decrease is a much more achievable and feasible goal. Hence,
this dissertation will mainly focus on those models that make predictions on stock price
movement.
• Gradual: This kind of concept drift occurs when a new data distribution gradually
replaces the current data distribution. This normally happens over a prolonged
period of time. At the start of this concept drift, the new data distribution can be
easily confused for noise, as the two distributions will be fluctuating, until eventually
the new distribution overrides the current one (example: a change in season).
• Incremental: This kind of concept drift occurs when one data distribution incre-
mentally changes to another over a period of time (example: global warming).
6
• Recurring: This kind of concept drift follows a cyclic pattern, where the data
distribution shifts between a selected set of data distributions (example: the four
seasons of the year).
A number of different techniques and frameworks have been developed to try and tackle
concept drift. Most of these developed frameworks can be clustered into one of the following
categories [5, 7]:
7
sudden concept drift. An example of such an algorithm is the window resize algorithm
for batch data [31].
All of the techniques mentioned in Section 2.3.1 rely on concept drift detectors to identify
when a concept drift has occurred. A concept drift detector quantifies concept drift by
identifying points in the data stream where a change in data distribution occurs [26]. In
general, most concept drift detectors tend to follow these four stages [26]:
1. Data Retrieval Stage: In this stage, chunks of data are collected from the data
stream. The size of each collected chunk depends on the concept drift detector.
2. Data Modeling Stage: In this stage the collected data is modeled/converted into
a structure that is more easily consumed by the concept drift detector. Not only
does this make it easier to infer certain key features from the data, but also makes
reading the data faster [34]. During this stage any redundant data is discarded.
3. Test Statistics Calculation Stage: This stage measures the amount (if any) of
dissimilarity that is present in the data. How this dissimilarity is measured depends
on the concept drift detector, as each concept drift detector makes use of different
statistical tests, with each test considering different aspects of the data. This stage
tends to be the most challenging, as it is still up to debate on how to define an
accurate and robust dissimilarity measurement [26].
8
4. Hypothesis Test Stage: This stage makes use of the previously obtained dissim-
ilarity score and determines the drift confidence interval. The drift confidence score
determines if the dissimilarity that is present in the data is caused by concept drift,
noise or random sample selected bias [26, 35]. This is accomplished by making use of
a hypothesis test that evaluates the statistical significance of the value calculated in
the third stage. Some hypothesis tests that have been used in past literature include;
the permutation test [35], bootstrapping [36] and the Hoeffding’s inequality-based
bound identification [16].
Modern concept drift detectors have added a variety of features to the above framework
in order to achieve more accurate predictions. With the vast amount of concept drift
detectors that are available, it would infeasible to list all of them, however, they can be
clustered into the following three methods [26]:
Error rate-based drift detection methods: These methods, also referred to as
sequential analysis drift detection methods [37], are a group of concept drift detectors that
keep track of the predictions made by a base model. These predictions are compared to
the real observed values, and using these values, the model’s error rate is calculated. This
process is repeated with each new data point. If the error rate is proven to have changed
by a statistically significant amount, the concept drift detector raises a flag to indicate it
has detected a concept drift [26, 37]. This group of concept drift detectors is the most
popular group as they tend to give the best results [26, 37, 38]. Some popular examples
of concept drift detectors that belong to this group include the Drift Detection Method
(DDM) [39] and the Page-Hinkley Test (PHT) [40] concept drift detectors.
Data Distribution-based Drift Detection: Concept drift detectors that fall under
this group of detectors make use of a distance metric to quantify the level of dissimilarity
between the previously observed distributions and the current distribution. If this calcu-
lated dissimilarity value is proven to be statistically significant, the detector raises a flag to
indicate that concept drift has occurred. These detectors come with a unique advantage,
as they generate certain key information about the drift, such as accurately finding the
time when the drift started to occur. However these algorithms tend to be computation-
ally expensive [26]. Popular examples of concept drift detectors that belong to this group
include: Principal Component Analysis based Change Detection framework (PCA-CD)[41]
and Statistical Change Detection for multidimensional data (SCD) [42].
Multiple Hypothesis Test Drift Detection: This group of concept drift detectors
apply multiple hypothesis/statistical tests. For the detector to identify a concept drift, all
9
of these hypothesis tests need to evaluate to true. Some detectors run tests in parallel,
while others run them systematically, meaning that for test t n to start, the test t n-1 needs
to evaluate to true. This group tends to not give good results when compared to the other
two groups [26, 38]. Some examples of concept drift detectors that belong to this group
include: Just-In-Time adaptive classifiers (JIT) [43] and Hierarchical Change-Detection
Tests (HCDTs) [44].
This dissertation will focus on the most popular and accurate concept drift detectors
and will therefore provide further details about detectors from the first group.
One of the most cited and used concept drift detector is the DDM detector [26, 38, 39].
It was proposed by Gama et al. in 2004 [39]. DDM was one of the first concept drift
detection algorithms to implement an error rate-based drift detection approach [26, 39],
and while perhaps it’s performance is not as good as more modern concept drift detectors
[38], it is definitely a pioneer in the field of detecting concept drift. Not only did it inspire
a multitude of other drift detection methods such as; Dynamic Extreme Learning Machine
(DELM) [45], Early Drift Detection Method (EDDM) [46], Hoeffding’s inequality based
Drift Detection Method (HDDM) [16] and Learning with Local Drift Detection (LLDD)
[47], it is also the first concept drift detector that is able to raise both warning and drift
flags [26]. Prior to DDM, all concept drift detectors raised a flag to indicate that a concept
drift has occurred, however with the addition of warning flags, concept drift detectors could
now raise a warning flag to indicate that a concept drift will probably soon occur, and a
drift flag to indicate that a concept drift has just occurred. This allowed for models to not
only take a reactive approach but also take a proactive approach.
DDM works by firstly checking the online error-rate. If this error rate changes to a
statistically significant amount that is either greater or equal to the warning level threshold,
it raises a warning flag and starts to use the oncoming data points to train a new model. If
the error rate continues to change even more, by an amount that is either greater or equal
to the drift level threshold, then a drift flag is raised and the old model is replaced by the
newer model. It is this new model that will now be used to make future predictions, since
the data it was trained on is more in-line with the current distribution.
Around two years after the release of the DDM detector, the EDDM concept drift
detector was proposed [46]. This detector quickly began to pick up in popularity [38].
When compared to DDM it was able to detect gradual concept drift with better accuracy,
10
while still having a good accuracy score on sudden concept drift [38, 46]. EDDM was
proposed by Garcı́a et al. and they argued that when no concept drift is present, the
distance between any two consecutive misclassifications should be larger than the distance
between two consecutive misclassifications when concept drift is present. EDDM uses this
hypothesis to detect concept drift. If the distance between these misclassifications starts
to decrease by a statistically significant amount, then warning and drift flags are raised.
The significance in the decrease in distance depends on certain threshold values that can
be changed through the concept drift detector’s parameters [46].
Another popular concept drift detector that has been extensively used in past literature
is the PHT concept drift detector [38, 40]. This concept drift detector works by measuring
and keeping track of the distance between the current data distribution and a normal
distribution. This distance is calculated using the PHT [48]. If this distance changes by a
statistically significant amount, then the concept drift detector raises a drift flag [40].
In 2018, a survey was conducted by Barros and Santos [38], whereby 14 popular state-
of-the-art concept drift detectors were tested on a large number of datasets. From the
survey it emerged that the HDDMA [16] detector gave the best results [38].
For one to understand how HDDMA works, the concept drift detector which it is based
on, this being HDDM needs to first be explained [16]. HDDM works by making use of
three different states. The STABLE state, which implies that there is currently no concept
drift. The WARNING state, which implies that the probability of a concept drift occurring
soon is high. The DRIFT state, which implies that the probability that a concept drift has
just occurred is 99% or more. When the WARNING state is entered, the detector raises
a warning flag, and similarly, when the DRIFT state is entered, the detector raises a drift
flag. The detector always starts at the STABLE state and for the detector to transition to
the WARNING state, the following condition needs to hold [16]:
P (µ − 2σ ≤ c ≤ µ + 2σ) ≈ αW (1)
where:
• σ= Standard Deviation
11
Equation 1 determines if the current observed value c is statistically far from the
expected value µ, and if it is, then the detector transitions to the WARNING state. Once
Equation 1 no longer evaluates to true, the detector transitions from the WARNING state
back to the STABLE state. For the detector to transition from the STABLE or WARNING
state to the DRIFT state the following equation needs to evaluate to true [16]:
P (µ − 3σ ≤ c ≤ µ + 3σ) ≈ αD (2)
where:
Equation 2 works very similar to Equation 1, in the sense that it checks if c is statisti-
cally far from the expected value µ. If Equation 2 holds, then the detector will transition
to the DRIFT state, regardless of which state it is in.
Due to the nature of how HDDM detects concept drift, it tends to not detect sudden
concept drift very quickly [16, 38]. This is not ideal as models need to react as quickly as
possible to sudden concept drift. To combat this, HDDMA further enhances on HDDM by
applying the bounding moving Averages Test (A-Test) [16]. This is done by keeping track
of the average value of the current distribution, and if this average value were to suddenly
change to an amount that is statistically significant, the probability for the detector to
change to the WARNING or DRIFT state is increased [16, 38].
Another variant of HDDM has also been developed, that being Hoeffding’s inequality
based Drift Detection Method with Weighted moving averages (HDDMW ). This detector
works very similarly to HDDMA , but rather than applying the A-Test to HDDM it applies
the Weighted moving averages Test (W-Test). The W-Test also keeps track of the average
value of the current distribution, but it gives more weight (importance) to recent data [16],
hence the older the data, the less weight it is given. While HDDMW has been used in past
literature, HDDMA tends to give better results in a variety of real-world situations [38].
12
2.4 Machine Learning for Stock Price Prediction
A survey conducted in 2020 investigated and compared a large number of machine learning
techniques that are used by traders as analytical tools in order to make predictions on
different financial instruments [12]. Some of these techniques include k-means clustering,
neural networks and genetic algorithms. In the survey, these various machine learning
models were applied to a number of datasets containing data about different financial
instruments. One common element that was observed from the results generated is that
neural networks gave good results throughout [12].
In O1, a suitable machine learning model needs to be selected and replicated. To com-
plete this objective, a number of state-of-the-art neural networks are going to be analysed
on a number of criteria. The first criterion is to ensure that, when compared to other mod-
els, the chosen machine learning model can generate good results. Secondly, the model
should resemble as closely as possible the previously defined ideal effective learner in Sec-
tion 2.2. Another important criterion is to ensure that the chosen model can be replicated
with a high degree of precision. This is to ensure that the results generated by the chosen
model fitted with a concept drift detector can be fairly compared with past literature.
Elman Neural Network (ENN): ENNs are a special type of Recurrent Neural Net-
works (RNNs) [50, 51]. ENNs use the output generated by their hidden layer as feedback.
This is done with the addition of a context/recurrent layer. This allows for ENNs to learn
the current data distribution better than other neural network models [50]. Despite being
originally proposed in 1990, it is still used in a variety of domains, as even when compared
to more modern models, it still gives good results [52]. In 2021, grey wolf optimization was
applied on an ENN by S. Kumar [52]. This optimised ENN model was then applied on a
financial dataset. The ENN managed to outperform all of the other previously proposed
models, and achieved results that were not seen in past literature. With this being said,
the dataset only contained eight different stocks, and the ENN model had a relatively high
standard deviation, meaning it could not consistently make predictions on all of the eight
stocks.
Long Short-Term Memory (LSTM): A LSTM neural network [53] is a type of RNN
that is capable of learning long-term dependencies. This is accomplished by replacing the
traditionally used artificial neurons in the hidden layer with memory cells. These memory
cells are capable of remembering data which is deemed important for a long period of
time. Through this, the model is capable of making a connection with past data to the
present data, hence partially solving the problem of long-term dependencies [53]. LSTM
13
neural networks are one of the most successful RNNs architectures designed, and they have
previously given good results in the problem of stock price movement prediction [14, 54].
Due to their great success, efforts have been made to try and improve upon LSTMs.
Attentive Long Short-Term Memory (ALSTM): As seen in [14, 55], an ALSTM
neural network further builds upon the previously mentioned LSTM neural network. This
is mainly accomplished by the addition of a temporal attention layer. This layer compresses
data at different time-steps into an overall representation with adaptive weights. The aim
of this layer is to use multiple compressed representations of data at the same time, giving
more importance to those with a higher weighting. As can be seen in [14], an ALSTM
model outperforms it’s LSTM counterpart.
Adversarial Attentive Long Short-Term Memory (Adv-ALSTM): Through
the use of adversarial training, the Adv-ALSTM neural network further enhances the
previously discussed ALSTM neural network [56, 57]. Adversarial training is the process of
adding malicious data to the training data. This malicious data, referred to as adversarial
examples, are generated by systematically copying and then transforming parts of the
training data. The transformation that occurs is classified as adversarial perturbation [14].
Through this method not only does the model have more data to work with but it also
becomes more robust to noise. As seen in [14], Adv-ALSTM severely outperforms both
it’s LSTM and ALSTM counterparts.
Since the Adv-ALSTM model gave the best results when compared to the LSTM and
ALSTM models [14], as well as taking into account that the ENN model was only applied
on eight different stocks [50], whereas the Adv-ALSTM model was tested on over 130
different stocks [14], the Adv-ALSTM model was deemed to be the most suitable model
to be replicated.
14
3 Methodology
In this section, the methods used to achieve the three objectives mentioned in Section 1.3
are going to be discussed.
15
3.3 Selecting the concept drift detector
The aim of O2 is to find a suitable concept drift detector. To do this, two experiments
are going to be carried out. In the first experiment a number of concept drift detectors
are going to be tested and evaluated on the StockNet and KDD17 datasets. In the second
experiment, the same concept drift detectors are going to be tested on stock data from
periods of time that concept drift occurred beyond any unreasonable doubt, that is those
periods in time where a stock experienced events such as a crash, rally or surge. The ideal
concept drift detector should be able to distinguish concept drift from noise and detect all
four types of concept drift. A total of four concept drift detectors are going to be tested,
these being; EDDM [46], HDDMA [16], HDDMW [16] and PH [40]. These four specific
concept drift detectors have been applied in numerous real-world applications/domains
[38]. An explanation for each of these concept drift detectors can be found in Section
2.3.2.
In the first experiment, the concept drift detectors are going to be evaluated on the
number of concept drifts they detect in the StockNet and KDD17 datasets. This experi-
ment does not account for type I and type II errors. Type I errors, also referred to as false
positives, occur when the concept drift detector detects a concept drift, even though no
concept drift occurred. These types of errors are the most damaging when computational
resources are limited, as they force a retraining process when it is not needed, hence effec-
tively wasting computational resources. Type II errors, also referred to as false negatives,
are much more damaging to the performance of the model. These types of errors take
place when a concept drift occurs, and the concept drift detector does not raise a drift
flag. This leads to the model not adapting to this new distribution, hence decreasing the
model’s performance.
Since the cause for a concept drift can be hidden or unknown, it is not easy to check for
type I errors. However it is possible to check for type II errors by testing a concept drift
detector during a period in time where concept drift occurred beyond any unreasonable
doubt. To check for type II errors, in the second experiment, the concept drift detectors
are going to be tested on ten different events that led to the stock market experiencing a
concept drift. These ten events can be found in Table 1. They were chosen due to the fact
that together they cover a broad range of well known events.
16
Table 1: Ten different events that caused a concept drift in the stock market
Time Period Stock(s) effected Type Cause
March 1999 Sony Group Corp. Rally Announcement of the PlayStation 2
June 2007 Apple Inc. Surge Release of the iPhone
September 2008 Dow Jones Industrial Average Crash Congress rejecting the bailout bill
May 2010 Dow Jones Industrial Average Crash Flash Crash of May 2010
August 2011 S&P 500 Fall Fear of contagion of the European sovereign debt crisis
August 2015 Dow Jones Industrial Average Sell-off Multitude of reasons, such as Greece defaulting on its debt
February 2020 S&P 500 Fall The COVID-19 pandemic
Elon Musk, CEO of Tesla, making a statement on
May 2020 Tesla Inc. Sudden Drop
Twitter stating that the Tesla stock price is too high3
Announcement of the Ray Tracing Texel eXtreme
September 2020 NVIDIA Corp. Surge
(RTX) 30 Series graphics processing unit
The WallStreetBets subreddit surging the price of the stock
January 2021 GameStop Corp. Short Squeeze
due to a multitude of hedge funds making short sells on the stock4
As was explained in Section 2.2, it is not uncommon for certain distributions to reappear. If
a reoccurring distribution was not present in the training set, then the model’s performance
would drop every time the distribution is encountered. Method 1 aims to remedy this. In
Method 1, if a concept drift occurs, the previous data distribution is trained on by the
given model. This is done so that, if the previous data distribution reappears in the future,
the model would be able to make better predictions on it. Figure 2 illustrates this method
and Algorithm 1 formalises it.
Originally, it was intended that the model would always train on the previously observed
data distribution, however this would not account for those instances of distributions that
contained a small number of data points. Such instances are not sufficient enough for the
model to fully learn the distribution. In order to account for this, a threshold system was
setup (Algorithm 1 - line 12). This threshold system ensures that the model will resume
3
https://www.twitter.com/elonmusk/status/1256239815256797184
4
https://www.twitter.com/elonmusk/status/1256239815256797184
17
training on at least three months of data, meaning a total of 63 data points, as the stock
market is open on an average of 21 days per month [60]. This threshold system works
by checking if the distribution is larger than 63 data points, and if it is not, the model is
instead trained on the last 63 data points. The reasoning behind this threshold system is
that recent data tends to be similar to the current distribution [5, 7], and hence by using
this recent data the model will be able to better learn the distribution.
If the model has never been trained on a previous instance of the current distribution,
then it’s performance is bound to decrease. Method 2 aims to remedy this by training the
model on data that is in-line, hence similar with the current data distribution. Every time
18
a concept drift occurs, the model is periodically trained every n number of days. From this
we get the splitDays parameter. This parameter controls how many days the model waits
before training. Figure 2 illustrates this method and it is also formalised in Algorithm 2.
While in Method 2, the model is being trained on the current distribution, the model has
still been trained on data that is not in-line with the current distribution. This data is
now irrelevant and prevents the model from making accurate predictions. Method 3 aims
to remedy this, by not only training the model on the current distribution, but also by
making the model forget data that does not belong to the current distribution. In this
19
method, when a concept drift occurs, it first forgets all the new out-of-sample data that
has been previously learnt, and then periodically learns from the new data distribution.
Algorithm 3 formalises this method while Figure 2 illustrates it.
While Method 3 has the advantage of ensuring that all the out-of-sample data learnt follows
the current distribution, it also assumes that all previously encountered out-of-sample data
is not in-line with the current distribution. This assumption is most likely not completely
correct, as there probably exists some older data distribution that is at least slightly similar
to the current distribution. Method 4 aims to enhance Method 3 by first measuring the
distance between the current data distribution and previously encountered distributions.
20
Then the model is trained on that distribution that is closest to the current distribution.
Various metrics have been proposed in past literature that measure the distance between
two data distributions such as the Chi-square distance [61] and the Kolmogorov–Smirnov
statistic [62]. All of these different metrics take into account different properties of the
distribution. For this particular implementation, the Kullback–Leibler (KL) divergence
test [63] is used as it has been used for similar applications in other real-world domains
[64, 65].
The KL divergence test calculates the relative entropy between two distributions, mean-
ing that it measures how much one distribution differs from another. The relative entropy
from distribution Q to distribution P is found by subtracting the entropy of P from the
cross entropy of P and Q. This is denoted as:
where:
• DKL (P ||Q) = The relative entropy from distribution Q to distribution P. The further
away this value is from 0, the larger the relative entropy.
• H(P ) = Entropy of P
X P (i)
DKL (P ||Q) = P (i) log (4)
i
Q(i)
21
contain two to four different distributions, with only one of them being somewhat similar
to the current distribution. Should this method be applied on a larger test-set, more than
one distribution can be picked from l. Algorithm 4 formalises this method and Figure 2
illustrates it.
22
Algorithm 4: Pseudocode for Method 4
1 Initialise: machine learning model: M
2 Initialise: concept drift detector: D
3 Initialise: list with potentially infinite length containing a stream of real values:
data
4 Initialise: int: i, j = 0
5 Initialise: bool: newDistribution, learnt = False
6 Initialise: empty list: c, l
7 Split data into trainingData, validationData and testingData
8 Train model M on trainingData
9 Validate model M on validationData
10 Initialise: machine learning model: Mr = M
11 foreach element e ∈ testingData do
12 Test model M on element e
13 Add element e to concept detector D
14 Append e to list c
15 if D.driftDetected then
16 newDistribution = True
17 i, j = 0
18 M = Mr
19 Append a copy of c to l
20 Clear c
21 learnt = False
22 end
23 if newDistribution ∧ (i ≥ splitDays) then
24 Train model M on the last splitDays number of data points
25 i=0
26 end
27 if newDistribution∧!learnt ∧ (j ≥ numberOf W aitDays) ∧ l.length > 1 then
28 Calculate all the relative entropies from distribution c to each distribution
in l, and train model M on that distribution that has it’s relative entropy
from distribution c closest to 0.
29 j=0
30 learnt = True
31 end
32 Increment i
33 Increment j
34 end
3.4.5 Evaluation Plan
To evaluate Algorithms 1-4 the ACC [66] and MCC [67] metrics are going to used. These
metrics were chosen over other possible metrics as they were used in the paper that pro-
posed the chosen Adv-ALSTM model [14]. By making use of these two metrics, Algorithms
1-4 can be easily compared to the original Adv-ALSTM model.
ACC represents the proportion of samples that are correctly classified and is described
as the percentage of correctly made predictions.
where:
While accuracy is a simple and elegant metric, it does not take into account class imbalance.
The MCC metric on the other hand accounts for such imbalance, as it treats the true values
and the predicted values as binary variables, and computes their phi-coefficient [67].
TP ∗ TN − FP ∗ FN
MCC = p (6)
(T P + F P )(T P + F N )(T N + F P )(T N + F N )
where:
24
4 Evaluation and Results
In this section, the results generated by the experiments detailed in Section 3 are going to
be discussed.5 All the code used to conduct the experiments was written in Python 3.6.6
Experiment 1 was conducted to ensure that the chosen Adv-ALSTM model was successfully
replicated. The replicated model was trained and tested on both the StockNet [58] and
the KDD17 [59] datasets.
In Table 2 and Table 3, the best results generated by the original Adv-ALSTM model,
denoted by OR, are compared to the best results generated by the replicated model, de-
noted by RR. The differences are very small and acceptable, with almost all differences
being less than 1%. This difference was expected, due to the small amount of randomness
involved with the model [14, 53]. O1 was therefore considered to be completed.
5
The code used to conduct the experiments can be found in: https://github.com/tm-26/Enhancing-
stock-price-prediction-models-by-using-concept-drift-detectors
6
https://www.python.org/
25
4.2 O2: Selecting the concept drift detector
For O2 to be completed, a suitable concept drift detector needed to be selected and
implemented. This was done by comparing four state-of-the-art concept drift detectors.
As stated in Section 3.3, EDDM [46], HDDMA [16], HDDMW [16] and PH [40] were chosen.
These four concept drift detectors were implemented using the scikit-multiflow7 library. In
Experiment 2, these four concept drift detectors were tested on the StockNet [58] and the
KDD17 [59] datasets, while in Experiment 3 the four concept drift detectors were tested
on periods in time where a concept drift occurred beyond any unreasonable doubt.
Experiment 2 was conducted in order to check the number of drift flags each concept drift
detector raises in the two chosen datasets.
From Table 4 it is evident that the HDDMA concept drift detector has the largest number
of detections. This was expected as the same has been observed in [37, 38].
4.2.2 Experiment 3 - Testing the concept drift detectors for type II errors
In Experiment 2, the detectors were solely tested on the number of drifts that they can
detect. This however does not account for type I and type II errors. Therefore Experiment
3 was conducted to test for type II errors, as the concept drift detectors were tested on ten
different periods in time where concept drift occurred beyond any unreasonable doubt. A
list of these periods in time is found in Section 3.3.
7
https://scikit-multiflow.github.io/
26
The concept drift detectors were evaluated on three criteria; the number of drifts they
managed to detect in time, the number of drifts they detected late and the number of
drifts they did not detect. If a drift was detected up to 31 days late, it is considered to
be detected in time, else it was considered to be detected late. As can be seen in Table
5, the HDDMA concept drift detector once again outperformed the other concept drift
detectors. It was the only concept drift detector to not miss any of the drifts, and also the
only detector to make detections in time. This was to be expected since HDDMA is known
to detect certain concept drifts while they are happening [16, 38].
From Experiment 2 and Experiment 3, it was clear that HDDMA is the most suitable
concept drift detector.
The aim of Experiment 4 is to find the optimal parameter values for the HDDMA concept
drift detector. As discussed in Section 2.3.3, the most important parameter that HDDMA
makes use of is the drift confidence parameter. This parameter controls how high the
probability of a drift occurring needs to be in order for the concept drift detector to return
an alert. By default this is set to 99%. Different values of the drift confidence parameter
were tested.
Table 6: HDDMA tested on the StockNet dataset
Drift Confidence Average Number Of Drifts Per Stock Total Number Of Drifts
0.999 73 6383
0.99 83 7294
0.9 102 8934
0.8 111 9731
As was expected, the lower the drift confidence parameter the more drifts are detected.
Since retraining is a computationally taxing process, it was decided to keep the default
27
value of 0.99. Setting the parameter any lower would result in the model retraining to
often and setting the parameter any higher would lead to not enough drifts being detected.
The chart in Figure 3 illustrates the number of drifts detected in a popular stock found in
the StockNet dataset. A change in colour signifies that a concept drift has occurred.
28
could not be properly tested on the StockNet dataset. Hence, it was decided to only
evaluate the retraining processes on the KDD17 dataset. All of the results shown in this
section indicate the mean and standard deviation of all the results obtained across the 50
stocks present in the KDD17 dataset.
In Method 1, every time a concept drift occurs, the model was trained on the previously
observed data distribution. This was done with the assumption that the model would be
better suited to handle reoccurring distributions.
As can be seen from Table 8, when compared to the vanilla model, Method 1 produced
worse results. At first it was hypothesised that the model was overfitting and was not
able to generalise enough to make accurate predictions. While this might have some truth,
another and perhaps more accurate hypothesis is that the model was being trained on
data that is not in-line (not similar) with the oncoming data distribution. Since this data
is very different to the current distribution, it confuses the model hence leading to worse
results than that of the vanilla model. To test for this hypothesis, in Experiments 6-8, the
model is only trained on data that is in-line, hence similar to the current distribution.
In Method 2, every time a concept drift occurred, the model was periodically trained every
splitDays number of days. In this case the assumption was that the model would benefit
from data that is in-line with the current distribution. The splitDays parameter was tested
with values ranging from 3 to 30. Table 9 shows the best results achieved from Method 2.
The complete set of results can be found in Appendix A.
29
Table 9: Method 1 & 2 results compared to vanilla model’s result
ACC MCC VM ACC %Difference VM MCC %Difference
Vanilla Model (VM) 52.86 0.0520 0 0
Method 1 51.29 ±0.11079 0.0286 ±0.21426 -2.97% -45.0%
Method 2 (splitDays=14) 51.96 ±0.28763 0.0487 ±0.35922 -1.70% -6.35%
Method 2 (splitDays=23) 51.92 ±0.22472 0.0769 ±0.34640 -1.78% +47.88%
From Table 9 it can be observed that Method 2 gave a better MCC result, however there
was still no improvement in the ACC result. Nonetheless, an improvement over Method 1
was observed and this improvement is due to the model being constantly fed recent data
that was in-line with the data distribution it was being tested on.
Method 3 works in a similar manner to Method 2, however this time, when a concept drift
occurred, all the previously learnt out-of-sample data is forgotten. Afterwards, the model
continues exactly like Method 2, by periodically learning every splitDays number of days.
The splitDays parameter was once again tested with values ranging from 3 to 30. Table 10
shows the best results achieved from Method 3. The complete set of results can be found
in Appendix A.
As can be seen from Table 10, Method 3 produced better ACC and MCC results when
compared to those of the vanilla model. From these results, and the results generated by
Method 2, it was observed that if the percentage of data learnt by the model that is in-line
with the current distribution is increased, an increase in results should be observed.
Method 4 increases the amount of in-line data that the model has been trained on by
comparing the current distribution to previously encountered distributions. The model is
then trained on the distribution that is most similar to the current one. The KL divergence
test is used in order to calculate the similarity between two given distributions. The
30
numberOfWaitDays parameter, which controls how many days the model waits before
comparing the current distribution to previous distributions, was tested with values ranging
from 4 to 40. The complete set of results can be found in Appendix A.
As can be seen from Table 11, Method 4 generated a better ACC and MCC result
when compared to those of the vanilla model. This was expected due to the high amount
of in-line data that the model was trained on. The results themselves are very promising,
with an ACC improvement of 2.5% and a MCC improvement of 135.38%. While the MCC
improved by quite a substantial amount, it’s standard deviation is also quite high, meaning
that a high MCC score was not achieved through all of the different stocks .
Table 12 summarises the best results generated by each model. There is a clear trend,
that the larger the percentage of in-line data known by the model, the better the results
obtained. This is best shown in both Method 1 and Method 4. In Method 1, the model
was fed data that did not follow the current distribution, and hence it was the only model
to achieve substantially worse results when compared to the vanilla model. On the other
hand, in Method 4, the model is trained on both data from the current distribution, as
well as data from a previous distribution similar to the current distribution. This led to
Method 4 generating the best results.
31
5 Conclusions
5.1 Revisiting aims and objectives
This dissertation sought to answer the research question “Can a machine learning model
that has been trained on a financial dataset get better results if it undergoes a retraining
process every time a concept drift has been detected?”. As stated in Section 1.3, to answer
this research question, O1 to O3 needed to be addressed.
In O1, a state-of-the-art machine learning model that was previously applied on a
dataset that contains stock market data was replicated. Through the research conducted,
the Adv-ALSTM model found in [14] was chosen to be the most suitable model, and
Experiment 1 was conducted in order to see if it was possible to replicate the selected
model. The observed differences in results were negligible and hence O1 was considered
to be addressed.
O2 required that a state-of-the art concept drift detector be selected and implemented.
Through past literature and the comparisons made in Experiments 2 and 3, the HDDMA
concept drift detector was deemed to be the most suitable [16]. Experiment 4 was then
conducted in order to find an optimal drift confidence parameter. After these three exper-
iments were conducted, O2 was considered to be addressed.
O3 required that HDDMA be used together with the chosen Adv-ALSTM model. In
total, four retraining methods were proposed. Experiments 5 to 8 tested each of these
methods. The results generated showed that the model benefitted mostly from the fourth
method, as a 2.5% increase in ACC and 135.38% in MCC was observed.
From this increase in results it is clear that the chosen model benefitted from the
attached concept drift detector, hence the answer to the research question is that the
model does indeed benefit from an attached concept drift detector.
32
is developed and the number of drifts detected in time increases, one would expect an
increase in performance. Since news and social media outlets effect stock prices, it would
be interesting to make use of such data to try to improve concept drift detection. Data
mining and web scraping techniques can be used to see if any stock is currently being
discussed on such platforms. Similar to the work conducted in [58, 68]. Secondly, the
proposed retraining methods can be applied on other stock price prediction models in
order to observe if all models equally benefit from the proposed retraining methods.
In addition to this, while this dissertation addresses the problem of concept drift in the
stock market domain, the proposed retraining techniques can be applied in other domains
where concept drift is also a major issue, such as the biomedical domain and the weather
prediction domain [69]. This should be done in order to see if this approach can be gener-
alised. For such a task to be carried out, one would need to first identify and understand
how the causes of concept drift differ from one domain to another. This needs to be done
in order to find the best way to detect concept drift in the different domains.
33
References
[1] M. Harries and K. Horn, “Detecting concept drift in financial time series prediction
using symbolic machine learning,” in AI-CONFERENCE-, pp. 91–98, Citeseer, 1995.
[2] Z. Xiang-rong, H. Long-ying, and W. Zhi-sheng, “Multiple kernel support vector re-
gression for economic forecasting,” in 2010 International Conference on Management
Science Engineering 17th Annual Conference Proceedings, pp. 129–134, Nov. 2010.
ISSN: 2155-1855.
[4] S. Jung, C. Kim, and Y. Chung, “A Prediction Method of Network Traffic Using
Time Series Models,” in Computational Science and Its Applications - ICCSA 2006
(M. Gavrilova, O. Gervasi, V. Kumar, C. J. K. Tan, D. Taniar, A. Laganá, Y. Mun,
and H. Choo, eds.), Lecture Notes in Computer Science, (Berlin, Heidelberg), pp. 234–
243, Springer, 2006.
[5] P. Lindstrom, “Handling Concept Drift in the Context of Expensive Labels,” Doctoral,
Sept. 2013.
[7] A. Tsymbal, “The Problem of Concept Drift: Definitions and Related Work,” Com-
puter Science Department, Trinity College Dublin, May 2004.
[8] M. Sewell, “History of the efficient market hypothesis,” Rn, vol. 11, no. 04, p. 04,
2011.
[9] J. Kinlay and D. Rico, “Can Machine Learning Techniques Be Used To Predict Market
Direction? - The 1,000,000 Model Test,”
[10] The Editors of Encyclopaedia Britannica, “S&P 500,” Encyclopedia Britannica, Apr.
2019.
34
[11] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on
concept drift adaptation,” ACM Computing Surveys, vol. 46, pp. 44:1–44:37, Mar.
2014.
[13] B. Qian and K. Rasheed, “Stock market prediction with multiple classifiers,” Applied
Intelligence, vol. 26, pp. 25–33, Feb. 2007.
[14] F. Feng, H. Chen, X. He, J. Ding, M. Sun, and T.-S. Chua, “Enhancing Stock Move-
ment Prediction with Adversarial Training,” in Proceedings of the Twenty-Eighth In-
ternational Joint Conference on Artificial Intelligence, (Macao, China), pp. 5843–
5849, International Joint Conferences on Artificial Intelligence Organization, Aug.
2019.
[15] R. J. Bauer and J. R. Dahlquist, “Market Timing and Roulette Wheels Revisited,”
Financial Analysts Journal, 2012.
[18] E. F. Fama, “Random Walks in Stock Market Prices,” Financial Analysts Journal,
vol. 21, no. 5, pp. 55–59, 1965. Publisher: CFA Institute.
[20] A. W. Lo and A. C. MacKinlay, “Stock Market Prices Do Not Follow Random Walks:
Evidence from a Simple Specification Test,” The Review of Financial Studies, vol. 1,
pp. 41–66, Jan. 1988. Publisher: Oxford Academic.
35
[21] P. Vorburger, Catching the drift : when regimes change over time. Dissertation,
University of Zurich, Zürich, 2009. Publication Title: Vorburger, Peter. Catching
the drift : when regimes change over time. 2009, University of Zurich, Faculty of
Economics.
[22] F. Zhu, C. Nam, and D. M. Aguilar, “Market Regime Classification Using Correlation
Networks,” p. 25.
[24] D. Yang and Q. Zhang, “Drift-Independent Volatility Estimation Based on High, Low,
Open, and Close Prices,” The Journal of Business, vol. 73, no. 3, pp. 477–492, 2000.
Publisher: The University of Chicago Press.
[25] J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine
Learning, vol. 1, pp. 317–354, Sept. 1986.
[26] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under Concept
Drift: A Review,” IEEE Transactions on Knowledge and Data Engineering, 2018.
arXiv: 2004.05785.
[27] G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden
contexts,” Machine Learning, vol. 23, pp. 69–101, Apr. 1996.
[29] V. Losing, B. Hammer, and H. Wersing, “KNN Classifier with Self Adjusting Memory
for Heterogeneous Concept Drift,” in 2016 IEEE 16th International Conference on
Data Mining (ICDM), pp. 291–300, Dec. 2016. ISSN: 2374-8486.
[30] Y. Radhika and M. Shashi, “Atmospheric Temperature Prediction using Support Vec-
tor Machines,” International Journal of Computer Theory and Engineering, pp. 55–58,
2009.
[31] L. I. Kuncheva, “Using Control Charts for Detecting Concept Change in Streaming
Data,” Bangor University, 2009.
36
[32] I. Katakis, G. Tsoumakas, and I. Vlahavas, “Tracking recurring contexts using ensem-
ble classifiers: an application to email filtering,” Knowledge and Information Systems,
vol. 22, pp. 371–391, Mar. 2010.
[33] M. Black and R. J. Hickey, “Maintaining the performance of a learned classifier under
concept drift,” Intelligent Data Analysis, vol. 3, pp. 453–474, Dec. 1999.
[34] A. Liu, Y. Song, G. Zhang, and J. Lu, “Regional Concept Drift Detection and Density
Synchronized Drift Adaptation,” pp. 2280–2286, 2017.
[35] N. Lu, G. Zhang, and J. Lu, “Concept drift detection via competence models,” Arti-
ficial Intelligence, vol. 209, pp. 11–28, Apr. 2014.
[39] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, Learning with Drift Detection,
vol. 8. Sept. 2004. Journal Abbreviation: Intelligent Data Analysis Pages: 295 Pub-
lication Title: Intelligent Data Analysis.
[42] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Statistical change detection for multi-
dimensional data,” in Proceedings of the 13th ACM SIGKDD international conference
on Knowledge discovery and data mining - KDD ’07, (San Jose, California, USA),
p. 667, ACM Press, 2007.
37
[43] C. Alippi and M. Roveri, “Just-in-Time Adaptive Classifiers—Part I: Detecting Non-
stationary Changes,” IEEE Transactions on Neural Networks, vol. 19, pp. 1145–1153,
July 2008. Conference Name: IEEE Transactions on Neural Networks.
[45] S. Xu and J. Wang, “Dynamic extreme learning machine for data stream classifica-
tion,” Neurocomputing, vol. 238, pp. 433–449, May 2017.
[47] J. Gama and G. Castillo, “Learning with Local Drift Detection,” in Advanced Data
Mining and Applications (X. Li, O. R. Zaı̈ane, and Z. Li, eds.), Lecture Notes in
Computer Science, (Berlin, Heidelberg), pp. 42–55, Springer, 2006.
[50] J. L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, no. 2, pp. 179–
211, 1990. https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402 1.
[51] L. Medsker and L. C. Jain, Recurrent Neural Networks: Design and Applications.
CRC Press, Dec. 1999. Google-Books-ID: ME1SAkN0PyMC.
[52] S. Kumar Chandar, “Grey Wolf optimization-Elman neural network model for stock
price prediction,” Soft Computing, vol. 25, pp. 649–658, Jan. 2021.
38
[54] K. Chen, Y. Zhou, and F. Dai, “A LSTM-based method for stock returns prediction:
A case study of China stock market,” in 2015 IEEE International Conference on Big
Data (Big Data), pp. 2823–2824, Oct. 2015.
[55] Q. Wang and Y. Hao, “ALSTM: An attention-based long short-term memory frame-
work for knowledge base reasoning,” Neurocomputing, vol. 399, pp. 342–351, July
2020.
[58] Y. Xu and S. B. Cohen, “Stock Movement Prediction from Tweets and Historical
Prices,” in Proceedings of the 56th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 1970–1979,
Association for Computational Linguistics, July 2018.
[59] L. Zhang, C. Aggarwal, and G.-J. Qi, “Stock Price Prediction via Discovering Multi-
Frequency Trading Patterns,” in Proceedings of the 23rd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, (Halifax NS Canada),
pp. 2141–2149, ACM, Aug. 2017.
[60] H. R. Stoll and R. E. Whaley, “Stock Market Structure and Volatility,” The Review
of Financial Studies, vol. 3, pp. 56–58, Jan. 1990. Publisher: Oxford Academic.
[61] K. Pearson, “On the Criterion that a Given System of Deviations from the Probable
in the Case of a Correlated System of Variables is Such that it Can be Reasonably
Supposed to have Arisen from Random Sampling,” in Breakthroughs in Statistics:
Methodology and Distribution (S. Kotz and N. L. Johnson, eds.), Springer Series in
Statistics, pp. 11–28, New York, NY: Springer, 1992.
[63] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Math-
ematical Statistics, vol. 22, pp. 79–86, Mar. 1951. Publisher: Institute of Mathematical
Statistics.
39
[64] B. Bigi, “Using Kullback-Leibler Distance for Text Categorization,” in Advances in In-
formation Retrieval (F. Sebastiani, ed.), Lecture Notes in Computer Science, (Berlin,
Heidelberg), pp. 305–319, Springer, 2003.
[66] I. O. for Standardization, ISO 5725-1 Accuracy (trueness and precision) of measure-
ment methods and results - Part 1: General principles and definitions. International
Organization for Standardization, Dec. 1994.
[68] G. Gidofalvi and G. Gidófalvi, Using News Articles to Predict Stock Price Movements.
2001.
[69] Y. Kadwe and V. Suryawanshi, “A Review on Concept Drift,” IOSR Journal of Com-
puter Engineering, vol. 17, pp. 20–26, Jan. 2015.
40
Appendix A: Results Generated
42
Table 14: Method 3 results
44
Table 15: Method 4 results
splitDays ACC MCC
4 48.87 ±0.43528 0.0129 ±0.23240
5 54.18 ±0.11209 0.0747 ±0.09596
6 50.16 ±0.10471 0.0484 ±0.11371
7 51.10 ±0.11936 0.0701 ±0.12425
8 51.90 ±0.13321 0.0837 ±0.14677
9 52.72 ±0.11945 0.1224 ±0.16686
10 52.04 ±0.15327 0.0900 ±0.17826
11 51.82 ±0.13738 0.0950 ±0.15446
12 51.66 ±0.12921 0.0759 ±0.19130
13 51.95 ±0.13274 0.1001 ±0.18381
14 52.48 ±0.12512 0.1060 ±0.18142
15 50.36 ±0.27622 0.0430 ±0.37787
16 51.82 ±0.13452 0.0696 ±0.19358
17 51.47 ±0.13306 0.1005 ±0.18613
18 52.46 ±0.11953 0.1095 ±0.20192
19 51.12 ±0.12462 0.1003 ±0.21149
20 51.96 ±0.13277 0.0864 ±0.20957
21 52.29 ±0.13113 0.0768 ±0.22172
22 51.47 ±0.14147 0.0846 ±0.23246
23 51.19 ±0.14414 0.1005 ±0.22901
24 51.65 ±0.14512 0.1024 ±0.22646
25 51.61 ±0.14093 0.1081 ±0.22287
26 52.04 ±0.14360 0.1128 ±0.23226
27 52.03 ±0.14299 0.1079 ±0.22613
28 51.87 ±0.14683 0.0993 ±0.20915
29 52.77 ±0.14787 0.0969 ±0.22001
30 52.12 ±0.14753 0.0767 ±0.21490
31 52.62 ±0.14361 0.0970 ±0.20989
32 52.10 ±0.14308 0.0691 ±0.20438
33 51.65 ±0.14541 0.0753 ±0.19004
34 51.02 ±0.14363 0.0897 ±0.20730
35 52.22 ±0.13461 0.0890 ±0.20964
36 52.32 ±0.12594 0.0771 ±0.22118
37 51.68 ±0.13563 0.0690 ±0.22299
38 51.75 ±0.13135 0.0771 ±0.22690
39 51.96 ±0.12667 0.0644 ±0.21806
40 52.56 ±0.12824 0.0670 ±0.22549
45
Figure 8: Method 4 - ACC results
46