Professional Documents
Culture Documents
Rizvi, Shamir
Rizvi, Shamir
By
Shamir Rizvi
A thesis
in the program of
i
Author’s Declaration
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including
I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose
of scholarly research.
by other means, in total or in part, at the request of other institutions or individuals for the purpose
of scholarly research.
ii
Abstract
The Impact of Twitter and News Count Variables on Stock Price Prediction via Neural Networks
Shamir Rizvi
This study examines how Twitter and News Count variables generated by Bloomberg L.P. when
utilized as inputs impact the stock price prediction accuracy of two distinct neural network types.
The neural network types that are examined are Multi-Layer Perceptron neural networks and Long
Short-Term Memory neural networks. Besides, all models were tested on two distinct periods, one
without any market panic, the other including a prolonged period of market panic. The results
suggest that the inclusion of Twitter and News Count variables significantly improve Multi-Layer
Perceptron networks, but no significant improvement occurred for Long-Short Term Memory
networks. Regarding periods of panic and no panic, the inclusion of the variables improved stock
iii
Acknowledgments
I would like to thank my supervisor Dr. Hossein Zolfagharinia for guiding and mentoring me in
countless ways throughout my time at Ryerson University. I would also like to thank Dr. Amin
and Dr. Kalu, who kindly agreed to be part of my examining committee. I also extend my
appreciation to my parents Kamal Rizvi and Azra Rizvi, for their constant support and the example
of hard work and determination they set for me on a daily basis. I would not be where I am today
without them.
iv
Table of Contents
Author’s Declaration………………...…………………………………….………………….....ii
Abstract..……………………………………………………………………………….………..iii
Acknowledgements………………..……………………………………………………………..iv
List of Tables………………………………………………………………………………..…..vii
List of Figures…………………………………………………………………….……………viii
Chapter 1 Introduction ............................................................................................................. 1
Chapter 2 Literature Review .................................................................................................... 6
2.1. Stock Price Prediction Models ........................................................................................ 11
2.1.1. The K Nearest Neighbor (KNN) Algorithm……………………….....………… .…...11
2.1.2. Random Forests........................................................................................................ 12
2.1.3. Fuzzy ....................................................................................................................... 14
2.1.4. ARIMA .................................................................................................................... 15
2.1.5. Regression................................................................................................................ 17
2.1.6. Bayes ....................................................................................................................... 18
2.1.7. Principal Component Analysis (PCA) ...................................................................... 19
2.1.8. Candlestick Analysis/K-Line .................................................................................... 21
2.1.9. Hidden Markov Model (HMM) ................................................................................ 21
2.1.10. Geometric Brownian Motion .................................................................................. 22
2.1.11. Support Vector Machines ....................................................................................... 23
2.1.12. Support Vector Regression ..................................................................................... 24
2.1.13. Artificial Neural Networks ..................................................................................... 25
Chapter 3 Problem Definition ................................................................................................ 30
Chapter 4 Choice of Solution Method……………..………………………………….……….33
4.1. Neural Network Selection……………..………………………….………….…………34
4.1.1. Multiple Layer Perceptron (MLP) Networks.……………….….…………………35
4.1.2. Long Short-Term Memory (LSTM) Networks……………………………………37
Chapter 5 Designing Method..……………………… ………………………...………………40
5.1. Neural Network Design……….………….……………….………………….………..40
5.2. Data Selection……………………………...………….………………………….……43
v
5.2.1. Stock Selection…………………………………………….…………..…….…….43
5.2.2. Variable Selection………………………………..……………...…………….…..44
5.3. Final Experiment………………….…..……………………………….…………..…..46
Chapter 6 Results & Discussion……..…………………………………………………….…...48
6.1. The Impact of T Variables vs the T+ Variables….………………...…………….……48
6.2. Examining the Predictive Models under Panic and Normal Circumstances……….. ….53
6.3. Comparing the MLP and LSTM Methods…………..………………………………....56
6.4. Comparing Walmart vs Amazon and Tesla vs. Ford……..………….………………...58
Chapter 7 Conclusions & Managerial Insights ………………………………………………60
Appendices…..…… ..……………………………………………………………………………63
.
References................................................................................................................................ 67
vi
List of Tables
Table 1. A Summary of the related literature...………………………………………………..6
Table 2. Types of Neural Networks found in recent stock price prediction literature….….35
Table 3. Indicator Selection……………………………..………………………………...….…46
Table 4. Variable Definitions………….…………..……………..………………………….….46
Table 5. RMSE Percentage change when utilizing T vs T+ variables…………………… .…49
Table 6. Average RMSE improvement across all configurations……………………………51
Table 7. Mean Twitter Count to News Publication Count Ratio…….…………..…….….…51
Table 8. Variable Coefficient of Variation change between training and test sets ……...…52
Table 9. Mean Twitter & News Count in Training Data, Normal Test Set, & Panic Test
Set……………………….…………….…………….…………….…………….……………….53
Table 10. T+ Variables Mean & Coefficient of Variation in Panic Test Set and Training...55
Table 11. RMSE analysis of T vs T+ and Panic Test Set vs Normal Test Set………………56
vii
List of Figures
Figure 1. Inner Workings of LSTM Neuron...............................................................................39
viii
Chapter 1
Introduction
The massive and ever-increasing role that the stock market plays on a societal level can be
evidenced by the fact that globally total stock market capitalizations have exceeded US $80 trillion
as of 2017 an over 300 percent growth from the 2009 global total stock market capitalization of
US $25 trillion, with Goldman Sachs estimating a steady rise to over US $100 trillion (Edwards,
2017).
Stock trading has a significant impact on the United States economy. The total value of stocks
being traded on U.S markets since 2013 has been increasing and has managed to consistently stay
above 200% of the United States’ total annual GDP (World Bank, 2017). The New York Stock
Exchange alone had an average market capitalization of approximately US $30 trillion (New York
Stock Exchange, 2018). Furthermore, the New York Stock exchange averages US $169 billion in
stocks traded daily (FXCM, 2016). The movement of stocks plays a key role in determining
financial market health. On a macro level, stock market price booms have been shown to drive
Related to individual traders, another reason to work on stock price prediction is due to the increase
in first-time and novice investors actively buying and selling stocks. Over 54% of all U.S adults
have their money invested in some shape or form in the stock market (Jones, 2017). Since the 2008
financial crisis consumer credit and mortgage borrowing have increased rapidly as well as stock
markets becoming more accessible to smaller investors as financial products and services grow
A major contributing factor to this increase is the rise in popularity of stock trading apps that allow
you to trade individual stocks as well as take on extra risk in the form of loans to finance your
1
trading all with just a swipe and tap on your smartphone (Morris, 2018). It has always been
beneficial for society that the average stock market investors/traders to be as knowledgeable as
possible when making their investment and financial decisions (Antunes, Macdonald, & Stewart,
2014). However, this is especially the case now more than ever in order to ensure long term
economic health, due to a large segment of stock trading apps’ actively targeted audiences being
The largest player in this new stock trading app market is currently an app called Robinhood.
Robinhood’s initial marketing was centered around being a massive disruptor to the online
brokerage industry. Robinhood was launched on the Apple store for iPhones and tablets in 2014
with the Android version following shortly after. Robinhood's selling point is to allow any level
of investor to buy and sell stocks and exchange-traded funds exclusive of any commission
fees/payments. Users initially could not short sell or trade mutual funds, options, or fixed income
instruments, however, options trading and short selling are available in the current version. The
information Robinhood chose to convey to customers consisted of basic pricing graphs and dates
of shareholder events such as dividends and earnings announcements. The approach behind this
was that the target audience for the app, millennials, would go on their own to find any information
beyond the basic graphs and dates available on the app. This lack of information may not seem
harmful at first glance. However, the app is targeting millennials and novice investors. Therefore,
the benefit of creating better stock prediction techniques that these young inexperienced investors
The disruptive impact of these commission-free stock trading apps such as Robinhood prompted
traditional brokerages such as Charles Schwab, E*TRADE Financial, and TD Ameritrade to slash
their trading commissions by over 35% in February 2017 (Team, 2017). This is a clear indication
2
that these apps are here to stay and a new major goal for all stock brokerages will be the same as
Furthermore, stock trading apps are making increasing the number of transactions a top priority.
More so than traditional brokerages. It is estimated Robinhood is making ten times the revenue
from payment for order flow as other brokers for the same volume (Kane, 2018). Payment for
order flow can be defined as the compensation a brokerage firm receives for directing orders to
different parties for trade execution. The brokerage firm receives a small payment, usually a penny
per share, as compensation for directing the order to different third parties. This massive revenue
generation from trade volume leads to a focus on increasing transactions meaning that it is in
Robinhood’s best interest to have all their investors be actively trading. Because of this stock price
prediction techniques become even more beneficial due to an investor’s trading decisions and
On an institutional level, stock price prediction techniques are utilized heavily in algorithmic
trading. Algorithmic trading utilizes computational power and mathematical formulas for trading.
Algorithmic trading makes use of complex formulas, combined with mathematical models and
human oversight, to make decisions to buy or sell financial securities on an exchange (Kim, 2010).
Stock price prediction techniques are often incorporated into institutional firm’s trading
algorithms. Many companies use algorithmic trading to minimize their transaction cost and market
The rise and development of AI applied to the stock market and within financial firms has been a
major contributing factor in the increase of the algorithmic trading market. Firms such as Sentient
have already built multiple AI-powered algorithmic traders and then distilled them into a single
3
AI-powered algorithmic trader that is being discussed as having the potential to be incorporated as
its own company separate from other Sentient operations (Business Wire, 2019).
The North American algorithmic trading market accounted for the largest share in 2017, and is
expected to retain its dominance until 2026. This is due to strong technological advancements and
considerable application of algorithm trading in various applications such as banks and financial
institutions across the region. Algorithmic trading is responsible for around 60-73% of all U.S.
equity trading. Moreover, according to the CEO of QuantInsti, algorithmic trading can potentially
help expand strategy portfolios by using more advanced quantitative tools and remove human
errors that often affect the performance of trading strategies. Approximately half all exchange in
terms of volume in futures and options market happen through algorithmic trading (Business Wire,
2019).
Due to the nature of algorithmic trading being primarily technical and mathematical, research that
explores quantitative stock prediction techniques is one of the only and primary ways that such an
influential field such as algorithmic trading can continue to incrementally improve. Because of the
stock market impact, a targeted increase in novice investors, and an industry-wide increase in
algorithmic trading, stock market prediction is a societal problem/issue that deserves academic
attention. It is becoming more and more important to develop techniques and methodologies that
will predict systematic risk in world stock markets and offer the possibility to investors to minimize
risk by making more informed buy and sell strategies while at the same time maximizing their
profits. Therefore, this thesis focuses on improving stock price prediction via algorithms and the
The remainder of this paper is structured in the following manner. Chapter 2 outlines the relevant
literature. Chapter 3 provides a definition of the problem. Chapters 4 presents an overview of the
4
choice of solution method. Chapter 5 examines the design decisions of the proposed models.
Chapter 6 presents the results and analysis and Chapter 7 discusses some important managerial
5
Chapter 2
Literature Review
The basis of selecting articles for the review of the literature was relevancy from the past 10 years
and what various other literature reviews have deemed as seminal papers in the field of stock price
prediction. A summary of the literature review is presented in Table 1. After reviewing the
literature, it is important to note that due to the nature of stock price prediction research, formal
theorizing is less discussed in both practice and academia. Rather, the focus is more on model
selection and justification based on previous applications on other problems. Historically, models
that could solve time series and classification problems were deemed worthy to test on the stock
market. However, recently machine learning models have been able to deal with increasingly
complex data sets. This is good news for the field of stock price prediction as stock market data is
Table 1 presents an overview of how the relevant papers have been classified. The column titled
“Analysis” represents the type of analysis and input utilized in the research. The columns titled
“ML”, “ANN”, “STATS”, and “EA”, represent whether research has utilized either machine
6
Table 1. A summary of the related literature
No. Authors Year Analysis ML NN STAT EA MODEL
1 Yoon 1993 TECH X NN
2 Kamstra, Donaldson 1996 TECH X NN
3 Tsaih 1998 TECH X X NN
4 Kim, Han 2000 TECH X X EA
5 Wang 2002 TECH X FUZZY
6 Leigh 2002 TECH X X EA
7 Kim 2003 TECH X SVM
8 Chen 2003 TECH X NN
9 Wang 2003 TECH X FUZZY
10 Pai & Lin 2005 TECH X X ARIMA
11 Enke & Thawornwong 2005 FUND X NN
12 Armano 2005 TECH X X EA
13 Hassan 2007 TECH X X X EA
14 Schumaker & Chen 2009 SENT X SVM
15 Yu 2009 TECH X SVM
16 Fernandez-Rodriguez 2009 TECH X NN
17 Huang & Tsai 2009 TECH X SVM
18 Demyanyk & Hasan 2010 N/A X X X N/A
19 Tsai & Hsiao 2010 FUND X X NN
20 Kara 2011 TECH X X SVM
21 Groth & Muntermann 2011 SENT X STAT
22 Wang, Wang, Zhang, & Guo 2012 TECH X X HYBRID
Schumaker, Zhang, Huang, &
23 Chen 2012 SENT X STAT
24 Yolcu, Egrioglu, & Aladag 2013 TECH X X NN
25 Kao, Chiu, Lu, & Chang 2013 TECH X X SVM
26 Hagenau, Liebmann, & Neumann 2013 SENT X STAT
27 Yu, Duan, & Cao 2013 SENT N/A
28 Hagenau, Liebmann, & Neumann 2013 SENT X SVM
29 Umoh & Udosen 2014 TECH X FUZZY
30 Lv, Sun, & Liu 2014 TECH X X EA
31 Lv & Liu 2014 TECH X X EA
32 Li, Xie, Checn, Wang, & Deng 2014 SENT X SVM
33 Sun, Shen, & Cheng 2014 TECH X X NN
34 Dash, Dash, & Bisoi 2014 TECH X X EA
35 Adebiyi & Ayodele 2014 TECH X X ARIMA
Lee, Surdeanu, MacCartney, &
36 Jurafsky 2014 SENT X X RF
37 De Fortuny, De Smedt, & Martens 2014 SENT X X SVM
38 Fenghua, Jihong, Zhifang, & Xu 2014 TECH X X SVM
39 Chaabane 2014 TECH X X ARIMA
7
Table 1. Continued.
40 Bisoi, Ranjeeta, & Dash 2014 T/F X X SVM
41 Mondal, Shit, & Goswami 2014 TECH X ARIMA
42 Li, Huan, Deng, & Zhu 2014 T/S X NN
43 Geva, & Zahavi 2014 T/S X X NN
44 Jiang, Chen, Nunamaker, & Zimbra 2014 SENT X STAT
45 Hafezi, Shahrabi, & Hadavandi 2015 TECH X X EA
46 Umoh & Inyang 2015 TECH X X FUZZY
47 Guo, Wang, Yang, & Miller 2015 TECH X X STAT
48 Hafezi, Shahrabi, & Hadavandi 2015 TECH X X NN
49 Ballings, Van den Poel, Hespeels, & Gryp 2015 TECH X X X RF
50 Patel, Shah, Thakkar, & Kotecha 2015 TECH X X RF
51 Nguyen, Shirai, & Velcin 2015 SENT X SVM
52 Rather, Agarwal, & Sastry 2015 TECH X X NN
53 Moghaddam, Moghaddam, & Esfandyari 2015 TECH X NN
54 Patel & Kotecha 2015 TECH X X RF
55 Wang, Wang, Zhao, & Tan 2015 FUND X STAT
56 Sun, Guo, Karimi, Ge, & Xiong 2015 TECH X FUZZY
57 Gocken, Ozcalici, & Boru 2016 TECH X X EA
58 Dash & Dash 2016 TECH X X FUZZY
59 Mahmud & Meesad 2016 TECH X FUZZY
60 Zhou, Gao, Wang, Chu, Todo, & Tang 2016 TECH X NN
61 Shynkevich, McGinnity, Coleman, & Belatreche 2016 SENT X KNN
62 Nie & Jin 2016 TECH X X SVM
63 Qiu & Song 2016 TECH X X NN
64 Chen & Pan 2016 TECH X X SVM
65 Gocken, Ozcalici, & Boru 2016 TECH X X NN
66 Di Persio & Honchar 2016 TECH X NN
67 Lahmiri 2016 TECH X X X SVM
68 Qiu, Song, & Akagi 2016 T/F X X NN
69 Qiu & Song 2016 TECH X X NN
70 Pröllochs, Feuerriegel, & Neumann 2016 SENT N/A
71 An & Chan 2017 TECH X EA
72 Wang, Wang, & Gao 2017 TECH X X EA
73 Shynkevich, McGinnity, Coleman, Belatreche, & Li 2017 TECH X KNN
74 Ouahilal, Mohajir, Chahhou, & Mohajir 2017 TECH X SVM
75 Tao, Hao, Hao, & Shen 2017 TECH X STAT
76 Rout, Dash, Dash, & Bisoi 2017 TECH X X EA
77 Mehmanpazir & Asadi 2017 TECH X X FUZZY
78 Castelli & Vanneschi 2017 TECH X EA
79 Chong, Han, & Park 2017 TECH X ARIMA
80 Weng, Ahmed, & Megahed 2017 TECH X X RF
81 Chen & Hao 2017 TECH X KNN
82 Zhuge, Xu, & Zhang 2017 SENT X NN
83 Kraus & Feuerriegel 2017 SENT X NN
84 Jeon, Hong, & Chang 2018 TECH X X STAT
85 Agustini, Affianti, & Putri 2018 TECH X STAT
86 Matsubara, Akita, & Uehara 2018 SENT X NN
8
Table 1. Continued.
what type of inputs the model will utilize. The three input categories are (1) technical
indicators are the most popular type of input in academic research on stock market prediction.
These indicators are defined as the result of mathematical and statistical techniques applied to
historic time series data including stock price, volume, and in some rare cases interest information.
This is in contrast to fundamental analysis which is less popular in academia but is utilized by
financial analysts daily. The reason for this could be that in contrast to technical analysis which is
derived from the movement of the stock price, fundamental analysis must incorporate various
9
factors such as economic forecasts, efficiency of management, business opportunities, and
financial statements (Wafi, Hassan, & Mabrouk, 2015). Fundamental analysis can be defined as
the method of trying to find the intrinsic value of a stock based on financial analysis. This analysis
incorporates macroeconomic factors, industry factors, and company reports and releases such as
tax and quarterly reviews. The last type of analysis is a sentiment-based analysis. Sentiment
analysis can be defined as analysis based on linguistic feature extraction to extract subjective
information from the written material. The reason for the relative lack of popularity of sentiment
analysis has been due to the difficulty in the development of reliable and efficient sentiment
analysis tools. This is due to both design complexity and relevant source selection which is of
categories.
Another important decision is to decide about the model’s output(s). This can either be stock price,
stock direction, index price, or index direction. Stock price refers to individual stock price data.
This data is a numerical value which the model believes the stock will be priced at in the near
future. The stock direction will output the direction the model believes a stock’s price will move,
this will either be up or down. The difference between a stock and an index is that rather than being
data pertaining to an individual stock, an index measures a section of the stock market via the
combining of price data from multiple stocks. Some popular indices are S&P 500, and Dow Jones
Industrial Average. Based on the literature, it is evident that individual stock outputs are drastically
popular than index-based computations. Lastly, one of the most important decisions in stock price
prediction is to select the model. As seen in Table 1, machine learning techniques, artificial neural
networks, traditional statistical methods, and evolutionary algorithms are used in different studies.
10
As previously discussed, the focus of stock price prediction research is the method utilized.
Therefore, it is important to have a full understanding of the logic that drives the model. As well
as some of the relevant recent research which has utilized the model, and the advantages and
disadvantages of the model. A brief overview of the models covered in the literature review is
The KNN algorithm is a supervised machine learning algorithm (a model that utilizes both input
and output data when learning) used exclusively for classification and regression. The primary
operation is identifying the closest neighbors to the data point being queried. Then, if the task is
classification, the most occurring neighbor value is returned and used as the output value. If
regression is the objective, then the average of all neighbor values is returned and used as the
output value. More specifically K nearest neighbors finds the distance between the data point being
investigated and all the other examples in the data, then the specified (denoted as K) number of
neighbors closest to the query are selected. After this, depending on whether the goal is regression
or classification, the output value of the algorithm is either the mean of the value of the neighbors
The prediction of the stock market closing price is computed using KNN as follows: a) Determine
the number of nearest neighbors, K, b) Compute the distance between the training samples and the
query record, c) Sort all training records according to the distance values, d) Use a majority vote
for the class labels of K nearest neighbors and assign it as a prediction value of the query record
11
Based on the survey of recent stock price prediction literature, this method is used the least out of
the eight most commonly used methods. It is safe to say in recent years stock price prediction
research focused primarily on KNN has died down. This is most likely due to KNN being a very
simple method that often performs poorly compared to newer machine learning approaches. Some
recent examples of this phenomenon demonstrated by Shynkevich et al. (2016) whose results
showed their KNN news sentiment-based stock prediction algorithm was consistently less accurate
when compared to Support Vector Machine (SVM), a more complex machine learning algorithm.
Shynkevich et al. (2017) also investigated the impact of forecasting window length on stock price
prediction machine learning algorithms. Their results showed that their prediction related to
forecasting window length held for artificial neural networks (ANN) and Support Vector Machines
(SVMS). However, regarding KNN, the results showed that “The prediction performance of the
KNN approach is low, the pattern is still visible however its occurrence is significantly affected
Random forests (RF) is a machine learning algorithm that utilizes the aggregate analysis power of
a large number of individual decision tree algorithms. Decision trees can be defined as any model
in which each node within the tree aims to split the observed data points as polarly different from
each other as possible while trying to make members of the divided subgroups similar to each
other as possible (Athey, Tibhsirani, & Wagner, 2019). This type of decision tree employed in
large numbers results in a random forest. Random forest is based on the simple principle that a
large number of relatively simple models that exhibit low correlation operating as a joint group
will outperform any of the individual models (Scornet, Biau, & Vert, 2015).
12
The overarching objective when designing a random forest is to not correlate with trees in the
with replacement, and ensuring feature randomness when constructing all individual trees. If a
random forest is created successfully then the prediction done by all trees communally should have
greater accuracy than that of any individual tree. The reason why this method is successful is that
as long all the trees do not contain the same bias, then all members can protect each other from
their errors. Another critical success factor for random forests is that the selected features need to
have proven signals. Lastly, a requirement for a random forest to be effective is a proven signal in
the selected features so that the models that rely on the features are not relying on just guessing.
Lee, Surdeanu, Macartney, and Jurafsky (2014) examined whether textual data improves stock
price prediction when employing random forests to train their models. Their proposed random
forest was made up of 2000 trees and was successful in training all the models tested. Their results
illustrate that the incorporation of textual data improved next day price movement prediction by
10%. Similarly, Weng, Ahmed, and Megahed (2017) examined the effectiveness of incorporating
textual data as well as technical data into stock price prediction. Applying decision trees was one
of the tested methods alongside SVM and NN. They also concluded that textual data incorporation
can improve prediction results. More recently, Zhang, Cui, Xu, Li, and Li (2018) utilized Random
forest as a critical part of training “Xuanwu”, a proprietary stock prediction model. The authors
concluded that the relatively effective performance of their prediction model in terms of accuracy
and returns is due to the incorporation of random forests as one of the integrated models used as a
learning method.
Patel and Kotecha (2015) combined random forests and Support Vector Regression (SVR) and
showed that the hybrid method performs better than applying each method individually. Patel,
13
Shah, Thakkar, and Kotecha (2015) compared RF to ANN and SVM and found that RF
outperforms others. However, it is important to note that the applied ANN was very simple with
only one hidden layer containing 2 neurons. This is below the optimal ANN architecture and can
2.1.3. Fuzzy
A prominent theory that has been applied to a wide variety of problems is the Fuzzy set theory,
first developed by Zadeh (1965). Fuzzy set theory is built upon the logic that elements may both
belong, and not belong, to the exact same set at certain levels. This essentially states that the
membership is in the interval [0,1]. Fuzzy Logic can define systems in both numeric and linguistic
terms. Fuzzy logic-based research has stated that this adds robustness to the method since it is not
purely numerical or purely symbolical, as is the sometimes the case with systems in real life. Purely
statistical models are often dependent on relatively normal data distributions, large sample sizes,
and no uncertain data, to have the best chance at accurate forecasting with minimum bias.
However, this is not the case with fuzzy-based systems as they outperform simpler statistical
Unlike most relatively successful numerical models, it was quite some time before fuzzy logic had
been utilized for creating forecasting models to address time-series data. Song and Chissom (1993)
were the first to propose a generic fuzzy logic-based time series forecasting. The work by Wang
(2002) was a seminal paper in fuzzy logic being applied to the stock market. He classified the
stock price into 6 different fuzzy categories based on time and price behavior. Sun, Guo, Karimi,
Ge, and Xiong (2015) developed a multivariate fuzzy time series model with multiple factors. To
simplify the calculation and refine the rules, it employs rough set theory into the model, which
enables the model to process large amounts of data. Rough sets can be essentially considered less
14
precise fuzzy sets for the sake of computational efficiency. This utilization of rough sets to simplify
the calculations was originally employed by Wang (2003) and later by Chen and Yang (2018).
Recently, three studies have utilized a promising and interesting fuzzy logic inspired framework,
a Neuro-Fuzzy Inference System (Umoh and Inyang, 2015; Dash and Dash, 2016; Mahmud and
Meesad, 2016). A Neuro-Fuzzy system is a combination of neural networks which heavily utilize
fuzzy logic techniques. A good summary of how fuzzy techniques are utilized by neural networks
is by Karaboga and Kaya (2018) who state "The first layer takes the input values and determines
the membership functions belonging to them. It is commonly called the fuzzification layer. The
membership degrees of each function are computed by using the premise parameter set, namely
{a,b,c}. The second layer is responsible for generating the firing strengths for the rules. Due to its
task, the second layer is denoted as the "rule layer". The role of the third layer is to normalize the
computed firing strengths, by diving each value for the total firing strength. The fourth layer takes
as input the normalized values and the consequence parameter set {p,q,r}. The values returned by
this layer are the defuzzificated ones and those values are passed to the last layer to return the final
output”. The advantage of these proposed systems is that, due to the ANN structure and learning
ability, these models are powerful at tackling a wide variety of problems while utilizing if-then
2.1.4. ARIMA
One of the more popular methods of stock price prediction is utilizing the simple stochastic time
series model (called ARIMA), which is commonly referred to as the Box-Jenkins model in the
finance industry. ARIMA stands for Auto-Regressive Integrated Moving Average. ARIMA is a
model that is solely intended to be trained to forecast time series data (Babu & Reddy, 2014).
ARIMA is a Generalized Random Walk model which is fine-tuned to eliminate all residual
15
autocorrelation. Autocorrelation is the calculation of the correlation coefficient between variables
that can incorporate long-term trends and seasonality (Kumar & Anand, 2014).
The way ARIMA works is by capturing complex relationships via the utilization of lagged term
observations. The acronym ARIMA provides a good overview of the unique techniques the model
utilizes. This is due to the acronym itself describing the critical parameters of the model: “AR”
stands for autoregression. In the context of ARIMA, AR is how much the model utilizes the
dependent relationship between a time series value and other lagged values in the same time series.
AR(x) means x lagged error terms are going to be used in the ARIMA model. “I” stands for
integrated. In the context of ARIMA, integrated refers to the technique of differencing (subtracting
a time series value from a previous value) so that a non-stationary time series (a time series that
does not have a constant mean, variance, and autocorrelation) can be utilized to create a more
stationary time series (ideally removing all seasonality) that will be easier to analyze. “MA” stands
for moving average. MA is how much the model utilizes the dependency between an observation
and a residual error from a moving average applied to a lagged observation. The common notation
is “MA(x)” where x represents previous observations that are used to calculate current observation.
Recently the majority of stock price prediction research involving ARIMA has been focused on
either incorporating ARIMA into another deep learning model or as a means of comparison. A
very early and prominent example of this is the work of Pai and Lin (2005), which created and
tested a hybrid ARIMA and SVM model. The authors recognized that ARIMA was on the decline
in terms of popularity and demonstrated that ARIMA could still be used to improve machine
learning models. The results showed that the proposed hybrid ARIMA SVM model outperformed
both the solo ARIMA and solo SVM model. Adebiyi, Adewumi, and Ayo (2014) compared
16
ARIMA against a three-layer ANN and found that the ANN outperformed ARIMA in the vast
majority of cases. When their ARIMA predictions were graphed, it became evident that the output
followed a linear direction and was not value-based forecasting. Similarly, Chong, Han, and Park
2.1.5. Regression
Statistical methods have been employed widely for a long time. However, there has been a massive
surge in more complex statistical methods since the latter half of the 20th century. This rise in
popularity can be attributed to the massive increase in accessible computing power. The
widespread availability of powerful computers has shifted the focus in statistical science from
traditional simple linear models to more nonlinear models, generalized linear models, and also
Some examples of the computationally demanding statistical methods that can be practically
utilized to their full potential but would not have been possible before include permutation tests,
bootstrapping, and various other resampling-based methods. However, the rapid rise in the latter
half of the 20th century has slowed down to a near halt. The focus has now shifted to even more
robust and extremely complex non-linear models (Jordan & Mitchell, 2015). These models are
referred to as machine learning methods and not statistical models due to the lack of interpretability
in the former.
However, this does not mean that non-machine learning statistical methods have completely
disappeared from the stock price prediction literature. Along with the major stock prediction
methods covered in the literature review, a few papers have employed traditionally statistical
methods to predict financial markets. The most common are linear regression-based prediction.
Due to the sheer amount of research that has been conducted utilizing linear regression no ground-
17
breaking developments are being made in linear regression stock price prediction research. One
recent example is the work by Jiang, Chen, Nunamaker, and Zimbra (2014) that visited company-
specific internet forums, divided users into stakeholder groups, and then analyzed how stakeholder
group postings correlated with events in the company and if that can be used to predict movements
in stock price. This data was incorporated as independent variables into their analysis. Their study
concluded that the inclusion of stakeholder sentiment can improve stock price prediction
performance.
2.1.6. Bayes
When discussing statistical models, a very familiar name in statistics literature has to be mentioned
and that is Bayes. Bayesian logic derived from Bayes’ theorem is a decision-making framework
utilized throughout inferential statistics. The defining characteristic of Bayesian logic is that it
quantifies uncertainty and deals with probabilistic outcomes rather than certain ones (Van de
Schoot et al., 2014). First introduced in the 18th century Bayesian logic has become a fundamental
part of statistical analysis. As will be discussed later in this section, the application of Bayesian
logic to stock price prediction has been drastically influenced by the emergence of machine
learning techniques.
Based on the review of literature, it is evident that research focusing on Bayesian logic being
applied to stock price prediction has decreased in popularity. One of the most prominent papers is
by Tan, Wang, Wang, and Zhao (2015). Their study employed the use of Bayesian factor graphs.
Bayesian factor graphs can be summarized as a method involving finding the optimal structure of
a directed acyclic graph that represents the joint probability distribution for a set of factors. Tan et
al. utilized Bayesian factor graphs to study the impact of macroeconomic events on stock market
indexes. Due to this being a new application of Bayesian factor graphs, they concluded that this
18
early prototype required a greater statistical and theoretical grounding. Besides, they stated that
the development and design of Bayesian factor graphs are extremely difficult. This raises the
question of why one would utilize this method over other machine learning methods that are more
Groth and Muntermann (2011) examined the ability of various machine learning algorithms to
improve stock price prediction utilizing textual data sentiment analysis. After showing that events
reported in the news and media can, to a significant degree, create stock price volatility. The
authors then employed various machine learning algorithms to detect patterns in articles that could
predict abnormal increases in risk and volatility stemming from the publication and consumption
of the news and media. A Naïve Bayes model was employed by them. Essentially a Naïve Bayes
model is a model that incorporates probabilistic Bayesian logic while treating each data points as
an individual and not a series, in the context of their study, this meant treating each word in a
sentence independent of the other ones. The authors concluded that as most would expect, Neural
Networks and Support Vector Machines drastically outperformed Naïve Bayes. However, more
notably, it was shown that k-Nearest Neighbor also outperformed the Naïve Bayes algorithm,
further solidifying the fact that the decline in prominence as a primary focus of stock price
PCA is a dimensionality reduction method which attempts to transform a large number of variables
into a smaller set, while not losing most of the uniquely defining characteristics of the original
noisier larger set. This is accomplished by reducing the dimensionality of the original data set. By
definition, a reduction in the number of variables of a data set will result in a reduction of accuracy.
However, when done effectively, this reduction in accuracy should result in simplifications
19
(Jolliffe & Cadima, 2016). Increased simplicity makes data sets simpler to analyze and visualize,
essentially prepping the data set for easier processing by machine learning algorithms. How to
effectively and consistently balance simplicity and accuracy is a challenge that is the focus in
are linear combinations or various mixtures of the original data sets variables (Bro & Smilde,
2014). The configuration of the various combinations is done to ensure that the new variables are
not correlated. The goal is to try and fit as much information on the initial variables into the new
principal components. So, a ten-dimensional data set should result in ten principal components,
with a descending amount of the original data sets information when going from the first principal
component to the other, i.e., the first will have as much information as possible, the 2nd will
contain as much of the remaining information as possible, etc. The way this is applied to stock
price prediction is the stock price time-series data along with all technical and fundamental
indicators go through PCA to create solid principal components that capture all the noise of the
original data so that they should theoretically be good predictors of the stock price (Zahedi &
Rounaghi, 2015).
Recently, Guo et al. (2015) utilized a two-dimensional PCA to reduce the dimensionality of a data
set to be utilized by a neural network for stock price forecasting. The results showed that the
proposed two-dimensional PCA reduced data dimensionality more effectively than traditional
PCA. This is similar to Wang and Wang (2015), who utilized PCA to extract the principal
components from their data set and then utilized an ANN variation for stock prediction. This
application of PCA to be used with ANN has proven to be relatively effective and maybe the
20
2.1.8. Candlestick Analysis/K-Line
Candlestick analysis, or as it is known in Asia K line patterns, is a stock prediction method relying
on the recognition of patterns in stock price time series and then determining indicators for when
this pattern will reoccur (Tsai & Quan, 2014). This method is widely utilized throughout the
industry as a basic analysis due to the less technical/numerical nature of the method. Candlestick
analysis is utilized across all levels of traders from hedge fund managers to individual day traders.
This simplicity may be the reason for the lack of attention to this method from academics.
Recently, Hao, Hao, Shen, and Tao (2017) set out to explicitly find “whether K-line patterns have
predictive power in academia”. They utilized various pattern recognition methods to test the
effectiveness of K-line analysis. The study concluded that the first issue with acceptance into
academia is that there are not concrete and formal definitions of k-line patterns. This lack of formal
definition is a significant barrier to K-line analysis’s wider acceptance and whether additional
A Hidden Markov model (HMM) is a finite state machine used to analyze a system that is assumed
to be a Markov chain with unobservable states. In a Markov chain, the next state in the process is
solely dependent on the immediately previous state and not on a sequence of states. HMMs have
proven to be powerful frameworks that can be utilized to analyze time series involving multivariate
data points. HMMs calculate the joint probability of a set of states that are hidden based on the
analysis of a set of states that have been observed. Initially, in the 1970s, HMMs were utilized to
improve speech recognition, where they have proven to be immensely successful. Currently,
HMMs are applied to a wide variety of problems ranging from speech recognition to tumor
recognition. However, the application towards stock price has not been one that has garnered as
21
much attention relative to other stock price prediction methods. Nguyen (2018) is a recent study
that further builds on the work of Nguyen (2014) and Nguyen and Nguyen (2015). Nguyen used
technical indicators as inputs into an HMM for stock price prediction. The study determined
through testing that the five-state model performs best. The five-state HMM also outperformed a
basic historical average return model. However, this study did not compare results with any other
machine learning stock price-prediction models. Due to the nature of HMM, model design
decisions regarding the number of states can alter the performance significantly. Therefore, a
potential future direction of HMM stock price research may be to standardize specific HMM
architectures.
Similarly, another statistical-based model that also utilizes Markovian logic is the Geometric
Brownian Motion Model (GBM). GBM is modeled after Brownian motion (the random movement
Markov process. According to the geometric Brownian motion model, the future price of financial
probability distribution assumption is the main assumption in the Black-Scholes model. Black-
Scholes is used to determine the fair price of a call or put option. Recently, the popularity of GBM
has stagnated when compared to other stock price prediction methods. Based on the literature
review most GBM research shows that while GBM is fairly accurate it does not present any
significant advantages over the more popular stock price prediction methods. Two examples of
this are Agustini, Affianti, Farida, Putri, (2018), and Parungrojrat, Kidsom (2019).
22
2.1.11. Support Vector Machines
A very promising model type in the world of machine learning is Support Vector Machines (SVM).
SVM work by performing a relatively intuitive and straight forward task. The main task is to
separate data into distinct classes via a decision boundary, and then maximize the margin. The
term margin when used in the context of support vector machines refers to the perpendicular
distance between the decision boundary and the data points closest to this line. What makes SVM
so effective is that they classify based on extreme examples. Such as a cat that looks like a dog.
These extreme examples would then become the support vector which would be the points from
which to maximize the distance from the decision boundary. This maximization is simply a
quadratic optimization problem. This is simple and straight forward when the data is linearly
separable.
As mentioned before, stock price data is extremely nonlinear. The way support vector machines
combat this is by projecting the data points into a higher dimensional space by performing a
function on the data to make the classes linearly separable. The downside to this is that it is
extremely computationally expensive. This is why researchers sometimes utilize a kernel function
trick which are optimized algorithms that are efficient at mapping data into a higher dimensional
space (Bisoi, Dash, & Parida, 2019). Then the same process of minimization takes place. The
selection of which kernel to utilize is a critical design decision for SVMs. There are multiple
examples of SVM being utilized for stock price prediction (Fenghua, Jihong, & Xu, Zhifang, 2014;
Lahmiri, 2016). Interestingly, there is also a good amount of research that has found success when
incorporating textual data into their SVMs for stock prediction (Hagenau, Leibmann, & Neumann,
2013; Chen, Deng, Wang, & Xie, 2014; Nguyen, Shirai, & Deng, 2015).
23
2.1.12. Support Vector Regression
Similar to SVM, Support Vector Regression utilizes the same components and techniques, but
instead of classifying, the mathematical operations are tasked with regressing the data points. Li,
Huan, Deng, and Zhu (2014) utilized a multiple kernel SVR to test whether or not the inclusion of
news articles alongside technical indicators improves the predictive power of the SVR model. The
reasoning the researchers presented for utilizing multiple kernels in their study was to have one
kernel dedicated to increasing the dimensionality of news sentiment data and the other kernel
dedicated to increasing the dimensionality of the Hong Kong stock exchange tick data. The results
showed that the Multiple Kernal-SVR outperformed a normal SVR as well as a Naïve Bayes based
model. What was also noteworthy is that the stock price prediction was improved when
incorporating news data. However, a major takeaway is that the experiment concluded that the
incorporation of a more significant number of news sources did not improve the model. Instead,
the focus should be on the selection of an effective approach that can appropriately incorporate
One study utilized a model in which the stock price predictor aspect was an SVR based proprietary
model developed by a private company. Schumaker, Zhang, Huang, (2012) utilized the AZfintext
qualitative stock price predictor, which employs regressed stock quotes and financial news article
data as inputs into an SVR algorithm for stock price prediction. The main selling point of AZfintext
is the focus on 20 minutes from publication incorporation of financial news data. The purpose of
the research was to test if incorporating sentiment-based analysis (authors tone) into the AZfintext
system would improve stock direction prediction accuracy. The results showed that the
incorporation of sentiment analysis into the AZfintext system did not improve overall prediction
accuracy. A limitation of this research’s application to the broader body of stock price prediction
24
is AZfintext’s focus on a 20-minute time limit to analyze and incorporate news articles into its
model.
Artificial Neural Network’s (ANN) experienced an increase in popularity similar to the one
traditional statistical method went through in the 20th century. Both booms were due to improved
computational power. The structure of ANNs is modeled after the structure of the neurons found
in the brain. The defining characteristic of neurons which has led to the creation of the most popular
machine learning model, is that the neurons in the brain are all interconnected. Specifically, each
neuron can be “triggered” by other neurons, while also being able to trigger additional neurons
(Gurney, 2014). This concept of triggering has led to the creation of what we now know as ANN’s.
The structure of modern-day ANNs consists of 3 distinct types of layers: the input, the output, and
a hidden layer. The input layer sends data to the hidden layer, which sends data to the output layer,
which then will give the desired output. Each layer is composed of individual neurons, and each
layer is connected. Meaning that the output of one layer will become the input of the next layer.
Each connection has a weight assigned to it. This weight determines the influence the output of
one neuron has as the input to the neuron in front of it. The term “learning” when it comes to ANNs
means the calculated adjustments that are applied to the weights of these connections (Nielsen,
2015). While this is a higher-level overview of ANN architecture, a more detailed description and
explanation of the technical and mathematical operations occurring in a neural network will be
provided in chapter 4.
The first ANN was developed by McCullough & Pitts in 1944. However, the perception of ANN
was not at all what it is today. It was not until the 1990s when ANNs started to be able to solve
25
relatively complex problems such as recognition of numbers written by hand. However, this was
still not the apex moment for ANN research as other early machine learning methods were still
outperforming ANNs. It was not until the early 2000s when there was a massive increase in the
availability of data which coincidently was the same time that the full potential of parallel
computing on graphic processing units (GPUs) was realized. This led to an increase in the
utilization of ANNs. The defining moment which can be credited as the cause of the modern-day
fascination with ANNs was the creation of AlexNet. AlexNet was the brainchild of Geoffrey
Hinton, Alex Krizhevsky, and Ilya Sutskever, who won ImageNet 2012, an annual image
classification competition, utilizing their ANN. The three Canadian researchers successfully
demonstrated that a deep convolutional neural network with 60 million parameters, 650,000
neurons, and 5 convolutional layers could successfully classify 1.2 million images into 1000
different categories with an error rate of only 16.4%. The training for this extremely deep neural
network was done using parallel processing on GPUs. This record-breaking event launched ANN
into the public eye, and now ANNs are considered the face of machine learning and artificial
intelligence.
The first few notable applications of ANN towards stock market prediction were (Ahmadi, 1990;
Kamijo & Tanigawa, 1990; Kimoto, Asakawa, Yoda, & Takeoka, 1990; Trippi & Desieno 1992;
Choi, Lee, & Rhee, 1995;). More recently, based on the review of literature, it is evident that ANNs
are the most popular and widely utilized method for modern-day stock price prediction research.
The reason for this is that they have proven themselves as a practical and powerful stock price
Honchar, 2016; Kraus, Feuerriegal, 2017; Chi, Wang, Zhang, Sun, Xu, 2018; Hiransha, Vijay,
Krishna, 2018) have taken a standard model testing approach to the various ANN architectures
26
possible. It is important to note that based on this literature review and the larger body of machine
learning research, no one neural network structure has proven to be the ideal, or even the
Sun, Shen, and Cheng (2014) utilized data on the movement of stocks to analyze whether or not
trading behavior can be mapped and used to predict stock prices. For each individual stock, the
stock trading activities were analyzed, and a network was mapped. Then, the trading relationships
were classified and grouped into appropriate categories. The researchers then utilized a granger
causality analysis to prove that stock prices were indeed correlated with the different trading
categories. To test the trading predictability power, a simple 3-layer feed-forward ANN was
utilized. The ANN incorporated technical indicators as well as the trading indicators. The results
show that ANN performs very well. This result can be considered relatively intuitive because it is
a well-known fact that traders get influenced by the activities of other traders.
Geva and Zahavi (2014) tested whether market data, simple news item counts, business events,
and sentiment scores, could improve various machine learning stock price prediction algorithms.
The models tested were ANNs, Decision Trees, and a basic regression. The results showed that
among the algorithms tested, only the ANN was able to fully exploit the more intricate nature of
the proposed sentiment/news inputs. The other models could not take advantage of these inputs by
realizing the very intricate relationship between price and the sentiment/news indicators.
Furthermore, another study incorporating sentiment data from news and microblogs is the work of
Zhuge, Xu, and Zhang (2017). The researchers utilized shanghai Composite Index data as well as
what the paper refers to as emotional data. Emotional data, in this case, is sentiment analysis from
news and microblogs discussing the specific company being analyzed. The researchers found that
27
15 input variables made up of sentiment indicators as well as technical indicators can successfully
Chaima, Raoudha, Fethi, (2018) proposed a simple neural network to test whether the inclusion of
accounting variables generated from the release of accounting disclosures improved the prediction
accuracy of the ANNs. In addition, whether or not major events in the country impacted the amount
the accounting variables improved the ANN, was also being tested. The market from which the
stocks were selected was the Tunisian Stock Exchange. The results of the study showed that by
combining 48,204 daily stock closing prices of 39 companies with the respective accounting
disclosure variables, the ANNs' quality of prediction improved. However, this level of
improvement drastically dropped when the ANN was predicting prices in 2011, a time of civil
unrest in Tunisia. This extreme example is noteworthy since a variable observed as consistently
improving the model accuracy lost its impact when emotional events occurred. It further proves a
point repeatedly brought up in recent stock price prediction research, that events and news can
Lastly, Vantstone, Gepp, and Harris (2019) aimed to see if the prediction of the price of 20
Australian stocks by a Neural Network Autoregressive (NNAR) model could be improved via the
inclusion of inputs in the form of counts of news articles as well as counts of Twitter posts. The
sentiment based indicators utilized in the study were generated by Bloomberg. Sentiment based
indicators such as these are becoming increasingly available, and due to the overall improvement
in data mining techniques, these indicators should theoretically be more reliable than ever. The
study found that the NNAR that incorporated the Bloomberg generated news and Twitter-based
sentiment indicators had a higher quality of stock price predictions. A major takeaway from this
research is that due to the indicators being created by Bloomberg, no text/data mining models had
28
to be utilized by the researchers. Due to being readily available from Bloomberg, the resulting in
the ease of incorporation of these news and Twitter indicators into ANNs is equal to that of any
Consdering the related literature discussed in this section, the contributions of this work are
summarized as follows:
• Based on the review of literature, this proposed research will be the first study to utilize
Bloomberg Twitter and News Publication Count variables as inputs for stock price
• In the proposed research, we utilize Twitter and News Publication count variables as inputs
into MLP and LSTM neural networks to test the impact of the variables on different neural
network types while also comparing both models’ stock price prediction performance.
29
Chapter 3
Problem Definition
Because of the volatile and chaotic nature of the stock market, accurate price prediction is a
challenging task. Improving stock price prediction will increase returns and reduce the risk for
investors both at the individual and institutional levels. Incremental improvements in stock price
prediction that can be utilized by active investors will result in more informed investors generating
greater returns. Being able to predict movement in the stock market does not only mean more profit
for investors, but the health of financial institutions directly impacts the health of the overall
economy. In North America alone, the financial sector had US $37.4 trillion in assets under
management (Szmigiera, 2019). Empirical evidence also suggests that non-bank financial
institutions such as hedge funds and investment firms introduce an excessive level of risk into the
financial sector negatively impacting the general economy (Liang & Reichert, 2012). Therefore,
any improvement in stock price prediction that can mitigate risk should positively impact the
Based on the review of literature, a promising new direction being taken by stock price prediction
researchers is that of including twitter and news publication analysis into stock prediction models.
This increase in having twitter and news publication analysis into stock prediction research can be
attributed to the rise in the use of twitter and social media as a means of opinion sharing.
Direct (2019), and Investopedia (2019) have recently discussed the impact social media and news
30
Regardless of technical analysis skills, Twitter and news publication analysis on its own is an
enormous task to undertake requiring specialized knowledge to scrape the web and develop
meaningful insights. Due to the difficulty of the task, a more practical approach to take advantage
of Twitter and news publication analysis that can be utilized widely by the stock price prediction
researchers is that of including pre-generated Twitter and news indicators into their technical
However, even sentiment analysis-based indicators require a plethora of decisions impacting if the
sentiment is determined to be good or bad or neutral. The subjective nature of sentiment analysis
strays very far from the nature of other technical analysis variables. Therefore, Twitter and News
Publication Count should be looked at instead. The reason for this is that Twitter and news count
are similar to all other technical indicators in that they should be the same for the specific day and
stock whether the company providing the variable is Yahoo or Bloomberg or Google. Due to these
variables being objective and universal truths, they are better candidates for inclusion in technical
Another area of interest that deserves greater attention in stock price prediction research is how
well these models perform when there is widespread panic influencing the economy. The
downturns is due to the mixture of two issues. These issues being the previously mentioned rise in
dependency on algorithms within the stock trading market and the macroeconomic implications of
stock trading. The combination of financial institutions’ influence on the economy, and reliance
on algorithms, can result in a slippery slope if high-frequency buy and sell decisions are being
made on predicted prices and the predictive power of the model dramatically decreases when the
31
The reason why investigating the accuracy of price prediction models during times of panic may
result in useful insights is that stock prices drastically increase in volatility during periods of
recession (Asteriou, Pilbeam, Sarantidis, 2019). This increased volatility will result in greater
difficulty in predicting prices accurately, potentially rendering the models useless. Interestingly,
contrary to the positive impact of news related data on stock price prediction methods compared
to exclusively technical data exhibited in the literature review. Trade volume has been empirically
observed to be a better predictor of panic-induced volatility than other traditional inputs utilized
in market crash prediction research (Erdogan, Bennett, Ozyildirim, 2014; ). On the other hand, it
has been shown that investors react differently to the news during times of panic (Kleinnijenhuis
et al., 2013; Angelovska, 2017). This may mean that the powerful robust sentiment/news-based
models that predict stock prices better than their technical only counterparts might not exhibit this
greater accuracy when panic strikes the market and investors’ reactions to news deviate from the
norm.
Based on these factors, this thesis will be the first to utilize neural networks as the base model for
stock price prediction to examine whether or not Bloomberg generated Twitter and News
publication count indicators effortlessly improve neural network stock price prediction within the
North American context enough to warrant advising future academic research to include
Bloomberg twitter and news indicators in all technical analysis based research. Furthermore, this
research will also be testing the change in predictive power of the proposed neural networks for
stock price prediction with and without Twitter and News Count indicators during times where the
stock market is greatly influenced by public panic compared to more stable economic periods.
32
Chapter 4
Based on the review of literature, it can be observed that neural networks are the primary technique
utilized in stock price prediction research currently and for the foreseeable future. This increase in
neural network utilization is due to two factors. First and foremost, neural networks have proven
to be extremely robust models with both the statistical strength and computational efficiency to
handle extremely non-linear chaotic data effectively. This is evidenced by the drastic rise in the
utilization of neural networks for the handling of very complex problems, as discussed in the
previous section. This popularity not only means more researchers and practitioners are utilizing
neural networks. But also, that neural networks themselves are continually being worked on and
The second reason for the increased utilization of neural networks is that compared to most
machine learning methods, the barrier to entry for neural networks is decreasing daily. The first
increase in neural network accessibility can be credited to the rise in popularity of simpler easy to
learn programming languages such as Python and R. Due to the colloquial nature of these
languages, it is now more common than ever that non-computer scientists know how to code. And
if one knows how to code in Python, then there are a plethora of packages that make the
construction of custom neural networks relatively simpler and, in some cases, even modular.
Because of this ease of use and the nature of stock price prediction being a topic of interest from
fund managers to individuals investing even in one stock, neural networks are the model which
can be defined as having the greatest ease of use to prediction power ratio. This effect is further
compounded due to companies such as Microsoft and Google working on tools that will make
33
neural network creation a matter of making design decisions and clicking. Microsoft is actively
promoting and continuously working on Azure machine learning studio, as well as ML.net, both
tools meant to make neural network formation easy as creating a PowerPoint presentation. Google,
on the other hand, has been utilizing machine learning and neural networks across all their
products. Due to this, Google has also created TensorFlow, a suite of tools and frameworks meant
to simplify, visualize, and optimize specifically neural networks. This industry focus and funding
will ensure that neural network creation will continuously improve in terms of effectiveness and
ease, which will result in a continued rise in popularity. Therefore, it is a sound decision for current
and future stock price prediction research to utilize neural networks as the primary prediction
technique.
Due to these factors, this research will be utilizing neural networks as the base model for stock
price prediction to test whether or not Bloomberg generated Twitter and News publication count
indicators improve neural network stock price prediction within the North American context
enough to warrant advising future academic research to include Bloomberg sentiment indicators
in all technical analysis based research. In addition, we will also be examining the change in the
predictive power of proposed neural networks for stock price prediction with and without Twitter
and News count indicators during times where the stock market is greatly influenced by public
Table 2 presents an overview of the types of neural networks utilized in technical variable-based
stock price prediction research in recent years. The two most prominent neural network types
found in Table 1 are the Multiple Layer Perceptron (MLP) and Long Short-Term Memory (LSTM)
Networks. Only three papers out of 17 did not utilize either an MLP or LSTM as one of the types
34
of stock price prediction neural networks. Therefore, for this research, the two commonly used
Table 2. Types of Neural Networks found in recent stock price prediction research
An MLP is a type of feed-forward neural network which utilizes multiple layers of perceptrons to
make sense of non-linear data. A single perceptron is an algorithm that is used to perform
classification tasks on linearly separable data effectively. The way this is done is via a single-layer
perceptron network, which can be defined as an input fed into a perceptron that utilizes the delta
between the desired output and calculated output to adjust the weights within the perceptron. A
35
single perceptron network is considered the most basic type of neural network. A simple
Where Φ is an activation function that essentially squeezes any value in between a specific range.
The most commonly utilized activation functions are sigmoid and tanh. The sigmoid activation
function takes any values and plots them on a relative scale of 0 to 1. The tanh activation function
works similarly, but the range of output values is -1 to 1. 𝑋𝑘 (𝑡) is the inputs being fed into the
perceptron. 𝑤 and 𝑏 are the weights and bias, respectively. K represents which value within the
vector the weight is assigned to (i.e., each individual value that comprises the vector has a specific
weight associated with it). This algorithm is then repeated while the weights are adjusted based on
MLP’s take advantage of the linear classification power of individual perceptrons by stacking the
perceptrons on top of one another, creating a layer of multiple perceptrons. Each perceptron can
now be referred to as a hidden node (a neuron not part of either the input or output layer). For time
series analysis, it is common for each input to be fed into each perceptron. By increasing the
number of perceptrons, the MLP can make sense of chaotic and nonlinear data. By doing this, an
MLP is able to learn an XOR function, a feat that is impossible for a single perceptron network. A
simple mathematical representation of the input variable processing happening within an MLP is:
ℎ
𝑌𝑛 (𝑡) = Φ (∑ 𝑤𝑛𝑜 ⋅ Φ (∑ 𝑤𝑘,𝑛 ⋅ 𝑋𝑘 (𝑡) + 𝑏𝑛ℎ ) + 𝑏𝑛𝑜 ) (2)
𝑘
‘o’ represents values associated with the output neurons; ‘h’ represents values associated with the
hidden neurons. The above example illustrates a single pass-through of data where an initial input
36
will generate a single output. The learning aspect occurring in all of the machine learning can be
simply represented in the form of a classic optimization problem. Below is a representation of how
an MLP will learn patterns in nonlinear data via the minimization of error between actual and
predicted values.
ℎ
Minimize 𝔼𝑡 ‖𝑌(𝑡) − Φ(∑ 𝑤𝑛𝑜 ⋅ Φ(∑𝑘 𝑤𝑘,𝑛 ⋅ 𝑋𝑘 (𝑡) + 𝑏𝑛ℎ ) + 𝑏𝑛𝑜 )‖2
ℎ
(3)
Variable (𝑤𝑘,𝑛 , 𝑏𝑛ℎ , 𝑤𝑛𝑜 , 𝑏𝑛𝑜 )1≤𝑛≤𝑁,1≤𝑘≤𝐾
MLPs are proven to be effective neural networks when it comes to time series data. This
effectiveness is evidenced in the widespread usage of MLPs within research as can be seen in
Table 1 as well. This fact along with the ease of implementation makes MLPs the ideal candidate
As a counterbalance to the straightforward approach of MLP’s, the second neural network type
which we will be testing is the LSTM. The major reason for selecting LSTMs is because apart
from MLPs, LSTMs are the next most widely deployed neural networks in stock price prediction
research. LSTMs are a type of recurrent neural network. It means that LSTMs, unlike their feed-
forward counterparts, utilize the previous output as an input into the next timestamp allowing the
model to factor in time intervals. Whereas a feed-forward neural network such as an MLP treats
the 1st inputs and the 1000th inputs as the same, recurrent networks treat the data as a sequential
set. This makes LSTM prime candidates for time series prediction. Even though the basic linear
transformations occurring within an LSTM are relatively similar to that of an MLP, the major
defining factor which has resulted in LSTM’s widespread usage is the addition of “gates” and
“states”. This addition drastically changes the nature of the neural network. LSTMs create internal
memory by utilizing operation gates that tweak the internal state variables.
37
LSTM’s can be thought of as a manufacturing factory. The input is the raw materials, the desired
output is the final product. To become a final product, the raw materials need to be processed on
an assembly line. Based on this analogy, gates can be thought of as the point at which a choice is
made regarding if something needs to be added, taken away, or modified before the raw materials
can move on to the next stage. The hidden and cell states can be thought of as explicit instructions
stating what will move on to the next stage in the assembly line. The three gate types are input(i),
forget(f), and output(o). The two-state types are hidden(h) and cell(c). A simplified mathematical
representation of the gates and states and how they relate to each other can be found below.
(4)
(5)
(6)
(7)
(8)
All three gates utilize the previous hidden state and the current input data, then multiply them by
the weight of the specific gate, and add a gate specific bias. After that is completed, the value is
passed through a sigmoid function. Because of the sigmoid function, the input, forget, and output
gates will return a value between 0 and 1. The input gate determines how much information from
the previous hidden state and current input needs to be added into the cell state. The forget gate
decides what percentage of the last state cell needs to be overlooked. The output gate is utilized in
the calculation of the current neurons' hidden state. A visual representation of the inner workings
of an LSTM neuron can be found in Figure 1. Due to this utilization of selective memory LSTM’s
have been proven to be effective at making sense of nonlinear time series data. Based on this fact
38
and the previously discussed strengths of MLP’s, we will be utilizing both LSTMs and MLPs for
our analysis.
39
Chapter 5
Designing Method
There are a few critical design decisions that have to be taken into account for all neural networks.
These include the number of hidden layers, the number of hidden nodes, the number of training
epochs, as well as which cost function and which optimizer utilizes. It is important to note that in
the larger body of research that employs neural networks as the primary algorithm, there is a lack
of standardization regarding the design. Furthermore, the vast majority of research focused on
neural network design is more focused on hyperparameter optimization via algorithms rather than
establishing empirical standards that researchers can utilize as proven baselines to effectively
compare and further develop different neural networks (Yang & Shami, 2020). In this section, we
will go over how each neural network design decision was made.
Hidden layers refer to any layer that is not either the input or output layer. It has been proven that
a neural network with a single hidden layer is capable enough to approximate any univariate
function (Guliyev & Ismailov, 2016). However, the nature of stock price prediction is multivariate.
When it comes to multivariate problems, there is a general agreement in the machine learning
community that the number of hidden layers rarely needs to be over two. This widespread
agreement is backed up by the fact that two hidden layers in a simple feed-forward network have
been able to successfully approximate a multivariate function (Stathakis, 2009; Thomas, Petridis,
Walters, Gheytassi, & Morgan, 2017). Due to these factors, both the LSTM and MLP will have
The cost function in terms of neural networks is the error measure that will be calculated for every
actual and predicted output. The cost function is what the optimization algorithm will be using a
40
measure to assess how the neural network is performing. The cost function that will be minimized
by both networks will be Mean Squared Error (MSE). The reason for this is due to the proven
effectiveness of MSE as a cost function. MSE has been used as the cost function for simple
problems. However, MSE is proven very effective when tackling complex problems such as
utilizing neural networks to enhance the quality of speech audio when there is a large amount of
background noise (Saleem & Khattak, 2020). Also, it has been shown that MSE as a cost function
can optimize neural networks successfully, even when dealing with extensive magnitude training
data (Zhang, Shen, Zhou, & Xu, 2019). Due to the proven capability of MSE as a cost function for
neural networks, both the LSTM and MLP will be minimizing MSE as their objective function.
The term “learning” in machine learning refers to the optimization of the network via adjustments
to the weights and biases based on how much the cost function is reducing and how close it is to
minimization. The optimizer that will be utilized by both networks will be Adam. The name Adam
is derived from Adaptive Moment Estimation. Adam is simple to implement a stochastic gradient-
based optimization algorithm that is also computationally efficient. It was initially introduced by
Kingma and Ba (2014) and has been tried and tested by researchers at OpenAI and Google
Deepmind. Adam has been shown to be successful at handling non-stationary data, while also
being able to handle both sparse and noisy gradients. In addition, the specific combination of
LSTM and Adam has been demonstrated to be an effective one (Jiang & Chen, 2018; Chang,
Zhang, & Chen, 2019). Due to these reasons, Adam can be considered a logical choice as the
optimization algorithm neural network-based time series research for the foreseeable future.
Hidden nodes refer to any neuron not found in either the input or output layer. The number of
hidden nodes is a decision that is specific to the type of network, the nature of the data, and the
problem trying to be solved. Due to this, iterative testing needs to be performed to determine the
41
optimal number of hidden nodes in the hidden layers. However, unlike epoch iterative testing,
which will be discussed next, there is a generally agreed-upon formula that can be used to create
a range for the iterative testing. The formula can be defined below.
𝑁𝑠
𝑁ℎ = (9)
(𝛼 ∗ (𝑁𝑖 + 𝑁𝑜 ))
Where:
Due to the number of hidden nodes being dependent on the number of input/output neurons and
the number of samples, we can utilize the above formula to determine the range of how many
potential hidden neurons both the LSTM and MLP should have. With the number of samples in
the training data being 920, the input layer being six neurons, and the output layer being one
neuron. The range (utilizing alphas from 2 to 10) the number of hidden neurons for both models
should fall between 13 – 65. Based on this range, iterative testing was conducted for both types of
neural networks. Model MSE loss during the training phase was utilized as the measure of
performance. Based on the results of the iterative testing, it is evident that for the specific purposes
of our research, 13 hidden neurons are deemed proper for the MLP. And for the LSTM, 60 hidden
neurons are deemed proper. Another design decision is the number of epochs. An epoch is one
pass-through of all sample data. Meaning that if sample data contains ten samples, one epoch
would be the neural network parsing through all ten samples one time. It is widely agreed that the
number of epochs should be determined on a case by case basis. Therefore, we have performed
iterative testing to see where the loss function stops decreasing, and no more “learning” is
42
occurring. Based on the results of the iterative testing, the optimal number of epochs for the LSTM
was determined to be 650. While the optimal number of epochs for the MLP was determined to be
4000. The reason for this large discrepancy could potentially be a result of the more simplistic
Based on the review of stock price prediction literature, a wide spectrum of reasons are provided
as to why the stocks utilized were selected. In most cases, the reasoning behind the selection of
specific stocks is not a point of detailed discussion. In this research, the stocks that will be analyzed
are Facebook, Apple, Netflix, Google, Ford, Tesla, Walmart, and Amazon. The primary goal of
this study is to investigate the impact that Twitter and News count variables have on stock price
prediction, specifically within the North American context. Due to this focus on North America,
the group of stocks referred to as ‘FAANG’ was chosen. FAANG is an acronym for Facebook Inc.
(FB), Apple Inc. (AAPL), Amazon.com Inc (AMZN), Netflix Inc. (NFLX), and Alphabet Inc.
(GOOG). The term FAANG was first used by the former Goldman Sachs fund manager Jim
Cramer (Skehin, Crane, & Bezbradica, 2018). The reason for this specific grouping is that these
five publicly traded companies listed on the NASDAQ are very significant to the North American
investors and the stock market as a whole. This is evidenced by the fact that FAANG stocks had a
combined market capitalization of $4.1 Trillion, making FAANG stocks 16.64% of the total S&P
500 market capitalization as of July 2020. The S&P 500 is an index comprised of 500 companies,
and historically the index has had a market capitalization valued at 70% to 80% of the total U.S
stock market. The percentage of FAANG stocks contribute to the S&P 500’s total market
capitalization is testimony to the broader significance of FAANG stocks to the overall North
43
American stock market. The movement of FAANG stock influences North American investors'
The reason for selecting Walmart is due to the nature of the Bloomberg Twitter and News count
variables. As previously mentioned, the two variables of interest are count variables that measure
the number of occurrences a specific company is mentioned on either Twitter or a digital news
publication. Due to digital and social media both being relatively recent in terms of mass popularity
and more popular with a younger demographic, there may be a difference in the added utility of
Twitter and News count variables for companies with traditional business models compared to a
company with non-traditional business models. In order to test this Walmart was selected to be
compared to Amazon. The purpose of this comparison is to see if there is any difference in the
way twitter and news count variables are formed for a traditional brick and mortar retailer as
opposed to a newer retailer with virtually no brick and mortar presence, such as Amazon. Similarly,
the reason for selecting Ford and Tesla is the comparison between a traditional automobile
manufacturer and a completely electric vehicle manufacturer. Ford operates on the traditional car
franchise dealership model (Crane, 2015) while Tesla showrooms are often found in malls, and all
orders are placed online (Johnson & Reed, 2019). It will be interesting to see if this contrast
between traditional and non-traditional business models has any impact on the Twitter and News
count variables themselves as well the utility of these variables as inputs in stock price prediction
models.
Table 3 summarizes the technical indicator selection for neural network-based stock price
prediction within the past few years. The six most popular inputs are “Open price”, “High price”,
“Low price”, “Close price”, “Moving average price”, and “Trade Volume”. Out of these, the four
44
most popular are “Open”, “High”, “Low” and “Close”. Therefore, Moving Average (price
averaged over a specified amount of periods), and Trade Volume (the number of trades performed
in a day) will only be utilized in half the models. In contrast, the other half use the four most
popular inputs, as well as Bloomberg generated Twitter and News Count data. There are multiple
reasons for replacing the two least used variables. The first being that this provides a more direct
comparison of the power of these variables on their own to improve stock price prediction. Since
the number of variables being utilized does not change, it cannot be said that any increase in
performance is a result of an increase in the amount of data fed into the models. Furthermore
specific to neural networks, the design decisions related to the number of neurons change as a
result of a change in the number of input variables. If a model utilizes 8 variables as its input the
range of appropriate number of neurons also changes. Therefore, for the purposes of empirically
sound comparison, we will be keeping the neural network’s design consistent regardless of what
variable set is being used. All the variables are collected from Bloomberg in a comma-separated
value format. The range of all data will be from January 2015 to May 2020. The definitions of
45
Table 4. Variable Definitions
Variable Name Definition
Open Price Dollar value of the first trade since market open
High Price Highest dollar value trade of the day
Low Price Lowest dollar value trade of the day
Close Price Dollar value of the last trade before market close
30 Day Moving
Average Average dollar value of one share in the previous 30 days
Trade Volume Total quantity of shares traded all-day
Represents the total number of tweets mentioning the parent company over a 24-hour
Twitter Count period. The sources for this are Twitter and Stocktwits
News
Publication Represents the total number of news publications mentioning the parent company over a
Count 24-hour period. The sources for this are stated as all besides Twitter and Stocktwits
Once daily stock data has been collected from Bloomberg, the variables are then split into a
technical only set (T) and a technical plus Twitter & news set (T+). A log transformation will be
performed on the Twitter Count & News Publication Count variables due to the variables
exhibiting skewness. After the log transformation, all Twitter and News publication count
performed on all remaining variables. The variables will then be further split into three sets, a train
set, a normal test set, and a panic test set. The training set is made of all daily variables from
January 2015 to 2019. The normal test set is defined as all daily variables from January 2019 to
November 2019. The panic test set contains all daily variables from January 2019 to May 2020.
Four price prediction models have been built for each stock. Each model will train and run the
neural network five times which will then average their predictions to get the final model
prediction. This is due to the stochastic nature of neural networks, which results in slight variances
in performance. The first two models will utilize LSTM’s with one model being fed Open price,
High price, Low price, Close price, Trade Volume, & 30 Day Moving Average as its inputs. The
second model will utilize LSTM’s with Open price, High price, Low price, Close price, Twitter
46
Count, & News Count as its inputs. All models will have the next day’s Close price as their output.
The third and fourth models will be comprised of MLP’s with the two various sets of inputs. All
models will be tested on both the normal test set and the panic test set. RMSE will be the accuracy
measure utilized to evaluate model prediction results. The lower the RMSE, the better the stock
price prediction will be considered. The reasoning for selecting RMSE over other error measures
is due to the fact that RMSE is expressed in terms of the original unit being measured, making
RMSE useful for error gap analysis between the expected and predicted values (Kumar, Kumar,
& Kumar, 2020). Additionally, RMSE gives higher weight to larger errors relative to other error
measures and is considered best suited for domains where substantial errors in accuracy are
especially unwanted (Joseph, Obini, Sulaiman, & Loko, 2020). This incorporation of magnitude
of error into RMSE makes it useful in stock price prediction research due to the practical
implication of larger errors in prediction potentially resulting in a greater loss when making buy
or sell decisions. This is why recent stock price prediction research utilizes RMSE as the error
measure for comparative analysis of different companies' stock price prediction (Thakkar &
Chaudhari, 2020).
47
Chapter 6
The final experiment results can be found in Appendix A. RMSE is calculated for each individual
model & test set configuration. Each model’s ability to predict stock prices is evaluated on how
low the RMSE is. The primary focus of this research is whether Twitter and news variables can
improve stock price prediction. T+ represents a model that includes technical variables along with
Twitter and news variables, while T represents a model that only consists of the technical variables.
subtracting the former’s RMSE from the latter, and then dividing the difference by the T model
Furthermore, when analyzing the impact of input data type, test data type, and neural network
selection, a focus will be placed on looking at the average statistics of the top and bottom 50% of
performers as a group rather than focusing on individual performances. Focusing on the average
relative difference between T+ input data and T input data for the same test set as a group should
help in developing insights that are independent of decisions (e.g., individual stock selection),
which may impact stock price prediction performance. The reason why these insights could
potentially be of greater value is due to the possibility of increased practical replicability and
utility. Therefore, when analyzing the impact of T+ variables, we will be doing so based on the
Table 5 presents an overview of where utilizing T+ variables rather than T variables did and did
not result in an improvement in RMSE. It is interesting to note that out of the seven scenarios
48
which did not benefit from T+ variables, 4 involve Walmart. Walmart is an extreme outlier due to
the prediction of Walmart stock closing price, not improving as a result of T+ variables while all
other stocks noticed some improvement. Regarding the rest of the three scenarios in which T+ did
not result in an improved prediction, Tesla stock predicted via LSTM did not improve for both test
sets, which is not the case for Tesla’s MLP counterpart. Finally, the last configuration which did
not result in improvement is Facebook utilizing MLP for the panic test set, making this
configuration the only one out of four for Facebook, which did not benefit from the addition of T+
variables.
NN %RMSE
Type Company Test Set T RMSE T+ RMSE RMSE +/- +/-
Amazon PANIC 0.062463331 0.060182 0.002281367 4%
Apple PANIC 0.128599041 0.1271058 0.001493242 1%
Facebook PANIC 0.076986501 0.0739217 0.003064785 4%
Ford PANIC 0.136406012 0.075534 0.060872043 45%
Google PANIC 0.062709786 0.0493791 0.013330685 21%
Netflix PANIC 0.11226308 0.0921643 0.020098763 18%
Tesla PANIC 0.176320498 0.1875775 -0.01125698 -6%
Walmart PANIC 0.058447047 0.0727827 -0.01433569 -25%
LSTM
Amazon NORMAL 0.046723821 0.0421004 0.004623444 10%
Apple NORMAL 0.058480253 0.0529901 0.005490104 9%
Facebook NORMAL 0.044041893 0.0366755 0.007366368 17%
Ford NORMAL 0.040241105 0.0300688 0.010172315 25%
Google NORMAL 0.034874718 0.0287432 0.006131553 18%
Netflix NORMAL 0.042107441 0.0370617 0.00504572 12%
Tesla NORMAL 0.053439369 0.1009144 -0.04747507 -89%
Walmart NORMAL 0.040209573 0.0425148 -0.0023052 -6%
Amazon PANIC 0.01949539 0.0192764 0.000218955 1%
Apple PANIC 0.021881629 0.0214114 0.000470186 2%
Facebook PANIC 0.02745893 0.0240696 0.003389371 12%
MLP Ford PANIC 0.160269481 0.0366101 0.123659394 77%
Google PANIC 0.023883099 0.0206876 0.003195483 13%
Netflix PANIC 0.025532814 0.0239921 0.001540685 6%
Tesla PANIC 0.084089884 0.0486324 0.035457453 42%
49
Walmart PANIC 0.017525161 0.0181446 -0.00061946 -4%
Amazon NORMAL 0.026022899 0.0242604 0.001762451 7%
Apple NORMAL 0.020020966 0.0199544 6.65348E-05 0%
Facebook NORMAL 0.020698101 0.0219327 -0.00123461 -6%
Ford NORMAL 0.048863618 0.0204708 0.028392781 58%
Google NORMAL 0.023554962 0.0214688 0.002086137 9%
Netflix NORMAL 0.02954214 0.0284155 0.001126658 4%
Tesla NORMAL 0.036097739 0.0353117 0.000786013 2%
Walmart NORMAL 0.012775668 0.0136211 -0.00084542 -7%
Note. NN Type: Neural Network Type
variables. The fact that Tesla has a model that improved by 42% and another that was reduced by
89% show the drastic need for researchers and traders to test across a wide variety of scenarios
before deciding to utilize a variable as input into a stock price prediction model. Ford is the clear
leader in terms of percentage improvement, with the absolute difference between Ford and the
second most improved stock prediction being 36%. Interestingly both MLP and LSTM
configurations for Ford performed better with the pan-test set rather than the normal test set. This
ranking will be utilized to analyze input variables, test data, and neural network choice going
forward. No clear trends are present related to the magnitude of mean Twitter and News count
variables and how it relates to RMSE % improved. However, Table 7 shows that when comparing
the average Twitter count to average News count ratios, the top 50 percent of models in terms of
RMSE% improved utilized data that had an average ratio of 1.66 Twitter counts to News counts.
This contrasts with an almost 57% increase in the average Twitter count to average News count
ratio of 2.92 for the bottom 50% of models. A potential interpretation of these results is that the
number of tweets or news count alone does not necessarily strengthen the variables but rather how
close the variables are to each other in terms of the mean value.
50
Table 6. Average RMSE % improvement across all configurations
Company Twitter to News Ratio Overall Mean News Publication Count Mean Twitter Count
Ford 0.712761099 274.7394834 195.824
Google 3.301257549 1,213.278229 4,005.34
Netflix 2.450254761 647.8841328 1,587.48
Facebook 0.178285882 3,983.518819 710.205
Amazon 3.974418341 769.4627306 3,058.17
Apple 1.98774911 2,545.850185 5,060.51
Walmart 1.982717614 380.7387454 754.897
Tesla 3.753669212 600.7311669 2,254.95
Top Half Avg 1.660639823 1,529.855166 1,624.71
Bottom Half
Avg 2.924638569 1,074.195707 2,782.13
A unique requirement present in machine learning is that of training with a separate set of
data. Therefore, it is crucial to analyze the input variables throughout the various stages of training
and testing. To investigate whether or not is possible to estimate the degree to which Twitter and
News count variables would improve stock price prediction, comparisons between the variable
were made between the training data and both the test sets. This was done so that it could be
theorized that an analysis performed on the input data before conducting the actual test could
determine, whether the stock being predicted is one where the model being used to predict it can
improve price prediction accuracy by including T+ variables. An analysis of the change in the
Coefficient of Variance between the training and test sets can be found in Table 8. A point to note
51
is that the coefficient of variation of the variables stays relatively the same between the training
Table 9 presents an analysis of the mean T+ variables within the training data, the normal test set,
and the panic test set. It is interesting to observe there is an improvement in RMSE due to the
addition of T+ variables across all eight tests, the top 50 percent of performers averaged a 25%
decrease in mean Twitter Count amount between Train and Test sets. This implies that for both
test sets, Twitter count on average was 25% lower than that of the training set. On the other hand,
for the bottom 50% of performers, there is an almost 60% decrease between the training data mean
Twitter count and the average of both test set’s mean Twitter counts. The difference between the
mean News publication count for the Training data and the test data for the top 50% of models
shows a 40% decrease. On the other hand, the bottom 50% of companies in terms of RMSE
percentage improvement due to the addition of T+ variables averaged a 25% increase between the
Table 8. Variable Coefficient of Variation change between training and test sets
52
Table 9. Mean Twitter & News Count in Training Data, Normal Test Set, & Panic Test Set
Mean Twitter Count Mean News Publication Count
Company Train Normal Test Pan Test Train Normal Test Pan Test
Ford 212 131 149 253 334 336
Google 4,320 3,560 3,140 1,206 1,109 1,237
Netflix 1,823 1,077 907 620 956 730
Facebook 686 866 780 4,758 1,914 1,741
Amazon 3,515 1,816 1,734 727 1,035 890
Apple 6,229 1,612 1,675 2,479 2,864 2,735
Walmart 881 394 388 364 439 429
Tesla 2,499 1,395 1,548 488 823 930
Top 50% 1,760 1,408 1,244 1,709 1,078 1,011
Bot 50% 3,281 1,304 1,337 1,015 1,290 1,246
Top 50% Train vs Avg Test 75% Top 50% Train vs Avg Test 61%
Top 50% Train vs Avg Test 40% Top 50% Train vs Avg Test 125%
To test whether these results are statistically significant, a statistical test must be conducted. The
results of our experiment are extremely skewed with a Shapiro Wilk test p-value of 0.0006633,
well below the accepted minimum normality test value of 0.05 or greater. Due to this fact, the
normality requirement of a t-test cannot be met. Therefore, we opted to utilize the non-parametric
Wilcoxon Signed Rank Test. A one-sided directional test was conducted hypothesizing that the
accuracy for models utilizing only T variables will be lower than models utilizing T+ variables.
Based on the Wilcoxon Signed Rank Test, our results are significant, with a p-value of 0.00180761
and a confidence level of 95%. Furthermore, when only models that were tested on the normal test
set data were examined to see if the improvement brought on by T+ variables was significant in
this case, the resulting p-value was 0.032, meaning that the observed difference was statistically
significant. Additionally, when testing if the improvement between models that utilized T+
variables and were tested on the panic test set was significant, the results show that with a p-value
6.2. Examining the Predictive Models under Panic and Normal Circumstances
The reason for creating two separate test sets for each stock is due to the fundamental nature of
Bloomberg generated Twitter and News count variables relating to the amount the public mentions
53
a company. As previously discussed in the problem definition section, interesting insights into the
strength of Bloomberg generated Twitter and News count variables can be gained by analyzing
the performance of all models in times of increased panic surrounding the stock market. It is widely
agreed that in the first half of 2020, North America experienced widespread panic related to the
performance of the stock market and economy due to the impact of the global COVID-19
pandemic. The average RMSE for models tested on the market panic test set was 0.0670, while
the average RMSE for the normal test set was 0.0354. There is an apparent decrease in performance
when testing models on the Normal test Set as compared to the Panic test set. To test the
significance of this difference, a Wilcox signed-rank test was conducted between the RMSE’s of
the models tested on the Panic test set and the models tested on the Normal test set. The results
To better understand this discrepancy in performance, an analysis of how the variables of interest
differ between the different test sets can be found in Appendix B. Appendix B. presents an
overview of how the T+ variable’s mean, standard deviation, and coefficient of variation change
when calculated for the overall data set, training data set, and two test sets, as well as the additional
period which differentiates the panic test set from the normal test set. The range for data under the
panic test set is from December 2019 to May 2020. The reason for selecting December 2019 as
the start of the panic period is due to the World Health Organization reporting the first Covid-19
case in December 2019 which then resulted in panic related to the U.S. stock market (Baig, Butt,
Haroon, Rizvi, 2020). The range for data for the normal test set is January 2019 to November
2019, the range for data for the pan-test set is January 2019 to May 2020. The training data is
defined by data ranging from January 2015 to December 2018. By splitting our analysis into these
three distinct sections, we have a starting point to theorize as to what may have caused an average
decrease in performance for tests done on the pan-test set versus the normal test set.
54
Table 10 utilizes the information in Appendix B to show how the T+ variables and price differences
between the Panic data and the Training Data. The companies are displayed in descending order
by mean RMSE for the pan-test Set. An interesting observation is that when comparing the bottom
50% to the top 50% of performances, the coefficient of variance for the Twitter and News Count
variables exhibit a 19% and 27% absolute difference, respectively. In addition, the percentage
change of mean Twitter and News count variables shows an 18% and 19% absolute difference,
respectively. The difference in variance between the data the models are learning from, and the
additional period added on to the normal test set to make it the pan-test set, is critical to our
analysis. The average percentage change in mean for the price data, as well as the average
coefficient of variance of the price data, exhibits a similar difference between the panic data and
training data for all stocks being predicted. This discrepancy in variable variance between the
training and test set might be the primary cause of the drop in performance when it comes to the
Table 10. T+ Variables Mean & Coefficient of Variation in Panic Test Set and Training data
Twitter Count News Count Price
Train vs. Panic Train vs. Panic Train vs. Panic
CV -7% -15% -14%
Google
Mean -46% 23% -49%
CV -19% -7% -37%
Amazon
Mean -55% -17% 114%
CV -27% -35% 6%
Walmart
Mean -26% 132% 129%
CV -48% -46% -16%
Facebook
Mean -10% -71% 47%
CV 22% 38% -45%
Netflix
Mean -69% -53% 117%
CV -59% -20% -16%
Apple
Mean -71% 0% 105%
CV -103% -2% 12%
Ford
Mean -14% 34% -44%
CV -35% -13% -10%
Tesla
Mean -57% 13% 52%
Coefficient Top 50% -25% -26% -15%
of Variation Bot 50% -44% 1% -15%
Top 50% -35% 17% 60%
Mean Bot 50% -53% -2% 57%
55
6.3. Comparing the MLP and LSTM methods
As is the current case with most stochastic based deep learning models, the inner workings of a
neural network are nearly impossible to analyze accurately. Therefore, any interpretations or
theorization beyond how complex a model seems or the model’s ability to utilize memory should
not be considered theoretically sound at this relatively early stage in the life cycle of machine
learning-based stock price prediction research. Therefore, a comparison is made between RMSE
mean and RMSE variance throughout the different groupings. Table 11 presents a summary of this
comparison.
Table 11. RMSE analysis for T vs. T+ and Panic Test Set vs. Normal Test Set
T+ with Panic Test Set Normal Test Set
T with LSTM LSTM LSTM LSTM
Mean 0.073394592 0.0693573 0.097052648 0.04569919
Range 0.141445781 0.1588343 0.138198375 0.07217128
ST.Dev 0.040657937 0.0406329 0.04158476 0.01625939
CV 55% 59% 43% 36%
T+ with Panic Test Set Normal Test Set
T with MLP MLP MLP MLP
Mean 0.03735703 0.0248912 0.037060044 0.02518823
Range 0.147493813 0.0350113 0.142744321 0.03608795
ST.Dev 0.035671165 0.0083744 0.035682809 0.00874798
CV 95% 34% 96% 35%
+/- Mean MLP vs LSTM -0.036037562 -0.044466 -0.0599926 -0.02051097
+/- CV MLP vs LSTM 40.09% -24.94% 53.44% -0.85%
Average +/- Mean RMSE
MLP vs LSTM -0.040251786
Average +/- CV RMSE MLP
vs LSTM 16.93%
It is clear to see that MLP outperforms LSTM in terms of the mean RMSE for both the pan-test
set and the normal test set. This is also true when comparing the utilization of T+ variables vs only
T variables. However, it is essential to note that even though the mean RMSE was this much lower
across all MLP configurations, the coefficient of variance was, on average, 16.93% higher for
56
MLP models. Therefore, based on the mean and the range, we cannot conclude that this
outperformance is meaningful and statistically significant. With LSTM’s being a more complex
version of MLP, it would be intuitive to think that LSTMS would perform better. To test the
significance of these results, a Wilcoxson Signed Rank Test was conducted to see if there was a
significant improvement brought on by the inclusion of T+ variables for each model type. Right
tailed tests were conducted for both MLP and LSTM. For the MLP the difference in performance
between T+ and T variables was significant with a p-value of 0.00381470. However, for the LSTM
the p-value of the right-tailed test was 0.0964050, meaning that the null hypothesis cannot be
rejected. To begin hypothesizing why this result was observed, we must note the fact that the main
differentiating feature present in LSTM’s that MLP’s lack, is memory. This memory is the reason
why LSTM’s perform well at complex tasks such as natural language processing (Soutner &
Muller, 2013) and video recognition (Ng et al., 2015). Via the LSTM’s memory, the model can
create a context for certain words mentioned later. Due to memory being the main differentiating
feature, we can hypothesize that for our experiment, the prediction of prices did not benefit from
utilizing memory. Another fact that backs this hypothesis is that the T variable set had a 30-day
moving average as one of the input variables, while the T+ input set did not. 30-day moving
average functions similarly to memory in that it is a value obtained entirely from previous values.
The T variable set on average performed inferior across the board. The lack of improvement the
inclusion of this variable resulted in can be considered similar to the lack of improvement brought
Similar results are also found in the literature. For example, Hiransha et al. (2018) observed that
their LSTM model for stock price prediction does not perform better than their MLP model when
the length of the predicted period was 400 days. However, when the same models were tested on
57
a period of 10 years, then the LSTM becomes more accurate. This phenomenon could be a potential
explanation as to the trend observed in our results. The intuitive explanation of this phenomenon
is that the main advantage of LSTMs being the memory feature is much more advantageous when
utilized on a larger test set. This could be due to the length of the prediction period increasing,
resulting in more chances for memory to be utilized. That could explain why in short testing
periods, the extra complexity of LSTM is not as beneficial as it would be for a much longer
extended prediction periods. This would then result in MLP and LSTM’s having no significant
advantage over the other. Additionally, the magnitude of the difference could be attributed to the
fact that LSTM’s perform better when there is increased data preparation designed specifically for
the model and test data (Chacon, Kesici, & Najafirad, 2020).
In section 5.2.1. it was hypothesized that due to potential differences in operating styles of Amazon
and Walmart, and Tesla and Ford, there could be a difference in how the Twitter and News count
variables influence stock price prediction of the respective companies. Due to the different
business models of Amazon and Tesla, as opposed to the more traditional ones of Walmart and
Ford, it was theorized that variables such as Twitter and News count that have not traditionally
been utilized as a result of only being created in 2015, would respond better to companies that are
newer and more popular currently. To investigate if this was true a comparative analysis was done.
However, based on the results presented in Table 5, there is no indication that the prediction of the
stock price of companies with a different business model responds better to stock price prediction
utilizing newer social media variables. This is evidenced by the fact that Tesla stock price
prediction has benefited the least from the T+ Variables. In contrast, Ford has benefited the most,
resulting in a 51% difference in how much T+ variables improve prediction. On the other hand,
58
Amazon and Walmart did not necessarily exhibit this same trend with Amazon, the newer
exhibited a -10% decrease when incorporating T+ Variables. These results suggest that there is no
clear indication between the broad perceptions of an organization based on age or operational style
and the impact that new social media-based variables such as Twitter and News count variables
59
Chapter 7
We proposed a stock price prediction technique that utilizes neural networks as the price prediction
model and technical variables as well as Twitter and News count variables as its inputs. Based on
the results and respective statistical tests, it is evident that the inclusion of these variables has the
potential to improve the stock price prediction ability across various model types and test data
configurations. However, further work must be done to assess the importance of the included
variables across a wide variety of companies, so a diagnosis can be made as to why there are
extreme outliers such as Walmart and Tesla. These results are of value as stock price prediction is
a domain where improvements in accuracy are a constant goal, and the search for variables that
There are a wide variety of real-world practical implications that arise from these results. The
results of the T+ vs T analysis suggests that both traders and researchers can benefit by including
Twitter and News count variables in conjunction with technical indicators as inputs into neural
network-based stock price prediction models. According to the trends exhibited in this research,
stock traders who utilize technical indicator-based price prediction can include the two variables
without having to make any adjustments to their model and, for the overwhelming majority of
accuracy is of great value to traders and investors of all levels. Another reason for traders to utilize
Twitter and News publication count indicators is that they are a way to successfully incorporate
public opinion with minimal effort into any prediction model. There is an ever-increasing influence
that social media has on society. This influence directly impacts the public’s perception of
60
corporations as well as products. Having robust multifaceted analysis that can capture this impact
Future research related to the technical analysis should aim to incorporate the T+ variables into
their investigation due to the similar nature and standardization potential of this variable. The
similar nature of the T+ variables is because they are count variables. Unlike other Twitter and
news-based indicators, Bloomberg generated Twitter and News count variables that have defined
the variables in simple and replicable terms related to objective measures. This means that other
indicator providers such as Yahoo Finance can have the same value for Twitter count or news
publication count, making this variable an objective standardized one. However, more work needs
to be done in terms of figuring out exactly why the variable works well across the majority of
scenarios but poorly in some unique cases, this is a promising direction for future research. Another
opportunity for future research is the investigation if the Twitter and News count variables improve
stock price prediction for medium and small-sized companies, as well as testing to see if the results
stay true when tested on other markets such as Europe and Asia.
The reason the difference between panic and normal test set results is of importance is related to
the increased utilization of algorithmic trading within investment funds and banks. According to
Deutsche Bank, high-frequency algorithmic trading at its peak accounted for 60% of total equity
trading within the United States in 2009. Algorithmic trading places a heavy reliance on the stock
price prediction. The functionality of any stock prediction methods directly impacts the
performance of these large algorithmically traded funds. The results from this experiment show
that the ability of these algorithms to maintain their accuracy comes into question in times of
market panic. Quality assurance and risk teams managing these large funds should ideally be
scrutinizing the models which drive their algorithmic funds. However, due to the nature of high-
61
frequency trading, even the shortest drop in prediction accuracy can result in a large number of ill-
advised trades. The results of our experiments show that there is a need to stress test stock price
prediction methods to ensure performance standards in all potential market scenarios. This is
important for traders to ensure a profitable model stays that way. An efficient way to test the price
prediction methods is through synthetic test data sets to ensure the models are robust enough to
handle all scenarios. The results of this stress test should be made clear to all customers and
regulatory boards.
As mentioned previously, the difference in mean prediction performance between MLP and
LSTM, as well as the magnitude of this difference, can potentially be attributed to two main
reasons. The first being the test period length. The second being the degree to which the data was
preprocessed. What these two observations mean for traders and researchers is that stock price
prediction model selection needs to incorporate both the aforementioned factors. For traders
utilizing stock price prediction, the amount of preprocessing as well as any expertise needed to
preprocess the data for each individual method successfully should be a critical point to consider
when trying to determine practically viable prediction methods. This is because extra data
preprocessing and requiring of expertise is not ideal for algorithms that try to predict a large
number of stock prices at once, such as index movement prediction. Future research work should
also focus on standardizing the data preparation techniques that ensure model optimality for all
model types to create an agreed-upon standard that should be replicated every time a specific
model is used. This standardization will benefit analysis related to stock price prediction methods
as well as research focused on improving neural networks. Furthermore, work needs to be done on
examining other existing neural network types thoroughly to ensure that simpler models that may
perform better in niche scenarios are not being underutilized for their computationally heavier and
62
complex counterparts. A major limitation to the utilization of the Twitter count and news
publication count variables is that the variable does not factor in the influence that certain twitter
accounts or news publications have on the perception of the stock they are talking about. Certain
Twitter accounts and news publications discussing a specific company may impact the stock price
with a greater magnitude than that of the average tweet or article. An obvious example of this
phenomenon would be A tweet by Elon Musk having more impact on the price of Tesla shares
than a non-Tesla affiliated Twitter account mentions tesla in their tweet. Therefore an avenue for
future work can be the analysis and identification of how certain Twitter users and news
63
Appendices
Appendix A
Table A.1. Experimental Results
64
LSTM(T+) 0.0601819638981050
MLP (T) 0.0194953902448822
MLP (T+) 0.0192764347624178
LSTM (T) 0.0467238214804085
LSTM(T+) 0.0421003771453041
MLP (T) 0.0260228987852983
NORMAL MLP (T+) 0.0242604474588606
LSTM (T) 0.17632049831447200
LSTM(T+) 0.187577475928560000
MLP (T) 0.084089883968641000
PANIC MLP (T+) 0.048632431410302100
Tesla LSTM (T) 0.053439369040910300
LSTM
(T+) 0.100914443840333000
MLP (T) 0.036097739374578800
NORMAL MLP (T+) 0.035311726227490000
LSTM (T) 0.128599040948235000
LSTM
(T+) 0.127105799310714000
MLP (T) 0.021881628833466800
PANIC MLP (T+) 0.021411442404551200
Apple
LSTM (T) 0.058480253278905900
LSTM
(T+) 0.052990149522751300
MLP (T) 0.020020966236037400
NORMAL MLP (T+) 0.019954431466789200
LSTM (T) 0.112263080026268000
LSTM
(T+) 0.092164317326074900
MLP (T) 0.025532814481469400
PANIC MLP (T+) 0.023992129105065700
Netflix
LSTM (T) 0.042107440714652000
LSTM
(T+) 0.037061721060168100
MLP (T) 0.029542140415547400
NORMAL MLP (T+) 0.028415482727267900
65
Appendix B.
Table B.1. Comparing Variables in Overall, Training, Normal Test, Pan Test, and Panic Period Test Set
Ford Google Netflix Facebook Amazon Apple Tesla Walmart
Twitter News Price Twitter News Price Twitter News Price
Twitter News Price Twitter News Price Twitter News Price Twitter News Price Twitter News Price
Overall
Mean 196 275 11 4005 1213 956 1587 648 214 710 3984 147 3058 769 1184 5061 2546 164 2255 601 297 755 381 87
Variance 124300 34101 6 11703649 363891 59469 2666375 196237 12979 432228 17787737 1562 6884086 204387 327469 41627970 1797736 3117 4799559 275483 14664 747508 66393 331
St.Dev 353 185 3 3421 603 244 1633 443 114 657 4218 40 2624 452 572 6452 1341 56 2191 525 121 865 258 18
Coefficient Variant
180% 67% 22% 85% 50% 26% 103% 68% 53% 93% 106% 27% 86% 59% 48% 127% 53% 34% 97% 87% 41% 115% 68% 21%
Training
Mean 212 253 12 4320 1206 1061 1823 620 170 686 4758 133 3515 727 947 6229 2479 141 2499 488 268 881 364 78
Variance 159345 28773 4 12068276 378508 34340 3092910 161330 9125 497886 21219182 1281 8044743 190822 208765 50038010 1775513 1228 5862998 203445 3227 913234 58531 134
St.Dev 399 170 2 3474 615 185 1759 402 96 706 4606 36 2836 437 457 7074 1332 35 2421 451 57 956 242 12
Coefficient Variant
188% 67% 15% 80% 51% 17% 96% 65% 56% 103% 97% 27% 81% 60% 48% 114% 54% 25% 97% 92% 21% 108% 67% 15%
Normal Test
Mean 131 334 9 3560 1109 708 1077 956 330 866 1914 180 1816 1035 1790 1612 2864 202 1395 823 264 394 439 107
Variance 16797 41977 0 13358463 295951 3622 894611 239512 1255 275603 1315295 238 1266840 222560 11013 2371030 2337862 825 643150 273027 1873 93790 104811 74
St.Dev 130 205 1 3655 544 60 946 489 35 525 1147 15 1126 472 105 1540 1529 29 802 523 43 306 324 9
Coefficient Variant
99% 61% 7% 103% 49% 9% 88% 51% 11% 61% 60% 9% 62% 46% 6% 96% 53% 14% 57% 63% 16% 78% 74% 8%
Pan Test
Mean 149 336 8 3140 1237 653 907 730 343 780 1741 185 1734 890 1869 1675 2735 232 1548 930 382 388 429 111
Variance 19947 44343 3 10181305 323122 8425 804327 288309 1773 235199 1103842 376 1171956 223752 37428 1895009 1812073 2471 1036664 343690 37967 88037 85835 89
St.Dev 141 211 2 3191 568 92 897 537 42 485 1051 19 1083 473 193 1377 1346 50 1018 586 195 297 293 9
Coefficient Variant
95% 63% 19% 102% 46% 14% 99% 74% 12% 62% 60% 10% 62% 53% 10% 82% 49% 21% 66% 63% 51% 76% 68% 8%
Only Panic Period Normal Test mean utilized for for st.dev and coefficient variant is from Normal Test
Mean 182 339 7 2323 1487 545 572 290 367 615 1403 196 1569 607 2024 1790 2481 289 1842 1134 614 376 410 119
Variance 24273 48826 4 2920037 278824 321 457364 89130 1862 113794 512130 473 942224 104158 52289 943121 681514 642 1670824 417390 27544 75950 47737 28
St.Dev 156 221 2 1709 528 18 676 299 43 337 716 22 971 323 229 971 826 25 1293 646 166 276 218 5
Coefficient Variant
86% 65% 27% 74% 36% 3% 118% 103% 12% 55% 51% 11% 62% 53% 11% 54% 33% 9% 70% 57% 27% 73% 53% 4%
66
References
Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and artificial neural
networks models for stock price prediction. Journal of Applied Mathematics, 2014.
Agustini, W. F., Affianti, I. R., & Putri, E. R. (2018, March). Stock price prediction using geometric
Brownian motion. In Journal of Physics: Conference Series (Vol. 974, No. 1, p. 012047). IOP
Publishing.
Ahmadi, H. (1990, June). Testability of the arbitrage pricing theory by neural network. In 1990 IJCNN
International Joint Conference on Neural Networks (pp. 385-393). IEEE.
Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., & Taha, K. (2015). Efficient machine
learning for big data: A review. Big Data Research, 2(3), 87-93.
Angelovska, J. (2017). Investors’ behaviour in regard to company earnings announcements during the
recession period: Evidence from the Macedonian stock exchange. Economic Research-
Ekonomska Istraživanja, 30(1), 647-660. doi:10.1080/1331677x.2017.1305768
Antunes, P., Macdonald, A., & Steward, M. (2014). Boosting Retirement Readiness and The Economy
through Financial advice Retrieved from http://www.conferenceboard.ca
Asteriou, D., Pilbeam, K., & Sarantidis, A. (2019). The Behaviour of Banking Stocks During the
Financial Crisis and Recessions. Evidence from Changes-in-Changes Panel Data Estimations.
Scottish Journal of Political Economy, 66(1), 154-179. doi:10.1111/sjpe.12191
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47(2),
1148-1178
Babu, C. N., & Reddy, B. E. (2014). A moving-average filter based hybrid ARIMA–ANN model for
forecasting time series data. Applied Soft Computing, 23, 27-38.
Baig, A., Butt, H. A., Haroon, O., & Rizvi, S. A. R. (2020). Deaths, Panic, Lockdowns and US Equity
Markets: The Case of COVID-19 Pandemic. Available at SSRN 3584947.
Bisoi, R., Dash, P. K., & Parida, A. K. (2019). Hybrid Variational Mode Decomposition and evolutionary
robust kernel extreme learning machine for stock price and movement prediction on daily basis.
Applied Soft Computing, 74, 652-678.
Braithwaite, T. (2017, July 28). Free stock trading for millennials comes at a cost. Retrieved from
https://www.ft.com/content/36ff325a-735e-11e7-aca6-c6bd07df1a3c
Bro, R., & Smilde, A. K. (2014). Principal component analysis. Analytical Methods, 6(9), 2812-2831.
Business Wire, (2019). Global Algorithmic Trading Market to Surpass US$ 21,685.53 Million by 2026.
Retrieved September 10, 2020, from
https://www.businesswire.com/news/home/20190205005634/en/Global-Algorithmic
Trading-Market-Surpass-21685.53-Million
Can Tweets And Facebook Posts Predict Stock Behavior?. (2019). Retrieved 30 September 2019, from
https://www.investopedia.com/articles/markets/031814/can-tweets-and-facebook-posts-
predict-stock-behavior-and-rt-if-you-think-so.asp
67
Chacon, H. D., Kesici, E., & Najafirad, P. (2020). Improving Financial Time Series Prediction Accuracy
Using Ensemble Empirical Mode Decomposition and Recurrent Neural Networks. IEEE Access,
8, 117133-117145. doi:10.1109/access.2020.2996981
Chan, H. L., & Woo, K. Y. (2013). Studying the dynamic relationships between residential property
prices, stock prices, and GDP: Lessons from hong kong. Journal of Housing Research, 22(1), 75-
89. Retrieved from http://ezproxy.lib.ryerson.ca/login?url=https://search-proquest-
com.ezproxy.lib.ryerson.ca/docview/1353322722?accountid=13631
Chang, Z., Zhang, Y., & Chen, W. (2019). Electricity price prediction based on hybrid model of adam
optimized LSTM neural network and wavelet transform. Energy, 187, 115804.
doi:10.1016/j.energy.2019.07.134
Chatterjee, U. K. (2016). Do stock market trading activities forecast recessions? Economic Modelling, 59,
370-386. doi:10.1016/j.econmod.2016.08.007
Chen, Y., & Hao, Y. (2017). A feature weighted support vector machine and K-nearest neighbor
algorithm for stock market indices prediction. Expert Systems with Applications, 80, 340-355.
Cheng, C. H., & Yang, J. H. (2018). Fuzzy time-series model based on rough set rule induction for
forecasting stock price. Neurocomputing, 302, 33-45.
Choi, J. H., Lee, M. K., & Rhee, M. W. (1995, June). Trading S&P 500 stock index futures using a neural
network. In Proceedings of the third annual international conference on artificial intelligence
applications on wall street (pp. 63-72).
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and
prediction: Methodology, data representations, and case studies. Expert Systems with
Applications, 83, 187-205.
Das, S., & Kadapakkam, P. R. (2018). Machine over Mind? Stock price clustering in the era of
algorithmic trading. The North American Journal of Economics and Finance.
Daniel, R. (2015, March 12). Robin How Robinhood, an investing app, is luring stock-market newbies.
Retrieved from http://fortune.com/2015/03/12/robinhood-investing-app/
Dash, R., & Dash, P. (2016). Efficient stock price prediction using a self evolving recurrent neuro-fuzzy
inference system optimized through a modified differential harmony search technique. Expert
Systems with Applications, 52, 75-90.
Di Persio, L., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction:
Comparisons and applications. International journal of circuits, systems and signal processing, 10,
403-413.
Edwards, J. (2017, December 2). Global market cap is heading toward $100 trillion and Goldman Sachs
thinks the only way is down. Retrieved from https://www.businessinsider.de/global-market-cap-
is-about-to-hit-100-trillion-2017-12?r=UK&IR=T
Erdogan, O., Bennett, P., & Ozyildirim, C. (2014). Recession Prediction Using Yield Curve and Stock
Market Liquidity Deviation Measures. Review of Finance, 19(1), 407-422. doi:10.1093/rof/rft060
Every Time Trump Tweets About the Stock Market. (2019). Retrieved 30 September 2019, from
https://www.bloomberg.com/features/trump-tweets-market
68
Fenghua, W. E. N., Jihong, X. I. A. O., Zhifang, H. E., & Xu, G. O. N. G. (2014). Stock price prediction
based on SSA and SVM. Procedia Computer Science, 31, 625-631.
FXCM. (2016, June). New York Stock Exchange (NYSE). Retrieved from
https://www.fxcm.com/uk/insights/new-york-stock-exchange-nyse/
Geva, T., & Zahavi, J. (2014). Empirical evaluation of an automated intraday stock recommendation
system incorporating both market data and textual news. Decision support systems, 57, 212-223.
Groth, S. S., & Muntermann, J. (2011). An intraday market risk management approach based on textual
analysis. Decision Support Systems, 50(4), 680-691.
Guliyev, N. J., & Ismailov, V. E. (2016). A Single Hidden Layer Feedforward Network with Only One
Neuron in the Hidden Layer Can Approximate Any Univariate Function. Neural Computation,
28(7), 1289-1304. doi:10.1162/neco_a_00849
Guo, Z., Wang, H., Yang, J., & Miller, D. J. (2015). A stock market forecasting model combining two-
directional two-dimensional principal component analysis and radial basis function neural
network. PloS one, 10(4), e0122385.
Gurney, K. (2014). An introduction to neural networks. CRC press.
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction
based on financial news using context-capturing features. Decision Support Systems, 55(3), 685-
697.
Hiransha, M., Gopalakrishnan, E.A., Menon, V. K., & Soman K.P. (2018). NSE Stock Market Prediction
Using Deep-Learning Models. Procedia Computer Science, 132, 1351-1362.
doi:10.1016/j.procs.2018.05.050
How Does President Trump’s Twitter Use Impact Forex, Markets And Stocks? - Friedberg Direct. (2019).
Retrieved 30 September 2019, from https://www.fxcm.com/ca/insights/president-trumps-twitter-
impact-forex-markets-stocks/
Jiang, S., Chen, H., Nunamaker, J. F., & Zimbra, D. (2014). Analyzing firm-specific social media and
market: A stakeholder-based event analysis framework. Decision Support Systems, 67, 30-39.
Jiang, S., & Chen, Y. (2018). Hand Gesture Recognition by Using 3DCNN and LSTM with Adam
Optimizer. Advances in Multimedia Information Processing – PCM 2017 Lecture Notes in
Computer Science, 743-753. doi:10.1007/978-3-319-77380-3_71
Johnson, A., & Reed, A. (2019). Tesla in Texas: A Showdown Over Showrooms. SAM Advanced
Management Journal, 84(2), 47-56.
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 374(2065), 20150202.
Jones, J. (2017, May 24). U.S. Ownership Down Among all but Older Higher Income. Retrieved from
https://news.gallup.com/poll/211052/stock-ownership-down-among-older-higher-income.aspx
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science,
349(6245), 255-260.
69
Joseph, I., Obini, N., Sulaiman, A., & Loko, A. (2020). Comparative Model Profiles of Covid-19
Occurrence In Nigeria. International Journal of Mathematics Trends and Technology, 68(6), 297-
310. doi:10.14445/22315373/ijmtt-v66i6p530
Kamijo, K. I., & Tanigawa, T. (1990, June). Stock price pattern recognition-a recurrent neural network
approach. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 215-221).
IEEE.
Kane, L. (2018, September 10). Robinhood Is Making Millions Selling Out Their Millennial Customers
To High-Frequency Traders. Retrieved from https://seekingalpha.com/article/4205379-
robinhood-making-millions-selling-millennial-customers-high-frequency-traders
Karaboga, D., & Kaya, E. (2018). Adaptive network based fuzzy inference system (ANFIS) training
approaches: a comprehensive survey. Artificial Intelligence Review, 1-31.
Kim, K. (2010). Electronic and algorithmic trading technology: the complete guide. Academic Press.
Kimoto, T., Asakawa, K., Yoda, M., & Takeoka, M. (1990, June). Stock market prediction system with
modular neural networks. In 1990 IJCNN international joint conference on neural networks (pp.
1-6). IEEE.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Kleinnijenhuis, J., Schultz, F., Oegema, D., & Atteveldt, W. V. (2013). Financial news and market panics
in the age of high-frequency sentiment trading algorithms. Journalism: Theory, Practice &
Criticism, 14(2), 271-291. doi:10.1177/1464884912468375
Kooli, C., Trabelsi, R., & Tlili, F. (2018). The impact of accounting disclosure on emerging stock market
prediction in an unstable socio-political context. Accounting and Management Information
Systems, 17(3), 313-329.
Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural
networks and transfer learning. Decision Support Systems, 104, 38-48.
Kumar, M., & Anand, M. (2014). An application of time series ARIMA forecasting model for predicting
sugarcane production in India. Studies in Business and Economics, 9(1), 81-94.
Kumar, R., Kumar, P., & Kumar, Y. (2020). Time Series Data Prediction using IoT and Machine
Learning Technique. Procedia Computer Science, 167, 373-381. doi:10.1016/j.procs.2020.03.240
Lahmiri, S. (2016). Intraday stock price forecasting based on variational mode decomposition. Journal of
Computational Science, 12, 23-27.
Lee, H., Surdeanu, M., MacCartney, B., & Jurafsky, D. (2014, May). On the Importance of Text Analysis
for Stock Price Prediction. In LREC (pp. 1170-1175).
Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment
analysis. Knowledge-Based Systems, 69, 14-23.
Li, X., Huang, X., Deng, X., & Zhu, S. (2014). Enhancing quantitative intra-day stock return prediction
by integrating both market news and stock prices information. Neurocomputing, 142, 228-238.
70
Liang, H., & Reichert, A. K. (2012). The impact of banks and non-bank financial institutions on economic
growth. The Service Industries Journal, 32(5), 699-717. doi:10.1080/02642069.2010.529437
Lusardi, A., & Mitchell, O. S. (2014). The economic importance of financial literacy: Theory and
evidence. Journal of economic literature, 52(1), 5-44.
Mahmud, M. S., & Meesad, P. (2016). An innovative recurrent error-based neuro-fuzzy system with
momentum for stock price prediction. Soft Computing, 20(10), 4173-4191.
Moghaddam, A. H., Moghaddam, M. H., & Esfandyari, M. (2016). Stock market index prediction using
artificial neural network. Journal of Economics, Finance and Administrative Science, 21(41), 89-
93.
Morris, C. (2018, May 10). Robinhood Trading App Surpasses E*Trade in Total Users. Retrieved from
http://fortune.com/2018/05/10/robinhood-users-trading-app-tops-etrade/
New York Stock Exchange, New York Stock Exchange. (2018) NYSE Total Market Cap [Web page].
Retrieved from https://www.nyse.com/market-cap
Nguyen, T. H., Shirai, K., & Velcin, J. (2015). Sentiment analysis on social media for stock movement
prediction. Expert Systems with Applications, 42(24), 9603-9611.
Nguyen, N. (2018). Hidden Markov model for stock trading. International Journal of Financial Studies,
6(2), 36.
Nielsen, M. A. (2015). Neural networks and deep learning (Vol. 25). San Francisco, CA, USA::
Determination press.
Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA and support vector machines model in stock price
forecasting. Omega, 33(6), 497-505.
Parungrojrat, N., & Kidsom, A. (2019). Stock Price Forecasting: Geometric Brownian Motion and Monte
Carlo Simulation Techniques. MUT Journal of Business Administration, 16(1), 9-103.
Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price index movement
using trend deterministic data preparation and machine learning techniques. Expert Systems with
Applications, 42(1), 259-268.
Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock market index using fusion of
machine learning techniques. Expert Systems with Applications, 42(4), 2162-2172.
Piterbarg, L. I. (2011). Parameter estimation from small biased samples: Fuzzy sets vs statistics. Fuzzy
Sets and Systems, 170(1), 1-21.h
Saleem, N., & Khattak, M. I. (2020). Deep Neural Networks for Speech Enhancement in Complex-Noisy
Environments. International Journal of Interactive Multimedia and Artificial Intelligence, 6(1),
84. doi:10.9781/ijimai.2019.06.001
Schumaker, R. P., Zhang, Y., Huang, C. N., & Chen, H. (2012). Evaluating sentiment in financial news
articles. Decision Support Systems, 53(3), 458-464.
Scornet, E., Biau, G., & Vert, J. P. (2015). Consistency of random forests. The Annals of Statistics, 43(4),
1716-1741.
71
Shynkevich, Y., McGinnity, T. M., Coleman, S. A., & Belatreche, A. (2016). Forecasting movements of
health-care stock prices based on different categories of news articles using multiple kernel
learning. Decision Support Systems, 85, 74-83.
Shynkevich, Y., McGinnity, T. M., Coleman, S. A., Belatreche, A., & Li, Y. (2017). Forecasting price
movements using technical indicators: Investigating the impact of varying input window length.
Neurocomputing, 264, 71-88.
Skehin, T., Crane, M., & Bezbradica, M. (2018, December). Day ahead forecasting of FAANG stocks
using ARIMA, LSTM networks and wavelets. CEUR Workshop Proceedings.
Stathakis, D. (2009). How many hidden layers and nodes? International Journal of Remote Sensing,
30(8), 2133-2147. doi:10.1080/01431160802549278
Song, Q., & Chissom, B. S. (1993). Fuzzy time series and its models. Fuzzy sets and systems, 54(3), 269-
277.
Sun, X. Q., Shen, H. W., & Cheng, X. Q. (2014). Trading network predicts stock price. Scientific reports,
4, 3711.
Sun, B., Guo, H., Karimi, H. R., Ge, Y., & Xiong, S. (2015). Prediction of stock index futures prices
based on fuzzy sets and multivariate fuzzy time series. Neurocomputing, 151, 1528-1536.
Szmigiera, M. (2019, July 19). Global assets under management by region 2017. Retrieved July 07, 2020,
from https://www.statista.com/statistics/264907/asset-under-management-worldwide-by-region/
Tao, L., Hao, Y., Yijie, H., & Chunfeng, S. (2017). K-Line Patterns’ Predictive Power Analysis Using the
Methods of Similarity Match and Clustering. Mathematical Problems in Engineering, 2017.
Team, T. (2017, June 17). How Much Will Commission-Free Brokerages Impact Traditional
Brokerages?. Retrieved from https://www.forbes.com/sites/greatspeculations/2017/06/14/how-
much-will-commission-free-brokerages-impact-traditional-brokerages/#1cc374e23b76
Thakkar, A., & Chaudhari, K. (2020). CREST: Cross-Reference to Exchange-based Stock Trend
Prediction using Long Short-Term Memory. Procedia Computer Science, 167, 616-625.
doi:10.1016/j.procs.2020.03.328
Thomas, A. J., Petridis, M., Walters, S. D., Gheytassi, S. M., & Morgan, R. E. (2017). Two Hidden
Layers are Usually Better than One. Engineering Applications of Neural Networks
Communications in Computer and Information Science, 279-290. doi:10.1007/978-3-319-65172-
9_24
Trippi, R. R., & DeSieno, D. (1992). Trading equity index futures with a neural network. Journal of
Portfolio Management, 19, 27-27.
Tsai, C. F., & Quan, Z. Y. (2014). Stock prediction by searching for similarities in candlestick charts.
ACM Transactions on Management Information Systems (TMIS), 5(2), 9.
Umoh, U. A., & Inyang, U. G. (2015). A FuzzFuzzy-Neural Intelligent Trading Model for Stock Price
Prediction. International Journal of Computer Science Issues (IJCSI), 12(3), 36.
72
Van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Van Aken, M. A. (2014). A
gentle introduction to Bayesian analysis: Applications to developmental research. Child
development, 85(3), 842-860.
Vanstone, B. J., Gepp, A., & Harris, G. (2018, June). The effect of sentiment on stock price prediction. In
International Conference on Industrial, Engineering and Other Applications of Applied Intelligent
Systems (pp. 551-559). Springer, Cham.
Wafi, A. S., Hassan, H., & Mabrouk, A. (2015). Fundamental analysis models in financial markets–
Review study. Procedia economics and finance, 30, 939-947.
Wang, Y. F. (2002). Predicting stock price using fuzzy grey prediction system. Expert systems with
applications, 22(1), 33-38.
Wang, Y. F. (2003). Mining stock price using fuzzy rough set system. Expert Systems with Applications,
24(1), 13-23.
Wang, L., Wang, Z., Zhao, S., & Tan, S. (2015). Stock market trend prediction using dynamical Bayesian
factor graph. Expert Systems with Applications, 42(15-16), 6267-6275.
Wang, J., & Wang, J. (2015). Forecasting stock market indexes using principle component analysis and
stochastic time effective neural networks. Neurocomputing, 156, 68-78.
Weng, B., Ahmed, M. A., & Megahed, F. M. (2017). Stock market one-day ahead movement prediction
using disparate data sources. Expert Systems with Applications, 79, 153-163.
World Bank, World Federation of Exchanges database. (2017) Stock traded, total value (% of GDP).
Retrieved from https://data.worldbank.org/indicator/CM.MKT.TRAD.GD.ZS
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory
and practice. arXiv preprint arXiv:2007.15745.
Zadeh, L. A. (1965). Fuzzy sets. Information and control, 8(3), 338-353.
Zahedi, J., & Rounaghi, M. M. (2015). Application of artificial neural network models and principal
component analysis method in predicting stock prices on Tehran Stock Exchange. Physica A:
Statistical Mechanics and its Applications, 438, 178-187.
Zhang, G. (2007). Avoiding Pitfalls in Neural Network Research. IEEE Transactions on Systems, Man
and Cybernetics, Part C (Applications and Reviews), 37(1), 3-16. doi:10.1109/tsmcc.2006.876059
Zhang, J., Cui, S., Xu, Y., Li, Q., & Li, T. (2018). A novel data-driven stock price trend prediction
system. Expert Systems with Applications, 97, 60-69.
Zhang, L., Wang, F., Xu, B., Chi, W., Wang, Q., & Sun, T. (2018). Prediction of stock prices based on
LM-BP neural network and the estimation of overfitting point by RDCI. Neural Computing and
Applications, 30(5), 1425-1444.
Zhang, N., Shen, S., Zhou, A., & Xu, Y. (2019). Investigation on Performance of Neural Networks Using
Quadratic Relative Error Cost Function. IEEE Access, 7, 106642-106652.
doi:10.1109/access.2019.2930520
Zhuge, Q., Xu, L., & Zhang, G. (2017). LSTM Neural Network with Emotional Analysis for Prediction of
Stock Price. Engineering Letters, 25(2).
73