Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

THE IMPACT OF TWITTER AND NEWS COUNT VARIABLES ON STOCK PRICE

PREDICTION VIA NEURAL NETWORKS

By

Shamir Rizvi

BBA, Wilfrid Laurier University, 2017

A thesis

presented to Ryerson University

in partial fulfillment of the

requirements for the degree of

Master of Science in Management

in the program of

Master of Science in Management

Toronto, Ontario, Canada, 2020

© Shamir Rizvi, 2020

i
Author’s Declaration

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including

any required revisions, as accepted by my examiners.

I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose

of scholarly research.

I further authorize Ryerson University to reproduce this thesis or dissertation by photocopying or

by other means, in total or in part, at the request of other institutions or individuals for the purpose

of scholarly research.

I understand that my thesis may be made electronically available to the public.

ii
Abstract

The Impact of Twitter and News Count Variables on Stock Price Prediction via Neural Networks
Shamir Rizvi

Master of Science in Management, 2020

Master of Science in Management, Ryerson Universtiy

This study examines how Twitter and News Count variables generated by Bloomberg L.P. when

utilized as inputs impact the stock price prediction accuracy of two distinct neural network types.

The neural network types that are examined are Multi-Layer Perceptron neural networks and Long

Short-Term Memory neural networks. Besides, all models were tested on two distinct periods, one

without any market panic, the other including a prolonged period of market panic. The results

suggest that the inclusion of Twitter and News Count variables significantly improve Multi-Layer

Perceptron networks, but no significant improvement occurred for Long-Short Term Memory

networks. Regarding periods of panic and no panic, the inclusion of the variables improved stock

price prediction via neural networks in both scenarios.

iii
Acknowledgments

I would like to thank my supervisor Dr. Hossein Zolfagharinia for guiding and mentoring me in

countless ways throughout my time at Ryerson University. I would also like to thank Dr. Amin

and Dr. Kalu, who kindly agreed to be part of my examining committee. I also extend my

appreciation to my parents Kamal Rizvi and Azra Rizvi, for their constant support and the example

of hard work and determination they set for me on a daily basis. I would not be where I am today

without them.

iv
Table of Contents
Author’s Declaration………………...…………………………………….………………….....ii
Abstract..……………………………………………………………………………….………..iii
Acknowledgements………………..……………………………………………………………..iv
List of Tables………………………………………………………………………………..…..vii
List of Figures…………………………………………………………………….……………viii
Chapter 1 Introduction ............................................................................................................. 1
Chapter 2 Literature Review .................................................................................................... 6
2.1. Stock Price Prediction Models ........................................................................................ 11
2.1.1. The K Nearest Neighbor (KNN) Algorithm……………………….....………… .…...11
2.1.2. Random Forests........................................................................................................ 12
2.1.3. Fuzzy ....................................................................................................................... 14
2.1.4. ARIMA .................................................................................................................... 15
2.1.5. Regression................................................................................................................ 17
2.1.6. Bayes ....................................................................................................................... 18
2.1.7. Principal Component Analysis (PCA) ...................................................................... 19
2.1.8. Candlestick Analysis/K-Line .................................................................................... 21
2.1.9. Hidden Markov Model (HMM) ................................................................................ 21
2.1.10. Geometric Brownian Motion .................................................................................. 22
2.1.11. Support Vector Machines ....................................................................................... 23
2.1.12. Support Vector Regression ..................................................................................... 24
2.1.13. Artificial Neural Networks ..................................................................................... 25
Chapter 3 Problem Definition ................................................................................................ 30
Chapter 4 Choice of Solution Method……………..………………………………….……….33
4.1. Neural Network Selection……………..………………………….………….…………34
4.1.1. Multiple Layer Perceptron (MLP) Networks.……………….….…………………35
4.1.2. Long Short-Term Memory (LSTM) Networks……………………………………37
Chapter 5 Designing Method..……………………… ………………………...………………40
5.1. Neural Network Design……….………….……………….………………….………..40
5.2. Data Selection……………………………...………….………………………….……43

v
5.2.1. Stock Selection…………………………………………….…………..…….…….43
5.2.2. Variable Selection………………………………..……………...…………….…..44
5.3. Final Experiment………………….…..……………………………….…………..…..46
Chapter 6 Results & Discussion……..…………………………………………………….…...48
6.1. The Impact of T Variables vs the T+ Variables….………………...…………….……48
6.2. Examining the Predictive Models under Panic and Normal Circumstances……….. ….53
6.3. Comparing the MLP and LSTM Methods…………..………………………………....56
6.4. Comparing Walmart vs Amazon and Tesla vs. Ford……..………….………………...58
Chapter 7 Conclusions & Managerial Insights ………………………………………………60
Appendices…..…… ..……………………………………………………………………………63
.

References................................................................................................................................ 67

vi
List of Tables
Table 1. A Summary of the related literature...………………………………………………..6
Table 2. Types of Neural Networks found in recent stock price prediction literature….….35
Table 3. Indicator Selection……………………………..………………………………...….…46
Table 4. Variable Definitions………….…………..……………..………………………….….46
Table 5. RMSE Percentage change when utilizing T vs T+ variables…………………… .…49
Table 6. Average RMSE improvement across all configurations……………………………51
Table 7. Mean Twitter Count to News Publication Count Ratio…….…………..…….….…51
Table 8. Variable Coefficient of Variation change between training and test sets ……...…52
Table 9. Mean Twitter & News Count in Training Data, Normal Test Set, & Panic Test
Set……………………….…………….…………….…………….…………….……………….53
Table 10. T+ Variables Mean & Coefficient of Variation in Panic Test Set and Training...55
Table 11. RMSE analysis of T vs T+ and Panic Test Set vs Normal Test Set………………56

vii
List of Figures
Figure 1. Inner Workings of LSTM Neuron...............................................................................39

viii
Chapter 1
Introduction

The massive and ever-increasing role that the stock market plays on a societal level can be

evidenced by the fact that globally total stock market capitalizations have exceeded US $80 trillion

as of 2017 an over 300 percent growth from the 2009 global total stock market capitalization of

US $25 trillion, with Goldman Sachs estimating a steady rise to over US $100 trillion (Edwards,

2017).

Stock trading has a significant impact on the United States economy. The total value of stocks

being traded on U.S markets since 2013 has been increasing and has managed to consistently stay

above 200% of the United States’ total annual GDP (World Bank, 2017). The New York Stock

Exchange alone had an average market capitalization of approximately US $30 trillion (New York

Stock Exchange, 2018). Furthermore, the New York Stock exchange averages US $169 billion in

stocks traded daily (FXCM, 2016). The movement of stocks plays a key role in determining

financial market health. On a macro level, stock market price booms have been shown to drive

long term economic growth (Chan & Woo, 2013)

Related to individual traders, another reason to work on stock price prediction is due to the increase

in first-time and novice investors actively buying and selling stocks. Over 54% of all U.S adults

have their money invested in some shape or form in the stock market (Jones, 2017). Since the 2008

financial crisis consumer credit and mortgage borrowing have increased rapidly as well as stock

markets becoming more accessible to smaller investors as financial products and services grow

(Lusardi & Mitchell, 2014).

A major contributing factor to this increase is the rise in popularity of stock trading apps that allow

you to trade individual stocks as well as take on extra risk in the form of loans to finance your

1
trading all with just a swipe and tap on your smartphone (Morris, 2018). It has always been

beneficial for society that the average stock market investors/traders to be as knowledgeable as

possible when making their investment and financial decisions (Antunes, Macdonald, & Stewart,

2014). However, this is especially the case now more than ever in order to ensure long term

economic health, due to a large segment of stock trading apps’ actively targeted audiences being

millennials and novice investors (Daniel, 2015).

The largest player in this new stock trading app market is currently an app called Robinhood.

Robinhood’s initial marketing was centered around being a massive disruptor to the online

brokerage industry. Robinhood was launched on the Apple store for iPhones and tablets in 2014

with the Android version following shortly after. Robinhood's selling point is to allow any level

of investor to buy and sell stocks and exchange-traded funds exclusive of any commission

fees/payments. Users initially could not short sell or trade mutual funds, options, or fixed income

instruments, however, options trading and short selling are available in the current version. The

information Robinhood chose to convey to customers consisted of basic pricing graphs and dates

of shareholder events such as dividends and earnings announcements. The approach behind this

was that the target audience for the app, millennials, would go on their own to find any information

beyond the basic graphs and dates available on the app. This lack of information may not seem

harmful at first glance. However, the app is targeting millennials and novice investors. Therefore,

the benefit of creating better stock prediction techniques that these young inexperienced investors

could utilize is evident (Daniel, 2015).

The disruptive impact of these commission-free stock trading apps such as Robinhood prompted

traditional brokerages such as Charles Schwab, E*TRADE Financial, and TD Ameritrade to slash

their trading commissions by over 35% in February 2017 (Team, 2017). This is a clear indication

2
that these apps are here to stay and a new major goal for all stock brokerages will be the same as

Robinhood’s; to get as many people investing as possible regardless of experience, knowledge, or

total liquid assets (Braithwaite, 2017).

Furthermore, stock trading apps are making increasing the number of transactions a top priority.

More so than traditional brokerages. It is estimated Robinhood is making ten times the revenue

from payment for order flow as other brokers for the same volume (Kane, 2018). Payment for

order flow can be defined as the compensation a brokerage firm receives for directing orders to

different parties for trade execution. The brokerage firm receives a small payment, usually a penny

per share, as compensation for directing the order to different third parties. This massive revenue

generation from trade volume leads to a focus on increasing transactions meaning that it is in

Robinhood’s best interest to have all their investors be actively trading. Because of this stock price

prediction techniques become even more beneficial due to an investor’s trading decisions and

profit being greatly influenced by potential short-term movements.

On an institutional level, stock price prediction techniques are utilized heavily in algorithmic

trading. Algorithmic trading utilizes computational power and mathematical formulas for trading.

Algorithmic trading makes use of complex formulas, combined with mathematical models and

human oversight, to make decisions to buy or sell financial securities on an exchange (Kim, 2010).

Stock price prediction techniques are often incorporated into institutional firm’s trading

algorithms. Many companies use algorithmic trading to minimize their transaction cost and market

risk (Das & Kadapakkam, 2018).

The rise and development of AI applied to the stock market and within financial firms has been a

major contributing factor in the increase of the algorithmic trading market. Firms such as Sentient

have already built multiple AI-powered algorithmic traders and then distilled them into a single

3
AI-powered algorithmic trader that is being discussed as having the potential to be incorporated as

its own company separate from other Sentient operations (Business Wire, 2019).

The North American algorithmic trading market accounted for the largest share in 2017, and is

expected to retain its dominance until 2026. This is due to strong technological advancements and

considerable application of algorithm trading in various applications such as banks and financial

institutions across the region. Algorithmic trading is responsible for around 60-73% of all U.S.

equity trading. Moreover, according to the CEO of QuantInsti, algorithmic trading can potentially

help expand strategy portfolios by using more advanced quantitative tools and remove human

errors that often affect the performance of trading strategies. Approximately half all exchange in

terms of volume in futures and options market happen through algorithmic trading (Business Wire,

2019).

Due to the nature of algorithmic trading being primarily technical and mathematical, research that

explores quantitative stock prediction techniques is one of the only and primary ways that such an

influential field such as algorithmic trading can continue to incrementally improve. Because of the

stock market impact, a targeted increase in novice investors, and an industry-wide increase in

algorithmic trading, stock market prediction is a societal problem/issue that deserves academic

attention. It is becoming more and more important to develop techniques and methodologies that

will predict systematic risk in world stock markets and offer the possibility to investors to minimize

risk by making more informed buy and sell strategies while at the same time maximizing their

profits. Therefore, this thesis focuses on improving stock price prediction via algorithms and the

analysis of these techniques.

The remainder of this paper is structured in the following manner. Chapter 2 outlines the relevant

literature. Chapter 3 provides a definition of the problem. Chapters 4 presents an overview of the

4
choice of solution method. Chapter 5 examines the design decisions of the proposed models.

Chapter 6 presents the results and analysis and Chapter 7 discusses some important managerial

implications and conclusion.

5
Chapter 2

Literature Review

The basis of selecting articles for the review of the literature was relevancy from the past 10 years

and what various other literature reviews have deemed as seminal papers in the field of stock price

prediction. A summary of the literature review is presented in Table 1. After reviewing the

literature, it is important to note that due to the nature of stock price prediction research, formal

theorizing is less discussed in both practice and academia. Rather, the focus is more on model

selection and justification based on previous applications on other problems. Historically, models

that could solve time series and classification problems were deemed worthy to test on the stock

market. However, recently machine learning models have been able to deal with increasingly

complex data sets. This is good news for the field of stock price prediction as stock market data is

often non-linear and chaotic.

Table 1 presents an overview of how the relevant papers have been classified. The column titled

“Analysis” represents the type of analysis and input utilized in the research. The columns titled

“ML”, “ANN”, “STATS”, and “EA”, represent whether research has utilized either machine

learning, artificial neural networks, statistics, or evolutionary algorithms respectively. The

following is a summary of the various categories organizing the review.

6
Table 1. A summary of the related literature
No. Authors Year Analysis ML NN STAT EA MODEL
1 Yoon 1993 TECH X NN
2 Kamstra, Donaldson 1996 TECH X NN
3 Tsaih 1998 TECH X X NN
4 Kim, Han 2000 TECH X X EA
5 Wang 2002 TECH X FUZZY
6 Leigh 2002 TECH X X EA
7 Kim 2003 TECH X SVM
8 Chen 2003 TECH X NN
9 Wang 2003 TECH X FUZZY
10 Pai & Lin 2005 TECH X X ARIMA
11 Enke & Thawornwong 2005 FUND X NN
12 Armano 2005 TECH X X EA
13 Hassan 2007 TECH X X X EA
14 Schumaker & Chen 2009 SENT X SVM
15 Yu 2009 TECH X SVM
16 Fernandez-Rodriguez 2009 TECH X NN
17 Huang & Tsai 2009 TECH X SVM
18 Demyanyk & Hasan 2010 N/A X X X N/A
19 Tsai & Hsiao 2010 FUND X X NN
20 Kara 2011 TECH X X SVM
21 Groth & Muntermann 2011 SENT X STAT
22 Wang, Wang, Zhang, & Guo 2012 TECH X X HYBRID
Schumaker, Zhang, Huang, &
23 Chen 2012 SENT X STAT
24 Yolcu, Egrioglu, & Aladag 2013 TECH X X NN
25 Kao, Chiu, Lu, & Chang 2013 TECH X X SVM
26 Hagenau, Liebmann, & Neumann 2013 SENT X STAT
27 Yu, Duan, & Cao 2013 SENT N/A
28 Hagenau, Liebmann, & Neumann 2013 SENT X SVM
29 Umoh & Udosen 2014 TECH X FUZZY
30 Lv, Sun, & Liu 2014 TECH X X EA
31 Lv & Liu 2014 TECH X X EA
32 Li, Xie, Checn, Wang, & Deng 2014 SENT X SVM
33 Sun, Shen, & Cheng 2014 TECH X X NN
34 Dash, Dash, & Bisoi 2014 TECH X X EA
35 Adebiyi & Ayodele 2014 TECH X X ARIMA
Lee, Surdeanu, MacCartney, &
36 Jurafsky 2014 SENT X X RF
37 De Fortuny, De Smedt, & Martens 2014 SENT X X SVM
38 Fenghua, Jihong, Zhifang, & Xu 2014 TECH X X SVM
39 Chaabane 2014 TECH X X ARIMA

7
Table 1. Continued.
40 Bisoi, Ranjeeta, & Dash 2014 T/F X X SVM
41 Mondal, Shit, & Goswami 2014 TECH X ARIMA
42 Li, Huan, Deng, & Zhu 2014 T/S X NN
43 Geva, & Zahavi 2014 T/S X X NN
44 Jiang, Chen, Nunamaker, & Zimbra 2014 SENT X STAT
45 Hafezi, Shahrabi, & Hadavandi 2015 TECH X X EA
46 Umoh & Inyang 2015 TECH X X FUZZY
47 Guo, Wang, Yang, & Miller 2015 TECH X X STAT
48 Hafezi, Shahrabi, & Hadavandi 2015 TECH X X NN
49 Ballings, Van den Poel, Hespeels, & Gryp 2015 TECH X X X RF
50 Patel, Shah, Thakkar, & Kotecha 2015 TECH X X RF
51 Nguyen, Shirai, & Velcin 2015 SENT X SVM
52 Rather, Agarwal, & Sastry 2015 TECH X X NN
53 Moghaddam, Moghaddam, & Esfandyari 2015 TECH X NN
54 Patel & Kotecha 2015 TECH X X RF
55 Wang, Wang, Zhao, & Tan 2015 FUND X STAT
56 Sun, Guo, Karimi, Ge, & Xiong 2015 TECH X FUZZY
57 Gocken, Ozcalici, & Boru 2016 TECH X X EA
58 Dash & Dash 2016 TECH X X FUZZY
59 Mahmud & Meesad 2016 TECH X FUZZY
60 Zhou, Gao, Wang, Chu, Todo, & Tang 2016 TECH X NN
61 Shynkevich, McGinnity, Coleman, & Belatreche 2016 SENT X KNN
62 Nie & Jin 2016 TECH X X SVM
63 Qiu & Song 2016 TECH X X NN
64 Chen & Pan 2016 TECH X X SVM
65 Gocken, Ozcalici, & Boru 2016 TECH X X NN
66 Di Persio & Honchar 2016 TECH X NN
67 Lahmiri 2016 TECH X X X SVM
68 Qiu, Song, & Akagi 2016 T/F X X NN
69 Qiu & Song 2016 TECH X X NN
70 Pröllochs, Feuerriegel, & Neumann 2016 SENT N/A
71 An & Chan 2017 TECH X EA
72 Wang, Wang, & Gao 2017 TECH X X EA
73 Shynkevich, McGinnity, Coleman, Belatreche, & Li 2017 TECH X KNN
74 Ouahilal, Mohajir, Chahhou, & Mohajir 2017 TECH X SVM
75 Tao, Hao, Hao, & Shen 2017 TECH X STAT
76 Rout, Dash, Dash, & Bisoi 2017 TECH X X EA
77 Mehmanpazir & Asadi 2017 TECH X X FUZZY
78 Castelli & Vanneschi 2017 TECH X EA
79 Chong, Han, & Park 2017 TECH X ARIMA
80 Weng, Ahmed, & Megahed 2017 TECH X X RF
81 Chen & Hao 2017 TECH X KNN
82 Zhuge, Xu, & Zhang 2017 SENT X NN
83 Kraus & Feuerriegel 2017 SENT X NN
84 Jeon, Hong, & Chang 2018 TECH X X STAT
85 Agustini, Affianti, & Putri 2018 TECH X STAT
86 Matsubara, Akita, & Uehara 2018 SENT X NN

8
Table 1. Continued.

87 Ning, Wah, & Erdan 2018 FUND X EA


88 Zhang, Cui, Xu, Li, & Li 2018 TECH X RF
89 Ebadati, & Mortazavi 2018 TECH X X NN
90 Chaima, Raoudha, & Fethi 2018 T/F X NN
91 Cheng & Yang 2018 TECH X FUZZY
92 Lahmiri 2018 TECH X X X SVM
93 Zhou, Pan, Hu, Tang, & Zhao 2018 TECH X NN
94 Bommareddy, Reddy, & Kumar 2018 TECH X STAT
95 Zhang, Wang, Xu, Chi, Wang, & Sun 2018 TECH X NN
96 Shah, Tairan, Garg, & Ghazali 2018 TECH X X EA
97 Zhang, Shi, Wang, & Fang 2018 SENT X X SVM
98 Hiransha, Vijay, & Krishna 2018 TECH X NN
99 Nguen & Nguyet 2018 TECH X STAT
100 Jeon, Hong, & Chang 2018 TECH X X NN
101 Gocken, Ozcalici, & Boru 2019 TECH X X X NN
102 Vantstone, Gepp, & Harris 2019 SENT X NN
103 Bisoi, Dash, & Parida 2019 TECH X X SVM
104 Liu, & Wang 2019 T/S X NN
105 Rundo, Trenta, & Di Stallo 2019 TECH X NN
Note. ML: Machine Learning, NN: Neural Network, STAT: Statistical Model, EA: Evolutionary
Algorithm, RF: Random Forest, SVM: Support Vector Machine, KNN: K Nearest Neighbour, TECH:
Technical, SENT: Sentiment-based, FUND: Fundamental, T/S: Technical & Sentiment, T/F: Technical &
Fundamental
The first decision researchers must make when attempting to create a stock prediction model is

what type of inputs the model will utilize. The three input categories are (1) technical

analysis/indicators, (2) fundamental analysis/indicators, or (3) sentiment-based analysis. Technical

indicators are the most popular type of input in academic research on stock market prediction.

These indicators are defined as the result of mathematical and statistical techniques applied to

historic time series data including stock price, volume, and in some rare cases interest information.

This is in contrast to fundamental analysis which is less popular in academia but is utilized by

financial analysts daily. The reason for this could be that in contrast to technical analysis which is

derived from the movement of the stock price, fundamental analysis must incorporate various

9
factors such as economic forecasts, efficiency of management, business opportunities, and

financial statements (Wafi, Hassan, & Mabrouk, 2015). Fundamental analysis can be defined as

the method of trying to find the intrinsic value of a stock based on financial analysis. This analysis

incorporates macroeconomic factors, industry factors, and company reports and releases such as

tax and quarterly reviews. The last type of analysis is a sentiment-based analysis. Sentiment

analysis can be defined as analysis based on linguistic feature extraction to extract subjective

information from the written material. The reason for the relative lack of popularity of sentiment

analysis has been due to the difficulty in the development of reliable and efficient sentiment

analysis tools. This is due to both design complexity and relevant source selection which is of

utmost importance. As exhibited in Table 1, it is also possible to use a combination of these

categories.

Another important decision is to decide about the model’s output(s). This can either be stock price,

stock direction, index price, or index direction. Stock price refers to individual stock price data.

This data is a numerical value which the model believes the stock will be priced at in the near

future. The stock direction will output the direction the model believes a stock’s price will move,

this will either be up or down. The difference between a stock and an index is that rather than being

data pertaining to an individual stock, an index measures a section of the stock market via the

combining of price data from multiple stocks. Some popular indices are S&P 500, and Dow Jones

Industrial Average. Based on the literature, it is evident that individual stock outputs are drastically

popular than index-based computations. Lastly, one of the most important decisions in stock price

prediction is to select the model. As seen in Table 1, machine learning techniques, artificial neural

networks, traditional statistical methods, and evolutionary algorithms are used in different studies.

The column labeled “Method” displays the main model type.

10
As previously discussed, the focus of stock price prediction research is the method utilized.

Therefore, it is important to have a full understanding of the logic that drives the model. As well

as some of the relevant recent research which has utilized the model, and the advantages and

disadvantages of the model. A brief overview of the models covered in the literature review is

presented in the next section.

2.1. Stock Price Prediction Models

2.1.1. The K Nearest Neighbor (KNN) Algorithm

The KNN algorithm is a supervised machine learning algorithm (a model that utilizes both input

and output data when learning) used exclusively for classification and regression. The primary

operation is identifying the closest neighbors to the data point being queried. Then, if the task is

classification, the most occurring neighbor value is returned and used as the output value. If

regression is the objective, then the average of all neighbor values is returned and used as the

output value. More specifically K nearest neighbors finds the distance between the data point being

investigated and all the other examples in the data, then the specified (denoted as K) number of

neighbors closest to the query are selected. After this, depending on whether the goal is regression

or classification, the output value of the algorithm is either the mean of the value of the neighbors

or the mode, respectively.

The prediction of the stock market closing price is computed using KNN as follows: a) Determine

the number of nearest neighbors, K, b) Compute the distance between the training samples and the

query record, c) Sort all training records according to the distance values, d) Use a majority vote

for the class labels of K nearest neighbors and assign it as a prediction value of the query record

(Chen & Hao, 2017).

11
Based on the survey of recent stock price prediction literature, this method is used the least out of

the eight most commonly used methods. It is safe to say in recent years stock price prediction

research focused primarily on KNN has died down. This is most likely due to KNN being a very

simple method that often performs poorly compared to newer machine learning approaches. Some

recent examples of this phenomenon demonstrated by Shynkevich et al. (2016) whose results

showed their KNN news sentiment-based stock prediction algorithm was consistently less accurate

when compared to Support Vector Machine (SVM), a more complex machine learning algorithm.

Shynkevich et al. (2017) also investigated the impact of forecasting window length on stock price

prediction machine learning algorithms. Their results showed that their prediction related to

forecasting window length held for artificial neural networks (ANN) and Support Vector Machines

(SVMS). However, regarding KNN, the results showed that “The prediction performance of the

KNN approach is low, the pattern is still visible however its occurrence is significantly affected

by the (comparatively) low performance”.

2.1.2. Random Forests

Random forests (RF) is a machine learning algorithm that utilizes the aggregate analysis power of

a large number of individual decision tree algorithms. Decision trees can be defined as any model

in which each node within the tree aims to split the observed data points as polarly different from

each other as possible while trying to make members of the divided subgroups similar to each

other as possible (Athey, Tibhsirani, & Wagner, 2019). This type of decision tree employed in

large numbers results in a random forest. Random forest is based on the simple principle that a

large number of relatively simple models that exhibit low correlation operating as a joint group

will outperform any of the individual models (Scornet, Biau, & Vert, 2015).

12
The overarching objective when designing a random forest is to not correlate with trees in the

forest. This can be accomplished by performing bootstrap aggregation, a method of resampling

with replacement, and ensuring feature randomness when constructing all individual trees. If a

random forest is created successfully then the prediction done by all trees communally should have

greater accuracy than that of any individual tree. The reason why this method is successful is that

as long all the trees do not contain the same bias, then all members can protect each other from

their errors. Another critical success factor for random forests is that the selected features need to

have proven signals. Lastly, a requirement for a random forest to be effective is a proven signal in

the selected features so that the models that rely on the features are not relying on just guessing.

Lee, Surdeanu, Macartney, and Jurafsky (2014) examined whether textual data improves stock

price prediction when employing random forests to train their models. Their proposed random

forest was made up of 2000 trees and was successful in training all the models tested. Their results

illustrate that the incorporation of textual data improved next day price movement prediction by

10%. Similarly, Weng, Ahmed, and Megahed (2017) examined the effectiveness of incorporating

textual data as well as technical data into stock price prediction. Applying decision trees was one

of the tested methods alongside SVM and NN. They also concluded that textual data incorporation

can improve prediction results. More recently, Zhang, Cui, Xu, Li, and Li (2018) utilized Random

forest as a critical part of training “Xuanwu”, a proprietary stock prediction model. The authors

concluded that the relatively effective performance of their prediction model in terms of accuracy

and returns is due to the incorporation of random forests as one of the integrated models used as a

learning method.

Patel and Kotecha (2015) combined random forests and Support Vector Regression (SVR) and

showed that the hybrid method performs better than applying each method individually. Patel,

13
Shah, Thakkar, and Kotecha (2015) compared RF to ANN and SVM and found that RF

outperforms others. However, it is important to note that the applied ANN was very simple with

only one hidden layer containing 2 neurons. This is below the optimal ANN architecture and can

explain the relatively poor performance of the used ANN.

2.1.3. Fuzzy

A prominent theory that has been applied to a wide variety of problems is the Fuzzy set theory,

first developed by Zadeh (1965). Fuzzy set theory is built upon the logic that elements may both

belong, and not belong, to the exact same set at certain levels. This essentially states that the

membership is in the interval [0,1]. Fuzzy Logic can define systems in both numeric and linguistic

terms. Fuzzy logic-based research has stated that this adds robustness to the method since it is not

purely numerical or purely symbolical, as is the sometimes the case with systems in real life. Purely

statistical models are often dependent on relatively normal data distributions, large sample sizes,

and no uncertain data, to have the best chance at accurate forecasting with minimum bias.

However, this is not the case with fuzzy-based systems as they outperform simpler statistical

methods when dealing with small biased samples (Piterbarg, 2011).

Unlike most relatively successful numerical models, it was quite some time before fuzzy logic had

been utilized for creating forecasting models to address time-series data. Song and Chissom (1993)

were the first to propose a generic fuzzy logic-based time series forecasting. The work by Wang

(2002) was a seminal paper in fuzzy logic being applied to the stock market. He classified the

stock price into 6 different fuzzy categories based on time and price behavior. Sun, Guo, Karimi,

Ge, and Xiong (2015) developed a multivariate fuzzy time series model with multiple factors. To

simplify the calculation and refine the rules, it employs rough set theory into the model, which

enables the model to process large amounts of data. Rough sets can be essentially considered less

14
precise fuzzy sets for the sake of computational efficiency. This utilization of rough sets to simplify

the calculations was originally employed by Wang (2003) and later by Chen and Yang (2018).

Recently, three studies have utilized a promising and interesting fuzzy logic inspired framework,

a Neuro-Fuzzy Inference System (Umoh and Inyang, 2015; Dash and Dash, 2016; Mahmud and

Meesad, 2016). A Neuro-Fuzzy system is a combination of neural networks which heavily utilize

fuzzy logic techniques. A good summary of how fuzzy techniques are utilized by neural networks

is by Karaboga and Kaya (2018) who state "The first layer takes the input values and determines

the membership functions belonging to them. It is commonly called the fuzzification layer. The

membership degrees of each function are computed by using the premise parameter set, namely

{a,b,c}. The second layer is responsible for generating the firing strengths for the rules. Due to its

task, the second layer is denoted as the "rule layer". The role of the third layer is to normalize the

computed firing strengths, by diving each value for the total firing strength. The fourth layer takes

as input the normalized values and the consequence parameter set {p,q,r}. The values returned by

this layer are the defuzzificated ones and those values are passed to the last layer to return the final

output”. The advantage of these proposed systems is that, due to the ANN structure and learning

ability, these models are powerful at tackling a wide variety of problems while utilizing if-then

fuzzy set theory to incorporate human-like reasoning.

2.1.4. ARIMA

One of the more popular methods of stock price prediction is utilizing the simple stochastic time

series model (called ARIMA), which is commonly referred to as the Box-Jenkins model in the

finance industry. ARIMA stands for Auto-Regressive Integrated Moving Average. ARIMA is a

model that is solely intended to be trained to forecast time series data (Babu & Reddy, 2014).

ARIMA is a Generalized Random Walk model which is fine-tuned to eliminate all residual

15
autocorrelation. Autocorrelation is the calculation of the correlation coefficient between variables

based on previous self-values. ARIMA is considered a Generalized exponential smoothing model

that can incorporate long-term trends and seasonality (Kumar & Anand, 2014).

The way ARIMA works is by capturing complex relationships via the utilization of lagged term

observations. The acronym ARIMA provides a good overview of the unique techniques the model

utilizes. This is due to the acronym itself describing the critical parameters of the model: “AR”

stands for autoregression. In the context of ARIMA, AR is how much the model utilizes the

dependent relationship between a time series value and other lagged values in the same time series.

AR(x) means x lagged error terms are going to be used in the ARIMA model. “I” stands for

integrated. In the context of ARIMA, integrated refers to the technique of differencing (subtracting

a time series value from a previous value) so that a non-stationary time series (a time series that

does not have a constant mean, variance, and autocorrelation) can be utilized to create a more

stationary time series (ideally removing all seasonality) that will be easier to analyze. “MA” stands

for moving average. MA is how much the model utilizes the dependency between an observation

and a residual error from a moving average applied to a lagged observation. The common notation

is “MA(x)” where x represents previous observations that are used to calculate current observation.

Recently the majority of stock price prediction research involving ARIMA has been focused on

either incorporating ARIMA into another deep learning model or as a means of comparison. A

very early and prominent example of this is the work of Pai and Lin (2005), which created and

tested a hybrid ARIMA and SVM model. The authors recognized that ARIMA was on the decline

in terms of popularity and demonstrated that ARIMA could still be used to improve machine

learning models. The results showed that the proposed hybrid ARIMA SVM model outperformed

both the solo ARIMA and solo SVM model. Adebiyi, Adewumi, and Ayo (2014) compared

16
ARIMA against a three-layer ANN and found that the ANN outperformed ARIMA in the vast

majority of cases. When their ARIMA predictions were graphed, it became evident that the output

followed a linear direction and was not value-based forecasting. Similarly, Chong, Han, and Park

(2017) found ANN’s to drastically outperform the benchmark Autoregressive model.

2.1.5. Regression

Statistical methods have been employed widely for a long time. However, there has been a massive

surge in more complex statistical methods since the latter half of the 20th century. This rise in

popularity can be attributed to the massive increase in accessible computing power. The

widespread availability of powerful computers has shifted the focus in statistical science from

traditional simple linear models to more nonlinear models, generalized linear models, and also

multi-level models (Al-Jarrah, Muhaidat, Karagiannidis, & Taha, 2015).

Some examples of the computationally demanding statistical methods that can be practically

utilized to their full potential but would not have been possible before include permutation tests,

bootstrapping, and various other resampling-based methods. However, the rapid rise in the latter

half of the 20th century has slowed down to a near halt. The focus has now shifted to even more

robust and extremely complex non-linear models (Jordan & Mitchell, 2015). These models are

referred to as machine learning methods and not statistical models due to the lack of interpretability

in the former.

However, this does not mean that non-machine learning statistical methods have completely

disappeared from the stock price prediction literature. Along with the major stock prediction

methods covered in the literature review, a few papers have employed traditionally statistical

methods to predict financial markets. The most common are linear regression-based prediction.

Due to the sheer amount of research that has been conducted utilizing linear regression no ground-

17
breaking developments are being made in linear regression stock price prediction research. One

recent example is the work by Jiang, Chen, Nunamaker, and Zimbra (2014) that visited company-

specific internet forums, divided users into stakeholder groups, and then analyzed how stakeholder

group postings correlated with events in the company and if that can be used to predict movements

in stock price. This data was incorporated as independent variables into their analysis. Their study

concluded that the inclusion of stakeholder sentiment can improve stock price prediction

performance.

2.1.6. Bayes

When discussing statistical models, a very familiar name in statistics literature has to be mentioned

and that is Bayes. Bayesian logic derived from Bayes’ theorem is a decision-making framework

utilized throughout inferential statistics. The defining characteristic of Bayesian logic is that it

quantifies uncertainty and deals with probabilistic outcomes rather than certain ones (Van de

Schoot et al., 2014). First introduced in the 18th century Bayesian logic has become a fundamental

part of statistical analysis. As will be discussed later in this section, the application of Bayesian

logic to stock price prediction has been drastically influenced by the emergence of machine

learning techniques.

Based on the review of literature, it is evident that research focusing on Bayesian logic being

applied to stock price prediction has decreased in popularity. One of the most prominent papers is

by Tan, Wang, Wang, and Zhao (2015). Their study employed the use of Bayesian factor graphs.

Bayesian factor graphs can be summarized as a method involving finding the optimal structure of

a directed acyclic graph that represents the joint probability distribution for a set of factors. Tan et

al. utilized Bayesian factor graphs to study the impact of macroeconomic events on stock market

indexes. Due to this being a new application of Bayesian factor graphs, they concluded that this

18
early prototype required a greater statistical and theoretical grounding. Besides, they stated that

the development and design of Bayesian factor graphs are extremely difficult. This raises the

question of why one would utilize this method over other machine learning methods that are more

powerful and robust.

Groth and Muntermann (2011) examined the ability of various machine learning algorithms to

improve stock price prediction utilizing textual data sentiment analysis. After showing that events

reported in the news and media can, to a significant degree, create stock price volatility. The

authors then employed various machine learning algorithms to detect patterns in articles that could

predict abnormal increases in risk and volatility stemming from the publication and consumption

of the news and media. A Naïve Bayes model was employed by them. Essentially a Naïve Bayes

model is a model that incorporates probabilistic Bayesian logic while treating each data points as

an individual and not a series, in the context of their study, this meant treating each word in a

sentence independent of the other ones. The authors concluded that as most would expect, Neural

Networks and Support Vector Machines drastically outperformed Naïve Bayes. However, more

notably, it was shown that k-Nearest Neighbor also outperformed the Naïve Bayes algorithm,

further solidifying the fact that the decline in prominence as a primary focus of stock price

prediction research is justified.

2.1.7. Principal Component Analysis (PCA)

PCA is a dimensionality reduction method which attempts to transform a large number of variables

into a smaller set, while not losing most of the uniquely defining characteristics of the original

noisier larger set. This is accomplished by reducing the dimensionality of the original data set. By

definition, a reduction in the number of variables of a data set will result in a reduction of accuracy.

However, when done effectively, this reduction in accuracy should result in simplifications

19
(Jolliffe & Cadima, 2016). Increased simplicity makes data sets simpler to analyze and visualize,

essentially prepping the data set for easier processing by machine learning algorithms. How to

effectively and consistently balance simplicity and accuracy is a challenge that is the focus in

dimensionality reduction method research.

PCA reduces dimensionality by utilizing covariance matrix calculations. Principal Components

are linear combinations or various mixtures of the original data sets variables (Bro & Smilde,

2014). The configuration of the various combinations is done to ensure that the new variables are

not correlated. The goal is to try and fit as much information on the initial variables into the new

principal components. So, a ten-dimensional data set should result in ten principal components,

with a descending amount of the original data sets information when going from the first principal

component to the other, i.e., the first will have as much information as possible, the 2nd will

contain as much of the remaining information as possible, etc. The way this is applied to stock

price prediction is the stock price time-series data along with all technical and fundamental

indicators go through PCA to create solid principal components that capture all the noise of the

original data so that they should theoretically be good predictors of the stock price (Zahedi &

Rounaghi, 2015).

Recently, Guo et al. (2015) utilized a two-dimensional PCA to reduce the dimensionality of a data

set to be utilized by a neural network for stock price forecasting. The results showed that the

proposed two-dimensional PCA reduced data dimensionality more effectively than traditional

PCA. This is similar to Wang and Wang (2015), who utilized PCA to extract the principal

components from their data set and then utilized an ANN variation for stock prediction. This

application of PCA to be used with ANN has proven to be relatively effective and maybe the

direction that most PCA research goes in the near future.

20
2.1.8. Candlestick Analysis/K-Line

Candlestick analysis, or as it is known in Asia K line patterns, is a stock prediction method relying

on the recognition of patterns in stock price time series and then determining indicators for when

this pattern will reoccur (Tsai & Quan, 2014). This method is widely utilized throughout the

industry as a basic analysis due to the less technical/numerical nature of the method. Candlestick

analysis is utilized across all levels of traders from hedge fund managers to individual day traders.

This simplicity may be the reason for the lack of attention to this method from academics.

Recently, Hao, Hao, Shen, and Tao (2017) set out to explicitly find “whether K-line patterns have

predictive power in academia”. They utilized various pattern recognition methods to test the

effectiveness of K-line analysis. The study concluded that the first issue with acceptance into

academia is that there are not concrete and formal definitions of k-line patterns. This lack of formal

definition is a significant barrier to K-line analysis’s wider acceptance and whether additional

research should be conducted in this direction.

2.1.9. Hidden Markov model (HMM)

A Hidden Markov model (HMM) is a finite state machine used to analyze a system that is assumed

to be a Markov chain with unobservable states. In a Markov chain, the next state in the process is

solely dependent on the immediately previous state and not on a sequence of states. HMMs have

proven to be powerful frameworks that can be utilized to analyze time series involving multivariate

data points. HMMs calculate the joint probability of a set of states that are hidden based on the

analysis of a set of states that have been observed. Initially, in the 1970s, HMMs were utilized to

improve speech recognition, where they have proven to be immensely successful. Currently,

HMMs are applied to a wide variety of problems ranging from speech recognition to tumor

recognition. However, the application towards stock price has not been one that has garnered as

21
much attention relative to other stock price prediction methods. Nguyen (2018) is a recent study

that further builds on the work of Nguyen (2014) and Nguyen and Nguyen (2015). Nguyen used

technical indicators as inputs into an HMM for stock price prediction. The study determined

through testing that the five-state model performs best. The five-state HMM also outperformed a

basic historical average return model. However, this study did not compare results with any other

machine learning stock price-prediction models. Due to the nature of HMM, model design

decisions regarding the number of states can alter the performance significantly. Therefore, a

potential future direction of HMM stock price research may be to standardize specific HMM

architectures.

2.1.10. Geometric Brownian Motion

Similarly, another statistical-based model that also utilizes Markovian logic is the Geometric

Brownian Motion Model (GBM). GBM is modeled after Brownian motion (the random movement

of colliding particles suspended in fluids). Brownian motion is considered a classic example of a

Markov process. According to the geometric Brownian motion model, the future price of financial

stocks is determined based on log-normal probability distribution (continuous probability

distribution of a random variable whose logarithm is normally distributed). This log-normal

probability distribution assumption is the main assumption in the Black-Scholes model. Black-

Scholes is used to determine the fair price of a call or put option. Recently, the popularity of GBM

has stagnated when compared to other stock price prediction methods. Based on the literature

review most GBM research shows that while GBM is fairly accurate it does not present any

significant advantages over the more popular stock price prediction methods. Two examples of

this are Agustini, Affianti, Farida, Putri, (2018), and Parungrojrat, Kidsom (2019).

22
2.1.11. Support Vector Machines

A very promising model type in the world of machine learning is Support Vector Machines (SVM).

SVM work by performing a relatively intuitive and straight forward task. The main task is to

separate data into distinct classes via a decision boundary, and then maximize the margin. The

term margin when used in the context of support vector machines refers to the perpendicular

distance between the decision boundary and the data points closest to this line. What makes SVM

so effective is that they classify based on extreme examples. Such as a cat that looks like a dog.

These extreme examples would then become the support vector which would be the points from

which to maximize the distance from the decision boundary. This maximization is simply a

quadratic optimization problem. This is simple and straight forward when the data is linearly

separable.

As mentioned before, stock price data is extremely nonlinear. The way support vector machines

combat this is by projecting the data points into a higher dimensional space by performing a

function on the data to make the classes linearly separable. The downside to this is that it is

extremely computationally expensive. This is why researchers sometimes utilize a kernel function

trick which are optimized algorithms that are efficient at mapping data into a higher dimensional

space (Bisoi, Dash, & Parida, 2019). Then the same process of minimization takes place. The

selection of which kernel to utilize is a critical design decision for SVMs. There are multiple

examples of SVM being utilized for stock price prediction (Fenghua, Jihong, & Xu, Zhifang, 2014;

Lahmiri, 2016). Interestingly, there is also a good amount of research that has found success when

incorporating textual data into their SVMs for stock prediction (Hagenau, Leibmann, & Neumann,

2013; Chen, Deng, Wang, & Xie, 2014; Nguyen, Shirai, & Deng, 2015).

23
2.1.12. Support Vector Regression

Similar to SVM, Support Vector Regression utilizes the same components and techniques, but

instead of classifying, the mathematical operations are tasked with regressing the data points. Li,

Huan, Deng, and Zhu (2014) utilized a multiple kernel SVR to test whether or not the inclusion of

news articles alongside technical indicators improves the predictive power of the SVR model. The

reasoning the researchers presented for utilizing multiple kernels in their study was to have one

kernel dedicated to increasing the dimensionality of news sentiment data and the other kernel

dedicated to increasing the dimensionality of the Hong Kong stock exchange tick data. The results

showed that the Multiple Kernal-SVR outperformed a normal SVR as well as a Naïve Bayes based

model. What was also noteworthy is that the stock price prediction was improved when

incorporating news data. However, a major takeaway is that the experiment concluded that the

incorporation of a more significant number of news sources did not improve the model. Instead,

the focus should be on the selection of an effective approach that can appropriately incorporate

and integrate multiple sources into one.

One study utilized a model in which the stock price predictor aspect was an SVR based proprietary

model developed by a private company. Schumaker, Zhang, Huang, (2012) utilized the AZfintext

qualitative stock price predictor, which employs regressed stock quotes and financial news article

data as inputs into an SVR algorithm for stock price prediction. The main selling point of AZfintext

is the focus on 20 minutes from publication incorporation of financial news data. The purpose of

the research was to test if incorporating sentiment-based analysis (authors tone) into the AZfintext

system would improve stock direction prediction accuracy. The results showed that the

incorporation of sentiment analysis into the AZfintext system did not improve overall prediction

accuracy. A limitation of this research’s application to the broader body of stock price prediction

24
is AZfintext’s focus on a 20-minute time limit to analyze and incorporate news articles into its

model.

2.1.13. Artificial Neural Networks

Artificial Neural Network’s (ANN) experienced an increase in popularity similar to the one

traditional statistical method went through in the 20th century. Both booms were due to improved

computational power. The structure of ANNs is modeled after the structure of the neurons found

in the brain. The defining characteristic of neurons which has led to the creation of the most popular

machine learning model, is that the neurons in the brain are all interconnected. Specifically, each

neuron can be “triggered” by other neurons, while also being able to trigger additional neurons

(Gurney, 2014). This concept of triggering has led to the creation of what we now know as ANN’s.

The structure of modern-day ANNs consists of 3 distinct types of layers: the input, the output, and

a hidden layer. The input layer sends data to the hidden layer, which sends data to the output layer,

which then will give the desired output. Each layer is composed of individual neurons, and each

layer is connected. Meaning that the output of one layer will become the input of the next layer.

Each connection has a weight assigned to it. This weight determines the influence the output of

one neuron has as the input to the neuron in front of it. The term “learning” when it comes to ANNs

means the calculated adjustments that are applied to the weights of these connections (Nielsen,

2015). While this is a higher-level overview of ANN architecture, a more detailed description and

explanation of the technical and mathematical operations occurring in a neural network will be

provided in chapter 4.

The first ANN was developed by McCullough & Pitts in 1944. However, the perception of ANN

was not at all what it is today. It was not until the 1990s when ANNs started to be able to solve

25
relatively complex problems such as recognition of numbers written by hand. However, this was

still not the apex moment for ANN research as other early machine learning methods were still

outperforming ANNs. It was not until the early 2000s when there was a massive increase in the

availability of data which coincidently was the same time that the full potential of parallel

computing on graphic processing units (GPUs) was realized. This led to an increase in the

utilization of ANNs. The defining moment which can be credited as the cause of the modern-day

fascination with ANNs was the creation of AlexNet. AlexNet was the brainchild of Geoffrey

Hinton, Alex Krizhevsky, and Ilya Sutskever, who won ImageNet 2012, an annual image

classification competition, utilizing their ANN. The three Canadian researchers successfully

demonstrated that a deep convolutional neural network with 60 million parameters, 650,000

neurons, and 5 convolutional layers could successfully classify 1.2 million images into 1000

different categories with an error rate of only 16.4%. The training for this extremely deep neural

network was done using parallel processing on GPUs. This record-breaking event launched ANN

into the public eye, and now ANNs are considered the face of machine learning and artificial

intelligence.

The first few notable applications of ANN towards stock market prediction were (Ahmadi, 1990;

Kamijo & Tanigawa, 1990; Kimoto, Asakawa, Yoda, & Takeoka, 1990; Trippi & Desieno 1992;

Choi, Lee, & Rhee, 1995;). More recently, based on the review of literature, it is evident that ANNs

are the most popular and widely utilized method for modern-day stock price prediction research.

The reason for this is that they have proven themselves as a practical and powerful stock price

prediction method. Papers such as (Moghaddam, Moghaddam, Esfandyari, 2015; Di Persio,

Honchar, 2016; Kraus, Feuerriegal, 2017; Chi, Wang, Zhang, Sun, Xu, 2018; Hiransha, Vijay,

Krishna, 2018) have taken a standard model testing approach to the various ANN architectures

26
possible. It is important to note that based on this literature review and the larger body of machine

learning research, no one neural network structure has proven to be the ideal, or even the

significantly more suited for stock price prediction.

Sun, Shen, and Cheng (2014) utilized data on the movement of stocks to analyze whether or not

trading behavior can be mapped and used to predict stock prices. For each individual stock, the

stock trading activities were analyzed, and a network was mapped. Then, the trading relationships

were classified and grouped into appropriate categories. The researchers then utilized a granger

causality analysis to prove that stock prices were indeed correlated with the different trading

categories. To test the trading predictability power, a simple 3-layer feed-forward ANN was

utilized. The ANN incorporated technical indicators as well as the trading indicators. The results

show that ANN performs very well. This result can be considered relatively intuitive because it is

a well-known fact that traders get influenced by the activities of other traders.

Geva and Zahavi (2014) tested whether market data, simple news item counts, business events,

and sentiment scores, could improve various machine learning stock price prediction algorithms.

The models tested were ANNs, Decision Trees, and a basic regression. The results showed that

among the algorithms tested, only the ANN was able to fully exploit the more intricate nature of

the proposed sentiment/news inputs. The other models could not take advantage of these inputs by

realizing the very intricate relationship between price and the sentiment/news indicators.

Furthermore, another study incorporating sentiment data from news and microblogs is the work of

Zhuge, Xu, and Zhang (2017). The researchers utilized shanghai Composite Index data as well as

what the paper refers to as emotional data. Emotional data, in this case, is sentiment analysis from

news and microblogs discussing the specific company being analyzed. The researchers found that

27
15 input variables made up of sentiment indicators as well as technical indicators can successfully

predict the Chinese company’s stock opening prices.

Chaima, Raoudha, Fethi, (2018) proposed a simple neural network to test whether the inclusion of

accounting variables generated from the release of accounting disclosures improved the prediction

accuracy of the ANNs. In addition, whether or not major events in the country impacted the amount

the accounting variables improved the ANN, was also being tested. The market from which the

stocks were selected was the Tunisian Stock Exchange. The results of the study showed that by

combining 48,204 daily stock closing prices of 39 companies with the respective accounting

disclosure variables, the ANNs' quality of prediction improved. However, this level of

improvement drastically dropped when the ANN was predicting prices in 2011, a time of civil

unrest in Tunisia. This extreme example is noteworthy since a variable observed as consistently

improving the model accuracy lost its impact when emotional events occurred. It further proves a

point repeatedly brought up in recent stock price prediction research, that events and news can

significantly influence the stock price.

Lastly, Vantstone, Gepp, and Harris (2019) aimed to see if the prediction of the price of 20

Australian stocks by a Neural Network Autoregressive (NNAR) model could be improved via the

inclusion of inputs in the form of counts of news articles as well as counts of Twitter posts. The

sentiment based indicators utilized in the study were generated by Bloomberg. Sentiment based

indicators such as these are becoming increasingly available, and due to the overall improvement

in data mining techniques, these indicators should theoretically be more reliable than ever. The

study found that the NNAR that incorporated the Bloomberg generated news and Twitter-based

sentiment indicators had a higher quality of stock price predictions. A major takeaway from this

research is that due to the indicators being created by Bloomberg, no text/data mining models had

28
to be utilized by the researchers. Due to being readily available from Bloomberg, the resulting in

the ease of incorporation of these news and Twitter indicators into ANNs is equal to that of any

other technical indicator.

Consdering the related literature discussed in this section, the contributions of this work are

summarized as follows:

• Based on the review of literature, this proposed research will be the first study to utilize

Bloomberg Twitter and News Publication Count variables as inputs for stock price

prediction within the North American context.

• In the proposed research, we utilize Twitter and News Publication count variables as inputs

into MLP and LSTM neural networks to test the impact of the variables on different neural

network types while also comparing both models’ stock price prediction performance.

• We also investigate whether there is a significant decrease in model performance in times

of widespread market panic.

29
Chapter 3

Problem Definition

Because of the volatile and chaotic nature of the stock market, accurate price prediction is a

challenging task. Improving stock price prediction will increase returns and reduce the risk for

investors both at the individual and institutional levels. Incremental improvements in stock price

prediction that can be utilized by active investors will result in more informed investors generating

greater returns. Being able to predict movement in the stock market does not only mean more profit

for investors, but the health of financial institutions directly impacts the health of the overall

economy. In North America alone, the financial sector had US $37.4 trillion in assets under

management (Szmigiera, 2019). Empirical evidence also suggests that non-bank financial

institutions such as hedge funds and investment firms introduce an excessive level of risk into the

financial sector negatively impacting the general economy (Liang & Reichert, 2012). Therefore,

any improvement in stock price prediction that can mitigate risk should positively impact the

overall economic health of the country.

Based on the review of literature, a promising new direction being taken by stock price prediction

researchers is that of including twitter and news publication analysis into stock prediction models.

This increase in having twitter and news publication analysis into stock prediction research can be

attributed to the rise in the use of twitter and social media as a means of opinion sharing.

Furthermore, primary finance/investing information sources such as Bloomberg (2019), Friedberg

Direct (2019), and Investopedia (2019) have recently discussed the impact social media and news

have on the stock market.

30
Regardless of technical analysis skills, Twitter and news publication analysis on its own is an

enormous task to undertake requiring specialized knowledge to scrape the web and develop

meaningful insights. Due to the difficulty of the task, a more practical approach to take advantage

of Twitter and news publication analysis that can be utilized widely by the stock price prediction

researchers is that of including pre-generated Twitter and news indicators into their technical

indicator based price prediction research.

However, even sentiment analysis-based indicators require a plethora of decisions impacting if the

sentiment is determined to be good or bad or neutral. The subjective nature of sentiment analysis

strays very far from the nature of other technical analysis variables. Therefore, Twitter and News

Publication Count should be looked at instead. The reason for this is that Twitter and news count

are similar to all other technical indicators in that they should be the same for the specific day and

stock whether the company providing the variable is Yahoo or Bloomberg or Google. Due to these

variables being objective and universal truths, they are better candidates for inclusion in technical

analysis research than purely sentiment-based variables.

Another area of interest that deserves greater attention in stock price prediction research is how

well these models perform when there is widespread panic influencing the economy. The

justification to investigate price prediction performance in times of panic-driven economic

downturns is due to the mixture of two issues. These issues being the previously mentioned rise in

dependency on algorithms within the stock trading market and the macroeconomic implications of

stock trading. The combination of financial institutions’ influence on the economy, and reliance

on algorithms, can result in a slippery slope if high-frequency buy and sell decisions are being

made on predicted prices and the predictive power of the model dramatically decreases when the

market exhibits panic.

31
The reason why investigating the accuracy of price prediction models during times of panic may

result in useful insights is that stock prices drastically increase in volatility during periods of

recession (Asteriou, Pilbeam, Sarantidis, 2019). This increased volatility will result in greater

difficulty in predicting prices accurately, potentially rendering the models useless. Interestingly,

contrary to the positive impact of news related data on stock price prediction methods compared

to exclusively technical data exhibited in the literature review. Trade volume has been empirically

observed to be a better predictor of panic-induced volatility than other traditional inputs utilized

in market crash prediction research (Erdogan, Bennett, Ozyildirim, 2014; ). On the other hand, it

has been shown that investors react differently to the news during times of panic (Kleinnijenhuis

et al., 2013; Angelovska, 2017). This may mean that the powerful robust sentiment/news-based

models that predict stock prices better than their technical only counterparts might not exhibit this

greater accuracy when panic strikes the market and investors’ reactions to news deviate from the

norm.

Based on these factors, this thesis will be the first to utilize neural networks as the base model for

stock price prediction to examine whether or not Bloomberg generated Twitter and News

publication count indicators effortlessly improve neural network stock price prediction within the

North American context enough to warrant advising future academic research to include

Bloomberg twitter and news indicators in all technical analysis based research. Furthermore, this

research will also be testing the change in predictive power of the proposed neural networks for

stock price prediction with and without Twitter and News Count indicators during times where the

stock market is greatly influenced by public panic compared to more stable economic periods.

32
Chapter 4

Choice of Solution Method

Based on the review of literature, it can be observed that neural networks are the primary technique

utilized in stock price prediction research currently and for the foreseeable future. This increase in

neural network utilization is due to two factors. First and foremost, neural networks have proven

to be extremely robust models with both the statistical strength and computational efficiency to

handle extremely non-linear chaotic data effectively. This is evidenced by the drastic rise in the

utilization of neural networks for the handling of very complex problems, as discussed in the

previous section. This popularity not only means more researchers and practitioners are utilizing

neural networks. But also, that neural networks themselves are continually being worked on and

improved more so than other models.

The second reason for the increased utilization of neural networks is that compared to most

machine learning methods, the barrier to entry for neural networks is decreasing daily. The first

increase in neural network accessibility can be credited to the rise in popularity of simpler easy to

learn programming languages such as Python and R. Due to the colloquial nature of these

languages, it is now more common than ever that non-computer scientists know how to code. And

if one knows how to code in Python, then there are a plethora of packages that make the

construction of custom neural networks relatively simpler and, in some cases, even modular.

Because of this ease of use and the nature of stock price prediction being a topic of interest from

fund managers to individuals investing even in one stock, neural networks are the model which

can be defined as having the greatest ease of use to prediction power ratio. This effect is further

compounded due to companies such as Microsoft and Google working on tools that will make

33
neural network creation a matter of making design decisions and clicking. Microsoft is actively

promoting and continuously working on Azure machine learning studio, as well as ML.net, both

tools meant to make neural network formation easy as creating a PowerPoint presentation. Google,

on the other hand, has been utilizing machine learning and neural networks across all their

products. Due to this, Google has also created TensorFlow, a suite of tools and frameworks meant

to simplify, visualize, and optimize specifically neural networks. This industry focus and funding

will ensure that neural network creation will continuously improve in terms of effectiveness and

ease, which will result in a continued rise in popularity. Therefore, it is a sound decision for current

and future stock price prediction research to utilize neural networks as the primary prediction

technique.

Due to these factors, this research will be utilizing neural networks as the base model for stock

price prediction to test whether or not Bloomberg generated Twitter and News publication count

indicators improve neural network stock price prediction within the North American context

enough to warrant advising future academic research to include Bloomberg sentiment indicators

in all technical analysis based research. In addition, we will also be examining the change in the

predictive power of proposed neural networks for stock price prediction with and without Twitter

and News count indicators during times where the stock market is greatly influenced by public

panic compared to more stable economic periods.

4.1 Neural Network Selection

Table 2 presents an overview of the types of neural networks utilized in technical variable-based

stock price prediction research in recent years. The two most prominent neural network types

found in Table 1 are the Multiple Layer Perceptron (MLP) and Long Short-Term Memory (LSTM)

Networks. Only three papers out of 17 did not utilize either an MLP or LSTM as one of the types

34
of stock price prediction neural networks. Therefore, for this research, the two commonly used

network types (i.e., LSTM and MLP) are chosen.

Table 2. Types of Neural Networks found in recent stock price prediction research

Authors Year MLP LSTM CNN RNN Novel


Rather, Agarwal, & Sastry 2015 X
Hafezi, Shahrabi, & Hadavandi 2015 X
Moghaddam, Moghaddam, & Esfandyari 2015 X X
Zhou, Gao, Wang, Chu, Todo, & Tang 2016 X
Qiu & Song 2016 X
Gocken, Ozcalici, & Boru 2016 X
Di Persio, & Honchar 2016 X X X X
Akita, Yoshihara, Matsubara, & Uehara 2016 X
Neslon, Pereira, & de Oliveira 2017 X
Ebadati & Mortazavi 2018 X
Zhou, Pan, Hu, Tang, & Zhao 2018 X X
Zhang, Wang, Xu, Chi, Wang, & Sun 2018 X
Hiransha & Vijay Krishna 2018 X X X X
Jeon, Hong, & Chang 2018 X
Gocken, Ozcalici, & Boru 2019 X
Liu & Wang 2019 X
Rundo, Trenta, & Di Stallo 2019 X
Note. MLP: Multiple Layer Perceptron, LSTM: Long Short-Term Memory, CNN: Convolutional Neural
Network, RNN: Recurrent Neural Network

4.1.1. Multiple Layer Perceptron (MLP) Network

An MLP is a type of feed-forward neural network which utilizes multiple layers of perceptrons to

make sense of non-linear data. A single perceptron is an algorithm that is used to perform

classification tasks on linearly separable data effectively. The way this is done is via a single-layer

perceptron network, which can be defined as an input fed into a perceptron that utilizes the delta

between the desired output and calculated output to adjust the weights within the perceptron. A

35
single perceptron network is considered the most basic type of neural network. A simple

mathematical representation of a perceptron for time series prediction is:

𝑌 𝑛 (𝑡) = Φ (∑ 𝑤𝑘 ⋅ 𝑋𝑘 (𝑡) + 𝑏 ) (1)


𝑘

Where Φ is an activation function that essentially squeezes any value in between a specific range.

The most commonly utilized activation functions are sigmoid and tanh. The sigmoid activation

function takes any values and plots them on a relative scale of 0 to 1. The tanh activation function

works similarly, but the range of output values is -1 to 1. 𝑋𝑘 (𝑡) is the inputs being fed into the

perceptron. 𝑤 and 𝑏 are the weights and bias, respectively. K represents which value within the

vector the weight is assigned to (i.e., each individual value that comprises the vector has a specific

weight associated with it). This algorithm is then repeated while the weights are adjusted based on

the delta between actual and predicted values.

MLP’s take advantage of the linear classification power of individual perceptrons by stacking the

perceptrons on top of one another, creating a layer of multiple perceptrons. Each perceptron can

now be referred to as a hidden node (a neuron not part of either the input or output layer). For time

series analysis, it is common for each input to be fed into each perceptron. By increasing the

number of perceptrons, the MLP can make sense of chaotic and nonlinear data. By doing this, an

MLP is able to learn an XOR function, a feat that is impossible for a single perceptron network. A

simple mathematical representation of the input variable processing happening within an MLP is:


𝑌𝑛 (𝑡) = Φ (∑ 𝑤𝑛𝑜 ⋅ Φ (∑ 𝑤𝑘,𝑛 ⋅ 𝑋𝑘 (𝑡) + 𝑏𝑛ℎ ) + 𝑏𝑛𝑜 ) (2)
𝑘

‘o’ represents values associated with the output neurons; ‘h’ represents values associated with the

hidden neurons. The above example illustrates a single pass-through of data where an initial input

36
will generate a single output. The learning aspect occurring in all of the machine learning can be

simply represented in the form of a classic optimization problem. Below is a representation of how

an MLP will learn patterns in nonlinear data via the minimization of error between actual and

predicted values.


Minimize 𝔼𝑡 ‖𝑌(𝑡) − Φ(∑ 𝑤𝑛𝑜 ⋅ Φ(∑𝑘 𝑤𝑘,𝑛 ⋅ 𝑋𝑘 (𝑡) + 𝑏𝑛ℎ ) + 𝑏𝑛𝑜 )‖2

(3)
Variable (𝑤𝑘,𝑛 , 𝑏𝑛ℎ , 𝑤𝑛𝑜 , 𝑏𝑛𝑜 )1≤𝑛≤𝑁,1≤𝑘≤𝐾

MLPs are proven to be effective neural networks when it comes to time series data. This

effectiveness is evidenced in the widespread usage of MLPs within research as can be seen in

Table 1 as well. This fact along with the ease of implementation makes MLPs the ideal candidate

for this research.

4.1.2. Long Short-Term Memory (LSTM) Networks

As a counterbalance to the straightforward approach of MLP’s, the second neural network type

which we will be testing is the LSTM. The major reason for selecting LSTMs is because apart

from MLPs, LSTMs are the next most widely deployed neural networks in stock price prediction

research. LSTMs are a type of recurrent neural network. It means that LSTMs, unlike their feed-

forward counterparts, utilize the previous output as an input into the next timestamp allowing the

model to factor in time intervals. Whereas a feed-forward neural network such as an MLP treats

the 1st inputs and the 1000th inputs as the same, recurrent networks treat the data as a sequential

set. This makes LSTM prime candidates for time series prediction. Even though the basic linear

transformations occurring within an LSTM are relatively similar to that of an MLP, the major

defining factor which has resulted in LSTM’s widespread usage is the addition of “gates” and

“states”. This addition drastically changes the nature of the neural network. LSTMs create internal

memory by utilizing operation gates that tweak the internal state variables.

37
LSTM’s can be thought of as a manufacturing factory. The input is the raw materials, the desired

output is the final product. To become a final product, the raw materials need to be processed on

an assembly line. Based on this analogy, gates can be thought of as the point at which a choice is

made regarding if something needs to be added, taken away, or modified before the raw materials

can move on to the next stage. The hidden and cell states can be thought of as explicit instructions

stating what will move on to the next stage in the assembly line. The three gate types are input(i),

forget(f), and output(o). The two-state types are hidden(h) and cell(c). A simplified mathematical

representation of the gates and states and how they relate to each other can be found below.

(4)

(5)

(6)

(7)

(8)

All three gates utilize the previous hidden state and the current input data, then multiply them by

the weight of the specific gate, and add a gate specific bias. After that is completed, the value is

passed through a sigmoid function. Because of the sigmoid function, the input, forget, and output

gates will return a value between 0 and 1. The input gate determines how much information from

the previous hidden state and current input needs to be added into the cell state. The forget gate

decides what percentage of the last state cell needs to be overlooked. The output gate is utilized in

the calculation of the current neurons' hidden state. A visual representation of the inner workings

of an LSTM neuron can be found in Figure 1. Due to this utilization of selective memory LSTM’s

have been proven to be effective at making sense of nonlinear time series data. Based on this fact

38
and the previously discussed strengths of MLP’s, we will be utilizing both LSTMs and MLPs for

our analysis.

Figure 1. Inner Workings of LSTM Neuron, adopted from Olah (2015)

39
Chapter 5

Designing Method

5.1. Neural Network Design

There are a few critical design decisions that have to be taken into account for all neural networks.

These include the number of hidden layers, the number of hidden nodes, the number of training

epochs, as well as which cost function and which optimizer utilizes. It is important to note that in

the larger body of research that employs neural networks as the primary algorithm, there is a lack

of standardization regarding the design. Furthermore, the vast majority of research focused on

neural network design is more focused on hyperparameter optimization via algorithms rather than

establishing empirical standards that researchers can utilize as proven baselines to effectively

compare and further develop different neural networks (Yang & Shami, 2020). In this section, we

will go over how each neural network design decision was made.

Hidden layers refer to any layer that is not either the input or output layer. It has been proven that

a neural network with a single hidden layer is capable enough to approximate any univariate

function (Guliyev & Ismailov, 2016). However, the nature of stock price prediction is multivariate.

When it comes to multivariate problems, there is a general agreement in the machine learning

community that the number of hidden layers rarely needs to be over two. This widespread

agreement is backed up by the fact that two hidden layers in a simple feed-forward network have

been able to successfully approximate a multivariate function (Stathakis, 2009; Thomas, Petridis,

Walters, Gheytassi, & Morgan, 2017). Due to these factors, both the LSTM and MLP will have

two hidden layers.

The cost function in terms of neural networks is the error measure that will be calculated for every

actual and predicted output. The cost function is what the optimization algorithm will be using a
40
measure to assess how the neural network is performing. The cost function that will be minimized

by both networks will be Mean Squared Error (MSE). The reason for this is due to the proven

effectiveness of MSE as a cost function. MSE has been used as the cost function for simple

problems. However, MSE is proven very effective when tackling complex problems such as

utilizing neural networks to enhance the quality of speech audio when there is a large amount of

background noise (Saleem & Khattak, 2020). Also, it has been shown that MSE as a cost function

can optimize neural networks successfully, even when dealing with extensive magnitude training

data (Zhang, Shen, Zhou, & Xu, 2019). Due to the proven capability of MSE as a cost function for

neural networks, both the LSTM and MLP will be minimizing MSE as their objective function.

The term “learning” in machine learning refers to the optimization of the network via adjustments

to the weights and biases based on how much the cost function is reducing and how close it is to

minimization. The optimizer that will be utilized by both networks will be Adam. The name Adam

is derived from Adaptive Moment Estimation. Adam is simple to implement a stochastic gradient-

based optimization algorithm that is also computationally efficient. It was initially introduced by

Kingma and Ba (2014) and has been tried and tested by researchers at OpenAI and Google

Deepmind. Adam has been shown to be successful at handling non-stationary data, while also

being able to handle both sparse and noisy gradients. In addition, the specific combination of

LSTM and Adam has been demonstrated to be an effective one (Jiang & Chen, 2018; Chang,

Zhang, & Chen, 2019). Due to these reasons, Adam can be considered a logical choice as the

optimization algorithm neural network-based time series research for the foreseeable future.

Hidden nodes refer to any neuron not found in either the input or output layer. The number of

hidden nodes is a decision that is specific to the type of network, the nature of the data, and the

problem trying to be solved. Due to this, iterative testing needs to be performed to determine the

41
optimal number of hidden nodes in the hidden layers. However, unlike epoch iterative testing,

which will be discussed next, there is a generally agreed-upon formula that can be used to create

a range for the iterative testing. The formula can be defined below.

𝑁𝑠
𝑁ℎ = (9)
(𝛼 ∗ (𝑁𝑖 + 𝑁𝑜 ))

Where:

Nh: The number of hidden nodes.


Ni: The number of input neurons.
No: The number of output neurons.
Ns: The number of samples in the training data set.
α : An arbitrary scaling factor, usually 2-10.

Due to the number of hidden nodes being dependent on the number of input/output neurons and

the number of samples, we can utilize the above formula to determine the range of how many

potential hidden neurons both the LSTM and MLP should have. With the number of samples in

the training data being 920, the input layer being six neurons, and the output layer being one

neuron. The range (utilizing alphas from 2 to 10) the number of hidden neurons for both models

should fall between 13 – 65. Based on this range, iterative testing was conducted for both types of

neural networks. Model MSE loss during the training phase was utilized as the measure of

performance. Based on the results of the iterative testing, it is evident that for the specific purposes

of our research, 13 hidden neurons are deemed proper for the MLP. And for the LSTM, 60 hidden

neurons are deemed proper. Another design decision is the number of epochs. An epoch is one

pass-through of all sample data. Meaning that if sample data contains ten samples, one epoch

would be the neural network parsing through all ten samples one time. It is widely agreed that the

number of epochs should be determined on a case by case basis. Therefore, we have performed

iterative testing to see where the loss function stops decreasing, and no more “learning” is

42
occurring. Based on the results of the iterative testing, the optimal number of epochs for the LSTM

was determined to be 650. While the optimal number of epochs for the MLP was determined to be

4000. The reason for this large discrepancy could potentially be a result of the more simplistic

nature of MLP, leading to the requirement of a larger number of epochs.

5.2. Data Selection

5.2.1. Stock Selection

Based on the review of stock price prediction literature, a wide spectrum of reasons are provided

as to why the stocks utilized were selected. In most cases, the reasoning behind the selection of

specific stocks is not a point of detailed discussion. In this research, the stocks that will be analyzed

are Facebook, Apple, Netflix, Google, Ford, Tesla, Walmart, and Amazon. The primary goal of

this study is to investigate the impact that Twitter and News count variables have on stock price

prediction, specifically within the North American context. Due to this focus on North America,

the group of stocks referred to as ‘FAANG’ was chosen. FAANG is an acronym for Facebook Inc.

(FB), Apple Inc. (AAPL), Amazon.com Inc (AMZN), Netflix Inc. (NFLX), and Alphabet Inc.

(GOOG). The term FAANG was first used by the former Goldman Sachs fund manager Jim

Cramer (Skehin, Crane, & Bezbradica, 2018). The reason for this specific grouping is that these

five publicly traded companies listed on the NASDAQ are very significant to the North American

investors and the stock market as a whole. This is evidenced by the fact that FAANG stocks had a

combined market capitalization of $4.1 Trillion, making FAANG stocks 16.64% of the total S&P

500 market capitalization as of July 2020. The S&P 500 is an index comprised of 500 companies,

and historically the index has had a market capitalization valued at 70% to 80% of the total U.S

stock market. The percentage of FAANG stocks contribute to the S&P 500’s total market

capitalization is testimony to the broader significance of FAANG stocks to the overall North

43
American stock market. The movement of FAANG stock influences North American investors'

view of the overall stock market, directly impacting trading decisions.

The reason for selecting Walmart is due to the nature of the Bloomberg Twitter and News count

variables. As previously mentioned, the two variables of interest are count variables that measure

the number of occurrences a specific company is mentioned on either Twitter or a digital news

publication. Due to digital and social media both being relatively recent in terms of mass popularity

and more popular with a younger demographic, there may be a difference in the added utility of

Twitter and News count variables for companies with traditional business models compared to a

company with non-traditional business models. In order to test this Walmart was selected to be

compared to Amazon. The purpose of this comparison is to see if there is any difference in the

way twitter and news count variables are formed for a traditional brick and mortar retailer as

opposed to a newer retailer with virtually no brick and mortar presence, such as Amazon. Similarly,

the reason for selecting Ford and Tesla is the comparison between a traditional automobile

manufacturer and a completely electric vehicle manufacturer. Ford operates on the traditional car

franchise dealership model (Crane, 2015) while Tesla showrooms are often found in malls, and all

orders are placed online (Johnson & Reed, 2019). It will be interesting to see if this contrast

between traditional and non-traditional business models has any impact on the Twitter and News

count variables themselves as well the utility of these variables as inputs in stock price prediction

models.

5.2.2. Variables Selection

Table 3 summarizes the technical indicator selection for neural network-based stock price

prediction within the past few years. The six most popular inputs are “Open price”, “High price”,

“Low price”, “Close price”, “Moving average price”, and “Trade Volume”. Out of these, the four

44
most popular are “Open”, “High”, “Low” and “Close”. Therefore, Moving Average (price

averaged over a specified amount of periods), and Trade Volume (the number of trades performed

in a day) will only be utilized in half the models. In contrast, the other half use the four most

popular inputs, as well as Bloomberg generated Twitter and News Count data. There are multiple

reasons for replacing the two least used variables. The first being that this provides a more direct

comparison of the power of these variables on their own to improve stock price prediction. Since

the number of variables being utilized does not change, it cannot be said that any increase in

performance is a result of an increase in the amount of data fed into the models. Furthermore

specific to neural networks, the design decisions related to the number of neurons change as a

result of a change in the number of input variables. If a model utilizes 8 variables as its input the

range of appropriate number of neurons also changes. Therefore, for the purposes of empirically

sound comparison, we will be keeping the neural network’s design consistent regardless of what

variable set is being used. All the variables are collected from Bloomberg in a comma-separated

value format. The range of all data will be from January 2015 to May 2020. The definitions of

variables are provided in Table 4.

Table 3. Indicator Selection

45
Table 4. Variable Definitions
Variable Name Definition
Open Price Dollar value of the first trade since market open
High Price Highest dollar value trade of the day
Low Price Lowest dollar value trade of the day
Close Price Dollar value of the last trade before market close
30 Day Moving
Average Average dollar value of one share in the previous 30 days
Trade Volume Total quantity of shares traded all-day
Represents the total number of tweets mentioning the parent company over a 24-hour
Twitter Count period. The sources for this are Twitter and Stocktwits
News
Publication Represents the total number of news publications mentioning the parent company over a
Count 24-hour period. The sources for this are stated as all besides Twitter and Stocktwits

5.3. Final Experiment

Once daily stock data has been collected from Bloomberg, the variables are then split into a

technical only set (T) and a technical plus Twitter & news set (T+). A log transformation will be

performed on the Twitter Count & News Publication Count variables due to the variables

exhibiting skewness. After the log transformation, all Twitter and News publication count

variables should exhibit an acceptable level of skewness. Min-max normalization will be

performed on all remaining variables. The variables will then be further split into three sets, a train

set, a normal test set, and a panic test set. The training set is made of all daily variables from

January 2015 to 2019. The normal test set is defined as all daily variables from January 2019 to

November 2019. The panic test set contains all daily variables from January 2019 to May 2020.

Four price prediction models have been built for each stock. Each model will train and run the

neural network five times which will then average their predictions to get the final model

prediction. This is due to the stochastic nature of neural networks, which results in slight variances

in performance. The first two models will utilize LSTM’s with one model being fed Open price,

High price, Low price, Close price, Trade Volume, & 30 Day Moving Average as its inputs. The

second model will utilize LSTM’s with Open price, High price, Low price, Close price, Twitter

46
Count, & News Count as its inputs. All models will have the next day’s Close price as their output.

The third and fourth models will be comprised of MLP’s with the two various sets of inputs. All

models will be tested on both the normal test set and the panic test set. RMSE will be the accuracy

measure utilized to evaluate model prediction results. The lower the RMSE, the better the stock

price prediction will be considered. The reasoning for selecting RMSE over other error measures

is due to the fact that RMSE is expressed in terms of the original unit being measured, making

RMSE useful for error gap analysis between the expected and predicted values (Kumar, Kumar,

& Kumar, 2020). Additionally, RMSE gives higher weight to larger errors relative to other error

measures and is considered best suited for domains where substantial errors in accuracy are

especially unwanted (Joseph, Obini, Sulaiman, & Loko, 2020). This incorporation of magnitude

of error into RMSE makes it useful in stock price prediction research due to the practical

implication of larger errors in prediction potentially resulting in a greater loss when making buy

or sell decisions. This is why recent stock price prediction research utilizes RMSE as the error

measure for comparative analysis of different companies' stock price prediction (Thakkar &

Chaudhari, 2020).

47
Chapter 6

Results & Discussion

The final experiment results can be found in Appendix A. RMSE is calculated for each individual

model & test set configuration. Each model’s ability to predict stock prices is evaluated on how

low the RMSE is. The primary focus of this research is whether Twitter and news variables can

improve stock price prediction. T+ represents a model that includes technical variables along with

Twitter and news variables, while T represents a model that only consists of the technical variables.

Stock price prediction improvement between T+ and T models performance is evaluated by

subtracting the former’s RMSE from the latter, and then dividing the difference by the T model

RMSE to get the percentage difference.

Furthermore, when analyzing the impact of input data type, test data type, and neural network

selection, a focus will be placed on looking at the average statistics of the top and bottom 50% of

performers as a group rather than focusing on individual performances. Focusing on the average

relative difference between T+ input data and T input data for the same test set as a group should

help in developing insights that are independent of decisions (e.g., individual stock selection),

which may impact stock price prediction performance. The reason why these insights could

potentially be of greater value is due to the possibility of increased practical replicability and

utility. Therefore, when analyzing the impact of T+ variables, we will be doing so based on the

RMSE difference when compared to models utilizing exclusively technical variables.

6.1. The Impact of T Variables vs the T+ Variables

Table 5 presents an overview of where utilizing T+ variables rather than T variables did and did

not result in an improvement in RMSE. It is interesting to note that out of the seven scenarios

48
which did not benefit from T+ variables, 4 involve Walmart. Walmart is an extreme outlier due to

the prediction of Walmart stock closing price, not improving as a result of T+ variables while all

other stocks noticed some improvement. Regarding the rest of the three scenarios in which T+ did

not result in an improved prediction, Tesla stock predicted via LSTM did not improve for both test

sets, which is not the case for Tesla’s MLP counterpart. Finally, the last configuration which did

not result in improvement is Facebook utilizing MLP for the panic test set, making this

configuration the only one out of four for Facebook, which did not benefit from the addition of T+

variables.

Table 5. RMSE Percentage change when utilizing T vs T+ variables

NN %RMSE
Type Company Test Set T RMSE T+ RMSE RMSE +/- +/-
Amazon PANIC 0.062463331 0.060182 0.002281367 4%
Apple PANIC 0.128599041 0.1271058 0.001493242 1%
Facebook PANIC 0.076986501 0.0739217 0.003064785 4%
Ford PANIC 0.136406012 0.075534 0.060872043 45%
Google PANIC 0.062709786 0.0493791 0.013330685 21%
Netflix PANIC 0.11226308 0.0921643 0.020098763 18%
Tesla PANIC 0.176320498 0.1875775 -0.01125698 -6%
Walmart PANIC 0.058447047 0.0727827 -0.01433569 -25%
LSTM
Amazon NORMAL 0.046723821 0.0421004 0.004623444 10%
Apple NORMAL 0.058480253 0.0529901 0.005490104 9%
Facebook NORMAL 0.044041893 0.0366755 0.007366368 17%
Ford NORMAL 0.040241105 0.0300688 0.010172315 25%
Google NORMAL 0.034874718 0.0287432 0.006131553 18%
Netflix NORMAL 0.042107441 0.0370617 0.00504572 12%
Tesla NORMAL 0.053439369 0.1009144 -0.04747507 -89%
Walmart NORMAL 0.040209573 0.0425148 -0.0023052 -6%
Amazon PANIC 0.01949539 0.0192764 0.000218955 1%
Apple PANIC 0.021881629 0.0214114 0.000470186 2%
Facebook PANIC 0.02745893 0.0240696 0.003389371 12%
MLP Ford PANIC 0.160269481 0.0366101 0.123659394 77%
Google PANIC 0.023883099 0.0206876 0.003195483 13%
Netflix PANIC 0.025532814 0.0239921 0.001540685 6%
Tesla PANIC 0.084089884 0.0486324 0.035457453 42%

49
Walmart PANIC 0.017525161 0.0181446 -0.00061946 -4%
Amazon NORMAL 0.026022899 0.0242604 0.001762451 7%
Apple NORMAL 0.020020966 0.0199544 6.65348E-05 0%
Facebook NORMAL 0.020698101 0.0219327 -0.00123461 -6%
Ford NORMAL 0.048863618 0.0204708 0.028392781 58%
Google NORMAL 0.023554962 0.0214688 0.002086137 9%
Netflix NORMAL 0.02954214 0.0284155 0.001126658 4%
Tesla NORMAL 0.036097739 0.0353117 0.000786013 2%
Walmart NORMAL 0.012775668 0.0136211 -0.00084542 -7%
Note. NN Type: Neural Network Type

Table 6 presents individual stocks ranked by percentage RMSE improved by utilizing T+

variables. The fact that Tesla has a model that improved by 42% and another that was reduced by

89% show the drastic need for researchers and traders to test across a wide variety of scenarios

before deciding to utilize a variable as input into a stock price prediction model. Ford is the clear

leader in terms of percentage improvement, with the absolute difference between Ford and the

second most improved stock prediction being 36%. Interestingly both MLP and LSTM

configurations for Ford performed better with the pan-test set rather than the normal test set. This

ranking will be utilized to analyze input variables, test data, and neural network choice going

forward. No clear trends are present related to the magnitude of mean Twitter and News count

variables and how it relates to RMSE % improved. However, Table 7 shows that when comparing

the average Twitter count to average News count ratios, the top 50 percent of models in terms of

RMSE% improved utilized data that had an average ratio of 1.66 Twitter counts to News counts.

This contrasts with an almost 57% increase in the average Twitter count to average News count

ratio of 2.92 for the bottom 50% of models. A potential interpretation of these results is that the

number of tweets or news count alone does not necessarily strengthen the variables but rather how

close the variables are to each other in terms of the mean value.

50
Table 6. Average RMSE % improvement across all configurations

Rank Company Average Improvement via T+


1 Ford 51%
2 Google 15%
3 Netflix 10%
4 Facebook 7%
5 Amazon 5%
6 Apple 3%
7 Walmart -10%
8 Tesla -13%
Mean 9%

Table 7. Mean Twitter Count to Mean News Publication Count Ratio

Company Twitter to News Ratio Overall Mean News Publication Count Mean Twitter Count
Ford 0.712761099 274.7394834 195.824
Google 3.301257549 1,213.278229 4,005.34
Netflix 2.450254761 647.8841328 1,587.48
Facebook 0.178285882 3,983.518819 710.205
Amazon 3.974418341 769.4627306 3,058.17
Apple 1.98774911 2,545.850185 5,060.51
Walmart 1.982717614 380.7387454 754.897
Tesla 3.753669212 600.7311669 2,254.95
Top Half Avg 1.660639823 1,529.855166 1,624.71
Bottom Half
Avg 2.924638569 1,074.195707 2,782.13

A unique requirement present in machine learning is that of training with a separate set of

data. Therefore, it is crucial to analyze the input variables throughout the various stages of training

and testing. To investigate whether or not is possible to estimate the degree to which Twitter and

News count variables would improve stock price prediction, comparisons between the variable

were made between the training data and both the test sets. This was done so that it could be

theorized that an analysis performed on the input data before conducting the actual test could

determine, whether the stock being predicted is one where the model being used to predict it can

improve price prediction accuracy by including T+ variables. An analysis of the change in the

Coefficient of Variance between the training and test sets can be found in Table 8. A point to note

51
is that the coefficient of variation of the variables stays relatively the same between the training

and test data for all models.

Table 9 presents an analysis of the mean T+ variables within the training data, the normal test set,

and the panic test set. It is interesting to observe there is an improvement in RMSE due to the

addition of T+ variables across all eight tests, the top 50 percent of performers averaged a 25%

decrease in mean Twitter Count amount between Train and Test sets. This implies that for both

test sets, Twitter count on average was 25% lower than that of the training set. On the other hand,

for the bottom 50% of performers, there is an almost 60% decrease between the training data mean

Twitter count and the average of both test set’s mean Twitter counts. The difference between the

mean News publication count for the Training data and the test data for the top 50% of models

shows a 40% decrease. On the other hand, the bottom 50% of companies in terms of RMSE

percentage improvement due to the addition of T+ variables averaged a 25% increase between the

mean News publication count during training versus during testing.

Table 8. Variable Coefficient of Variation change between training and test sets

% +/- in CV of % +/- in CV of % +/- in CV of % +/- in CV of


Training Twitter Training Twitter Training News Training News
Rank by Count vs Norm count vs Pan Count vs Norm Count vs Pan
RMSE% Test Twitter Test Twitter Test News Test News
+/- Company Count count Count Count
1 Ford -90% -93% -6% -4%
2 Google 22% 21% -2% -5%
3 Netflix -9% 2% -14% 9%
4 Facebook -42% -41% -37% -36%
5 Amazon -19% -18% -14% -7%
6 Apple -18% -31% 0% -5%
7 Walmart -31% -32% 7% 2%
8 Tesla -39% -31% -29% -29%
Top 50%
Avg -30% -28% -15% -9%
Bottom
50% Avg -27% -28% -9% -10%
Note. CV: Coefficient of Variation

52
Table 9. Mean Twitter & News Count in Training Data, Normal Test Set, & Panic Test Set
Mean Twitter Count Mean News Publication Count
Company Train Normal Test Pan Test Train Normal Test Pan Test
Ford 212 131 149 253 334 336
Google 4,320 3,560 3,140 1,206 1,109 1,237
Netflix 1,823 1,077 907 620 956 730
Facebook 686 866 780 4,758 1,914 1,741
Amazon 3,515 1,816 1,734 727 1,035 890
Apple 6,229 1,612 1,675 2,479 2,864 2,735
Walmart 881 394 388 364 439 429
Tesla 2,499 1,395 1,548 488 823 930
Top 50% 1,760 1,408 1,244 1,709 1,078 1,011
Bot 50% 3,281 1,304 1,337 1,015 1,290 1,246
Top 50% Train vs Avg Test 75% Top 50% Train vs Avg Test 61%
Top 50% Train vs Avg Test 40% Top 50% Train vs Avg Test 125%

To test whether these results are statistically significant, a statistical test must be conducted. The

results of our experiment are extremely skewed with a Shapiro Wilk test p-value of 0.0006633,

well below the accepted minimum normality test value of 0.05 or greater. Due to this fact, the

normality requirement of a t-test cannot be met. Therefore, we opted to utilize the non-parametric

Wilcoxon Signed Rank Test. A one-sided directional test was conducted hypothesizing that the

accuracy for models utilizing only T variables will be lower than models utilizing T+ variables.

Based on the Wilcoxon Signed Rank Test, our results are significant, with a p-value of 0.00180761

and a confidence level of 95%. Furthermore, when only models that were tested on the normal test

set data were examined to see if the improvement brought on by T+ variables was significant in

this case, the resulting p-value was 0.032, meaning that the observed difference was statistically

significant. Additionally, when testing if the improvement between models that utilized T+

variables and were tested on the panic test set was significant, the results show that with a p-value

of 0.0124817, the difference in performance is significant.

6.2. Examining the Predictive Models under Panic and Normal Circumstances

The reason for creating two separate test sets for each stock is due to the fundamental nature of

Bloomberg generated Twitter and News count variables relating to the amount the public mentions

53
a company. As previously discussed in the problem definition section, interesting insights into the

strength of Bloomberg generated Twitter and News count variables can be gained by analyzing

the performance of all models in times of increased panic surrounding the stock market. It is widely

agreed that in the first half of 2020, North America experienced widespread panic related to the

performance of the stock market and economy due to the impact of the global COVID-19

pandemic. The average RMSE for models tested on the market panic test set was 0.0670, while

the average RMSE for the normal test set was 0.0354. There is an apparent decrease in performance

when testing models on the Normal test Set as compared to the Panic test set. To test the

significance of this difference, a Wilcox signed-rank test was conducted between the RMSE’s of

the models tested on the Panic test set and the models tested on the Normal test set. The results

show that this difference in performance is significant even at a 1% level (p-value~0.001).

To better understand this discrepancy in performance, an analysis of how the variables of interest

differ between the different test sets can be found in Appendix B. Appendix B. presents an

overview of how the T+ variable’s mean, standard deviation, and coefficient of variation change

when calculated for the overall data set, training data set, and two test sets, as well as the additional

period which differentiates the panic test set from the normal test set. The range for data under the

panic test set is from December 2019 to May 2020. The reason for selecting December 2019 as

the start of the panic period is due to the World Health Organization reporting the first Covid-19

case in December 2019 which then resulted in panic related to the U.S. stock market (Baig, Butt,

Haroon, Rizvi, 2020). The range for data for the normal test set is January 2019 to November

2019, the range for data for the pan-test set is January 2019 to May 2020. The training data is

defined by data ranging from January 2015 to December 2018. By splitting our analysis into these

three distinct sections, we have a starting point to theorize as to what may have caused an average

decrease in performance for tests done on the pan-test set versus the normal test set.

54
Table 10 utilizes the information in Appendix B to show how the T+ variables and price differences

between the Panic data and the Training Data. The companies are displayed in descending order

by mean RMSE for the pan-test Set. An interesting observation is that when comparing the bottom

50% to the top 50% of performances, the coefficient of variance for the Twitter and News Count

variables exhibit a 19% and 27% absolute difference, respectively. In addition, the percentage

change of mean Twitter and News count variables shows an 18% and 19% absolute difference,

respectively. The difference in variance between the data the models are learning from, and the

additional period added on to the normal test set to make it the pan-test set, is critical to our

analysis. The average percentage change in mean for the price data, as well as the average

coefficient of variance of the price data, exhibits a similar difference between the panic data and

training data for all stocks being predicted. This discrepancy in variable variance between the

training and test set might be the primary cause of the drop in performance when it comes to the

pan-test compared to the normal-test.

Table 10. T+ Variables Mean & Coefficient of Variation in Panic Test Set and Training data
Twitter Count News Count Price
Train vs. Panic Train vs. Panic Train vs. Panic
CV -7% -15% -14%
Google
Mean -46% 23% -49%
CV -19% -7% -37%
Amazon
Mean -55% -17% 114%
CV -27% -35% 6%
Walmart
Mean -26% 132% 129%
CV -48% -46% -16%
Facebook
Mean -10% -71% 47%
CV 22% 38% -45%
Netflix
Mean -69% -53% 117%
CV -59% -20% -16%
Apple
Mean -71% 0% 105%
CV -103% -2% 12%
Ford
Mean -14% 34% -44%
CV -35% -13% -10%
Tesla
Mean -57% 13% 52%
Coefficient Top 50% -25% -26% -15%
of Variation Bot 50% -44% 1% -15%
Top 50% -35% 17% 60%
Mean Bot 50% -53% -2% 57%

55
6.3. Comparing the MLP and LSTM methods

As is the current case with most stochastic based deep learning models, the inner workings of a

neural network are nearly impossible to analyze accurately. Therefore, any interpretations or

theorization beyond how complex a model seems or the model’s ability to utilize memory should

not be considered theoretically sound at this relatively early stage in the life cycle of machine

learning-based stock price prediction research. Therefore, a comparison is made between RMSE

mean and RMSE variance throughout the different groupings. Table 11 presents a summary of this

comparison.

Table 11. RMSE analysis for T vs. T+ and Panic Test Set vs. Normal Test Set
T+ with Panic Test Set Normal Test Set
T with LSTM LSTM LSTM LSTM
Mean 0.073394592 0.0693573 0.097052648 0.04569919
Range 0.141445781 0.1588343 0.138198375 0.07217128
ST.Dev 0.040657937 0.0406329 0.04158476 0.01625939
CV 55% 59% 43% 36%
T+ with Panic Test Set Normal Test Set
T with MLP MLP MLP MLP
Mean 0.03735703 0.0248912 0.037060044 0.02518823
Range 0.147493813 0.0350113 0.142744321 0.03608795
ST.Dev 0.035671165 0.0083744 0.035682809 0.00874798
CV 95% 34% 96% 35%
+/- Mean MLP vs LSTM -0.036037562 -0.044466 -0.0599926 -0.02051097
+/- CV MLP vs LSTM 40.09% -24.94% 53.44% -0.85%
Average +/- Mean RMSE
MLP vs LSTM -0.040251786
Average +/- CV RMSE MLP
vs LSTM 16.93%

It is clear to see that MLP outperforms LSTM in terms of the mean RMSE for both the pan-test

set and the normal test set. This is also true when comparing the utilization of T+ variables vs only

T variables. However, it is essential to note that even though the mean RMSE was this much lower

across all MLP configurations, the coefficient of variance was, on average, 16.93% higher for

56
MLP models. Therefore, based on the mean and the range, we cannot conclude that this

outperformance is meaningful and statistically significant. With LSTM’s being a more complex

version of MLP, it would be intuitive to think that LSTMS would perform better. To test the

significance of these results, a Wilcoxson Signed Rank Test was conducted to see if there was a

significant improvement brought on by the inclusion of T+ variables for each model type. Right

tailed tests were conducted for both MLP and LSTM. For the MLP the difference in performance

between T+ and T variables was significant with a p-value of 0.00381470. However, for the LSTM

the p-value of the right-tailed test was 0.0964050, meaning that the null hypothesis cannot be

rejected. To begin hypothesizing why this result was observed, we must note the fact that the main

differentiating feature present in LSTM’s that MLP’s lack, is memory. This memory is the reason

why LSTM’s perform well at complex tasks such as natural language processing (Soutner &

Muller, 2013) and video recognition (Ng et al., 2015). Via the LSTM’s memory, the model can

create a context for certain words mentioned later. Due to memory being the main differentiating

feature, we can hypothesize that for our experiment, the prediction of prices did not benefit from

utilizing memory. Another fact that backs this hypothesis is that the T variable set had a 30-day

moving average as one of the input variables, while the T+ input set did not. 30-day moving

average functions similarly to memory in that it is a value obtained entirely from previous values.

The T variable set on average performed inferior across the board. The lack of improvement the

inclusion of this variable resulted in can be considered similar to the lack of improvement brought

by the memory feature in LSTM’s.

Similar results are also found in the literature. For example, Hiransha et al. (2018) observed that

their LSTM model for stock price prediction does not perform better than their MLP model when

the length of the predicted period was 400 days. However, when the same models were tested on

57
a period of 10 years, then the LSTM becomes more accurate. This phenomenon could be a potential

explanation as to the trend observed in our results. The intuitive explanation of this phenomenon

is that the main advantage of LSTMs being the memory feature is much more advantageous when

utilized on a larger test set. This could be due to the length of the prediction period increasing,

resulting in more chances for memory to be utilized. That could explain why in short testing

periods, the extra complexity of LSTM is not as beneficial as it would be for a much longer

extended prediction periods. This would then result in MLP and LSTM’s having no significant

advantage over the other. Additionally, the magnitude of the difference could be attributed to the

fact that LSTM’s perform better when there is increased data preparation designed specifically for

the model and test data (Chacon, Kesici, & Najafirad, 2020).

6.4. Comparing Walmart vs. Amazon and Tesla vs. Ford

In section 5.2.1. it was hypothesized that due to potential differences in operating styles of Amazon

and Walmart, and Tesla and Ford, there could be a difference in how the Twitter and News count

variables influence stock price prediction of the respective companies. Due to the different

business models of Amazon and Tesla, as opposed to the more traditional ones of Walmart and

Ford, it was theorized that variables such as Twitter and News count that have not traditionally

been utilized as a result of only being created in 2015, would respond better to companies that are

newer and more popular currently. To investigate if this was true a comparative analysis was done.

However, based on the results presented in Table 5, there is no indication that the prediction of the

stock price of companies with a different business model responds better to stock price prediction

utilizing newer social media variables. This is evidenced by the fact that Tesla stock price

prediction has benefited the least from the T+ Variables. In contrast, Ford has benefited the most,

resulting in a 51% difference in how much T+ variables improve prediction. On the other hand,

58
Amazon and Walmart did not necessarily exhibit this same trend with Amazon, the newer

company noticing a 5% average improvement in RMSE when utilizing T+ Variables. Walmart

exhibited a -10% decrease when incorporating T+ Variables. These results suggest that there is no

clear indication between the broad perceptions of an organization based on age or operational style

and the impact that new social media-based variables such as Twitter and News count variables

can have on the prediction of a company’s stock prices.

59
Chapter 7

Conclusion & Managerial Insights

We proposed a stock price prediction technique that utilizes neural networks as the price prediction

model and technical variables as well as Twitter and News count variables as its inputs. Based on

the results and respective statistical tests, it is evident that the inclusion of these variables has the

potential to improve the stock price prediction ability across various model types and test data

configurations. However, further work must be done to assess the importance of the included

variables across a wide variety of companies, so a diagnosis can be made as to why there are

extreme outliers such as Walmart and Tesla. These results are of value as stock price prediction is

a domain where improvements in accuracy are a constant goal, and the search for variables that

can be incorporated into the existing analysis is continuously being pursued.

There are a wide variety of real-world practical implications that arise from these results. The

results of the T+ vs T analysis suggests that both traders and researchers can benefit by including

Twitter and News count variables in conjunction with technical indicators as inputs into neural

network-based stock price prediction models. According to the trends exhibited in this research,

stock traders who utilize technical indicator-based price prediction can include the two variables

without having to make any adjustments to their model and, for the overwhelming majority of

cases, notice an instant improvement in prediction accuracy. Any improvement in prediction

accuracy is of great value to traders and investors of all levels. Another reason for traders to utilize

Twitter and News publication count indicators is that they are a way to successfully incorporate

public opinion with minimal effort into any prediction model. There is an ever-increasing influence

that social media has on society. This influence directly impacts the public’s perception of

60
corporations as well as products. Having robust multifaceted analysis that can capture this impact

is of interest to investors and their customers.

Future research related to the technical analysis should aim to incorporate the T+ variables into

their investigation due to the similar nature and standardization potential of this variable. The

similar nature of the T+ variables is because they are count variables. Unlike other Twitter and

news-based indicators, Bloomberg generated Twitter and News count variables that have defined

the variables in simple and replicable terms related to objective measures. This means that other

indicator providers such as Yahoo Finance can have the same value for Twitter count or news

publication count, making this variable an objective standardized one. However, more work needs

to be done in terms of figuring out exactly why the variable works well across the majority of

scenarios but poorly in some unique cases, this is a promising direction for future research. Another

opportunity for future research is the investigation if the Twitter and News count variables improve

stock price prediction for medium and small-sized companies, as well as testing to see if the results

stay true when tested on other markets such as Europe and Asia.

The reason the difference between panic and normal test set results is of importance is related to

the increased utilization of algorithmic trading within investment funds and banks. According to

Deutsche Bank, high-frequency algorithmic trading at its peak accounted for 60% of total equity

trading within the United States in 2009. Algorithmic trading places a heavy reliance on the stock

price prediction. The functionality of any stock prediction methods directly impacts the

performance of these large algorithmically traded funds. The results from this experiment show

that the ability of these algorithms to maintain their accuracy comes into question in times of

market panic. Quality assurance and risk teams managing these large funds should ideally be

scrutinizing the models which drive their algorithmic funds. However, due to the nature of high-

61
frequency trading, even the shortest drop in prediction accuracy can result in a large number of ill-

advised trades. The results of our experiments show that there is a need to stress test stock price

prediction methods to ensure performance standards in all potential market scenarios. This is

important for traders to ensure a profitable model stays that way. An efficient way to test the price

prediction methods is through synthetic test data sets to ensure the models are robust enough to

handle all scenarios. The results of this stress test should be made clear to all customers and

regulatory boards.

As mentioned previously, the difference in mean prediction performance between MLP and

LSTM, as well as the magnitude of this difference, can potentially be attributed to two main

reasons. The first being the test period length. The second being the degree to which the data was

preprocessed. What these two observations mean for traders and researchers is that stock price

prediction model selection needs to incorporate both the aforementioned factors. For traders

utilizing stock price prediction, the amount of preprocessing as well as any expertise needed to

preprocess the data for each individual method successfully should be a critical point to consider

when trying to determine practically viable prediction methods. This is because extra data

preprocessing and requiring of expertise is not ideal for algorithms that try to predict a large

number of stock prices at once, such as index movement prediction. Future research work should

also focus on standardizing the data preparation techniques that ensure model optimality for all

model types to create an agreed-upon standard that should be replicated every time a specific

model is used. This standardization will benefit analysis related to stock price prediction methods

as well as research focused on improving neural networks. Furthermore, work needs to be done on

examining other existing neural network types thoroughly to ensure that simpler models that may

perform better in niche scenarios are not being underutilized for their computationally heavier and

62
complex counterparts. A major limitation to the utilization of the Twitter count and news

publication count variables is that the variable does not factor in the influence that certain twitter

accounts or news publications have on the perception of the stock they are talking about. Certain

Twitter accounts and news publications discussing a specific company may impact the stock price

with a greater magnitude than that of the average tweet or article. An obvious example of this

phenomenon would be A tweet by Elon Musk having more impact on the price of Tesla shares

than a non-Tesla affiliated Twitter account mentions tesla in their tweet. Therefore an avenue for

future work can be the analysis and identification of how certain Twitter users and news

publications impact share prices significantly more than others.

63
Appendices

Appendix A
Table A.1. Experimental Results

COMPANY TEST SET MODEL RMSE


LSTM (T) 0.0627097859587438
LSTM(TS) 0.0493791013187517
MLP (T) 0.0238830985020960
PANIC MLP (TS) 0.0206876153144940
Google
LSTM (T) 0.0348747175799209
LSTM(TS) 0.0287431643651000
MLP (T) 0.0235549620698895
NORMAL MLP (TS) 0.0214688247323170
LSTM (T) 0.0769865009259097
LSTM(TS) 0.0739217154544836
MLP (T) 0.0274589302195309
PANIC MLP (TS) 0.0240695590899780
Facebook
LSTM (T) 0.0440418927042600
LSTM(TS) 0.0366755242351116
MLP (T) 0.0206981011074381
NORMAL MLP (TS) 0.0219327085874696
LSTM (T) 0.0584470466096008
LSTM(TS) 0.0727827336491720
MLP (T) 0.0175251606873094
PANIC MLP (TS) 0.0181446193733898
Walmart
LSTM (T) 0.0402095725000273
LSTM(TS) 0.0425147729091293
MLP (T) 0.0127756683893731
NORMAL MLP (TS) 0.0136210862080269
LSTM (T) 0.1364060120912110
LSTM(TS) 0.0755339687199875
MLP (T) 0.1602694814700700
PANIC MLP (T+) 0.0366100870248053
Ford
LSTM (T) 0.0402411053744201
LSTM(T+) 0.0300687901462170
MLP (T) 0.0488636178267794
NORMAL MLP (T+) 0.0204708371291519
Amazon PANIC LSTM (T) 0.0624633311636806

64
LSTM(T+) 0.0601819638981050
MLP (T) 0.0194953902448822
MLP (T+) 0.0192764347624178
LSTM (T) 0.0467238214804085
LSTM(T+) 0.0421003771453041
MLP (T) 0.0260228987852983
NORMAL MLP (T+) 0.0242604474588606
LSTM (T) 0.17632049831447200
LSTM(T+) 0.187577475928560000
MLP (T) 0.084089883968641000
PANIC MLP (T+) 0.048632431410302100
Tesla LSTM (T) 0.053439369040910300
LSTM
(T+) 0.100914443840333000
MLP (T) 0.036097739374578800
NORMAL MLP (T+) 0.035311726227490000
LSTM (T) 0.128599040948235000
LSTM
(T+) 0.127105799310714000
MLP (T) 0.021881628833466800
PANIC MLP (T+) 0.021411442404551200
Apple
LSTM (T) 0.058480253278905900
LSTM
(T+) 0.052990149522751300
MLP (T) 0.020020966236037400
NORMAL MLP (T+) 0.019954431466789200
LSTM (T) 0.112263080026268000
LSTM
(T+) 0.092164317326074900
MLP (T) 0.025532814481469400
PANIC MLP (T+) 0.023992129105065700
Netflix
LSTM (T) 0.042107440714652000
LSTM
(T+) 0.037061721060168100
MLP (T) 0.029542140415547400
NORMAL MLP (T+) 0.028415482727267900

65
Appendix B.

Table B.1. Comparing Variables in Overall, Training, Normal Test, Pan Test, and Panic Period Test Set
Ford Google Netflix Facebook Amazon Apple Tesla Walmart
Twitter News Price Twitter News Price Twitter News Price
Twitter News Price Twitter News Price Twitter News Price Twitter News Price Twitter News Price
Overall
Mean 196 275 11 4005 1213 956 1587 648 214 710 3984 147 3058 769 1184 5061 2546 164 2255 601 297 755 381 87
Variance 124300 34101 6 11703649 363891 59469 2666375 196237 12979 432228 17787737 1562 6884086 204387 327469 41627970 1797736 3117 4799559 275483 14664 747508 66393 331
St.Dev 353 185 3 3421 603 244 1633 443 114 657 4218 40 2624 452 572 6452 1341 56 2191 525 121 865 258 18
Coefficient Variant
180% 67% 22% 85% 50% 26% 103% 68% 53% 93% 106% 27% 86% 59% 48% 127% 53% 34% 97% 87% 41% 115% 68% 21%
Training
Mean 212 253 12 4320 1206 1061 1823 620 170 686 4758 133 3515 727 947 6229 2479 141 2499 488 268 881 364 78
Variance 159345 28773 4 12068276 378508 34340 3092910 161330 9125 497886 21219182 1281 8044743 190822 208765 50038010 1775513 1228 5862998 203445 3227 913234 58531 134
St.Dev 399 170 2 3474 615 185 1759 402 96 706 4606 36 2836 437 457 7074 1332 35 2421 451 57 956 242 12
Coefficient Variant
188% 67% 15% 80% 51% 17% 96% 65% 56% 103% 97% 27% 81% 60% 48% 114% 54% 25% 97% 92% 21% 108% 67% 15%
Normal Test
Mean 131 334 9 3560 1109 708 1077 956 330 866 1914 180 1816 1035 1790 1612 2864 202 1395 823 264 394 439 107
Variance 16797 41977 0 13358463 295951 3622 894611 239512 1255 275603 1315295 238 1266840 222560 11013 2371030 2337862 825 643150 273027 1873 93790 104811 74
St.Dev 130 205 1 3655 544 60 946 489 35 525 1147 15 1126 472 105 1540 1529 29 802 523 43 306 324 9
Coefficient Variant
99% 61% 7% 103% 49% 9% 88% 51% 11% 61% 60% 9% 62% 46% 6% 96% 53% 14% 57% 63% 16% 78% 74% 8%
Pan Test
Mean 149 336 8 3140 1237 653 907 730 343 780 1741 185 1734 890 1869 1675 2735 232 1548 930 382 388 429 111
Variance 19947 44343 3 10181305 323122 8425 804327 288309 1773 235199 1103842 376 1171956 223752 37428 1895009 1812073 2471 1036664 343690 37967 88037 85835 89
St.Dev 141 211 2 3191 568 92 897 537 42 485 1051 19 1083 473 193 1377 1346 50 1018 586 195 297 293 9
Coefficient Variant
95% 63% 19% 102% 46% 14% 99% 74% 12% 62% 60% 10% 62% 53% 10% 82% 49% 21% 66% 63% 51% 76% 68% 8%
Only Panic Period Normal Test mean utilized for for st.dev and coefficient variant is from Normal Test
Mean 182 339 7 2323 1487 545 572 290 367 615 1403 196 1569 607 2024 1790 2481 289 1842 1134 614 376 410 119
Variance 24273 48826 4 2920037 278824 321 457364 89130 1862 113794 512130 473 942224 104158 52289 943121 681514 642 1670824 417390 27544 75950 47737 28
St.Dev 156 221 2 1709 528 18 676 299 43 337 716 22 971 323 229 971 826 25 1293 646 166 276 218 5
Coefficient Variant
86% 65% 27% 74% 36% 3% 118% 103% 12% 55% 51% 11% 62% 53% 11% 54% 33% 9% 70% 57% 27% 73% 53% 4%

66
References

Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and artificial neural
networks models for stock price prediction. Journal of Applied Mathematics, 2014.
Agustini, W. F., Affianti, I. R., & Putri, E. R. (2018, March). Stock price prediction using geometric
Brownian motion. In Journal of Physics: Conference Series (Vol. 974, No. 1, p. 012047). IOP
Publishing.
Ahmadi, H. (1990, June). Testability of the arbitrage pricing theory by neural network. In 1990 IJCNN
International Joint Conference on Neural Networks (pp. 385-393). IEEE.
Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., & Taha, K. (2015). Efficient machine
learning for big data: A review. Big Data Research, 2(3), 87-93.
Angelovska, J. (2017). Investors’ behaviour in regard to company earnings announcements during the
recession period: Evidence from the Macedonian stock exchange. Economic Research-
Ekonomska Istraživanja, 30(1), 647-660. doi:10.1080/1331677x.2017.1305768
Antunes, P., Macdonald, A., & Steward, M. (2014). Boosting Retirement Readiness and The Economy
through Financial advice Retrieved from http://www.conferenceboard.ca
Asteriou, D., Pilbeam, K., & Sarantidis, A. (2019). The Behaviour of Banking Stocks During the
Financial Crisis and Recessions. Evidence from Changes-in-Changes Panel Data Estimations.
Scottish Journal of Political Economy, 66(1), 154-179. doi:10.1111/sjpe.12191
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47(2),
1148-1178
Babu, C. N., & Reddy, B. E. (2014). A moving-average filter based hybrid ARIMA–ANN model for
forecasting time series data. Applied Soft Computing, 23, 27-38.
Baig, A., Butt, H. A., Haroon, O., & Rizvi, S. A. R. (2020). Deaths, Panic, Lockdowns and US Equity
Markets: The Case of COVID-19 Pandemic. Available at SSRN 3584947.
Bisoi, R., Dash, P. K., & Parida, A. K. (2019). Hybrid Variational Mode Decomposition and evolutionary
robust kernel extreme learning machine for stock price and movement prediction on daily basis.
Applied Soft Computing, 74, 652-678.
Braithwaite, T. (2017, July 28). Free stock trading for millennials comes at a cost. Retrieved from
https://www.ft.com/content/36ff325a-735e-11e7-aca6-c6bd07df1a3c
Bro, R., & Smilde, A. K. (2014). Principal component analysis. Analytical Methods, 6(9), 2812-2831.
Business Wire, (2019). Global Algorithmic Trading Market to Surpass US$ 21,685.53 Million by 2026.
Retrieved September 10, 2020, from
https://www.businesswire.com/news/home/20190205005634/en/Global-Algorithmic
Trading-Market-Surpass-21685.53-Million
Can Tweets And Facebook Posts Predict Stock Behavior?. (2019). Retrieved 30 September 2019, from
https://www.investopedia.com/articles/markets/031814/can-tweets-and-facebook-posts-
predict-stock-behavior-and-rt-if-you-think-so.asp

67
Chacon, H. D., Kesici, E., & Najafirad, P. (2020). Improving Financial Time Series Prediction Accuracy
Using Ensemble Empirical Mode Decomposition and Recurrent Neural Networks. IEEE Access,
8, 117133-117145. doi:10.1109/access.2020.2996981
Chan, H. L., & Woo, K. Y. (2013). Studying the dynamic relationships between residential property
prices, stock prices, and GDP: Lessons from hong kong. Journal of Housing Research, 22(1), 75-
89. Retrieved from http://ezproxy.lib.ryerson.ca/login?url=https://search-proquest-
com.ezproxy.lib.ryerson.ca/docview/1353322722?accountid=13631

Chang, Z., Zhang, Y., & Chen, W. (2019). Electricity price prediction based on hybrid model of adam
optimized LSTM neural network and wavelet transform. Energy, 187, 115804.
doi:10.1016/j.energy.2019.07.134
Chatterjee, U. K. (2016). Do stock market trading activities forecast recessions? Economic Modelling, 59,
370-386. doi:10.1016/j.econmod.2016.08.007
Chen, Y., & Hao, Y. (2017). A feature weighted support vector machine and K-nearest neighbor
algorithm for stock market indices prediction. Expert Systems with Applications, 80, 340-355.
Cheng, C. H., & Yang, J. H. (2018). Fuzzy time-series model based on rough set rule induction for
forecasting stock price. Neurocomputing, 302, 33-45.
Choi, J. H., Lee, M. K., & Rhee, M. W. (1995, June). Trading S&P 500 stock index futures using a neural
network. In Proceedings of the third annual international conference on artificial intelligence
applications on wall street (pp. 63-72).
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and
prediction: Methodology, data representations, and case studies. Expert Systems with
Applications, 83, 187-205.
Das, S., & Kadapakkam, P. R. (2018). Machine over Mind? Stock price clustering in the era of
algorithmic trading. The North American Journal of Economics and Finance.
Daniel, R. (2015, March 12). Robin How Robinhood, an investing app, is luring stock-market newbies.
Retrieved from http://fortune.com/2015/03/12/robinhood-investing-app/
Dash, R., & Dash, P. (2016). Efficient stock price prediction using a self evolving recurrent neuro-fuzzy
inference system optimized through a modified differential harmony search technique. Expert
Systems with Applications, 52, 75-90.
Di Persio, L., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction:
Comparisons and applications. International journal of circuits, systems and signal processing, 10,
403-413.
Edwards, J. (2017, December 2). Global market cap is heading toward $100 trillion and Goldman Sachs
thinks the only way is down. Retrieved from https://www.businessinsider.de/global-market-cap-
is-about-to-hit-100-trillion-2017-12?r=UK&IR=T
Erdogan, O., Bennett, P., & Ozyildirim, C. (2014). Recession Prediction Using Yield Curve and Stock
Market Liquidity Deviation Measures. Review of Finance, 19(1), 407-422. doi:10.1093/rof/rft060
Every Time Trump Tweets About the Stock Market. (2019). Retrieved 30 September 2019, from
https://www.bloomberg.com/features/trump-tweets-market

68
Fenghua, W. E. N., Jihong, X. I. A. O., Zhifang, H. E., & Xu, G. O. N. G. (2014). Stock price prediction
based on SSA and SVM. Procedia Computer Science, 31, 625-631.
FXCM. (2016, June). New York Stock Exchange (NYSE). Retrieved from
https://www.fxcm.com/uk/insights/new-york-stock-exchange-nyse/
Geva, T., & Zahavi, J. (2014). Empirical evaluation of an automated intraday stock recommendation
system incorporating both market data and textual news. Decision support systems, 57, 212-223.
Groth, S. S., & Muntermann, J. (2011). An intraday market risk management approach based on textual
analysis. Decision Support Systems, 50(4), 680-691.
Guliyev, N. J., & Ismailov, V. E. (2016). A Single Hidden Layer Feedforward Network with Only One
Neuron in the Hidden Layer Can Approximate Any Univariate Function. Neural Computation,
28(7), 1289-1304. doi:10.1162/neco_a_00849
Guo, Z., Wang, H., Yang, J., & Miller, D. J. (2015). A stock market forecasting model combining two-
directional two-dimensional principal component analysis and radial basis function neural
network. PloS one, 10(4), e0122385.
Gurney, K. (2014). An introduction to neural networks. CRC press.
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction
based on financial news using context-capturing features. Decision Support Systems, 55(3), 685-
697.
Hiransha, M., Gopalakrishnan, E.A., Menon, V. K., & Soman K.P. (2018). NSE Stock Market Prediction
Using Deep-Learning Models. Procedia Computer Science, 132, 1351-1362.
doi:10.1016/j.procs.2018.05.050
How Does President Trump’s Twitter Use Impact Forex, Markets And Stocks? - Friedberg Direct. (2019).
Retrieved 30 September 2019, from https://www.fxcm.com/ca/insights/president-trumps-twitter-
impact-forex-markets-stocks/
Jiang, S., Chen, H., Nunamaker, J. F., & Zimbra, D. (2014). Analyzing firm-specific social media and
market: A stakeholder-based event analysis framework. Decision Support Systems, 67, 30-39.
Jiang, S., & Chen, Y. (2018). Hand Gesture Recognition by Using 3DCNN and LSTM with Adam
Optimizer. Advances in Multimedia Information Processing – PCM 2017 Lecture Notes in
Computer Science, 743-753. doi:10.1007/978-3-319-77380-3_71
Johnson, A., & Reed, A. (2019). Tesla in Texas: A Showdown Over Showrooms. SAM Advanced
Management Journal, 84(2), 47-56.
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 374(2065), 20150202.
Jones, J. (2017, May 24). U.S. Ownership Down Among all but Older Higher Income. Retrieved from
https://news.gallup.com/poll/211052/stock-ownership-down-among-older-higher-income.aspx
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science,
349(6245), 255-260.

69
Joseph, I., Obini, N., Sulaiman, A., & Loko, A. (2020). Comparative Model Profiles of Covid-19
Occurrence In Nigeria. International Journal of Mathematics Trends and Technology, 68(6), 297-
310. doi:10.14445/22315373/ijmtt-v66i6p530
Kamijo, K. I., & Tanigawa, T. (1990, June). Stock price pattern recognition-a recurrent neural network
approach. In 1990 IJCNN International Joint Conference on Neural Networks (pp. 215-221).
IEEE.
Kane, L. (2018, September 10). Robinhood Is Making Millions Selling Out Their Millennial Customers
To High-Frequency Traders. Retrieved from https://seekingalpha.com/article/4205379-
robinhood-making-millions-selling-millennial-customers-high-frequency-traders
Karaboga, D., & Kaya, E. (2018). Adaptive network based fuzzy inference system (ANFIS) training
approaches: a comprehensive survey. Artificial Intelligence Review, 1-31.
Kim, K. (2010). Electronic and algorithmic trading technology: the complete guide. Academic Press.
Kimoto, T., Asakawa, K., Yoda, M., & Takeoka, M. (1990, June). Stock market prediction system with
modular neural networks. In 1990 IJCNN international joint conference on neural networks (pp.
1-6). IEEE.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Kleinnijenhuis, J., Schultz, F., Oegema, D., & Atteveldt, W. V. (2013). Financial news and market panics
in the age of high-frequency sentiment trading algorithms. Journalism: Theory, Practice &
Criticism, 14(2), 271-291. doi:10.1177/1464884912468375
Kooli, C., Trabelsi, R., & Tlili, F. (2018). The impact of accounting disclosure on emerging stock market
prediction in an unstable socio-political context. Accounting and Management Information
Systems, 17(3), 313-329.
Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural
networks and transfer learning. Decision Support Systems, 104, 38-48.
Kumar, M., & Anand, M. (2014). An application of time series ARIMA forecasting model for predicting
sugarcane production in India. Studies in Business and Economics, 9(1), 81-94.
Kumar, R., Kumar, P., & Kumar, Y. (2020). Time Series Data Prediction using IoT and Machine
Learning Technique. Procedia Computer Science, 167, 373-381. doi:10.1016/j.procs.2020.03.240
Lahmiri, S. (2016). Intraday stock price forecasting based on variational mode decomposition. Journal of
Computational Science, 12, 23-27.
Lee, H., Surdeanu, M., MacCartney, B., & Jurafsky, D. (2014, May). On the Importance of Text Analysis
for Stock Price Prediction. In LREC (pp. 1170-1175).
Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment
analysis. Knowledge-Based Systems, 69, 14-23.
Li, X., Huang, X., Deng, X., & Zhu, S. (2014). Enhancing quantitative intra-day stock return prediction
by integrating both market news and stock prices information. Neurocomputing, 142, 228-238.

70
Liang, H., & Reichert, A. K. (2012). The impact of banks and non-bank financial institutions on economic
growth. The Service Industries Journal, 32(5), 699-717. doi:10.1080/02642069.2010.529437
Lusardi, A., & Mitchell, O. S. (2014). The economic importance of financial literacy: Theory and
evidence. Journal of economic literature, 52(1), 5-44.
Mahmud, M. S., & Meesad, P. (2016). An innovative recurrent error-based neuro-fuzzy system with
momentum for stock price prediction. Soft Computing, 20(10), 4173-4191.
Moghaddam, A. H., Moghaddam, M. H., & Esfandyari, M. (2016). Stock market index prediction using
artificial neural network. Journal of Economics, Finance and Administrative Science, 21(41), 89-
93.
Morris, C. (2018, May 10). Robinhood Trading App Surpasses E*Trade in Total Users. Retrieved from
http://fortune.com/2018/05/10/robinhood-users-trading-app-tops-etrade/
New York Stock Exchange, New York Stock Exchange. (2018) NYSE Total Market Cap [Web page].
Retrieved from https://www.nyse.com/market-cap
Nguyen, T. H., Shirai, K., & Velcin, J. (2015). Sentiment analysis on social media for stock movement
prediction. Expert Systems with Applications, 42(24), 9603-9611.
Nguyen, N. (2018). Hidden Markov model for stock trading. International Journal of Financial Studies,
6(2), 36.
Nielsen, M. A. (2015). Neural networks and deep learning (Vol. 25). San Francisco, CA, USA::
Determination press.
Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA and support vector machines model in stock price
forecasting. Omega, 33(6), 497-505.
Parungrojrat, N., & Kidsom, A. (2019). Stock Price Forecasting: Geometric Brownian Motion and Monte
Carlo Simulation Techniques. MUT Journal of Business Administration, 16(1), 9-103.
Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price index movement
using trend deterministic data preparation and machine learning techniques. Expert Systems with
Applications, 42(1), 259-268.
Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock market index using fusion of
machine learning techniques. Expert Systems with Applications, 42(4), 2162-2172.
Piterbarg, L. I. (2011). Parameter estimation from small biased samples: Fuzzy sets vs statistics. Fuzzy
Sets and Systems, 170(1), 1-21.h
Saleem, N., & Khattak, M. I. (2020). Deep Neural Networks for Speech Enhancement in Complex-Noisy
Environments. International Journal of Interactive Multimedia and Artificial Intelligence, 6(1),
84. doi:10.9781/ijimai.2019.06.001
Schumaker, R. P., Zhang, Y., Huang, C. N., & Chen, H. (2012). Evaluating sentiment in financial news
articles. Decision Support Systems, 53(3), 458-464.
Scornet, E., Biau, G., & Vert, J. P. (2015). Consistency of random forests. The Annals of Statistics, 43(4),
1716-1741.

71
Shynkevich, Y., McGinnity, T. M., Coleman, S. A., & Belatreche, A. (2016). Forecasting movements of
health-care stock prices based on different categories of news articles using multiple kernel
learning. Decision Support Systems, 85, 74-83.
Shynkevich, Y., McGinnity, T. M., Coleman, S. A., Belatreche, A., & Li, Y. (2017). Forecasting price
movements using technical indicators: Investigating the impact of varying input window length.
Neurocomputing, 264, 71-88.
Skehin, T., Crane, M., & Bezbradica, M. (2018, December). Day ahead forecasting of FAANG stocks
using ARIMA, LSTM networks and wavelets. CEUR Workshop Proceedings.
Stathakis, D. (2009). How many hidden layers and nodes? International Journal of Remote Sensing,
30(8), 2133-2147. doi:10.1080/01431160802549278
Song, Q., & Chissom, B. S. (1993). Fuzzy time series and its models. Fuzzy sets and systems, 54(3), 269-
277.
Sun, X. Q., Shen, H. W., & Cheng, X. Q. (2014). Trading network predicts stock price. Scientific reports,
4, 3711.
Sun, B., Guo, H., Karimi, H. R., Ge, Y., & Xiong, S. (2015). Prediction of stock index futures prices
based on fuzzy sets and multivariate fuzzy time series. Neurocomputing, 151, 1528-1536.
Szmigiera, M. (2019, July 19). Global assets under management by region 2017. Retrieved July 07, 2020,
from https://www.statista.com/statistics/264907/asset-under-management-worldwide-by-region/
Tao, L., Hao, Y., Yijie, H., & Chunfeng, S. (2017). K-Line Patterns’ Predictive Power Analysis Using the
Methods of Similarity Match and Clustering. Mathematical Problems in Engineering, 2017.
Team, T. (2017, June 17). How Much Will Commission-Free Brokerages Impact Traditional
Brokerages?. Retrieved from https://www.forbes.com/sites/greatspeculations/2017/06/14/how-
much-will-commission-free-brokerages-impact-traditional-brokerages/#1cc374e23b76
Thakkar, A., & Chaudhari, K. (2020). CREST: Cross-Reference to Exchange-based Stock Trend
Prediction using Long Short-Term Memory. Procedia Computer Science, 167, 616-625.
doi:10.1016/j.procs.2020.03.328
Thomas, A. J., Petridis, M., Walters, S. D., Gheytassi, S. M., & Morgan, R. E. (2017). Two Hidden
Layers are Usually Better than One. Engineering Applications of Neural Networks
Communications in Computer and Information Science, 279-290. doi:10.1007/978-3-319-65172-
9_24
Trippi, R. R., & DeSieno, D. (1992). Trading equity index futures with a neural network. Journal of
Portfolio Management, 19, 27-27.
Tsai, C. F., & Quan, Z. Y. (2014). Stock prediction by searching for similarities in candlestick charts.
ACM Transactions on Management Information Systems (TMIS), 5(2), 9.
Umoh, U. A., & Inyang, U. G. (2015). A FuzzFuzzy-Neural Intelligent Trading Model for Stock Price
Prediction. International Journal of Computer Science Issues (IJCSI), 12(3), 36.

72
Van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Van Aken, M. A. (2014). A
gentle introduction to Bayesian analysis: Applications to developmental research. Child
development, 85(3), 842-860.
Vanstone, B. J., Gepp, A., & Harris, G. (2018, June). The effect of sentiment on stock price prediction. In
International Conference on Industrial, Engineering and Other Applications of Applied Intelligent
Systems (pp. 551-559). Springer, Cham.
Wafi, A. S., Hassan, H., & Mabrouk, A. (2015). Fundamental analysis models in financial markets–
Review study. Procedia economics and finance, 30, 939-947.
Wang, Y. F. (2002). Predicting stock price using fuzzy grey prediction system. Expert systems with
applications, 22(1), 33-38.
Wang, Y. F. (2003). Mining stock price using fuzzy rough set system. Expert Systems with Applications,
24(1), 13-23.
Wang, L., Wang, Z., Zhao, S., & Tan, S. (2015). Stock market trend prediction using dynamical Bayesian
factor graph. Expert Systems with Applications, 42(15-16), 6267-6275.
Wang, J., & Wang, J. (2015). Forecasting stock market indexes using principle component analysis and
stochastic time effective neural networks. Neurocomputing, 156, 68-78.
Weng, B., Ahmed, M. A., & Megahed, F. M. (2017). Stock market one-day ahead movement prediction
using disparate data sources. Expert Systems with Applications, 79, 153-163.
World Bank, World Federation of Exchanges database. (2017) Stock traded, total value (% of GDP).
Retrieved from https://data.worldbank.org/indicator/CM.MKT.TRAD.GD.ZS

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory
and practice. arXiv preprint arXiv:2007.15745.
Zadeh, L. A. (1965). Fuzzy sets. Information and control, 8(3), 338-353.
Zahedi, J., & Rounaghi, M. M. (2015). Application of artificial neural network models and principal
component analysis method in predicting stock prices on Tehran Stock Exchange. Physica A:
Statistical Mechanics and its Applications, 438, 178-187.
Zhang, G. (2007). Avoiding Pitfalls in Neural Network Research. IEEE Transactions on Systems, Man
and Cybernetics, Part C (Applications and Reviews), 37(1), 3-16. doi:10.1109/tsmcc.2006.876059

Zhang, J., Cui, S., Xu, Y., Li, Q., & Li, T. (2018). A novel data-driven stock price trend prediction
system. Expert Systems with Applications, 97, 60-69.
Zhang, L., Wang, F., Xu, B., Chi, W., Wang, Q., & Sun, T. (2018). Prediction of stock prices based on
LM-BP neural network and the estimation of overfitting point by RDCI. Neural Computing and
Applications, 30(5), 1425-1444.
Zhang, N., Shen, S., Zhou, A., & Xu, Y. (2019). Investigation on Performance of Neural Networks Using
Quadratic Relative Error Cost Function. IEEE Access, 7, 106642-106652.
doi:10.1109/access.2019.2930520
Zhuge, Q., Xu, L., & Zhang, G. (2017). LSTM Neural Network with Emotional Analysis for Prediction of
Stock Price. Engineering Letters, 25(2).

73

You might also like