Professional Documents
Culture Documents
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
CSC SCHOOL
Testing Stock Market Efficiency Using Historical
Trading Data and Machine Learning
SAMI PURMONEN
PAUL GRIFFIN
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background 3
2.1 Efficient market hypothesis (EMH) . . . . . . . . . . . . . . . . . . . 3
2.2 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . 4
2.2.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.4 Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Method 7
3.1 ML algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Naive algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Investment Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.1 Absolute Percentage Error (MAPE) . . . . . . . . . . . . . . 9
3.5.2 Movement accuracy . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.3 Investment profit . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.4 Investment profit without fees . . . . . . . . . . . . . . . . . . 10
4 Results 11
4.1 S&P 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 FTSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Discussion 27
5.1 Investment Profits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 MAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Local Predictor Behavior . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Movement Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Bibliography 31
Chapter 1
Introduction
In the stock market shares of companies are traded. It’s an important component of
the economy because it gives companies access to capital in exchange for a slice of
ownership. Investors can speculate in stock prices, buy when they think prices are
going up and sell when they think prices are going down. This would be profitable
for investors making accurate predictions of stock prices. However, the predictability
of stock prices has been questioned. According to the efficient market hypothesis
stock prices are unpredictable in such a way that one can not beat the market
[4]. In this paper we attempt to predict prices using machine learning algorithms.
The predicted prices are then used with an investment strategy and its profits
are compared to market returns. If the investment strategy beats the market the
efficient market hypothesis is falsified.
1
Chapter 2
Background
3
CHAPTER 2. BACKGROUND
• Reinforcement - The algorithm only gets the signal success of failure. The
task could be long and complicated, for example when playing a game of chess
the algorithm does not get feedback every move, but only get to know if the
whole game was a victory or defeat.
4
2.3. RELATED WORK
patterns and the closest k samples are chosen and maybe weighted. The average of
those is the result [3].
5
Chapter 3
Method
3. Test - Different investment strategies are tested. If one that beats the market
is found then the EMH is falsified. Otherwise it is confirmed.
Proving that EMH is true is a problem that has been attempted by many re-
searchers but there is no definitive consensus. This paper presents an attempt at
disproving EMH by looking at a necessary consequence and attempting to falsify it.
The EMH states that it is not possible to consistently beat the market. This
means that if one can find an investment strategy that consistently produces excess
returns compared to market indexes, the EMH would not be true.
The method used in this paper is based on price predictions. The first step is
predicting prices using historical trading data. If historical trading data is related
to future prices it would be hard to find out how they are related manually since it
may be complex non-linear relationships. Therefore ML algorithms are used which
has been shown to be good at finding such relationships if they exist [9]. The second
step is to invest money based on the predictions and evaluate the returns compared
with the market. If the market is outperformed, EMH must be falsified.
3.1 ML algorithms
Four different ML algorithms are used. Artificial neural network, linear regression,
nearest neighbors and random forest. The implementation of each algorithm comes
from Mathematica v10.0.1.0. This ensures that the implementations are well tested
since they are used by many people. All the ML algorithms used in the paper use
supervised learning and train on 90 days data prior to the predicted date. Input to
the algorithm after training is the past 5 days data.
7
CHAPTER 3. METHOD
8
3.5. PERFORMANCE MEASURES
else
i f p r e d i c t e d P r i c e s ( t ) − t r a n s a c t i o n C o s t >=
r e a l P r i c e s ( t −1)
money = ( money−t r a n s a c t i o n C o s t ) ∗ r e a l P r i c e s ( t )
/ r e a l P r i c e s ( t −1)
hasInvested = true
The benchmark strategy is to invest all money on the first day and sell all shares
on the last day of the prediction period. This is called the buy-and-hold strategy.
The money made using this method is:
money = r e a l P r i c e s [ end ] / r e a l P r i c e s [ 1 ]
9
CHAPTER 3. METHOD
10
Chapter 4
Results
11
CHAPTER 4. RESULTS
12
4.1. S&P 500
13
CHAPTER 4. RESULTS
Figure 4.6. S&P 500 Profits with and without fees for each algorithm
14
4.1. S&P 500
15
CHAPTER 4. RESULTS
16
4.1. S&P 500
17
CHAPTER 4. RESULTS
4.2 FTSE
18
4.2. FTSE
19
CHAPTER 4. RESULTS
20
4.2. FTSE
21
CHAPTER 4. RESULTS
Figure 4.16. FTSE Profits with and without fees for each algorithm
22
4.2. FTSE
23
CHAPTER 4. RESULTS
24
4.2. FTSE
25
Chapter 5
Discussion
Several interesting things were discovered when looking at the results. A general
look at the indices is available in figure 4.1 and 4.11 and more details are discussed
in this section.
27
CHAPTER 5. DISCUSSION
5.2 MAPE
As seen in figure 4.4 and 4.14 all algorithms produce errors that are less than 1%.
Without looking at other measurements this may seem low and suggest that pre-
dictions are accurate. However, the naive algorithm outperforms all ML algorithms
suggesting that the ML algorithms ability to predict prices is very low.
28
5.5. CONCLUSION
5.5 Conclusion
None of the ML algorithms managed to beat the market on either index which
means that the results support the EMH.
29
Bibliography
[3] Thomas Cover and Peter Hart. Nearest neighbor pattern classification. Infor-
mation Theory, IEEE Transactions on, 13(1):21–27, 1967.
[4] Eugene F. Fama. Efficient capital markets: A review of theory and empirical
work*. The Journal of Finance, 25(2):383–417, 1970.
[5] Tin Kam Ho. Random decision forests. In Document Analysis and Recognition,
1995., Proceedings of the Third International Conference on, volume 1, pages
278–282. IEEE, 1995.
[6] Laurent Hyafil and Ronald L Rivest. Constructing optimal binary decision
trees is np-complete. Information Processing Letters, 5(1):15–17, 1976.
[7] Burton G. Malkiel. The efficient market hypothesis and its critics. Journal of
Economic Perspectives, 17(1):59–82, 2003.
[8] A. Meier and P. Olsson. The optimal training interval for a multilayer percep-
tron on a day to day estimation of the swedish omxs30 index. 2014.
[9] Donald Michie, David J Spiegelhalter, and Charles C Taylor. Machine learning,
neural and statistical classification. Citeseer, 1994.
[11] J. Suljkic and E. Molin. Testing the predictability of stock markets on real
data. 2014.
[12] Jingtao Yao and Chew Lim Tan. A study on training criteria for financial
time series forecasting. In Proceedings of International Conference on Neural
Information Processing. Citeseer, 2001.
31
www.kth.se