Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Prediction of Stock Returns using Machine Learning

A project report submitted

to

MANIPAL ACADEMY OF HIGHER EDUCATION

For Partial Fulfillment of the Requirement for the

Award of the Degree

of

Bachelor of Technology

in

Information Technology

by

Sahil Singh

Reg. No. 160911051

Under the guidance of

Dr.Sanjay Singh
Professor
Department of I & CT
Manipal Institute of Technology

AUGUST 2020
I dedicate my thesis to my friends and family.

i
DECLARATION

I hereby declare that this project work entitled Prediction of Stock Re-

turns Using Machine Learning is original and has been carried out by me

in the Department of Information and Communication Technology of Manipal

Institute of Technology, Manipal, under the guidance of Dr. Sanjay Singh,

Professor, Department of Information and Communication Technology, M.

I. T., Manipal. No part of this work has been submitted for the award of a

degree or diploma either to this University or to any other Universities.

Place: Manipal

Date :12-08-20

Sahil Singh

ii
CERTIFICATE

This is to certify that this project entitled Prediction of Stock Returns

using Machine Learning is a bonafide project work done by Mr. Sahil

Singh (Reg.No.:160911051) at Manipal Institute of Technology, Manipal,

independently under my guidance and supervision for the award of the Degree

of Bachelor of Technology in Information Technology.

Dr.Sanjay Singh Dr.Balachandra

Professor Professor & Head

Department of I & CT Department of I & CT

Manipal Institute of Technology Manipal Institute of Technology

Manipal, India Manipal, India

iii
ACKNOWLEDGEMENTS

I would llike to thank my internal guide for this project, Dr Sanjay Singh,

who guided me in the right direction to conduct my research on the subject.I

would also like to thank my college,Manipal Institute of Technology for having

provided their lab for doing my project work.

iv
ABSTRACT
Finacial market forecasting has been a very challenging problem for both

researchers and industrialists as the markets generally have a very low signal to

noise ratio. Therefore it can be considered as one of the toughest problems in

Machine Learning domain.There have been attempts to solve this problem us-

ing modified ML techniques.For instance in McNally, Roche, and Caton [8] the

author has tried to predict the directional movement of price of Bitcoin using

LSTM networks.He shows how the LSTM networks outperforms standard time

series forecasting techniques like ARIMA although the best accuracy achieved

is only 52 percent which further demonstrates the difficulty of financial market

predictions.

It was clear that some data transformation techniques would be needed

to modify the noisy price data before using it as input into the model.Two

appoaches have been taken to solve the prediction problem, namely funamental

analysis, in which certain key financial ratios which demonstrate the health

of the company has been used as an input, and Technical Analysis, wherein

certain transformations on the historical price data is done prior to giving

input to the model in order smooth out the inherent noise, similar to time

series analysis.

The yearly predictions using fundamental analysis inputs yielded a MCC of

0.11 indicating correlation between predictions and actual result.The Technical

analysis process yielded an accuracy of 58 percent in directional predictions.

CCS CONCEPTS

• Applied Computing → Forecasting:Decision analysis;

• Computing methodologies → Neural networks;Deep belief networks;

v
Contents

Acknowledgements iv

Abstract v

List of Tables ix

List of Figures x

Abbreviations x

1 Introduction 1

1.1 Problem Definition . . . . . . . . . . . . . . . . . 3

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . 4

2 Methodology 5

2.1 Fractal Structure of the Markets . . . . . . . . . . 5

2.1.1 Estimation of Hurst’s Exponent . . . . . . 6

2.1.2 Use of Hurst’s Exponent . . . . . . . . . . 7

2.1.3 Hurst Exponent as a Feature . . . . . . . . 8

vi
2.2 Using Control Systems theory to Filter Time Series

Data . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Butterworth Filter . . . . . . . . . . . . . 10

2.2.2 High Pass Digital Filters . . . . . . . . . . 13

2.2.3 The Problem of Spectral Dilation . . . . . 14

2.2.4 Automatic Gain Control . . . . . . . . . . 15

2.2.4.1 Calculation of K . . . . . . . . . 16

2.2.5 Roofing Filter . . . . . . . . . . . . . . . . 16

2.3 RSI . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Modified RSI . . . . . . . . . . . . . . . . 17

2.4 MLP-LSTM . . . . . . . . . . . . . . . . . . . . . 18

2.5 Prediction using Fundamental Analysis . . . . . . 19

2.5.1 General Approach . . . . . . . . . . . . . . 19

2.5.2 Normalization Details . . . . . . . . . . . 19

2.5.3 Features of the Data . . . . . . . . . . . . 21

2.5.4 AdaBoost Classifier . . . . . . . . . . . . . 21

3 Results 23

3.1 Results on Fundamental Analysis . . . . . . . . . 24

3.2 Results on Technical Analysis . . . . . . . . . . . 28

4 Conclusion 29

Appendices 31

vii
A 32

A.1 Butterworth Filter . . . . . . . . . . . . . . . . . 32

A.2 High pass filter . . . . . . . . . . . . . . . . . . . 32

References 34

ProjectDetail 35

viii
List of Tables

3.1 Perfomance of Neural Net and AdaBoost Classifier

in Fundamental Analysis classification task . . . . 27

A.1 Project Detail . . . . . . . . . . . . . . . . . . . . 36

ix
List of Figures

2.1 A visual representation of behaviour of time series

based on the Hurst’s Exponent value . . . . . . . 8

2.2 Power,Frequency and phase,frequency plot for But-

terworth filter . . . . . . . . . . . . . . . . . . . 12

2.3 Power,Frequency and phase,frequency plot for Mov-

ing Average filter . . . . . . . . . . . . . . . . . . 12

2.4 High Pass Filter Gain-Frequency plot . . . . . . . 14

2.5 Details of the MLP-LSTM neural network used . 20

2.6 MLP-LSTM . . . . . . . . . . . . . . . . . . . . . 21

x
ABBREVIATIONS

LDA : Latent Drichlet Allocation

API : Application Programming Interface

ML : Machine Learning

LSTM : Long-Short Term Memory networks

ARIMA : Auto Regressive Integrated Moving Averages

MCC : Mathew’s Correlation Coefficient

CNN : Convolutional Neural Network

tp : True Positives

fp : False Positives

fn : False Negatives

tn : True Negatives

CAPM : Capital Asset Pricing Model

xi
Chapter 1

Introduction

The problem of predicting future stock returns would qualify as

the most challenging machine learning problem, for in financial

markets there is no guarantee for a pattern to exist and the signal

to noise ratio of the data is very low.The risk of overfitting in-

creases as a result and hence novel techniques have to be devised

to take care of such problems.There are two common approaches

when it comes to predicting returns namely, fundamental analy-

sis(one which relies on using financial statements for prediction)

and technical analysis(time series prediction). In this project ef-

fort has been made to devise an algorithm that combines both

the approaches.

Effort has been made by researchers in this domain, for exam-

ple in McNally, Roche, and Caton [8], the researchers used LSTM

to forecast the directional price of bitcoin.While they found out

that LSTM performs better than certain benchmark techniques

1
like ARIMA, it had only 52 percent accuracy which limits the

practical application of the model to real world trading.In this

project greater accuracy has been tried to achieve.

Algorithmic trading is on the rise everywhere due to availabil-

ity of large amounts of data.It would be important for an indi-

vidual investor or an investment firm to take advantage of these

developments to remain competitive.

The systems used have been able to attain an accuracy of

58 percent in predicting the directional returns 10 days ahead,

while the fundamental analysis system attains an MCC of 0.11.

It is important to note that the result of these models could be

further improved by using more relevant data.Hedge funds and

other investment firms generally have access to alternate datasets

not freely available and therefore can further improve the model

using their own data.

Recently, a new technique called manifold mixup(Verma et al.

[10]) has been devised to resolve the problem of overfittingin neu-

ral networks in classification problems.The basic idea is to see the

output as a weighted combination of all the classes so that the

continues values produced as a result would produce a smooth de-

cision function, less likely to overfit.It is useful for reducing over-

fitting as well as data augmentation in cases where sufficient data

is not available.In this project modifications have been made to

2
use it to solve the problem of classification in imbalanced classes.

Modelling of spatio-temporal time series requires modelling

both the temporal aspect and the spatial aspect of the data.One

particular approach taken to such a problem was in Wang et

al. [11], where they used a CNN-LSTM neural network for sen-

timent analysis of textual data.The dimensional aspect of each

text was encoded by the CNN as a separate region within a sen-

tence.Therefore each consecutive sentence had its own CNN and

the outpu of these CNNs was fed as input to the LSTM blocks.In

this project a similar strategy has been devised wherein MLP-

LSTM neural network has been used to model the multivariate

OHLC(Ope,High,Low and Close of everyday of a given stock)

time series data.

1.1 Problem Definition

• The fractal nature of the markets, which plays an important

role in understanding the state of the markets and in effect

any time series, has largely been ignored by practitioners.

• Traditional time series analysis uses arbitrary methods like

Moving Averages for smoothing and data transformations

which have a lot of flaws.

• Lack of proper metrics being used by practitioners for judg-

3
ing the performance of a classifier.

1.2 Objectives

• To use well studied techniques for the pre-processing of fi-

nancial time series data.

• Develop an approach specifically tailored for a multivariate

time series.

• Use Machine Learning Classification techniques on financial

statements to predict their annual stock returns.

4
Chapter 2

Methodology

2.1 Fractal Structure of the Markets

It has long been suspected that markets have a fractal structure

i.e. the time series looks the same whether one samples the data

hourly,daily,weekly,etc.This makes intuitive sense as the shape of

the time series curve is related to the fluctuation in prices which

is a reflection of the risks.Therefore for a market to be stable the

inherent risk should scale up according to the fractal law relative

to the time horizon of investment.The problem of determining the

Hurst’s exponent of a time series was first studied in the field of

hydrology.The problem of predicting rivers and lake levels for the

design of reservoirs was studied in Hurst [5],wherein they used

past data to predict river levels for the forthcoming year.As a

result of this work, something known as Hurst’s exponent was

created which shows the nature of the time series i.e whether the

time series is mean reverting,persistent or following a random

5
Brownian motion.The equation for the Hurst’s exponent is as

follows.
 
R(n)
E = CnH as n → ∞ (2.1)
S(n)
where, R(n) is the range of sum of deviation from the mean.

S(n) is the standard deviation.

E(x) is the expected value.

n is the time period of observation relative to the above mea-

surements.

C is a constant.

In a time series that is self-similar, H is related to the frctal

dimension D as D = 2 − H such that 1 < D < 2.For more details

on the fractal dimension check out Mandelbrot [7].

2.1.1 Estimation of Hurst’s Exponent

There are a lot of techniques to estimate the Hurst’s exponent,

however the one which has been used in this project is called the

Rescaled Range analysis as given in Gilmore et al. [4].

A time series is divided into a shorter time series of its factors

i.e a time series of length N is divided into n=N,N/2,N/4 the

rescaled range is then calculated for each series.

Consider a time series of length n,X = X1 , X2 , .....Xn

1
Pn
1. The mean, m is given by n i=1 Xi

6
2. The next step is to create a mean adjusted series: Yt =

Xt − m for t = 1, 2, 3 · · · n

Pt
3. We calculate the sum of the deviated series Zt = i=1 Yi for

t = 1, 2, 3 · · · n

4. Then we compute the range R: R(n) = max(Z1 , Z2 · · · Zn ) −

min(Z1 , Z2 · · · Zn )

5. The standard deviation S of the considered interval is cal-

culated:
qP
n
S(n) = 1
n i=1 (Xi − m)2

6. The Hurst exponent is then calculated by performing linear


h i
R(n)
regression of log S(n) on log n, where the slope of the line

gives the value of H.

2.1.2 Use of Hurst’s Exponent

The value of Hurst’s exponent tells us about the nature of the

time series,which in turn gives us information about the certain-

ity of the prediction.The H value of 0.5 tells us that the time

series follows a geometric Brownian motion and hence is random,

indicating that it is not possible to make accurate predictions.For

H > 0.5 we say that the time series is persistent which means that

it is bound to keep moving in one direction whereas for H < 0.5

it is mean reverting, which means it is likely to be range bound.

7
Figure 2.1: A visual representation of behaviour of time series based on the Hurst’s

Exponent value

2.1.3 Hurst Exponent as a Feature

In this project, the Hurst’s Exponent has been used as a feature

of the multivariate time series.Since our model gives the prob-

ability of an up or down move for a given period,the value of

the exponent is an important indicator of the nature of the most

recent time series, as described in the previous section, which

in turns tells the model whether its possible in the first place

to make an accurate prediction.The value of Hurst’s Exponent

has been calculated using the Closing prices of the previous 150

days.Therefore for every day, the corresponding value is calcu-

lated by using the look back period of 150 days.

8
2.2 Using Control Systems theory to Filter

Time Series Data

The principles of Control Systems theory have been widely stud-

ied and applied in the filed of electronics for the purpose of mainly

smoothing the incoming signal or getting rid of unwanted fre-

quencies.Since any time series can be identified as basically a

signal(analog or digital), these same principles can be applied for

modifying or cleaning the time series data before entering it into

our prediction model.

Definition 2.1 (Laplace Transform) Laplace transform is the

mapping of the function f (t) where t is time into the s-plane such

that
Z ∞
F (s) = e−st f (t) (2.2)
0

where s is a complex number s = a + ib

Definition 2.2 (Transfer Function) A transfer function is a

function that denotes the mapping of input to the output generally

denoted as
O(s)
= f (s) (2.3)
I(s)
where O(s) is the output and I(s) is the input.

Definition 2.3 (Z-Transform) The Z-transform is defined by

the transformation of Laplace transform such that Z = e−st .The

9
following relation holds while using it to a function f (t)

f (t − k) = Z −k f (t) (2.4)

Definition 2.4 (Moving Average) A Moving Average, F (t),

of a time series f(t) at a point t is defined as

F (t) Z −1 + Z −2 + · · · Z −N
= (2.5)
f (t) N

where N is the period of moving average.

Definition 2.5 (Gain(dB)) Gain in decibels for an angular fre-

quency ω is given by

G = 20 log10 (H(ω)) (2.6)

Where H(ω) is the transfer function.

For our purpose,we will be using digital filters since the finan-

cial market time series data is not continuous.

2.2.1 Butterworth Filter

To smoothen the data, the simplest and the most widely used

technique is moving averages as portrayed in Definition 2.4.How-

ever to obtain more smoothing more no of data points will be

needed as inputs to the Moving Average which creates lag i.e. the

filter takes time to react to the changes.Now traders want entry

signal as fast as possible, but for our model to interpret it prop-

erly we also want sufficient smoothing.The problem is solved by

10
the digital version of Butterworth filter as given in Ehlers [1].The

butterworth filter basically removes the lower frequencies in the

signal or a time series so that the signal is less noisy.This removal

of high frequency signals is not performed by Moving Averages.

The equation of a two-pole butterworth filter is as follows:



2π∗ T1
a=e
 
1
b = 2 ∗ a ∗ cos 1.414 ∗ 1.25 ∗ π ∗
T
c2 = b
(2.7)
c3 = −a ∗ a

c1 = 1 − c2 − c3

O(t) = c1 ∗ I(t) − c2 ∗ O(t − 1) − c3 ∗ O(t − 2)

Where O(t) is the filtered time series while I(t) is the orignal

time series. T is the time period corresponding to the desired


1
cuttoff frequency, F = T, such that only frequencies below this

will be retained. As can be seen clearly in Figure 2.2, the gain

response is same for freuencies less than the cutoff frequency, af-

ter which it drops sharply eliminating high frequency signals. In

comparison ,the higher frequency components are not removed

and the frequency response is also not smooth in a Moving Av-

erage filter as shown in Figure 2.3

The code is given in Appendix A.1.

11
Figure 2.2: Power,Frequency and phase,frequency plot for Butterworth filter

Figure 2.3: Power,Frequency and phase,frequency plot for Moving Average filter

12
2.2.2 High Pass Digital Filters

A fundamental requirement of time series analysis is making the

time series stationary, that is constant mean and variance through-

out the series.This property is tried to be achieved by taking the

difference of succesive terms. However this renders the result-

ing output quite jittery,not to mention the high frequency noise

preset in such data.Therefore a balance is required between high

frequency and low frequency components.This is where High pass

filters come in.They attenuate te frequency components greater

than the cutoff frequency and let the lower ones pass.

The following is the transfer function of a single pole high pass

filter

cos .707 ∗ 2 ∗ π ∗ T −1 + sin .707 ∗ 2 ∗ π ∗ T −1 − 1


 
α=
cos (.707 ∗ 2 ∗ π ∗ T −1 )
(2.8)
O(t) (1 − α/2) 1 − Z −1

=
I(t) 1 − (1 − α)Z −1
Where I(t) is the input time series and O(t) is the output time

series.

13
Figure 2.4: High Pass Filter Gain-Frequency plot

2.2.3 The Problem of Spectral Dilation

Spectral Dilation as mentioned in Ehlers [2] is basically the in-

crese in the amplitude of signals as their frequency decreases.This

results in the output signal having more of these lower frequen-

cies.This effect is agin due to t fractal geometry, or fractal nature

of time series,because when the time interval in consideration in-

creases, the range of price swings also increases,thus increasing

the amplitude. The power or the gain increases in proportion


1
to Fα where F is the frequency and α = 2H with H being the

Hurst’s Exponent of the time series.The amplitude increases at

6db per octave for α = 1 or a time series that is a Brownian

motion.For persistent time series the increase is even more.A sin-

14
gle pole High pass filter only attenuuates at the rate of -6db per

octave and therefore is not enough for persistent time series.It

is for this reason we use a two-pole high pass filter so that the

attenuation is more than the spectral dilation gain.The code for

a two pole high pass filter is provided in Appendix A.2.

2.2.4 Automatic Gain Control

It is well known that for an ML algorithm to perform optimally,

the input has to be normalized within the range of -1 to 1 or 0

to 1.In this project, the technique of Automatic Gain Control as

mentioned in Ehlers [3] is applied to the filter output to maintain

a steady gain ratio.The steps taken are:

1. Peak value is initially set to 0.The first value of the series is

then set as peak.

2. Continue forward and check if any value is greater than the

value of peak.If its not then New peak value is calculated as

Peak(t)=Peak(t-1)*K where K is decided beforehand.If the

value at the current time step is greater than peak,then peak

takes this new value.

3. Check if Peak value is greater than 0 and divide the value at

current time step by the Peak value.

15
2.2.4.1 Calculation of K

The gain decay factor for a theoretical sine wave would be

Gain = K Period/2 (2.9)

Since we will be considering only the period between 10 and 48

days in our project the effective gain becomes K 24−5 = K 19 . A

reasonable value of attenuation would be -1.5db therefore we get

−1.5 = 20 log10 (K 19 )

solving which gets us a K value of 0.991.

2.2.5 Roofing Filter

To obtain the advantages of both stationarity and smoothing, a

combination of high pass and low pass filter will have to be used.In

this project, the data of daily closing prices is first passed through

a high pass filter of a period of 48 days removing all frequencies

below that, and the resulting output is smoothed by passing it

through a two-pole Butterworth filter of 10 days removing all

frequencies above that.Hence we get the final series containing

frequencies between 10 days and 48 days.

2.3 RSI

The RSI value is an indicator of the strength of the trend its

calculated as the following

16
SM M A(U, n)
RS =
SM M A(D, n)
where SMMA(U,n) is the average of positive returns over the

last n time periods and SMMA(D,n) does the same for negative

returns.

Using the relative strength factor RSI is calculted by the fol-

lowing formula;

100
RSI = 100 −
1 + RS

2.3.1 Modified RSI

The RSI equation can be rearranged to be written as

100 ∗ SM M A(U, n)
RSI = (2.10)
SM M A(U, n) + SM M A(D, n)
where 100 is just a scaling term and can be ignored. Let D(t) =

SM M A(U (t), n) + SM M A(D(t), n), where SM M A(U (t), n) is

the average of positive returns over the previous n time steps

corresponding to time t and likewise for SM M A(D(t), n) where

D(t) is the series of absolute magnitude of negative returns. We

apply Butterworth filter in the calculation of RSI,where c1,c2 and

17
c3 are given in equation (2.7).

RSI(t) = c1 ∗ (SM M A(U (t), n)/D(t) + SM M A(U (t − 1), n)/D(t − 1))/2

+c2 ∗ RSI(t − 1) + c3 ∗ RSI(t − 2)

(2.11)
Before feeding input into the RSI, the original time series is

passed through a roofing filter as described in the previous section

with a two-pole high pass filter having a critical period of 48 and

the Butterworth filter having a critical period of 10.The above

mentioned modified RSI values with periods 5 and 14 are used as

features into our neural net along with Hurst’s Exponent.

2.4 MLP-LSTM

The time series we had was a multivariate time series,therefore it

was clear that a simple LSTM network would not suffice.Hence

a modified version of the LSTM was created wherein there was a

Multi-Layered Perceptron for each block of LSTM and the output

of the MLP was fed into the input of the corresponding LSTM

block.

Three inputs were taken for each timestep i.e. RSI value of

5,RSI value of 14 and Hurst’s exponent calculated over the period

of previous 150 days.A look-back period of 20 days was used,

therefore our training tensor was basically a 20 × 3 matrix. The

output target was a binary number indicating whether the return

18
has been positive or negative in the next 10 days.

In figure 2.6, we show the structure of MLP-LSTN, where Xit

is the ith feature of the timestep t.The output from the MLP will

be taken as the input for the LSTM block.

2.5 Prediction using Fundamental Analysis

2.5.1 General Approach

The financial statement of the companies for the last 20 years

was downloaded and the Key Financial Ratios were taken as in-

put which were fed into a prediction model after being normal-

ized.The output to be predicted was the directional return for the

next year.

2.5.2 Normalization Details

MinMax normalization was used to scale the ratios in the range

of 0 to 1.However this normalization was performed sector wise

i.e. the maximum and minimum value to be used for the stocks in

auto sector are different from those of the IT sector.The rationale

was that when an investor is deciding on buying a stock, he or she

compares it with similar stocks, and that’s why financial ratios

should be scaled relative to other stocks in the same sector.

19
Figure 2.5: Details of the MLP-LSTM neural network used

20
Label3 hhti

Cell Label1

cht−1i × + chti

Tanh

× ×

Hidden σ σ Tanh σ Label2

X1t hht−1i hhti

X2t
X3t .. O1
.. . xhti Input

Xnt .

Figure 2.6: MLP-LSTM

2.5.3 Features of the Data

1. The data had 64 features and 11,200 data points.

2. There was a class imbalance as there were more stocks with

negative returns than positive ones.

2.5.4 AdaBoost Classifier

The perfect algorithm for classifying data with class imbalance is

the AdaBoost Classifier.The classifier works in the following way:

1. Classification is performed using a weak learner and results

are recorded.

2. In the next stage the data vectors which were assigned the

incorrect label have greater probability of being chosen for

21
the next round of classification.After choosing the data vec-

tors the classification process is repeated.

3. This process goes on for n stages.The final classifier is the

sum of classifiers in all the stages weighted inversely to the

error of their classifications.

22
Chapter 3

Results

Definition 3.1 (True Positives(TP)) The number of labels that

have been predicted as positive correctly.

Definition 3.2 (False Positives(FP)) The number of labels that

have been predicted as positive incorrectly.

Definition 3.3 (False Negatives(FN)) The number of labels

that have been predicted as negative incorrectly.

Definition 3.4 (True Negatives(TN)) The number of labels

that have been predicted as negative incorrectly.

Definition 3.5 (Recall) Recall is the proportion of true posi-

tives cases among all positive labels.


TP
Recall = TP+FN

Definition 3.6 (Precision) Precision is the proportion of true

positives to the total number of labels that had been identified as

positive by the classifier.

23
TP
Precision = TP+FP

Definition 3.7 (F1-Score) F1-Score is the harmonic mean of

Precision and Recall.


2(Precision×Recall)
F1-Score = Precision+Recall

Definition 3.8 (Mathew’s Correlation Coefficient) Mathews

Correlation coefficient is defined according to the following equa-

tion

TP × TN − FP × FN
MCC = p (3.1)
(TP + FP)(TP + FN)(TN + FP)(TN + FN)

3.1 Results on Fundamental Analysis

Accuracy is not a good enough measure to judge the performance

of a classifier,therefore we use additional metrics such as MCC

and F1-Score which are defined above.F1-score basically mea-

sures the balance between precision and recall and MCC mea-

sures the correlation between the model’s predictions and the

observed data. We get an MCC of 0.11 for our fundamental anal-

ysis classifier using Ada Boost indicating a correlation between

our predictions and observations which means successful classifi-

cations. A comparitive study has also been done amongst a MLP

classifier and Ada Boost Classifier, the results are given below

in the table below. In table 3.1, accuracy and F1-score for var-

ious pre-processing types have been shown.In training the data

24
with neural nets, the method of manifold mixup has been used

to generate new training vectors such that the effect of class im-

balances can be mitigated.From a general point of view higher

weights were assigned to those vectors which belonged to the less

frequently occuring class label and lower weights to the more fre-

quently occuring.Assignments of these weights have been done

using two types of distributions, namely beta and normal.The

term balanced beta implies the use of Beta(0.5,0.5) as the dis-

tribution for assigning the weights.Otherwise the weights were

assigned in the following manner:

For Beta:

1. If the frequency of occurence of Class 0 is f1 in terms of the

total no of training vectors i.e. 0 < f 1 < 1,then our weight

assginment distribution will be α ∼ Beta(1 − f 1, f 1)

2. The training vectors are created such that the new training

input vector Xnew and the corresponding training output

vector, Ynew will be given by the following relations:

Xnew = αX0 + (1 − α)X1


(3.2)
Ynew = αY0 + (1 − α)Y1

Where X0 ,X1 are the input training vector belonging to

Class 0 and 1 respectively and likewise for Y0 ,Y1 .

25
For Normal:

1. The weight assignment procedure remains the same as above

except that in this case α ∼ Normal(1 − f 1, 0.1k), where k

can take the value of any integer assigned by the user, accord-

ing to how much variation the user wants in the weights.Normal2

indecates k value of 1,while norm4 indicates a k value of 4.In

this case the weights will be more centered around the mean

value than in the case of Beta Distribution.

2. normal feautures9,normal features30 and so on indicate that

feature pruning has been used for those models to see if it

performes better and the number indicates the no of features

included, the selection of which is done on the basis of these

features having higher correlation with annual returns than

others.

For AdaBoost Classifier:

Ada Boost has been used along with Decision tree classifier and

depth indicates a maximum depth level of 1 for the tree for a

single iteration while depth6 would indicate a maximum depth of

6.

Basically a comparitive analysis has been done between differ-

ent ways of using manifold mixup to correct class imbalances to

the classic way of using AdaBoost Classifier.

26
Table 3.1: Perfomance of Neural Net and AdaBoost Classifier in Fundamental

Analysis classification task

model type preprocessing type accuracy F1 Score

0 neural net balanced beta 0.470859 0.485842

1 neural net beta 0.679448 0.205323

2 neural net normal 0.671779 0.286667

3 neural net normal2 0.564417 0.466165

4 neural net beta2 0.630368 0.377261

5 neural net beta3 0.664110 0.220641

6 ada depth 0.662577 0.402174

7 ada depth6 0.682515 0.378378

8 neuralnet norm4 0.673313 0.116183

9 neuralnet normal features9 0.579755 0.424370

10 neuralnet normal features30 0.613497 0.388350

11 neuralnet normal features5 0.595092 0.394495

27
3.2 Results on Technical Analysis

The best accuracy obtained from the technical analysis time series

classification was 58.5 percent and a final cross-entropy loss of

0.512 which is good enough from a financial market point of view.

28
Chapter 4

Conclusion

From the above mentioned results it is reasonable to conclude

that our models predict the direction of the returns better than

random.Investors and traders who don’t have time to research the

markets to select stocks can rely on these models as reccomender

systems.

In this project, the time series data used was sampled at daily

interval,however in the future similar methods can be applied to

data sampled at shorter time intervals.The advantage of more

data being available as a result of higher sampling frequency

would make the model more robust and adaptive to all kinds

of condition.

Only one technical indicator has been studied in this project as

a feature, but more such technical indicators,which come under

the category of oscillators such as stochastic oscillator can be

studied for their potential use as a feature.

29
It is also observed that the scope of this project was limited

by the availability of only the data that is freely available .Since

financial markets are very competitive, there’s not much capital-

izable information present in freely available data, therefore al-

ternate datasets such as consumer surveys and other proprietary

datasets can be used to make more accurate predictions.

Another thing not explored in this project which is gaining

much popularity these days is sentiment analysis.It is basically us-

ing NLP(Natural Language Processing) techniques to guage the

public sentiment or opinion on a particular financial product.For

example, the researchers had found that media sentiments signif-

icantly effects Bitcoin’s price and that investors tend to overreact

on news in a shorter time frame.Therefore in the future modifi-

cations of the system presented in this project,sentiment analysis

could also be added along with other methods.

30
Appendices

31
Appendix A

A.1 Butterworth Filter

Listing A.1: Two pole butterworth filter Python code

def b u t t e r w o r t h 2 ( s e l f , c l , p e r i o d ) :

c l 1=c l

c l = np . z e r o s ( len ( c l ) )

a = np . exp ( − 1 .4 1 4 ∗ 3 .1 4 15 9 / p e r i o d )

b = 2∗ a∗np . c o s ( 1 . 4 1 4 ∗ ( 3 . 1 4 1 5 9 / 2 ) / p e r i o d )

c2 =b

c3=−a∗a

c1=1−c2−c3

for i in range ( 3 , len ( c l ) ) :

c l [ i ] = c1 ∗( c l 1 [ i ]+ c l 1 [ i −1])/2+ c2 ∗ c l [ i −1]+c

return c l

A.2 High pass filter

Listing A.2: Two pole High-pass filter Python code

32
def h i g h p a s s 2 ( s e l f , c l , p e r i o d ) :

hp = np . z e r o s ( len ( c l ) )

c o s e l e m e n t=np . c o s ( 0 . 7 0 7 ∗ 2 ∗ np . p i / p e r i o d )

s i n e l e m e n t=np . s i n ( 0 . 7 0 7 ∗ 2 ∗ np . p i / p e r i o d )

alpha = ( c o s e l e m e n t+s i n e l e m e n t −1)/ c o s e l e m e n t

print ( alpha )

peak = np . z e r o s ( len ( c l ) )

for i in range ( 3 , len ( c l ) ) :

hp [ i ] = (1− alpha / 2 ) ∗ ∗ 2 ∗ ( c l [ i ] −2∗ c l [ i −1]+ c l [ i

return hp

33
References

[1] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:

Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,

pp. 31–33.

[2] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:

Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,

pp. 77–79.

[3] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:

Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,

pp. 54–55.

[4] M Gilmore et al. “Investigation of rescaled range analysis, the Hurst

exponent, and long-time correlations in plasma turbulence”. In: Physics

of Plasmas 9.4 (2002), pp. 1312–1317.

[5] Harold E Hurst. “The problem of long-term storage in reservoirs”. In:

Hydrological Sciences Journal 1.3 (1956), pp. 13–27.

[6] Vytautas Karalevicius, Niels Degrande, and Jochen De Weerdt. “Using

sentiment analysis to predict interday Bitcoin price movements”. In: The

Journal of Risk Finance (2018).

[7] Benoit B Mandelbrot. “Self-affine fractals and fractal dimension”. In:

Physica scripta 32.4 (1985), p. 257.

34
[8] S. McNally, J. Roche, and S. Caton. “Predicting the Price of Bitcoin

Using Machine Learning”. In: 2018 26th Euromicro International Con-

ference on Parallel, Distributed and Network-based Processing (PDP).

2018, pp. 339–343.

[9] Yensen Ni, Yi-Ching Liao, and Paoyu Huang. “Momentum in the Chi-

nese stock market: Evidence from stochastic oscillator indicators”. In:

Emerging Markets Finance and Trade 51.sup1 (2015), S99–S110.

[10] Vikas Verma et al. “Manifold mixup: Encouraging meaningful on-manifold

interpolation as a regularizer”. In: stat 1050 (2018), p. 13.

[11] Jin Wang et al. “Dimensional sentiment analysis using a regional CNN-

LSTM model”. In: Proceedings of the 54th Annual Meeting of the Asso-

ciation for Computational Linguistics (Volume 2: Short Papers). 2016,

pp. 225–230.

35
Table A.1: Project Detail

Student Details

Student Name Your Name

Registration 160911051 Section/Roll No. A/35


Number

Email Address Phone No.(M)


sahil.singh3@learner.manipal.edu 7004164643

Project Details

Project Title Prediction of StockReturns using Machine Learning

Project Duration 4-6 Months Date of Reporting 03-01-2020

Faculty Name Dr. Sanjay Singh

Full Contact Ad- Department of Information and Communication Technology,


dress with PIN Manipal Institute of Technology, Manipal-576104
Code

Email Address sanjay.singh@manipal.edu

36

You might also like