Professional Documents
Culture Documents
Project Report Stock Market
Project Report Stock Market
LEARNING ALGORITHM
Submitted by
THIRUMUTHUKUMARAN M (211517205114)
of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
APRIL 2021
PANIMALAR INSTITUTE OF
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Certified that the candidates were examined in the university project viva-voce
held on at Panimalar Institute of Technology, Chennai
600 123.
INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGEMENT
A project of this magnitude and nature requires kind co-operation and support from
many, for successful completion. We wish to express our sincere thanks to all those who
were involved in the completion of this project.
We seek the blessing from the Founder of our institution Dr.JEPPIAAR, M.A,
Ph.D, for having been a role model who has been our source of inspiration behind our
success in education in his premier institution. Our sincere thanks to the Honorable
Chairman of our prestigious institution Mrs.REMIBAI JEPPIAAR for her sincere
endeavor in educating us in her premier institution.
We would like to express our deep gratitude to our beloved Secretary and
Correspondent Dr.P.CHINNADURAI, M.A, Ph.D, for his kind words and enthusiastic
motivation which inspired us a lot in completing this project.
We also express our sincere thanks and gratitude to our dynamic Directors
Mrs.C.VIJAYA RAJESHWARI, Dr.C.SAKTHI KUMAR, M.E, Ph.D, and
Mrs.S.SARANYA SREE SAKTHI KUMAR, B.E, M.B.A, for providing us with
necessary facilities for completion of this project.
We also express our appreciation and gratefulness to our respected Principal Dr. T.
JAYANTHY, M.E, Ph.D, who helped us in the completion of the project. We wish to
convey our thanks and gratitude to our Head of the Department,
Dr.R.JOSPHINELEELA, M.E(CSE), Ph.D(CSE), for her full support by providing
ample time to complete our project.
Special thanks to our Project Coordinator Dr.J.CHENNI KUMARAN, B.E(CSE),
M.Tech.(CSE), Ph.D(CSE), Professor, and Internal Guide Mr.R.PRAVEEN KUMAR,
B.Tech, M.E, Assistant Professor for their expert advice, valuable information and
guidance throughout the completion of the project.
Last, we thank our parents and friends for providing their extensive moral support and
encouragement during the course of the project.
ABSTRACT
Stock price movement is non-linear and complex. The stock market data has the
characteristics of non-linear, high noise, complexity, and timing, etc., Several works have
been carried out to predict stock prices and the traditional stock prediction method is to
build a linear prediction model based on the historical stock data. Some other traditional
approaches such as Linear Regression and Support Vector Regression were used but those
algorithms did not possess adequate level of accuracy. Due to very high variations in stock
prices, various system applied deep learning algorithms due to its proven accuracy in
various analytics fields. Artificial Neural Network was deployed to predict stock prices in
the existing papers but as stock prices are time-series based, it does not attain certain
accuracy. With the intent of making betterment to the existing system, Recurrent neural
network (RNN) was applied to improve prediction accuracy. In RNN, there is limitation of
not able to store high dependencies and vanishing gradient descent issue exists. Therefore,
we have constructed and applied the deep learning sequential model, namely Long Short-
Term Memory Model (LSTM), into the prediction of stock prices on the next 30 days by
using 4 years of historical data. Our input data are carefully selected and applied into the
models. The results show that the stock price prediction using LSTM is very efficient and
effective over other models. Furthermore, we discovered that the stacked-LSTM model
improves the predictive power over LSTM.
I
TABLE OF CONTENTS
NO. NO.
ABSTRACT I
LIST OF ABBREVATIONS IV
1. INTRODUCTION 2
2. SYSTEM DESCRIPTION 4
2.1 Existing system 4
2.2 Proposed system 5
3. LITERATURE SURVEY 6
4. MACHINE LEARNING 10
4.1Machine Learning Description 10
4.2 Deep Learning 10
4.3 Working of ML Algorithms 11
5. PYTHON 13
6. ANACONDA 14
7. ML FOR STOCK PRICE PREDICTION 18
7.1 Long Short-Term Memory 18
7.2 Implementation 19
8. REQUIREMENT SPECIFICATION 26
II
9. SYSTEM DESIGN 27
9.1 Architecture Diagram 27
III
LIST OF ABBREVATIONS
2
1.INTRODUCTION
1.1 overview
The stock market is the place where the stocks are transferred, traded, and
circulated. On the one hand, the issuance of stock provides a legal and reasonable
channel for capital flow, which enables a large amount of idle capital to be
gathered in the stock market[1]. Such effective accumulation of capital can
improve the organic composition of enterprise capital and greatly promote the
development of economy. On the other hand, the circulation of stock enables the
capital to be collected effectively and the accumulation of capital is effectively
promoted[2]. Based on this, the stock market is generally regarded by scholars
from all walks of life as an intuitive reflection of the economic development of a
country or region in a certain period. One of the main reasons lies in the stock
market trading prices can objectively reflect the stock market supply and demand
relations[3]. Moreover, the stock market is often regarded as an indicator of stock
prices and quantities. However, With the rapid development of social economy, the
number of listed companies is increasing, so the stock has become one of the major
topics in the financial field. The changing trend of stock often affects the direction
of many economic behaviours to a certain extent [4], so the prediction of stock
price has been paid more and more attention by scholars. The stock market data
has the characteristics of non-linear, high noise, complexity, and timing, etc., [5].
The traditional stock prediction method is to build a linear prediction model based
on the historical stock data, (Bowden et al). [7] proposed to use ARIMA method to
build autoregressive model to predict stock prices. Although this method has some
advantages in computational efficiency, the assumption of statistical distribution
and stability of the research data limits their ability to model. The nonlinear and
non-stationary financial time series, and the outliers in the research data also have
a great impact on the prediction results. There are many factors affecting stock
3
prices. In our system, the Long Short Term Memory Model of Deep learning is
used to predict stock prices.
1.2 OBJECTIVE
The main goal of our system is to discover the role of time series through analysing
the historical information of the stock market, and to deeply explore its internal
rules through the selective memory advanced deep learning function of LSTM
neural network model, to achieve the prediction of stock price trend.
1.3 SCOPE
4
2. SYSTEM DESCRIPTION
This paper compares nine machine learning models (Decision Tree, Random
Forest, Adaptive Boosting (Adaboost), eXtreme Gradient Boosting (XGBoost),
Support Vector Classifier (SVC), Naïve Bayes, K-Nearest Neighbors (KNN),
Logistic Regression and Artificial Neural Network (ANN)) and two powerful deep
learning methods (Recurrent Neural Network (RNN) and Long short-term memory
(LSTM). Ten technical indicators from ten years of historical data are used as the
input values, and two ways are used for employing them. Firstly, they calculated
the indicators by stock trading values as continues data, and secondly converting
indicators to binary data before using. Each prediction model is evaluated by three
metrics based on the input ways. The evaluation results indicate that for the
continues data. Results shown that in the binary data evaluation, deep learning
methods are the found to perform well. however, the difference becomes less
because of the noticeable improvement of models’ performance in the second way.
Disadvantages:
Our system proposes the long short-term memory model to predict stock
price trend. In our system, the input of 4 past observed years is given to predict the
stock price of next 30 days. So, if we have an input of 4 past years, the network
output will be the prediction for the 30 next days. We will split the data in Train and
Test. The test will be composed of k periods, in which every period is a series of 30
days prediction. Usage of the most precise forecasting technology using Long
Short-Term Memory unit which helps investors, analysts or any person interested
in investing in the stock market by providing them a good knowledge of the future
situation of the stock market.
ADVANTAGE
5
3.LITERATURE SURVEY
YEAR : 2019
DESCRIPTION :
6
2. TITLE : STOCK VOLATILITY PREDICTION BY HYBRID NEURAL NETWORK
Zhang
YEAR : 2019
DESCRIPTION :
This paper proposes a hybrid time-series predictive neural network (HTPNN) that
combines the effection of news. The features of news headlines are expressed as
distributed word vectors which are dimensionally reduced to optimize the
efficiency of the model by sparse automatic encoders. Then, according to the
timeliness of stocks, the daily K-line data is combined with the news. HTPNN
captures the potential law of stock price fluctuation by learning the fusion feature
of news and time series, which not only retains the effective information of news
and stock data, but also eliminates the redundant information of the text. Compared
with the state-of-the-art methods, this method combines more abundant stock
characteristics and has more advantages in running speed. Besides, the accuracy is
averagely improved by nearly 5%. This paper faces a drawback that the
segmentation length for news and index sequences are fixed. However, in actual
trading, the effect of different events on stock fluctuation may be different. Still,
they would like to analyze the impact of event intensity and how to divide the
sequence window more scientific for better prediction accuracy, ultimately achieve
the goal of profitability by applying the investment strategy.
7
3. TITLE : AUGMENTED TEXTUAL FEATURES-BASED STOCK MARKET
PREDICTION
YEAR : 2020
DESCRIPTION :
8
YEAR : 2020
DESCRIPTION :
YEAR : 2019
DESCRIPTION :
9
the traditional signal process methods and frequency trading patterns modeling
approach with deep learning in stock trend prediction.
4. MACHINE LEARNING
This kind of machine learning is called “deep” because it includes many layers of
the neural network and massive volumes of complex and disparate data. To
achieve deep learning, the system engages with multiple layers in the network,
extracting increasingly higher-level outputs. For example, a deep learning system
that is processing nature images and looking for Gloriosa daisies will – at the first
10
layer – recognize a plant. As it moves through the neural layers, it will then
identify a flower, then a daisy, and finally a Gloriosa daisy. Examples of deep
learning applications include speech recognition, image classification, and
pharmaceutical analysis.
Neural Network
12
5. Python
Python is an interpreter, object-oriented, high-level programming language with
dynamic semantics. Its high-level built-in data structures, combined with dynamic
typing and dynamic binding make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python supports modules
and packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed Often,
programmers fall in love with Python because of the increased productivity it
provides. Since there is no compilation step, the edit-test-debug cycle is incredibly
fast. Debugging Python programs is easy: a bug or bad input will never cause a
segmentation fault. Instead, when the interpreter discovers an error, it raises an
exception. When the program doesn't catch the exception, the interpreter prints a
stack trace. A source level debugger allows inspection of local and global
variables, evaluation of arbitrary expressions, setting breakpoints, stepping through
the code a line at a time, and so on. The debugger is written in Python itself,
testifying to Python's introspective power. On the other hand, often the quickest
way to debug a program is to add a few print statements to the source: the fast edit-
test-debug cycle makes this simple approach very effective
Python: Dynamic programming language which supports several different
programming paradigms:
Procedural programming
Object oriented programming
Functional programming
13
6. Anaconda
Anaconda will enable you to create virtual environments and install packages
needed for data science and deep learning. With virtual environments you can
install specific package versions for a particular project or a tutorial without
worrying about version conflicts.
14
# create a new environment with conda
$ conda create -n [my-env-name]
$ conda cerate -n [my-env-name] python=[python-version]
# activate the environment you created
$ source activate [my-env-name]
# take a look at the environment you created
$ conda info
$ conda list
# install a package with conda and verify it's installed
$ conda install numpy
$ conda list
# take a look at the list of environments you currently have
$ conda info -e
# remove an environment
$ condaenv remove --name [my-env-name]
I highly recommend you download and print out the Anaconda cheatsheet here.
Condavs Pip install
15
Numpy
NumPy is the fundamental package for scientific computing in Python. It is a
Python library that provides a multidimensional array object, various derived
objects (such as masked arrays and matrices), and an assortment of routines for fast
operations on arrays, including mathematical, logical, shape manipulation, sorting,
selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical
operations, random simulation and much more. At the core of the NumPy package,
is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data
types, with many operations being performed in compiled code for performance.
There are several important differences between NumPy arrays and the standard
Python sequences:
⮚ NumPy arrays have a fixed size at creation, unlike Python lists (which can
grow dynamically). Changing the size of an array will create a new array
and delete the original.
⮚ The elements in a NumPy array are all required to be of the same data type,
and thus will be the same size in memory. The exception: one can have
arrays of (Python, including NumPy) objects, thereby allowing for arrays of
different sized elements.
⮚ NumPy arrays facilitate advanced mathematical and other types of
operations on large numbers of data. Typically, such operations are
executed more efficiently and with less code than is possible using Python’s
built-in sequences.
⮚ A growing plethora of scientific and mathematical Python-based packages
are using NumPy arrays; though these typically support Python-sequence
input, they convert such input to NumPy arrays prior to processing, and they
16
often output NumPy arrays. In other words, in order to efficiently use much
(perhaps even most) of today’s scientific/mathematical Python-based
software, just knowing how to use Python’s built-in sequence types is
insufficient - one also needs to know how to use NumPy arrays.
Pandas
Data processing is important part of analyzing the data, because data is not all the
time accessible in preferred format. Various dispensation are necessary before
analyzing the data such as cleaning, restructuring or merging etc. Numpy, Scipy,
Cython and Panda are the tools available in python which can be used fast
processing of the data. Further, Pandas are built on the top of Numpy. Pandas
provides rich set of functions to process various types of data. Further, working
with Panda is fast, easy and more expressive than other tools. Pandas provides fast
data processing as Numpy along with flexible data manipulation techniques as
spreadsheets and relational databases. Lastly, pandas integrates well with
matplotlib library, which makes it very handy tool for analyzing the data.
Pandas provides two very useful data structures to process the data i.e. Series and
DataFrame . The Series is a one-dimensional array that can store various data
types, including mix data types. The row labels in a Series are called the index.
Any list, tuple and dictionary can be converted in to Series using ‘series’ .
DataFrame is the widely used data structure of pandas. Note that, Series are used to
work with one dimensional array, whereas DataFrame can be used with two
dimensional arrays. DataFrame has two different index i.e. column-index and row-
index. The most common way to create a DataFrame is by using the dictionary of
17
equal-length list as shown below. Further, all the spreadsheets and text files are
read as DataFrame, therefore it is very important data structure of pandas.
Prediction and analysis of the stock market are one of the most complicated tasks
to do. There are several reasons for this, such as the market volatility and so many
other dependent and independent factors for deciding the value of a particular
stock in the market. These factors make it very difficult for any stock market
analyst to predict the rise and fall with high accuracy degrees.
However, with the advent of Machine Learning and its robust algorithms, the latest
market analysis and Stock Market Prediction developments have started
incorporating such techniques in understanding the stock market data.
In short, Machine Learning Algorithms are being used widely by many
organisations in analysing and predicting stock values. This article shall go through
a simple Implementation of analysing and predicting a Popular Worldwide Online
Retail Store’s stock values using several Machine Learning Algorithms in Python.
18
speech or video). To understand the concept behind LSTM, let us take a simple
example of an online customer review of a Mobile Phone.
The Long short-term Memory Algorithm works in a way that it only remembers
the relevant information and uses it to make predictions ignoring the non-relevant
data. In this way, we must build an LSTM model that essentially recognises only
the essential data about that stock and leaves out its outliers.
The above LSTM architecture diagram shows that LSTM is an advanced version
of Recurrent Neural Networks that retains Memory to process sequences of data. It
can remove or add information to the cell state, carefully regulated by structures
called gates.
The LSTM unit comprises a cell, an input gate, an output gate, and a forget gate.
The cell remembers values over arbitrary time intervals, and the three gates
regulate the flow of information into and out of the cell.
7.2 IMPLEMENTATION:
The first step is to import libraries that are necessary to pre-process the stock data
and the other required libraries for building and visualising the outputs of the
LSTM model. For this, we will use the Keras library under the TensorFlow
19
framework. The required modules are imported from the Keras library
individually.
Print the Data Frame Shape and Check for Null Values.
In this yet another crucial step, we first print the shape of the dataset. To make sure
that there are no null values in the data frame, we check for them. The presence of
null values in the dataset tend to cause problems during training as they act as
outliers causing a wide variance in the training process.
20
print(“Null Value Present: “, df.IsNull().values.any())
>> Dataframe Shape: (7334, 6)
>>Null Value Present: False
Scaling
To reduce the data’s computational cost in the table, we shall scale down the stock
values to values between 0 and 1. In this way, all the data in big numbers get
reduced, thus reducing memory usage. Also, we can get more accuracy by scaling
down as the data is not spread out in tremendous values. This is performed by the
MinMaxScaler class of the sci-kit-learn library.
#Scaling
scaler = MinMaxScaler()
feature_transform = scaler.fit_transform(df[features])
feature_transform= pd.DataFrame(columns=features, data=feature_transform,
index=df.index)
feature_transform.head()
Executing this, the feature variables’ values are scaled down to smaller values
compared to the real values given above.
22
advantage of using this Time Series split is that the split time series data samples
are observed at fixed time intervals.
training_size=int(len(df1)*0.65)
test_size=len(df1)-training_size
train_data,test_data=df1[0:training_size,:],df1[training_size:len(df1),:1]
23
We use Adam Optimizer and the Mean Squared Error as the loss
function for compiling the model. These two are the most
preferred combination for an LSTM model. Additionally, the
model is also plotted and is displayed below.
#Model Training
model.fit(X_train,y_train,validation_data=(X_test,ytest),epochs=100,batch_size=64,
verbose=1)
LSTM Prediction
24
With our model ready, it is time to use the model trained using the LSTM network
on the test set and predict the Adjacent Close Value of the input stock. This is
performed by using the simple function of predict on the LSTM model built.
#LSTM Prediction
y_pred= lstm.predict(X_test)
plt.plot(day_new,scaler.inverse_transform(df1[1158:]))
plt.plot(day_pred,scaler.inverse_transform(lst_output))
df3=df1.tolist()
df3.extend(lst_output)
plt.plot(df3[1200:])
df3=scaler.inverse_transform(df3).tolist()
25
The above graph shows that some pattern is detected by the LSTM network model
built above. By fine-tuning several parameters and adding more LSTM layers to
the model, we achieved a more accurate representation of stock value.
8. REQUIREMENT SPECIFICATION
HARDWARE:
SOFTWARE:
26
9. SYSTEM DESIGN
9.1 Architecture Diagram
27
9.2 Use Case Diagram:
28
9.4 Sequence Diagram:
29
9.5 Dataflow Diagram:
30
31
32
9.6 Activity Diagram:
33
34
10. CONCLUSION AND FUTURE ENHANCEMENT
This system establishes a forecasting framework to predict the prices of stocks. We
leveraged the combinations of price, volumes, and corporate statistics as input
data. We proposed, developed, trained and tested stacked-LSTM models, and built-
up trading prediction strategies according to our model. The LSTM shows more
superior results over other models due its ability to assign different weights to the
input features hence automatically choose the most relevant features. Hence the
Stacked-LSTM is more able to capture the long-term dependence in the time series
and more suitable in predicting financial time series. Our superior trading return
from the LSTM further validates our experimental result. Moreover, we have
shown that despite the more complicated model structure of stacked, the stacked-
LSTM have better model performance over the single LSTM model due to the
potential of overfitting.
FUTURE SCOPE
Various algorithms shall be included to obtain predictions for a longer period and
facilitating the achievement of higher accuracy in price prediction. One direction
of future work will be dealing with the volatility of stock time series. One
difficulty of predicting stock market arises from its non-stationary behaviour. It
would be interesting to see how Stacked-LSTM performs on denoised data.
35
11.APPENDIX I
SOURCE CODE
import pandas_datareader as pdr
import pandas as pd
import numpy as np
import tensorflow as tf
import math
key="00735a353b9ce888705cfce7b19ddf36f0f123ad"
df = pdr.get_data_tiingo('AAPL', api_key=key)
36
df.to_csv('AAPL.csv’)
df=pd.read_csv('AAPL.csv’)
df.head()
df.tail()
df1=df.reset_index()['close’]
df1
plt.plot(df1)
plt.show()
scaler=MinMaxScaler(feature_range=(0,1))
df1=scaler.fit_transform(np.array(df1).reshape(-1,1))
print(df1)
training_size=int(len(df1)*0.65)
test_size=len(df1)-training_size
train_data,test_data=df1[0:training_size,:],df1[training_size:len(df1),:1]
training_size,test_size
train_data
37
for i in range(len(dataset)-time_step-1):
dataX.append(a)
time_step = 100
print(X_train.shape), print(y_train.shape)
print(X_test.shape), print(ytest.shape)
# reshape input to be [samples, time steps, features] which is required for LSTM
X_train =X_train.reshape(X_train.shape[0],X_train.shape[1] , 1)
X_test = X_test.reshape(X_test.shape[0],X_test.shape[1] , 1)
model=Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(100,1)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
38
model.compile(loss='mean_squared_error',optimizer='adam')
model.summary()
model.fit(X_train,y_train,validation_data=(X_test,ytest),epochs=100,batch_size=6
4,verbose=1)
train_predict=model.predict(X_train)
test_predict=model.predict(X_test)
train_predict=scaler.inverse_transform(train_predict)
test_predict=scaler.inverse_transform(test_predict)
# Training data
math.sqrt(mean_squared_error(y_train,train_predict))
math.sqrt(mean_squared_error(ytest,test_predict))
### Plotting
look_back=100
trainPredictPlot = np.empty_like(df1)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(train_predict)+look_back, :] = train_predict
39
testPredictPlot = np.empty_like(df1)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(train_predict)+(look_back*2)+1:len(df1)-1, :] = test_predict
plt.plot(scaler.inverse_transform(df1))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()
len(test_data)
x_input=test_data[341:].reshape(1,-1)
x_input.shape
temp_input=list(x_input)
temp_input=temp_input[0].tolist()
temp_input
lst_output=[]
n_steps=100
i=0
while(i<30):
40
if(len(temp_input)>100):
x_input=np.array(temp_input[1:])
x_input=x_input.reshape(1,-1)
temp_input.extend(yhat[0].tolist())
temp_input=temp_input[1:]
lst_output.extend(yhat.tolist())
i=i+1
else:
print(yhat[0])
temp_input.extend(yhat[0].tolist())
print(len(temp_input))
lst_output.extend(yhat.tolist())
41
i=i+1
day_new=np.arange(1,101)
day_pred=np.arange(101,131)
len(df1)
plt.plot(day_new,scaler.inverse_transform(df1[len(df1)-100:]))
plt.plot(day_pred,scaler.inverse_transform(lst_output))
df3=df1.tolist()
df3.extend(lst_output)
plt.plot(df3[1200:])
42
12.APPENDIX II
EXPERIMENTAL RESULTS
43
44
45
46
X – Axis denotes Days.
47
13. REFERENCES
49
[11] JINTAO LIU; HONGFEI LIN; LIANG YANG; BO XU; DONGZHEN WEN,
“MULTI-ELEMENT HIERARCHICAL ATTENTION CAPSULE NETWORK FOR
50
Sixth International Conference On Innovative and Emerging
Trends in Engineering and Technology ( ICIETET ’21 )
51
52
53
54