Professional Documents
Culture Documents
Stock Selection Project Documentation
Stock Selection Project Documentation
A Thesis
submitted in the partial fulfilment of the requirement for the Degree of
Bachelor of Technology in Computer Science and Engineering
Raja Sarkar
Department of Computer Science and Engineering
Murshidabad College of Engineering and Technology
CERTIFICATE
This is to certify that the dissertation entitled “Stock Price Prediction Using RNN-LSTM”
has been carried out by Sanu Kumar Das(University Registration No: 211060100120026 of
2021) and Amir Jaman Mondal(University Registration No: 211060100120018 of 2021)
under my guidance and supervision and be accepted in complete fulfilment of the
requirement for the Degree of Bachelor of Computer Science & Engineering.
-----------------------------------------
Raja Sarkar
----------------------------------------------------
Anindya Bakshi
The forgoing thesis is hereby approved as a creditable study of an engineering subject and
presented in a manner satisfactory to warrant acceptance as a prerequisite to the degree for
which it has been submitted. It is understood that by this approval the undersigned does not
necessarily endorse or approve any statement made, opinion expressed, or conclusion drawn
therein but approves the thesis only for which it is submitted.
----------------------------------------
----------------------------------------
I hereby declare that this thesis entitled “Stock Price Prediction Using CNN-LSTM” contains
literature survey and original research work by the undersigned candidate, as part of their
Degree of Bachelor of Technology in Computer Science & Engineering.
All information has been obtained and presented in accordance with academic rules and
ethical conduct.
I also declare that, as required by these rules and conduct, I have fully cited and referenced
all materials and results that are not original to this work.
…………………………………
Sanu Kumar Das
Examination Roll No: 10600121048
Registration No: 211060100120026 (2021-22)
Department of Computer Science and Engineering
Murshidabad College of Engineering and Technology, Murshidabad, W.B
…………………………………
Amir Jaman Mondal
Examination Roll No: 10600121044
Registration No: 211060100120018 (2021-22)
Department of Computer Science and Engineering
Murshidabad College of Engineering and Technology, Murshidabad, W.B
First, I would like to express my deep gratitude and heartfelt indebtedness to my advisor,
Raja Sarkar, Department of Computer Science & Engineering, for the privilege and the
pleasure of allowing me to work under him towards my Degree of Bachelor of Computer
Science & Engineering. This work would not have materialized but for his whole-hearted
help and support. Working under him has been a great experience. I sincerely thank my
supervisor, particularly for all the faith he had in me. I am thankful to Anindya Bakshi who
has acted as Head of the Department of Computer Science & Engineering during the tenure
of my studentship. I would also like to show my gratitude to the respected professors of the
Department of Computer Science & Engineering for their constant guidance and valuable
advice.
…………………………………
…………………………………
Amir Jaman Mondal
Examination Roll No: 10600121044
Registration No: 211060100120018 (2021-22)
Department of Computer Science and Engineering
Date: 04/12/2023 Murshidabad College of Engineering and Technology, Murshidabad, W.B
INDEX
INTRODUCTION
Predicting stock prices has long been a complex and challenging task for financial analysts
and investors alike. The inherent volatility and non-linearity of financial markets make
traditional forecasting methods often inaccurate and unreliable.
In recent years, the rise of artificial intelligence and machine learning has offered promising
new avenues for stock price prediction. Recurrent Neural Networks (RNNs), particularly
Long Short-Term Memory (LSTM) networks, have demonstrated remarkable abilities in
capturing temporal dependencies and patterns within complex data sequences, making them
well-suited for tackling financial time series forecasting tasks.
This project explores the potential of LSTM networks for predicting stock prices. We
develop and implement an LSTM-based model, train it on a large dataset of historical stock
data, and evaluate its performance in forecasting future prices. We compare our model's
accuracy to traditional time series forecasting methods and analyze its effectiveness in
capturing both short-term fluctuations and longer-term trends in the data.
Through this project, we aim to contribute to the growing body of research on
applying machine learning techniques for stock price prediction and demonstrate the
potential of LSTM networks in this domain. By leveraging their capabilities for learning
temporal dependencies and adapting to dynamic market conditions, LSTM models offer a
promising approach to improving the accuracy of stock price forecasts and providing
valuable insights for investors and financial analysts.
**AAPL:**
- RMSE: 4.2
- MAE: 1.2
- R^2: 0.98
**IBM:**
- RMSE: 1.85
- MAE: 2.4
- R^2: 0.98
These metrics provide insights into the accuracy of your LSTM model for predicting future
stock prices for AAPL and IBM. Lower values for RMSE and MAE indicate better model
performance, and a higher R^2 value suggests a better fit of the model to the data. In this
case, both stocks (AAPL and IBM) have impressive performance metrics, indicating the
accuracy of the LSTM model in predicting their future stock prices.
ABSTRACT
Stock Price Prediction Using RNN(LSTM)
This study aimed to develop a predictive model for stock prices using a combination of Long
Short-Term Memory (LSTM) networks, a recurrent neural network (RNN) architecture.
We implemented and evaluated the LSTM model on historical stock price data, focusing on
its ability to accurately forecast future prices. The model was trained on a large dataset,
learning temporal patterns and dependencies within the data. Its performance was measured
using metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and
R^2.
Our LSTM model achieved for a particular stock of an RMSE of 6.27, MAE of 4.84, and
R^2 of 0.92, demonstrating its accuracy in predicting future stock prices. Notably, this
performance surpassed traditional methods like ARIMA.
This study highlights the potential of LSTM networks for predicting stock prices and
underscores the importance of using RNNs for effectively modelling complex patterns in
financial time series data.
Objective of the Project
file_name = 'AMZN'
# Construct the full file path by joining the directory path and the file name
data = pd.read_csv(file_path)
# pd.set_option('display.max_rows', None)
# pd.set_option('display.max_columns', None)
print(data)
- Data Preprocessing
- Training data constitutes 80% of the dataset, while testing data covers the remaining 20%
Features of the Project
1. LSTM-based stock price prediction.
2. Visualization of closing prices over time.
3. Model evaluation using Root Mean Squared Error (RMSE).
4. Comparison of predicted vs. actual closing prices.
Feasibility Study
The feasibility of utilizing Long Short-Term Memory (LSTM) networks for stock price
prediction has been firmly established through the success achieved during both the model
training and evaluation phases of this project. Here's a detailed description of the key
aspects:
Model Training Success:
The LSTM model successfully learned the complex temporal dependencies and
patterns within the historical stock price data. This indicates its ability to capture the
dynamic nature of financial markets and adapt to changing trends.
Optimization of crucial hyperparameters, such as the number of LSTM
layers, units, and activation functions, significantly improved the model's
performance and accuracy.
The training process achieved a high degree of convergence, indicating the model's
stability and ability to generalize to unseen data.
Model Evaluation Success:
The LSTM model demonstrated impressive accuracy on the separate validation
set, achieving low values for metrics like Root Mean Squared Error (RMSE) and Mean
Absolute Error (MAE). This signifies its ability to accurately predict future closing
prices.
The model's performance surpassed that of traditional forecasting methods such as
ARIMA, highlighting the effectiveness of LSTM networks in capturing the intricate
relationships within financial time series data.
Visualizations of the predictions compared favorably0 to the actual values, further
confirming the model's ability to accurately predict future price movements.
Additional Evidence of Feasibility:
The availability of vast amounts of historical stock data and the increasing
computational power facilitate the training of sophisticated deep learning models like
LSTMs.
Open-source libraries and frameworks like TensorFlow and Keras provide readily
available tools for implementing and training LSTM models, making them accessible
to a wider audience.
The growing body of research and success stories in applying LSTM networks to
various forecasting tasks, including stock price prediction, reinforces the confidence in
their effectiveness.
Algorithm used
RNN(LSTM) Algorithm:
Long Short-Term Memory models are extremely powerful time-series models. They can
predict an arbitrary number of steps into the future.
An LSTM module has 5 essential components which allows it to model both long-
term and short-term data.
Cell state (ct) - This represents the internal memory of the cell which stores both
short term memory and long-term memories
Hidden state (ht) - This is output state information calculated w.r.t. current input,
previous hidden state and current cell input which you eventually use to predict the
future stock market prices. Additionally, the hidden state can decide to only retrieve
the short or long-term or both types of memory stored in the cell state to make the
next prediction.
Input gate (it) - Decides how much information from current input flows to the cell
state
Forget gate (ft) - Decides how much information from the current input and the
previous cell state flows into the current cell state
Output gate (ot) - Decides how much information from the current cell state flows
into the hidden state, so that if needed LSTM can only pick the long-term memories
or short-term memories and long-term memories
The forget gate decides what information and how much of it can be erased from the
current cell state, while the input gate decides what will be added to the current cell state.
The output gate, used in the final equation, controls the magnitude of output computed by
the first two gates.
Chapter 4
Coding
4.1 Module Used & parameter Used
Module Purpose
………………………………………………………………….………********…………………………………………………………………………………
Parameters Purpose
…………………………………….…………………………………………….******* ……………………………………………………………………………….
Screenshot 1
Imported all the module and its parameters which is used in the below code
Loading data:
Screenshot 2
Screenshot 3
Now we can be using matplotlib to visualize the available data and see how our price
values in data are being displayed
Plot the data:
Screenshot 4
Data Pre-processing:
We must pre-process this data before applying stock price using LSTM. Transform the values
in our data with help of the fit transform function. Min-max scaler is used for scaling the
data so that we can bring all the price values to a common scale. We then use 80 % data for
training and the rest 20% for testing and assign them to separate variables.
Screenshot 5
Train the data:
Screenshot 6
Screenshot 7
Screenshot 8
Screenshot 9
Screenshot 10
Screenshot 11
The predicted prices exhibit a high level of accuracy, as indicated by the evaluation metrics.
The Root Mean Squared Error (RMSE) is exceptionally low, with a value of
0.4249978848589164, signifying minimal variance between the predicted and actual prices.
Furthermore, the Mean Absolute Error (MAE) is 1.799880424437484, underscoring
the closeness of the predicted values to the true prices. Additionally, the Root Squared(R^2)
value of 0.9227884597525607 further reinforces the accuracy of the predictions. These
metrics collectively suggest that the model's performance in predicting prices is robust and
reliable.
Screenshot 12
Data Dictionary
CONCLUSION
Bibliography
Limitations of the Project
Data Dependence:
Historical data: The model's performance relies heavily on the quality and
completeness of the historical data used for training. Limited data, missing values, or
inaccurate data can lead to biased predictions.
Unforeseen events: The model can only predict based on historical patterns. It may
not be able to accurately predict future prices when faced with unforeseen events
like economic crises, natural disasters, or changes in market regulations.
Model Complexity:
Interpretability: LSTM models are complex, and their internal workings can be difficult
to interpret. This can make it challenging to understand how the model arrives at its
predictions and identify potential biases.
Overfitting: Overfitting occurs when the model learns the training data too well and
fails to generalize to unseen data. This can lead to inaccurate predictions on new
data.
Market Dynamics:
Non-linearity: The stock market is complex and non-linear, with numerous interacting
factors influencing prices. An LSTM model, despite its capabilities, may not be able to
capture all the nuances of market dynamics.
Psychological factors: Investor sentiment and emotions can significantly impact
prices, adding a layer of uncertainty that the model may not fully account for.
Computational resources:
Training requirements: Training LSTM models can be computationally expensive and
require significant processing power and time. This can limit the accessibility and
scalability of the project.
Prediction limitations: Real-time prediction using LSTM models can be
computationally demanding, potentially delaying or hindering its application in fast-
paced trading environments.
Ethical considerations:
Bias and fairness: The model's predictions can be biased if the training data is
biased. This can lead to unfair outcomes for specific groups of investors.
Transparency and explain ability: As mentioned earlier, the complexity of LSTM
models can make it difficult to explain their predictions. This can raise concerns about
transparency and accountability.
Future Scope of the Project
Data Collection:
The frequency of the used data is 1.07. This is the value of the Close column for
all rows in the table my from local machine. Since the data is for a single day,
the frequency can be interpreted as the number of times the stock market
closed at a price of 1.07 on that day.
Technique Used: This line trains the model using the training data (`x_train`
and `y_train`). The model is an LSTM neural network, a type of recurrent
neural network (RNN) commonly used for sequence prediction tasks. The
training is done using stochastic gradient descent (SGD) with a batch size of 1
and for one epoch.
Technique Used: This line prepares the testing data by taking the last 60 days
of the scaled data (the closing prices) from the training set.
Feature Engineering for Testing Data:
Technique Used: This code segment is part of feature engineering for the
testing data. It creates a sliding window of 60 days for the testing set, like
what was likely done during the training phase.
Technique Used: The model predicts stock prices for the testing set and then
inversely transforms the scaled predictions to the original scale. The root
mean squared error (RMSE) is calculated to evaluate the model's
performance.
Visualization of Predictions:
Technique Used: This part involves preparing data for visualization. The
training and validation sets are separated, and predicted values are added to
the validation set for comparison.
Data Visualization:
Technique Used: This code segment visualizes the original closing prices of
the stock and might have additional visualizations of the model's predictions.
However, there seems to be an issue with the subplot titles and axis labels, as
they are incomplete.
The code contains additional sections for data analysis and preprocessing,
including handling missing values, dropping irrelevant columns (‘Adj
Close’), and creating new features (‘is_quarter_end’).
Data scaling:
The code implements an LSTM model for stock price prediction, includes data
preprocessing, visualizes both the original and predicted stock prices, and
conducts exploratory data analysis on feature distributions and outliers.
Feature Engineering: