Professional Documents
Culture Documents
Udeh Onyedikachi Peter
Udeh Onyedikachi Peter
Udeh Onyedikachi Peter
BY
U17/NAS/CSC/251
COMPUTER SCIENCE
COMPUTER SCIENCE/MATHEMATICS
ENVIRONMENTAL STUDIES
BY
U17/NAS/CSC/251
ENUGU.
JULY, 2021
APPROVAL
This research titled STOCK PRICE PREDICTION USING MACHINE LEARNING has been
assessed and approved by the Committees of the Department of Computer Science and Faculty
i
CERTIFICATION
This is to certify that this research titled STOCK PRICE PREDICTION USING MACHINE
LEARNING was carried out by UDEH ONYEDIKACHI PETER with Registration Number
University Enugu.
ii
DEDICATION
This research work is dedicated to Almighty God, who gave me the wisdom, understanding and
iii
ACKNOWLEDGEMENTS
With all my heart, I appreciate God Almighty for his protection and guidance and for seeing this
research work from the beginning to the end. . I wish to express my heartfelt gratitude to my able
supervisor Dr. S.C. Echezona who made this research work a reality through his dedication and
corrections. Special thanks to my Head of Department Dr J.B Agbogun and all the Lecturers and
Staff in the Computer science/Maths department for their contributions to my academic life. I
express my immense gratitude to my lovely parents for their unfailing support, words of
encouragement and prayers. I say a big thank you for your immense support.
iv
ABSTRACT
This study builds the groundwork for democratizing machine learning technology for retail
investors by using a web application to connect machine learning models for retail investors. It
provides predictions and visualizations to help investors navigate the stock markets and make
more intelligent decisions. Prophet, Ridge Regressions, Recurrent Neural Networks, and
Bagging regression are among the stock price prediction algorithms and models created. A great
variety of basic features, such as technical analysis features developed using distinct history
timelines, are driven by time series data. Numerous feature selection and feature extraction
approaches are used to find best features for the problem. The major technologies used for
the web application includes; Streamlit, SQLite, Bootstrap.
v
TABLE OF CONTENTS
Title page
Approval…………………………………………………………………………………………...i
Certification……………………………………………………………………………………….ii
Dedication………………………………………………………………………………………...iii
Acknowledgements……………………………………………………………………………….iv
Abstract…………………………………………………………………………………………....v
Table of contents………………………………………………………………………………….vi
List of tables……………………………………………………………………………………...ix
List of figures……………………………………………………………………………………...x
vi
2.1 Introduction…………………………………………………………………………………....5
3.1 Overview……………………………………………………………………………………..19
4.1 Preamble……………………………………………………………………………………..23
5.1 Summary……………………………………………………………………………………..28
5.2 Conclusion…………………………………………………………………………………...28
5.3 Recommendation……………………………………………....…………………………….28
vii
REFERENCE…………………………………………………………………………………….29
Appendix A………………………………………………………………………………………32
Appendix B……………………....……………………………………………………………....61
viii
LIST OF TABLES
Table 2.1: The train time, accuracy score, mean squared error (MSE) of both the train and
ix
LIST OF FIGURES
Figure 2.1: A line plot of the actual price versus the predicted price using Ridge Regression
model…………………………………………………………………………………………….12
Figure 2.2: A line plot of the actual price versus the predicted price using the Prophet
model…………………………………………………………………………………………….12
Figure 2.3: A line plot of the actual price versus the predicted price using Bagging
Regression model……………………………………………………………………………….13
Figure 2.4: A line plot of the actual price versus the predicted price using LSTM model...13
Figure 4.2: The raw data and interactive visualization of the raw data…………………….25
Figure 4.3: The forecast data and interactive visualization of the forecast data…………...26
x
CHAPTER ONE
INTRODUCTION
In recent years, many attempts have been made to predict the behavior of bonds, currencies,
stocks, stock markets or other economic markets among economists and investors. These
attempts were inspired by various information on how uncertain the economic markets
behaves.
Retail investors spend a lot of time trying to find investment opportunities. Wealthier investors
could seek professional financial advisory services, which is not true for retail investors, being
that the costs are prohibitive. Thus, retail investors need to find out the market themselves and
make informed decisions on their own, this makes investment very stressful for them.
decisions get swayed by cognitive biases or personal emotions, leading to irrelevant losses. Even
if retail investors are watchful enough, most do not have sufficient skills to process a huge
volume of data required to make good investment decisions. Institutional investors rely on
sophisticated models backed by technologies to avoid traps, but retail investors do not have an
avenue to such technologies and often find themselves falling behind the market.
1
Without access to quantitative and data-driven models, one obvious approach retail investors
could use to gauge the market is through simple indicators, for instance, rectilinear regression,
bollinger bands and exponential moving average (EMA). Another conspicuous approach retail
investors might use to predict the stock exchange is to draw a rectilinear regression line that
Inspired by the increasing popularity of machine learning algorithms for forecasting applications,
these algorithms might function as potential tools to discover hidden patterns within the trend of
stock prices, this information might be useful to supply extra insights for retail investors when
making investment decisions. Therefore, this final year project aims to research the usefulness of
machine learning in predicting stock prices and democratize such technologies through a simple
Various prediction techniques were studied in the stock market prediction field and still
nowadays researchers are focusing on implementing the latest technique so as to enhance the
Retail investors most of the time find it difficult to seek professional financial consultative
services because the prices are excessive. The main goal of this project research is to provide
retail investors a web application that uses machine learning to assist them steer within the
fast-changing stock market. The objective of the project is to introduce and democratize the
current Machine Learning and Deep Learning technologies for retail investors to help them make
investment decisions. No prediction is 100% accurate. Therefore, the upper bound and lower
bound of the stock prices will be displayed to illustrate the trading range the investors should be
2
looking at. This application is an additional quantitative tool for investors to ascertain the market
The main aim of the study is to develop a web application that provides stock price prediction
Specific Objectives:
i. Explore how different machine learning approaches can be used and will affect the
ii. Investigate how different hyperparameters can be tuned for better performance of the
This research is carried out with the main objective of introducing and democratizing the latest
machine learning technologies and also helps retail investors make investment decisions.
Therefore, some importance of this study is stated and based on the results obtained, it is hoped
1. Provide retail investors access to quantitative and data-driven models useful for making
investment decisions.
3
3. To provide a basis for researchers who are interested in applying machine learning
algorithms in fundamental data such as accounting information and other financial data.
1. The scope of this project does not exceed a generalized suggestion tool.
2. No prediction is 100% accurate since there are many parameters that can directly affect
the stock market, each and every one of them cannot be taken into account.
3. This system is limited to only certain users that have knowledge of the stock market.
4
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
This chapter shows, and outlines the various technologies used in my study. It also presents
This project is partitioned into two sections, specifically a research segment and an application
segment. The Machine Learning and Deep Learning algorithms used in the research part
includes; Ridge Linear Regression model, BaggingRegressor model, LSTM, Prophet and
AdaBoostRegressor model. While the major technologies used for the application part includes;
2.2.1 Python
Python is one of the most powerful and most popular programming languages for scientific
computing. It is easy to learn, has efficient data structures, and a simple but effective approach to
object-oriented programming. Python’s elegant syntax and dynamic typing, together with its
interpreted nature, make it an ideal language for scripting and rapid application development in
many areas on most platforms (Rossum 2020). In this project, Python programming
packages/libraries will be used for the Machine Learning predictions. The server side web
framework that will be used to server the predicted prices is also written in Python.
5
2.2.2 Streamlit
Streamlit is an open-source software framework for deploying Data Science and Machine
Learning projects. It provides for simple data optimization, deployment, and statistical analysis
with very little code. It also eliminates the need for prior experience with web service
frameworks such as Django and Flask. This is especially handy when working on data
dashboards with a team that is mainly made up of non-technical people. Streamlit is simple to
use because it builds an interactive data-driven web application using predefined commands.
Simple instructions like st.write() may now be used to build a wide range of objects, from simple
2.2.3 SQLite
SQLite is a relational database management system (RDBMS) that will be used in this project to
store user data and preferences about their choice of stocks. SQLite is an open source embedded
relational database designed to provide a convenient way for applications to manage data without
the overhead that often comes with dedicated relational database systems (Owens 2006). SQLite
is easy to configure, easy to use, highly portable and efficient compared to other relational
database management systems. SQLite is serverless such that it does not need a server to operate.
The database storage file is accessed directly using the SQLite library.
2.2.4 Bootstrap
6
Bootstrap is now the most widely used HTML, CSS, and JS framework for creating responsive,
mobile-first web projects. It is used to speed up and simplify front-end web development. It's
designed for people of all skill levels, devices of all sizes, and projects of all sizes. In this project,
Bootstrap will be used as the front end technology for displaying the Machine Learning predicted
results and other features the user will be seeing on the web page.
This study will explore different Machine Learning models, how they can be used and how they
will affect the accuracy of the stock price prediction. The Machine Learning algorithms used in
this project are part of the Scikit-learn library while the Deep Learning algorithms are part of the
PyTorch library.
Scikit-learn is an open source Python library that provides supervised and unsupervised Machine
easy-to-use interface tightly integrated with the Python programming language (Pedregosa et al,
2.2.5.2 PyTorch
PyTorch is open source and it focuses on both usability and speed. PyTorch offers an imperative
and Pythonic programming style that supports code as a model, simplifies debugging, and is
7
consistent with other widely known machine learning libraries, all while remaining efficient and
continuing to support hardware accelerators like GPUs (Paszke 2019). PyTorch is easy to learn,
simple to use and more pythonic compared to other Deep Learning frameworks like Tensorflow.
PyTorch has a very special feature called data parallelism which allows it to distribute
computational workload among multiple CPU or GPU cores. Some of the PyTorch Deep
Learning algorithms includes; Recurrent Neural Network (RNN), Convolutional Neural Network
Regression: a regularization term equal to α∑n = 1θ2 is added to the cost function. This forces the
learning algorithm to not only fit the data but also keep the model weights as small as possible
(Aurélien 2019). The parameter α determines how much the model should be regularized. If = 0,
Ridge Regression is simply Linear Regression. If it is very large, all weights end up very close to
zero, resulting in a flat line passing through the mean of the data. Ridge regression is a model
this method. When there is a problem with multicollinearity, least-squares are unbiased, and
variances are large, resulting in predicted values that are far from the actual values.
Because of its structure of adjusting parameters without investigating the original model's details,
it is a simple yet robust estimation method. It includes a time series model that can be
decomposed into three main model components: trend, holidays, and seasonality. In a recent
8
study “Long-Term Forecasting of Electrical Loads in Kuwait Using Prophet and Holt–Winters
Holt–Winters model in Kuwait’s long-term peak load forecasting. The use of this method
in forecasting is expected to spread due to its robustness and accuracy. Prophet is a method
for forecasting time series data that uses an additive model to fit non-linear trends with yearly,
weekly, and daily seasonality, as well as holiday effects. It works best with time series that have
regression methods with the aim of reducing prediction process variance (Sadrmomtazi
2013). Bagging is built on the development of individual regression models that use a randomly
distributed training set to train a single algorithm. There is a random training set of N instances
for each regression model (N = size of original training set). In the test, a significant number of
the original instances may be repeated or completely omitted. The average prediction values are
used to provide the final prediction after iteratively building several regression models. For a
long time, the approach has been associated with a proper response to handling missing values in
A recurrent neural network (RNN) is a type of artificial neural network that recognizes
sequential patterns in data to predict the following scenarios (Laskowski 2018). This architecture
is particularly powerful due to its node connections, which enable the display of temporal
9
dynamic behavior. The use of feedback loops to process a sequence is another important feature
referred to as memory. Because of this behavior, RNNs are ideal for time series problems. Long
short-term memory (LSTM) architectures were developed based on this structure. LSTMs are
specifically developed to prevent the problem of long-term dependency. They don't have to work
hard to remember knowledge for lengthy periods of time; it's nearly second nature to them (Olah
2015). They are currently frequently utilized and function exceptionally effectively in a wide
range of situations.
The dataset used for this research is an OHLCV (Open, High, Low, Close, Volume) historical
data fetched using Yahoo finance API. Other features were extracted from the fetched dataset.
The features used are the closing price of each particular trading day, the volume of stock traded
any particular day, S&P 500 index, the Bollinger bands derived from the Simple Moving
After data extraction, the next step was feature scaling. The reason for feature scaling is to bring
every feature in the same footing without any upfront importance. Another reason why feature
scaling is applied is that few algorithms like Neural network gradient descent converge much
faster with feature scaling than without it (Roy 2020). The MinMaxScaler class from the
Scikit-learn library was used to perform the feature scaling, it scales the data by using a scaling
10
technique called normalization in which the values are shifted and rescaled so that they end up
After feature scaling, the scaled data is used for training each model. After training, the mean
squared error was used to measure the average of the squares of the errors i.e, the average
squared difference between the estimated values and the actual value.
Table 2.1: The train time, accuracy score, mean squared error (MSE) of both the train and
Regression Regression
The Table 2.1 above shows the performance of each model on the prepared dataset. It can be
observed that every other model except LSTM is overfitting. This is because the MSE from their
respective test data is way higher than that of their training data, as such they may not generalize
11
Figure 2.1: A line plot of the actual price versus the predicted price using Ridge Regression
model.
Figure 2.2: A line plot of the actual price versus the predicted price using the Prophet
model.
12
Figure 2.3: A line plot of the actual price versus the predicted price using Bagging
Regression model
Figure 2.4: A line plot of the actual price versus the predicted price using LSTM model
architecture. We don't always know what the best model architecture is for a given model, so
we'd like to be able to experiment with a variety of options. In true machine learning fashion,
we'll ideally ask the machine to perform this exploration and automatically select the best model
architecture. The parameters that define the model architecture are known as hyperparameters,
and the process of searching for the best model architecture is known as hyperparameter tuning
(Jordan 2017). The difficulty with hyperparameters is that there is no single magic number that
works everywhere. The best numbers vary depending on the task and the dataset.
Knowing where to begin can be difficult given the number of hyperparameters that you may
want to tune. With this in mind, here are commonly used hyperparameters used in this project.
The single most important hyperparameter is learning rate, and it should always be tuned. The
learning rate is a hyper-parameter that governs how much we adjust the weights of our network
in relation to the loss gradient (Zulkifli 2018). A good starting point is 0.01. If our learning rate
is too low, it will take a much longer time (hundreds or thousands of epochs) to reach the ideal
state. On the other hand, if our learning rate is too high, it will overshoot the ideal state and our
In a report “Cyclical Learning Rates for Training Neural Networks”, Smith (2018) argued that
you could estimate a good learning rate by training the model initially with a very low learning
14
Gradient descent is a popular optimization algorithm for determining the weights or coefficients
of machine learning algorithms like artificial neural networks and logistic regression. It works by
having the model predict on training data and then using the error on the predictions to update
Mini-batch gradient descent is a gradient descent variation that divides the training dataset into
small batches that are used to calculate model error and update model coefficients (Brownlee
2017). Implementations may choose to sum the gradient over the mini-batch, which reduces the
gradient's variance even further. Mini-batch gradient descent is the recommended gradient
descent variant for most applications, particularly deep learning. Mini-batch sizes, also known as
“batch sizes” for brevity, are frequently tuned to an aspect of the computational architecture on
which the implementation is running. For example, a power of two that corresponds to the
memory requirements of the GPU or CPU hardware, such as 32, 64, 128, 256 etc. Small values
result in a learning process that converges quickly at the expense of training noise. Large values
result in a slow convergent learning process with accurate estimates of the error gradient.
The number of epochs determines how frequently the network's weights will be changed. As the
number of epochs increases, the neural network's weights are changed the same number of times,
and the boundary shifts from underfitting to optimal to overfitting. The Validation Error is the
metric we should be looking at when deciding on the number of epochs for our training step. The
intuitive manual method is to train the model for as many iterations as the validation error
decreases. To determine when to stop training the model, a technique known as Early Stopping is
15
used. If the validation error has not improved in the last 10 or 20 epochs, the training process is
terminated.
Various researchers have worked on various studies to improve the accuracy of stock price
Piramuthu (2004) thoroughly evaluated various feature selection methods for data mining
applications. He compared how different feature selection methods optimized decision tree
performance using datasets such as credit approval data, loan default data, web traffic data, tam
data, and kiang data. He compared probabilistic distance measures such as the Bhattacharyya
measure, the Matusita measure, the divergence measure, the Mahalanobis distance measure, and
the Patrick-Fisher measure. The Minkowski distance measure, city block distance measure,
Euclidean distance measure, and nonlinear distance measure are inter-class distance measures.
The author's evaluation of both probabilistic distance-based and several inter-class feature
selection methods is a strength of this paper. Furthermore, the author conducted the evaluation
using various datasets, which added to the paper's strength. The evaluation algorithm, on the
other hand, was only a decision tree. We don't know if the feature selection methods will still
In their study "Stock market forecasting using Hidden Markov Model" Hassan and Nath (2005)
used the Hidden Markov Model (HMM) to forecast stock prices of four different airlines. They
divide the model's states into four categories: the opening price, the closing price, the highest
price, and the lowest price. The approach used in this paper is unique in that it does not require
expert knowledge to build a prediction model. While this work is limited to the airline industry
16
and is based on a very small dataset, it may not result in a generalizable prediction model. One of
the approaches used in stock market prediction works could be used to perform the comparison
work. The authors limited the date range of the training and testing datasets to a maximum of
two years.
Lee (2009) predicted stock trends using the support vector machine (SVM) and a hybrid feature
selection method in "Using support vector machine with a hybrid feature selection method to the
stock trend prediction". The dataset used in this study is a subset of the NASDAQ Index from the
Taiwan Economic Journal Database (TEJD) in 2008. The feature selection part used a hybrid
method, with supported sequential forward search (SSFS) acting as the wrapper. Another
advantage of this work is that they created a detailed procedure for parameter adjustment with
performance under various parameter values. The feature selection model's clear structure is also
heuristic to the primary stage of model structuring. One limitation was that the performance of
SVM was only compared to back-propagation neural networks (BPNNs) and not to other
Lei (2018) used Wavelet Neural Network (WNN) to predict stock price trends in "Wavelet neural
network prediction method of stock price trend based on rough set attribute reduction". As an
optimization, the author used Rough Set (RS) for attribute reduction. Rough Set was used to
reduce the dimensions of the stock price trend feature. It was also used to determine the Wavelet
Neural Network's structure. This work's dataset includes five well-known stock market indices:
(1) the SSE Composite Index (China), (2) the CSI 300 Index (China), (3) the All Ordinaries
Index (Australia), (4) the Nikkei 225 Index (Japan), and (5) the Dow Jones Index (USA). The
model was evaluated using various stock market indices, and the results were convincing and
general. The computational complexity is reduced by using Rough Set to optimize the feature
17
dimension before processing. However, in the discussion section, the author only emphasized
parameter adjustment and did not mention the model's flaws. I discovered that because the
evaluations were done on indices, the same model may not perform as well if applied to a
specific stock.
Recent studies make use of input data from a variety of sources and in a variety of formats. Some
systems use historical stock data to perform mathematical analysis, others use financial news
articles, still others use expert reviews to perform sentimentanalysis on financial news articles,
and still others use a hybrid system that uses multiple inputs to forecast the market.
18
CHAPTER THREE
3.1 Overview
In this study, Object-Oriented Analysis and Design (OOAD) methodology and notation symbols
of the Unified Modelling Language (UML) will be used for the analysis of the system.
encouraging and facilitating reuse of software components. The OOAD methodology uses
Before the proposal of the new system, retail investors spend a lot of time trying to find
investment opportunities. One obvious approach retail investors could use to gauge the market is
by drawing technical indicators on a visualized price history, for instance, Bollinger bands,
Simple Moving Average, Exponential moving average (EMA) and Relative strength index (RSI).
This requires a lot of work and decisions get swayed by cognitive biases or personal emotions,
Having the overview knowledge of the existing system, some of its problems are as follows.
1. Irrelevant losses from bad investment decisions due to cognitive biases or personal
emotions.
19
2. Irrelevant losses due to lack of proper analysis of stock market data.
3. Time consuming: a lot of time is spent trying to find investment opportunities or trying to
The proposed system will process a huge volume of data required to make good investment
decisions and use quantitative and data-driven models to gauge the market and predict stock
prices. Because it does not account for human behaviors, the proposed system will not always
produce accurate results. Factors such as a change in company leadership, internal issues, strikes,
protests, natural disasters, and a change in authority cannot be considered when relating to a
change in the stock market by the machine. The system's goal is to provide an estimate of where
the stock market might be headed. Many factors and parameters may influence it along the way.
1. The use of quantitative and data driven models to make investment decisions.
2. Time saving: retail investors no longer have to waste time processing market data.
3. Since the models are data driven, the losses will be less because models cannot be
swayed by emotions.
4. Less expensive: retail investors won’t have to spend the huge amount required to hire
financial advisors.
The proposed solution is comprehensive as it includes pre-processing of the stock market dataset,
utilization of multiple features and visualization of the stock market price trend prediction.
20
3.4.1 Use Case
The use case is a list of steps, typically defining interactions between a role(actor) and a system.
Use case diagrams are used to gather the requirements of a system including internal and
external influences. These requirements are mostly design requirements. Hence, when a system
is analyzed to gather its functionalities, use cases are prepared and actors are identified.
When the initial task is complete, use case diagrams are modelled to present the outside view.
21
Activity diagrams represent workflows in a graphical way. They can be used to describe the
22
CHAPTER FOUR
IMPLEMENTATION
4.1 Preamble
This chapter describes the implementation of the new system and the software and hardware that
My choice of IDE Jupyter Notebooks. Jupyter Notebook is a free and open-source web tool that
lets us write and share code and documents. It provides an environment in which you may
document your code, run it, examine the findings, visualize data, and view the results without
having to leave the environment. This makes it a useful tool for completing end-to-end data
science workflows, including data cleansing, statistical modeling, constructing and training
It provides an environment in which you may document your code, run it, study the results,
visualize data, and view the output without ever leaving the environment. This makes it a helpful
tool for completing end-to-end data science workflows, such as data purification, statistical
modeling, machine learning model construction and training, data visualization, and so on.
23
All the machine learning related code are written in Python. Python programming language was
used mainly for the development of the back-end server of the system. The Streamlit framework
was used to server the machine learning prediction to the user interface.
There was extensive testing of the software throughout the implementation. The testing is in two
phases:
1. Component test
2. System test
Component test - during the development of the project, Jupyter notebook provided the
capability to test each model or function in real time. The functions were tested with different
variables in order to detect any possible error, while the models were tuned with different
System testing - the completed system was later tested on more realistic data after completion to
Screenshots
Figure 4.1 shows different stock tickers of different companies to select from, figure 4.2 shows
the raw data and interactive visualization of the raw data for any selected stock, figure 4.3 shows
the forecast data and interactive visualization of the forecast data for any selected stock, figure
24
Figure 4.1: Different stock tickers of different companies to select from.
Figure 4.2: The raw data and interactive visualization of the raw data.
25
Figure 4.3: The forecast data and interactive visualization of the forecast data.
26
4.5 User Manual
To run the whole program on a personal workstation od PC, follow these steps;
2. Install all the required modules by running “$ pip install setup.py” on the terminal or
command prompt.
Some of the source code snippets for this implementation can be seen at Appendix A and
Appendix B.
27
CHAPTER FIVE
5.1 Summary
Stock investment can help you build your savings, protect your money from inflation and taxes.
But deciding on what or where to invest your money can be difficult, and the cost for hiring
investment professionals/experts can be prohibitive for retail investors. This study builds the
groundwork for democratizing machine learning technology for retail investors by using a web
application to connect machine learning models for retail investors. It provides predictions and
visualizations to help investors navigate the stock markets and make more intelligent decisions.
5.2 Conclusion
This stock price prediction web application helps retail investors make informed decisions about
the stock market, using technical and data driven models to help them reduce investment losses
and save time. Currently. The platform is limited to some features but it is easily extensible to
support upgrades.
5.3 Recommendation
Having presented all that is necessary for the implementation of this research, the following
Considering some other factors that affect the price of the stock market such as fundamental
28
REFERENCES
Aurélien G (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems, 2019, p. 135)
Brownlee, J. (2017, July 21, para. 8-22). A Gentle Introduction to Mini-Batch Gradient Descent
and How to Configure Batch Size. Machine Learning Mastery.
https://tinyurl.com/kvnd852f
Hassan MR, Nath B (2005). Stock market forecasting using Hidden Markov Model: a new
approach. In: Proceedings—5th international conference on intelligent systems
design and applications 2005, ISDA’05. 2005. pp. 192–6.
https://doi.org/10.1109/ISDA.2005.85.
Jordan, J. (2017, November 2, para. 1). Hyperparameter tuning for machine learning models.
Jeremy Jordan. https://www.jeremyjordan.me/hyperparameter-tuning/
Laskowski, N. (2018, para. 1). Recurrent neural networks. Retrieved from
searchenterpriseai.techtarget.com/definition/recurrent-neural-networks
Lee MC 2009. Using support vector machine with a hybrid feature selection method to the
stock trend prediction. Expert Syst Appl. 2009;36(8):pp. 10896–904.
https://doi.org/10.1016/j.eswa.2009.02.038.
Lei L (2018). Wavelet neural network prediction method of stock price trend based on rough set
attribute reduction. Appl Soft Comput J. 2018;62:pp. 923–32.
https://doi.org/10.1016/j.asoc.2017.09.029.
Olah, C. (2015, August 27, para. 14). Understanding LSTM Networks -- colah’s blog. Colah.
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Owens, M. (2006). The Definitive Guide to SQLite. In The Definitive Guide to SQLite
(1st ed., p. 1). Apress.
Paszke Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison,
Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai,
Soumith Chintala (2019, p. 1). PyTorch: An Imperative Style, High-Performance
Deep Learning Library.
Pedregosa Fabian, Ga ̈el Varoquaux, Alexandre Gramfort, Vincent Michel, BertrandThirion,
Olivier Grisel, Mathieu Blondel,Peter Prettenhofer, Ron Weiss,
Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
Matthieu Brucher,Matthieu Perrot and Edouard Duchesnay (2011).
Journal of Machine Learning Research 12 (2011, pp. 2825-2830).
29
Piramuthu S 2004. Evaluating feature selection methods for learning in data mining applications.
Eur J Oper Res. 2004;156(2):pp. 483–94.
https://doi.org/10.1016/S0377-2217(02)00911-6.
Rossum, V. G., & Team, P. D. (2018b, p. 1). Python Tutorial: Release 3.6.4. In Python Tutorial.
12th Media Services.
Roy, B. (2020, April 7, para. 9). All about Feature Scaling - Towards Data Science. Medium.
https://towardsdatascience.com/all-about-feature-scaling-bcc0ad75cb35
Sadrmomtazi S. M., "Modelling Compressive Strengh of EPS Lightweight Concrete Using
Regression, Neural Networks and ANFIS," Construction and Building Materials
Journal, vol. 42, (2013, pp. 205 -216).
Saxena, A., Dhadwal, M., & Kowsigan, M. (2020, p. 5). Indian Crop Production:Prediction And
Model Deployment Using Ml And Streamlit. Turkish Journal of Physiotherapy
and Rehabilitation.
Smith, L. N. (2018, July 16, p. 5). Cyclical Learning Rates for Training Neural Networks.
Cyclical Learning Rates for Training Neural Networks
Zulkifli, H. (2019, March 28, para. 3). Understanding Learning Rates and How It Improves
Performance in Deep Learning. Medium. https://tinyurl.com/39kefbyr
30
APPENDICES
import yfinance as yf
import datetime as dt
import numpy as np
import pandas as pd
import time
# get the start and end date dynamically using the datetime module
end = dt.datetime.today()
print(start, end)
indicator = 'GOOG'
selected_df.close.plot(figsize=(15,7))
selected_df['previous_3_days'] = selected_df.close.shift(-3)
# Nasdaq index
31
selected_df['nasdaq'] = yf.download('NDX', start=start, end=end)['Close']
# Bollinger Band
# upper band
# lower band
# Target
interval_for_target = 5
selected_df['target'] = selected_df.close.shift(-interval_for_target)
selected_df.dropna(inplace=True);
X_test = test_data.drop(columns=['target'])
y_test = test_data['target']
min_max_scaler = MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(X_train)
# Note that Ridge regression performs linear least squares with L2 regularization.
32
# Create and train the Ridge Linear Regression Model
ridge_model = Ridge()
start = time.time()
ridge_model.fit(X_train_scaled, y_train)
stop = time.time()
X_test_scaled = min_max_scaler.transform(X_test)
print(accuracy)
selected_df_plus_pred = test_data
selected_df_plus_pred['ridge_predicted'] = ridge_model.predict(X_test_scaled)
selected_df_plus_pred.head()
selected_df_plus_pred[['target', 'ridge_predicted']][300:].plot(figsize=(15,7))
mean_squared_error(y_train, ridge_model.predict(X_train_scaled))
lr_bag = BaggingRegressor(LinearRegression())
start = time.time()
lr_bag.fit(X_train_scaled, y_train)
stop = time.time()
33
print(f"Training time: {stop - start}s")
X_test_scaled = min_max_scaler.transform(X_test)
print(accuracy)
mean_squared_error(y_train, lr_bag.predict(X_train_scaled))
selected_df_plus_pred['bagging_predicted'] = lr_bag.predict(X_test_scaled)
selected_df_plus_pred.head()
selected_df_plus_pred[['target', 'bagging_predicted']][300:].plot(figsize=(15,7))
lr_boost = AdaBoostRegressor(LinearRegression())
lr_boost.fit(X_train_scaled, y_train)
X_test_scaled = min_max_scaler.transform(X_test)
print(accuracy)
mean_squared_error(y_test, lr_boost.predict(X_test_scaled))
selected_df_plus_pred['adaboost_predicted'] = lr_boost.predict(X_test_scaled)
selected_df_plus_pred.tail()
34
# Ploting the performance of the Adaboost model
selected_df_plus_pred[['target', 'bagging_predicted']][300:].plot(figsize=(15,7))
df_for_prophet.reset_index(drop=True, inplace=True)
prophet_X_test = prophet_test_data.drop(columns=['y'])
prophet_y_test = prophet_test_data['y']
# Make prediction
pht_model = Prophet()
start = time.time()
pht_model.fit(prophet_train_data)
stop = time.time()
future = pht_model.make_future_dataframe(periods=len(prophet_y_test),freq='MS')
prophet_pred = pht_model.predict(future)
prophet_pred
df_for_prophet['predicted'] = prophet_pred['yhat']
mean_squared_error(df_for_prophet['y'], df_for_prophet['predicted'])
r2_score(df_for_prophet['y'], df_for_prophet['predicted'])
35
# Ploting the performance of the Prophet model
df_for_prophet[['y', 'predicted']][300:].plot(figsize=(15,7))
import torch
import torch.nn as nn
training_data = selected_df.copy()
training_data['target'] = training_data.close.shift(-5)
training_data.dropna(inplace=True)
y = df[[target_col]]
X = df.drop(columns=[target_col])
return X, y
X, y = feature_label_split(df, target_col)
36
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(training_data, 'target', 0.2)
scaler = MinMaxScaler()
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
X_test_arr = scaler.transform(X_test)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)
y_test_arr = scaler.transform(y_test)
batch_size = 64
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
test_features = torch.Tensor(X_test_arr)
test_targets = torch.Tensor(y_test_arr)
37
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, drop_last=True)
class LSTMModel(nn.Module):
"""LSTMModel class extends nn.Module class and works as a constructor for LSTMs.
It has only two methods, namely init() and forward(). While the init()
method initiates the model with the given input parameters, the forward()
Attributes:
lstm (nn.LSTM): The LSTM model constructed with the input parameters.
"""
Args:
38
output_dim (int): The number of nodes in the output layer
"""
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
# LSTM layers
self.lstm = nn.LSTM(
"""The forward method takes input tensor x and does forward propagation
Args:
x (torch.Tensor): The input tensor of the shape (batch size, sequence length, input_dim)
Returns:
"""
39
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
# If we don't, we'll backprop all the way to the start even after going through another batch
# Forward propagation by passing in the input, hidden state, and cell state into the model
# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)
return out
models = {
"lstm": LSTMModel,
return models.get(model.lower())(**model_params)
class Optimization:
Optimization is a helper class that takes model, loss function, optimizer function
40
learning scheduler (optional), early stopping (optional) as inputs. In return, it
provides a framework to train and validate the models, and to predict future values
Attributes:
model (RNNModel, LSTMModel, GRUModel): Model class created for the type of RNN
"""
"""
Args:
model (RNNModel, LSTMModel, GRUModel): Model class created for the type of RNN
"""
self.model = model
self.loss_fn = loss_fn
self.optimizer = optimizer
self.train_losses = []
self.val_losses = []
Given the features (x) and the target values (y) tensors, the method completes
41
one step of the training. First, it activates the train mode to enable back prop.
the losses by using the loss function. Then, it computes the gradients by doing
Args:
"""
self.model.train()
# Makes predictions
yhat = self.model(x)
# Computes loss
# Computes gradients
loss.backward()
self.optimizer.step()
self.optimizer.zero_grad()
return loss.item()
42
def train(self, train_loader, val_loader, batch_size=64, n_epochs=50, n_features=1):
The method takes DataLoaders for training and validation datasets, batch size for
Then, it carries out the training by iteratively calling the method train_step for
n_epochs times. If early stopping is enabled, then it checks the stopping condition
to decide whether the training needs to halt before n_epochs steps. Finally, it saves
Args:
"""
batch_losses = []
y_batch = y_batch.to(device)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
self.train_losses.append(training_loss)
43
with torch.no_grad():
batch_val_losses = []
y_val = y_val.to(device)
self.model.eval()
yhat = self.model(x_val)
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
self.val_losses.append(validation_loss)
print(
torch.save(self.model.state_dict(), model_path)
The method takes DataLoaders for the test dataset, batch size for mini-batch testing,
predicts the target values and calculates losses. Then, it returns two lists that
Note:
44
This method assumes that the prediction from the previous step is available at
the time of the prediction, and only does one-step prediction into the future.
Args:
Returns:
"""
with torch.no_grad():
predictions = []
values = []
y_test = y_test.to(device)
self.model.eval()
yhat = self.model(x_test)
predictions.append(yhat.to(device).detach().numpy())
values.append(y_test.to(device).detach().numpy())
def plot_losses(self):
"""The method plots the calculated loss values for training and validation
"""
45
plt.plot(self.train_losses, label="Training loss")
plt.legend()
plt.title("Losses")
plt.show()
plt.close()
input_dim = len(X_train.columns)
output_dim = 1
hidden_dim = 64
layer_dim = 3
batch_size = 64
dropout = 0.2
n_epochs = 50
learning_rate = 1e-3
weight_decay = 1e-6
'hidden_dim' : hidden_dim,
'layer_dim' : layer_dim,
'output_dim' : output_dim,
'dropout_prob' : dropout}
loss_fn = nn.MSELoss(reduction="mean")
46
opt = Optimization(model=model, loss_fn=loss_fn, optimizer=optimizer)
opt.plot_losses()
test_loader_one,
batch_size=1,
n_features=input_dim
df[col] = scaler.inverse_transform(df[col])
return df
df_result = df_result.sort_index()
return df_result
df_result
47
df_result[['value', 'prediction']].plot(figsize=(15,7))
def calculate_metrics(df):
return result_metrics
result_metrics = calculate_metrics(df_result)
Appendix B: Source code for Streamlit web application using Prophet model only
import streamlit as st
import yfinance as yf
START = "2015-01-01"
TODAY = date.today().strftime("%Y-%m-%d")
48
stocks = ('NSRGF','NVDA','NFLX','AMZN','PYPL','TSLA','GOOG', 'AAPL', 'MSFT', 'GME')
@st.cache
def load_data(ticker):
data.reset_index(inplace=True)
return data
data = load_data(selected_stock)
st.subheader('Raw data')
st.write(data.tail())
def plot_raw_data():
fig = go.Figure()
st.plotly_chart(fig)
49
plot_raw_data()
df_train = data[['Date','Close']]
m = Prophet()
m.fit(df_train)
future = m.make_future_dataframe(periods=period)
forecast = m.predict(future)
st.subheader('Forecast data')
st.write(forecast.tail())
st.plotly_chart(fig1)
st.write("Forecast components")
fig2 = m.plot_components(forecast)
st.write(fig2)
50