CCP Final Stock Market

Stock Price Prediction Using LSTM On Foreign Share Market
INTRODUCTION
INTRODUCTION
Department of Computer Engineering Page 1

The environment and public health greatly depend on the quality of the air we breathe.
Knowing how pollutants affect our health depends in large part on the collection,
monitoring, and analysis of data on air quality. Using a dataset with a plethora of data on
air pollutants and meteorological variables, we explore the field of air quality data
analysis and prediction in this report. Our goal is to forecast air quality indices (AQI) and
obtain important insights into trends in air quality by utilizing data science and machine
learning. A multitude of factors, such as particulate matter (PM2.5 and PM10), different
gasses (NO, NO2, NOx, CO, SO2, O3), and volatile organic compounds (benzene,
toluene, xylene), are involved in the complex and dynamic domain of air quality data.
Furthermore, weather factors like humidity, temperature, and wind speed can have a big
impact on air quality. 'city_day.csv,' the dataset we have access to, contains a significant
amount of information from several cities and presents a complete picture of air quality in
different scenarios. This report provides a thorough analysis of the data, including
visualization, clustering analysis, predictive modeling, and data preprocessing. First, we
must complete the crucial work of cleaning and preparing the data. We deal with missing
values, eliminate outliers, standardize the data, and reduce dimensionality using Principal
Component Analysis (PCA). These preprocessing procedures are essential to guarantee
The use of K-Means clustering to reveal innate patterns and clusters within the air quality
data is one of this report's highlights. This helps us comprehend the relationships and
groupings between the different pollutants and meteorological variables on a deeper
level. Additionally, we examine the dataset's correlation structure in more detail. We can
learn a great deal about the interactions between variables by examining the correlation
matrix. These insights can be very helpful to environmental scientists and policymakers.
Predictive modeling is where machine learning comes into play. With the help of a
Random Forest Regressor, we hope to predict air quality indices (AQI) from the
information at hand. With major ramifications for environmental management and public
health, this predictive model offers a potent tool for predicting air quality conditions. To
sum up, this report offers a thorough investigation of the analysis and prediction of air
quality data, demonstrating the potential of data science and machine learning to improve
our comprehension of air quality trends and enable better informed decision-making. The

analysis's conclusions could help create preventative strategies to deal with air quality
issues and safeguard both the environment and community health.
Abstract
Stock market prediction is that the scene of trying to complete the long-run value of
company stock. Analysis of the stock price we take the price. By using the neural
network, we develop a model. within the neural network, we use a recurrent neural
network that remembers each and each information through time. LSTM networks
maintain to stay contextual information of inputs by associate a loop that permits
information to travel from one step to the following. These loops make recurrent neural
networks seem magical. Each x train has the last 100 days value for the present-day y
data. this can add more accurate results when a measure to existing stock price prediction
algorithms. The network is trained and evaluated for accuracy with various sizes of
knowledge, and also the results are tabulated.
A. Existing System
In the existing system, SVM and Backpropagation Algorithm there is no which won't do
dropout process. Because of this, unwanted data have been processed which leads to
wastage of time and memory space. The prediction of future stock price by SVM and
Backpropagation Algorithm is less efficient because of processing unwanted data. The
SVM and Backpropagation Algorithm which is used in the existing system is not that
effective in handling non-linear data. So, in our proposed future stock price prediction is
done using LSTM (Long Short Term Memory) which is a higher accurate value for the
next day than SVM and Backpropagation Algorithm .
B. Proposed System
In the proposed system we try to find the accurate value of the next day closing value that
helps the investors to invest or sell their shares. Long Short Term Memory (LSTM) is an
artificial neural network in the field of deep learning. LSTM is an advance Neural

network with having a memory cell that stores a small amount of data for further
references. LSTM has feedback links that make it a "general-purpose computer". LSTM
can also process an entire series of data not only single value like image. Because of the
dropout process which takes place in the LSTM algorithm, it is comparatively faster than
SVM and Backpropagation. LSTM algorithm is more suitable in predicting the future
stock price than the SVM and Backpropagation algorithm because of removing the
undesired data. The time and memory consumption are also reduced when compared to
the exciting system due to the dropout process. LSTM algorithm is more proper in
handling non-linear data. We predict the 07-company stock price and store them in a
tabular format and visualize them.
An Introduction to U.S. Stock Market Indexes
Stock market indexes around the world are powerful indicators for global and country-
specific economies. In the United States the S&P 500, Dow Jones Industrial Average, and
Nasdaq Composite are the three most broadly followed indexes by both the media and
investors.1 In addition to these three indexes, there are approximately 5,000 others that
make up the U.S. equity market.
KEY TAKEWAYS –
 There are approximately 5,000 U.S. indexes.
 The three most widely followed indexes in the U.S. are the S&P 500, Dow Jones
Industrial Average, and Nasdaq Composite.
 The Wilshire 5000 includes all the stocks from the U.S. stock market.
 Indexes can be constructed in a wide variety of ways but they are commonly
identified generally by capitalization and sector segregation.
INDEXES ABOUND –

With so many indexes, the U.S. market has a wide range of methodologies and
categorizations that can serve a broad range of purposes. The media most often reports on
the direction of the top three indexes regularly throughout the day with key news items
serving as contributors and detractors. Investment managers use indexes as benchmarks
for performance reporting.
Meanwhile, investors of all types use indexes as performance proxies and allocation
guides. Indexes also form the basis for passive index investing often done primarily
through exchange-traded funds that track indexes specifically.
Overall, an understanding of how market indexes are constructed and utilized can help to
add meaning and clarity for a wide variety of investing avenues. Below, we elaborate on
the three most followed U.S. indexes, the Wilshire 5000 which includes all the stocks
across the entire U.S. stock market, and a roundup of some of the other most notable
indexes.

PROJECT
SCHEDULE
PROJECT SCHEDULE

Software Engineering is an establishment of sound engineering principles in

order to obtain economical software that is reliable and works efficiently on real machine.
We have used classic life cycle paradigm also called “Water fall model”. For software
engineering which is sequential approach to software development that begins at the
system level and progress through analysis, design, coding, testing and maintenance. We
had completed system engineering and analysis by the end of September 2021 which
encompasses requirement gathering at the system level in the small amount of top-level
design analysis.
We had completed software requirement analysis by the mid of November 2021 which
encompasses both system and software requirement gathering. By the end of December
2021 we had completed the project planning and design. On the basis of design prepared
in the previous stage by the end of April 2022 we completed coding stage.
After completion of coding page the important part of the software development is
carried out, which is the testing phase by the mid of May 2022. Various criteria’s of
testing were taken in account has unit testing, integration testing, validation testing and
system testing. First, each and every module of the project was tested under the unit
testing. After unit testing, integration testing was carried out by integrating all module
tested in unit testing. After unit testing the module prepared was cross checked with the
design. At last the whole module was tested over the system and desired result was
obtained.

Sep System
Engineering
Oct
Requirement
Analysis
Nov
Dec
Design
Nov
Feb
Coding
Mar
Apr
May Testing

REQUIREMENTS
ANALYSIS

REQUIREMENTS ANALYSIS
Software Requirements :
I.Python: Python is a high-level, interpreted programming language known for its
simplicity and readability. It is versatile and widely used in various domains, including
web development, data analysis, machine learning, scientific computing, and more.
Python's easy-to-read syntax and extensive standard library make it a popular choice for
both beginners and experienced developers.
II.Python Libraries: The code relies on several Python libraries and modules.We install
these libraries using Python's package manager, pip. The required libraries include:
● pandas: Pandas is an open-source Python library that provides high performance data
structures and data analysis tools. It is particularly valuable for data manipulation,
cleaning, and analysis. Pandas introduces two main data structures, the DataFrame and
Series, which allow users to work with tabular data effectively. It's an essential tool for
anyone dealing with data in Python.
● numpy: NumPy, short for Numerical Python, is a fundamental library for numerical
and scientific computing in Python. It provides support for multi dimensional arrays and
matrices, along with a wide range of mathematical functions to operate on these arrays.
NumPy is the foundation for many other scientific and data manipulation libraries in
Python, and it's crucial for tasks like linear algebra, statistics, and more.
● scikit-learn (sklearn): Scikit-learn is a machine learning library used for clustering and
predictive modeling in the code. It offers tools for data preprocessing, modeling, and
evaluation.
● matplotlib: Matplotlib is a popular library for creating static, animated, or interactive

visualizations in Python. It is used for generating various plots and graphs in the code.
● seaborn: Seaborn is a data visualization library built on top of matplotlib. It provides a

high-level interface for creating informative and attractive statistical graphics.
III.Jupyter Notebook : Jupyter is an open-source application that allows interactive and

collaborative computing. Jupyter Notebook and JupyterLab are popular interfaces that
enable users to create and share documents that contain live code, equations,
visualizations, and narrative text. Jupyter is widely used in data analysis, research, and
education. It supports multiple programming languages, with Python being one of the
most commonly used languages in Jupyter environments.
Hardware Requirements

The hardware requirements depend on the size of your dataset and the complexity of
the analysis. However, for a smooth execution of the code provided in this report, a
standard modern computer or laptop with the following specifications is typically
sufficient: 4
Processor: A multi-core processor, such as Intel Core i5 or AMD Ryzen, is
recommended for faster data processing.
Memory (RAM): At least 8 GB of RAM is advisable, especially when dealing with large
datasets.
Storage: Adequate storage space for the dataset and any additional data files or models.
An SSD is preferred for faster read/write operations.
Operating System: The code is compatible with Windows, macOS, and Linux operating
systems.

DESIGN
DESIGN OF LSTM
Long Short Term Memory Network consists of four different gates for
different purposes as described below:-

 Forget Gate(f): It determines to what extent to forget the previous data.
 Input Gate(i): It determines the extent of information be written onto the Internal
Cell State.
 Input Modulation Gate(g): It is often considered as a sub-part of the input gate and
much literature on LSTM’s does not even mention it and assume it is inside the Input
gate. It is used to modulate the information that the Input gate will write onto the
Internal State Cell by adding non-linearity to the information and making the
information Zero-mean. This is done to reduce the learning time as Zero-mean input
has faster convergence. Although this gate’s actions are less important than the others
and are often treated as a finesse-providing concept, it is good practice to include this
gate in the structure of the LSTM unit.
 Output Gate(o): It determines what output(next Hidden State) to generate from the
current Internal Cell State.
The basic workflow of a Long Short Term Memory Network is similar to the
workflow of a Recurrent Neural Network with the only difference being that the
Internal Cell State is also passed forward along with the Hidden State.
Single Recurrent Unit containing different

Gates
Working of an LSTM recurrent unit :
 Take input the current input, the previous hidden state, and the previous internal cell
state.
 Calculate the values of the four different gates by following the below steps :
 For each gate, calculate the parameterized vectors for the current input and the
previous hidden state by element-wise multiplication with the concerned vector
with the respective weights for each gate.
 Apply the respective activation function for each gate element-wise on the
parameterized vectors. Below given is the list of the gates with the activation
function to be applied for the gate.
 Calculate the current internal cell state by first calculating the element-wise
multiplication vector of the input gate and the input modulation gate, then calculate
the element-wise multiplication vector of the forget gate and the previous internal cell
state and then adding the two vectors.
 Calculate the current hidden state by first taking the element-wise hyperbolic tangent
of the current internal cell state vector and then performing element-wise
multiplication with the output gate.

Recurrent Unit
That the blue circles denote element-wise multiplication. The weight matrix W contains
different weights for the current input vector and the previous hidden state for each gate.
Just like Recurrent Neural Networks, an LSTM network also generates an output at each
time step and this output is used to train the network using gradient descent.
Network Using Gradient Descent
Why do we use LSTM with text data?

When performing normal text modelling, most of the pre-processing task and modelling
task focuses on creating data sequentially. Examples of such tasks can be POS tagging,
stop words elimination, sequencing of the text. These are the methods that try to make
data understood by a model with less effort according to the known pattern. It can give
the results.
Here applying LSTM networks can have its own special feature. We have discussed that
LSTM has a feature through which it can memorize the sequence of the data. It has one
more feature that it works on the elimination of unused information and as we know the
text data always consists a lot of unused information which can be eliminated by the
LSTM so that the calculation timing and cost can be reduced,
So basically the feature of elimination of unused information and memorizing the
sequence of the information makes the LSTM a powerful tool for performing text
classification or other text-based tasks.

IMPLEMENTATIO
N
`

SYSTEM ARCHITECTURE
A. Download Share price Data
There are many sources for getting the share price data. Python has a library to get the
data from the internet. Pandas_datareader has 4 parameters that will indicate how and
where to get the data to the API. The first parameter is it specify the stock name of a
company. next, it has a data source parameter that indicates that from where the API
should you want to collect the data and the next parameters are starting date and ending
date. There are many API to collect the data from the internet every API has some
limitation of requesting data. so, by using Pandas_datareader I collected the 5 companies.
the data files are in CSV format that can be used easily in Jupiter notebook. data set has
Date, High, Low, Open, Close, Volume, Adj Close .
Collecting The Data Using Pandas Data Reader

B. Data Set : Air Quality India(2015-2020)
Dataset Description: Dataset appears to be a collection of air quality measurements from

different monitoring stations on various dates. Each row represents the daily data record
for a specific monitoring station, while the columns contain the following types of
information:
● StationId: Identifier for the monitoring station.\
● Date: The date of the recorded measurements. The dates are obfuscated in the
screenshots, but they would typically be in a standard date format.
● Pollutants: Columns labeled PM2.5, PM10, NO, NO2, NOx, NH3, CO, SO2, O3,
Benzene, Toluene, and Xylene represent the daily average concentrations of these air
pollutants measured in micrograms per cubic meter (µg/m³) or in parts per million (ppm),
depending on the typical concentration levels of the pollutants.
● Air Quality Index (AQI): The AQI column shows the calculated Air Quality Index,
which is a dimensionless number used to communicate the level of pollution and its
health implications.
● AQI Bucket: This column categorizes the AQI into different buckets such as 'Good',
'Satisfactory', 'Moderate', 'Poor', 'Very Poor', and 'Severe', representing the overall air
quality on that day.
● Total no. of Columns : 16 , Rows : 29532 This dataset could be used for various
analytical purposes, such as:
• Tracking pollution trends over time at different locations.
• Assessing the effectiveness of air quality management policies.
• Correlating air pollution levels with public health data to understand the impact on
respiratory or cardiovascular diseases.
• Developing predictive models for air quality forecasting.
Dataset Of APPLE
C. Data Preprocessing
Data set is a collection of feathers and N number of rows there are many values and
values are in different formats. In a dataset, they may be duplicate values or null values
that may lead to some loss inaccuracy and there may be dependent. Data have been
collected from different sources so there use a different type of format to notate a single
value like gender someone represents M/F or Male/Female. The machine can understand
only 0 and 1 so an image will be in 3-dimension data should be reduced to a 2-dimension

format like data show to free from noisy data, null values, an incorrect format. Data
cleaning can be performed by panda’s tabular data and OpenCV for images.
D. Feather Scaling
The dataset consists of numerical value which are constantly increasing day by day in the
data set the closing value gradually increasing from 1 to 2000. As in the neural network
target variable with a large spread of values, in turn, may result in large error gradient
values causing weight values to change dramatically, making the learning process
unstable. So, we scale the dataset into a small scale of (0,1) or (0,2) that make simple and
reduce the loss. By scaling it easy for the model to store similar data for further reference
in the further. We scale all the data to 0 to 1 and then try the model and pass the scaled
value to get the output and reconvert the scale to an original scale value.
Closed Values of the APPLE after Scaling

A. LSTM Training and Evaluation

Long Short Term Memory is an Artificial neural network that has a feedback connection
that makes the network work efficiently. LSTM has the advantage of handling sequential
time series data not only a single valid data. It has three cells that regulate the
functioning, input, and output of the network. This type of network is suitable for
classifying and predicting the time series problems which we can make things better
untestable to a network. There are several architectures for LSTM but most commonly
used are three cell ones that make the network work efficiently and predict the values
accurately. An input gate, an output gate and a forget gate. The Input Gate is the point
where we pass the data into the network that we want to teach the model. Here there is
another input from the output gate that is also called as a feedback signal that is the main
one that was model learn what mistake it hades done and learn how to overcome from
that loss. In these input gate, we use the sigmoid function and tanh function to combine
the input and hidden value and get the output in the range -1 to 1. The Output Gate is the
last gate in the circuit we it gives the output value and it has an important duty that has to
decide what data should be hidden for the next time. First, the previse hidden data and
input passed to sigmoid function then pass the input value and sigmoid output multiply at
tanh function that decides what should be stored for the next step. The Forget Gate is the
main gate that stores the previous information for farther references. The input from the
previous data and the current data to the sigmoid function that gives the value 0 or 1. If
we get 1 the forget gate store the data otherwise it will forget the present data.

LSTM Architecture Diagram
PROPOSED WORK
By using LSTM, we try to implement the predicter model for the top 10 companies. For
that, we have done some data modification that made the algorithm more efficient to find
the accurate value. First, we get the data from the yahoo server by using the pandas web
Reader that uses four parameters name of the company, server name, starting date and
ending date. That data has been scaled because we have ranged value from 0 to different
so for making the model more efficient, we scale the input data from 0 to 1. For that, we
have a package sklearn that has a scaling method to convert the input data to a range from
0 to 1. The main point is to create an x_train and y_train after the scaling we create
training data we take 100 days of previous data for 'y' day value. So, we create train data
for every 'y' day that the previous 100 days values are responsible for that day. Like that
we create a training data set. We try to create a network model with two LSTM layers
and 2 dense layers that make the model try to find the closing value for the next day.

Sequential methods are used because of the advantages of finding patterns of similarities.
it helps in finding the next event for that particular X data. Model Compile defines the
loss function, the optimizer, and the metrics. That's all. It has nothing to do with the
weights and you can compile a model as many times as you want without causing any
problem to pretrained weights. LSTM. The compiler has optimizer Adam that makes the
network learn the value. We use the loss as have a mean squared error than try to reduce
the loss that has accrued at the time of learning.
LSTM Model
Then we try to fix the x train data and y train data to the model fit to make the model
learn from the historic data. The epochs are used to make the model learn the same data
repeated times. At epochs 2 is apply for that we got the nearest value for the next day
closing value. Batch size is 1 because each value is individual and it is independent that
makes the prediction accuracy. Then we create x train and y train to predict the accuracy
of the model and predict the values for x train and get the y to predict value. We compare
both y train and y predict values. The RMSE is 5.77 that gives the assurance of that we
can find the max closest value for the next day closing value .

Flow of project


TESTING
REPORT
TESTING REPORTS
1. Unit Testing
Unit testing focuses each module individually, ensuring that if function properly as unit.
In this testing we have tested of our software to ensure maximum error detection. It helps
to remove bugs from sub modules & prevent arrival of huge bugs after words.
 Functional Partitioning:
1. Fetching data from www.Tiingo.com Website
2. Inserting Dataset of particular stock
3. Pre-processing of data
4. Train , Test of data
5. Final Plotting of graphs

Functional Description:
1) Name: Fetching data from Tiingo Website

Input:
Linking of API key to Stock market website to Project
Output:
Successfully Linked and Data Fetching Started
Specification:
This module provides the fetching of particular stock data from stock market website
Using API key linking .
2) Name: Inserting Dataset of particular stock

Input:
Dataset should be in CSV format extension and should contain –
Symbol , date , close , high , low , open volume , adjClos , adjHigh , adjLow ,
adj Open , adjVolume , divCash , splitFactor
Output:
Requirements Satisfied in .csv file and dataset format is correct
Specification:
This module provides the arraignment of correct data format in dataset
3) Name: Pre-processing of data

Input:
Process of transforming raw data into an understandable format
Output:
Raw data converted into understandable format
Specification:
This module provides the raw data which we had fetched from stock market website
and .csv file it is successfully converted into understandable format .
4) Name: Train , Test of data
Input:
To recognize patterns and validation of data
Output:
Pattern recognized successfully and data validated
Specification:
Training data is the initial dataset you use to teach a machine learning application to
recognize patterns or perform to your criteria, while testing or validation data is used to
evaluate your model's accuracy .
5) Name: Final plotting of graphs

Input:
Recognized patterns and validated data should be plotted on graph
Output:
Data plotted successfully
Specification:
The Data which we had Train Test and Pre-processed should be plotted on different
graphs .

2 . Integration Testing
Integration testing is a systematic technique for constructing the program structure while
at the same time conducting tests to uncover errors associated with interfacing. The
objective is to take unit tested components and build a program structure that has been
dictated by the design.
We prefer the Top-down integration testing as a testing approach for our project. The
Top-down integration testing is an incremental approach to construction of program
structure. Modules are integrated by moving downward through the control hierarchy,
beginning with the main control module
Modules subordinate to the main module are incorporated into the structure in depth-first
order.
Depth-first integration would integrate all components on a major control path of a
structure. Selection of a major path is somewhat arbitrary and depends on application-
specific characteristics.
3. Validation testing
The purpose of validation testing is to ensure that all expectations of the customer have
been satisfied. The testing is done by the project group members by inspecting the
requirements and validating each requirement.
We prefer the alpha testing technique for the validation purpose. In which case the a
group of students is called on to test Linking , fetching , pre-processing and plotting . The
group members will present at the place and observing the working and collecting the
bugs which have been occurred. The changes are made according to the requirements and
testing is done again to gather more errors if present .

4. System Testing
In the system testing system undergoes various exercises to fulfill the system
requirements.
These tests include:
a. Linking Tests : These are designed to ensure that the project is successfully linked to
the stock website .
b. Performance Tests: The tests are conducted to check the performance of the system.

PERFORMANCE
ANALYSIS

PERFORMANCE ANALYSIS
 Figure below shows the details of performance analysis .

INSTALLATION
GUIDE

INSTALLATION GUIDE
1. Operating System: Microsoft windows or mac

2. .Net framework 4.0 and above
3. Jupyter Notebook software
4. Python Should be installed on system
INSTALL PROCEDURE FOR CLIENT
 Install all dependencies required for project

 Insert API key of any stock exchange in project
 Give .csv file as input
 Get predictions of stock


RESULT
 Functional Partitioning:
1. Apple Stock data plotted from 2017 till 2022
2. Plotting of Training , Test , Dataset

PREDECTION OF FUTURE 30 DAYS PRICES

Representations of lines in graphs –

• Blue line – Dataset
• Green Line – Test Data
• Orange Line – Training Dataset
• Yellow Line – Prediction of future 30 days of stock prices

APPLICATION
APPLICATIONS
This project can be modified and used in following:
Random Forest Regressor: The code employs a Random Forest Regressor, which is a
machine learning model, to perform feature importance analysis.
Model Training: The Random Forest Regressor is trained on the dataset, with
'X_encoded' as the input and 'AQI' as the target variable..
Environmental Monitoring: Ridge Regression models can be trained on historical air

quality data along with various environmental and meteorological variables such as
temperature, humidity, wind speed, and geographical features. These models can then
predict future air quality levels based on these factors. Environmental agencies and
organizations can use these predictions to monitor and manage air pollution levels in
different regions.
Health Alerts and Advisories: Predictive models can help in issuing health alerts and
advisories to the public when air quality is expected to deteriorate. By forecasting high
pollution days in advance, individuals with respiratory conditions or other health
concerns can take precautionary measures such as staying indoors or using masks to
reduce exposure to harmful pollutants.

Urban Planning and Policy Making: Governments and urban planners can use air
quality prediction models to inform policy decisions related to transportation, industrial
zoning, and urban development. By understanding the factors contributing to poor air
quality, policymakers can implement measures to mitigate pollution and improve overall
air quality in cities.
Traffic Management: Traffic congestion is a significant contributor to air pollution in
urban areas. By incorporating traffic data into Ridge Regression models, transportation
authorities can predict how changes in traffic flow patterns, such as implementing
congestion pricing or promoting public transportation, may impact air quality.
Industrial Emissions Control: Industries can use predictive models to optimize their
operations and reduce missions. By analyzing the relationship between production
activities, weather conditions, and air quality, companies can adjust their processes to
minimize pollution and comply with environmental regulations.
Community Engagement and Education: Air quality prediction models can be used to
raise awareness among the general public about the sources and health effects of air
pollution. By providing easily accessible forecasts through mobile apps or websites,
individuals can make informed decisions to protect their health and the environment .

BIBLIOGRAPHY
BIBLIOGRAPHY
TEXT BOOKS REFERED
 "Air Pollution Control Engineering" by Noel de Nevers.

 “Fundamentals of Air Pollution" by Daniel Vallero
WEBSITES
 https://en.wikipedia.org/wiki/Long_short-term_memorywww.beyondlogic.org
 https://colah.github.io/posts/2015-08-Understanding-LSTMs/
www.howstuffworks.com
 https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-
networks-experts/

CCP Final Stock Market

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CCP Final Stock Market

Uploaded by

Copyright:

Available Formats

Stock Price Prediction Using LSTM On Foreign Share Market

Department of Computer Engineering Page 1

Department of Computer Engineering Page 2

Department of Computer Engineering Page 3

An Introduction to U.S. Stock Market Indexes

Department of Computer Engineering Page 4

Department of Computer Engineering Page 5

Department of Computer Engineering Page 6

Software Engineering is an establishment of sound engineering principles in

Department of Computer Engineering Page 7

Department of Computer Engineering Page 8

Department of Computer Engineering Page 9

● matplotlib: Matplotlib is a popular library for creating static, animated, or interactive

● seaborn: Seaborn is a data visualization library built on top of matplotlib. It provides a

III.Jupyter Notebook : Jupyter is an open-source application that allows interactive and

Department of Computer Engineering Page 10

Department of Computer Engineering Page 11

Department of Computer Engineering Page 12

 Forget Gate(f): It determines to what extent to forget the previous data.

Single Recurrent Unit containing different

Working of an LSTM recurrent unit :

Department of Computer Engineering Page 14

Network Using Gradient Descent

Why do we use LSTM with text data?

Department of Computer Engineering Page 15

Department of Computer Engineering Page 16

Department of Computer Engineering Page 17

A. Download Share price Data

Collecting The Data Using Pandas Data Reader

Department of Computer Engineering Page 18

B. Data Set : Air Quality India(2015-2020)

Dataset Description: Dataset appears to be a collection of air quality measurements from

Department of Computer Engineering Page 19

Closed Values of the APPLE after Scaling

Department of Computer Engineering Page 20

A. LSTM Training and Evaluation

Department of Computer Engineering Page 21

LSTM Architecture Diagram

Department of Computer Engineering Page 22

Department of Computer Engineering Page 23

Department of Computer Engineering Page 24

Department of Computer Engineering Page 25

2. Inserting Dataset of particular stock

4. Train , Test of data

5. Final Plotting of graphs

Department of Computer Engineering Page 26

1) Name: Fetching data from Tiingo Website

2) Name: Inserting Dataset of particular stock

3) Name: Pre-processing of data

5) Name: Final plotting of graphs

Department of Computer Engineering Page 28

Department of Computer Engineering Page 29

Department of Computer Engineering Page 30

Department of Computer Engineering Page 31

 Figure below shows the details of performance analysis .

Department of Computer Engineering Page 32

Department of Computer Engineering Page 33

1. Operating System: Microsoft windows or mac

INSTALL PROCEDURE FOR CLIENT

 Install all dependencies required for project

Department of Computer Engineering Page 34

Department of Computer Engineering Page 35

1. Apple Stock data plotted from 2017 till 2022