Research Paper TARP Final Upload

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Prediction using Machine Learning techniques

(LSTM MODEL)
Rohith Anil Kumar Sai Anirudh Guduru Thodunupuru Sahithi
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
specializing in Information Security Vellore Institute of Technology, Vellore Institute of Technology,
Vellore Institute of Technology, Vellore Vellore
Vellore

Abstract—The prediction of time series data is very mainly introduced to handle situations where RNN’s fail.
difficult. Due to the various environmental factors such as RNN’s fail to store information for a longer period of
global warming the temperature, pressure and various time. At times, a reference to certain information stored
components of the planet is found to increase every year. quite a long time ago is required to predict the current
Based on certain features, we can predict the temperature of
a time in future. Therefore, we can make the necessary
output. But RNN’s are absolutely incapable of handling
changes and take enough precautions to reduce the such “long-term dependencies” Another problem of RNN
temperature and other components of the Earth. To ensure is that they do not have control over which part of the
this, we can create some sort of a time series forecasting context needs to be carried forward and how much of the
machine learning model, that will allow us to predict the past needs to be forgotten. One of the most prominent
temperature and other components of the earth in future. issues with RNNs are exploding and vanishing gradients
The traditional recurrent neural network RNN wants to which occur during the training process of a network
remember all the knowledge whether it is useful or not, so through backtracking. The problem with CNN is that for
there may be problems such as gradient disappearance and smaller datasets the accuracy of prediction is found to be
gradient explosion.
low. While model fitting the CNN model was required to
Keywords—recurrent neural networks, long short-term need more epochs, and hence its performance reduced for
memory model (LSTM model), root mean square error, smaller datasets.
preprocessing, dataset, training

III. PROPOSED METHODOLOGY


I. INTRODUCTION
LSTM is designed in such a manner that the vanishing
A time series is a set of repeated measurements of the gradient problem is almost completely removed, while the
same phenomenon taken sequentially over time. Time is training model is left unaltered. Long time lags in certain
usually the independent variable in a time series. The problems are bridged using LSTMs where they also
temperature data represents temperature anomalies. We handle noise, distributed representation, and continuous
will not be working with absolute temperature data as in
values. LSTM take care of RNN’s long term dependency
climate change studies, anomalies are more important than
problem. LSTMs have feedback connections which make
absolute temperature. From the given dataset, various
features that have good correlation with the feature that is them different to more traditional feedforward neural
predicted would be used by a machine learning prediction networks. This property enables LSTMs to process entire
model. In this way, the temperature or pressure features sequences of time series data without treating each point
can be predicted before-hand based on a test data set and in the sequence independently but rather retaining useful
necessary measures can be taken by organisations like the information about previous data in the sequence to help
UN to reduce the variation. Some of the common with the processing of new data points. Also, the
prediction models are clustering models, classification performance and accuracy for both larger datasets and
model, neural network (convolutional and recurrent), time smaller datasets is good for LSTMs and this overcomes
series model. Here, a variation of the generic recurrent the drawback of using the CNN model for smaller
neural network model is used as the prediction model and datasets.
it is the LSTM RNN model. A common LSTM unit is The LSTM model is chosen to predict the temperature.
composed of a cell, an input gate, an output gate and a The behaviour of the preceding X years helps us forecast
forget gate. The cell remembers values over arbitrary time what the current temperature/future temperature would be.
intervals and the three gates regulate the flow of LSTMs are better at identifying complex pattern logics
information into and out of the cell. from data by remembering what’s useful and what’s not.
Programming language that is going to be used is Python.
II. PROBLEM STATEMENT Some of the libraries that have to be used are:
Pandas: for data manipulation and analysis
The main idea is to predict the values of any important
Numpy: for working with arrays
feature like temperature for a forecast dataset. For this a
Seaborn: data visualization library based on matplotlib
good machine learning model must be chosen with a low
Matplotlib: plotting library for python programming
RMSE value. The existing common prediction models
language
such as recurrent neural networks and convolutional
Plotly: Scientific graphing library
neural networks have certain shortcomings. LSTM
Tensorflow: used to create machine learning models
networks are an extension of recurrent neural networks
Keras: Keras is used for creating deep models which can of the rows would be used for testing. We plot the graph
be productized on smartphones. It is also used for to see the training data and testing data. The model is then
distributed training of deep learning models. Keras is the created by adding the necessary layers. Initially we create
high-level API of Tensorflow 2. a model with the LSTM layer present. The LSTM layers
contain a Dropout to prevent overfitting in the model. The
DATASET USED: output layer consists of a dense layer with 1 neuron with
https://storage.googleapis.com/tensorflow/tf-keras- activation as ReLU. We compile this model with loss
datasets/jena_climate_2009_2016.csv.zip calculated as mean squared error and the optimizer used is
Number of records: 420551 Adam optimizer. After compiling, model fitting is done.
Number of features: 15 The feature that is used to for training the machine
learning model is temperature and the feature that is
predicted is also the temperature feature. Then the test
data set is given to the machine learning model to predict
the values and result is represented in a tabular format.
The predicted values are then compared with the actual
values and are plotted graphically for better
understanding. The RMSE value for the values predicted
by the machine learning model is calculated and displayed
along with the graph. Here, the LSTM model is the
univariate LSTM model.
The same dataset is pre-preprocessed, compiled with CNN
layer. The output layer consists of a dense layer with 1
neuron with activation as ReLU. We compile this model
with loss calculated as mean squared error and the
optimizer used is Adam optimizer. After model fitting is
done, test data is given to the machine learning model to
predict the values and the result is represented in a tabular
format. The predicted values are then compared with the
actual values and are plotted graphically for better
understanding. The RMSE value for the values predicted
by the machine learning model is calculated and displayed
First of all, the necessary libraries are imported. The along with the graph.
dataset is then loaded. It is a csv file and is read using The same dataset is pre-processed, compiled with LSTM
pandas. The number of rows in the chosen dataset is layer. The LSTM layers contain a Dropout to prevent
420551 and the number of columns is 15. Pre-processing overfitting in the model. The output layer consists of a
of this dataset then takes places. Starting from the fifth dense layer with 1 neuron with activation as ReLU. We
row every sixth row is taken and another smaller dataset is compile this model with loss calculated as mean square
created. This is then stored as another data frame. The first error and the optimizer used is Adam optimizer. After
column which is the index column is changed to the Date compiling, model fitting is done. The features that are
time column. From plotting the temperature on a graph, used for training the machine learning model is
we understand that the variations in temperature follow a temperature and timestamp which is processed and made
particular pattern. This point is kept in mind while training into four other features. These features are the date time
the machine learning model. converted to Day sin, Day cos, Year sin, Year cos. The
periodic change in temperature can be noticed from the
graph which has plotted date-time against temperature.
This is why temperature along with date-time is
considered for training. The test data values are fed into
the machine learning model to predict the values and the
result is represented in a tabular format. The predicted
values are then compared with the actual values and are
plotted graphically for better understanding. The RMSE
value for the values predicted by the machine learning
model is calculated and displayed along with the graph.
The same process is again repeated for another LSTM
machine learning model where the features used for
training the model are temperature and pressure. The
RMSE value for the values predicted by the machine
For deep learning models, data should always be in an
learning model is calculated and displayed along with the
input matrix. Each row is taken as the input and has its
graph.
corresponding label. The data is then reshaped to the form
To test the performance, the machine learning models
which is required for training the machine learning model.
(univariate LSTM and univariate CNN) mentioned have
For training the first 60000 rows would be used. The rest
also been tested with smaller datasets. The layers added to
the model here are only the LSTM layer with the ReLU temperature values. The RMSE value is
activation function and a Dense layer. We compile this calculated based on the 5086 rows and 2
model with loss calculated as mean squared error and the columns. The graph plot shows the actual and
optimizer used is Adam optimizer. After model fitting is predicted values of the temperature for the first
done, test data is given to the machine learning model to 100 instances.
predict the values and the result is represented in a tabular
format. The predicted values are then compared with the
actual values and are plotted graphically for better
understanding. The RMSE value for the values predicted
by the machine learning model is calculated and displayed
along with the graph.

IV. RESULTS
• When the univariate LSTM model is used to
predict the temperature by only taking the
temperature as a feature, the RMSE value is
found to be 0.625. The orange color plotted in
the graph shows the actual value and the blue
color plotted in the graph shows the predicted
value of the model. The X axis shows the index
number of the temperatures and the Y axis shows
the temperature values. The RMSE value is
calculated based on the 5086 rows and 2
columns. The graph plot shows the actual and
predicted values of the temperature for the first
100 instances.

• When the multivariate LSTM model is used to


predict the temperature by taking the temperature
as one feature and date-time divided into four
other features, the RMSE value is found to be
0.504. The orange color plotted in the graph
shows the actual value and the blue color plotted
in the graph shows the predicted value of the
model. The X axis shows the index number of
the temperatures and the Y axis shows the
temperature values. The RMSE value is
calculated based on the 5085 rows and 2
columns. The graph plot shows the actual and
predicted values of the temperature for the first
100 instances.

• When the CNN model is used to predict the


temperature by only taking the temperature as a
feature, the RMSE value is found to be 0.6384.
The orange color plotted in the graph shows the
actual value and the blue color plotted in the
graph shows the predicted value of the model.
The X axis shows the index number of the
temperatures and the Y axis shows the
LSTM MODEL

• When the multivariate LSTM model is used to


predict the temperature by taking the temperature
as one feature and pressure as the other feature,
the RMSE value is found to be 0.584. The orange
color plotted in the graph shows the actual value
and the blue color plotted in the graph shows the
predicted value of the model. The X axis shows
the index number of the temperatures and the Y
axis shows the temperature values. The RMSE CNN MODEL
value is calculated based on the 5084 rows and 2
V. CONCLUSION
columns. The graph plot shows the actual and
predicted values of the temperature for the first From the RMSE values of univariate models of CNN and
100 instances. LSTM, LSTM is found to have lower RMSE value and
hence, it is said to have better accuracy. The accuracy of
LSTM models is good for both large datasets and smaller
datasets while the accuracy of CNN models are low for
smaller datasets and for CNN the model requires more
training to achieve the same accuracy as that of a LSTM
model for lesser training. The LSTM is said to solve
vanishing gradient and explosive gradient problem of the
recurrent neural networks, hence achieving a good
accuracy for a prediction model.
For the RMSE values of the multivariate models of
LSTM, the model in which the features used for training
are temperature and four other features derived from date-
time feature are said to have lower RMSE value as
compared to the LSTM model in which the features used
for training are temperature and pressure. Hence, the
model with temperature and four other features derived
from date-time as said to have better correlation to the
label which is temperature.
Also, from the RMSE values of LSTM (univariate and
multivariate), we can understand that multivariate models
give lesser RMSE value compared to univariate LSTM
models. Hence, the predicted values provided by them is
more accurate.
From the above results, it can be concurred that the LSTM
model is a better predictive model than the CNN model
• When very small datasets are used to train the and generic RNN model (as it takes care of the
LSTM recurrent neural network model, the dependency problem in RNN while doing predictive
accuracy of the model when the model fitting is analysis and also as it takes care of the
done for 200 epochs is very good. The same vanishing/explosive gradient problem).
accuracy for the same dataset can be achieved by
the CNN machine learning model when the VI. REFERENCES
model fitting is done for 1000 epochs. But even
in this case the accuracy of the LSTM model is 1. Li, Xiangang; Wu, Xihong (2014-10-15).
found to be more as the predicted value is closer "Constructing Long Short-Term Memory based
to the actual value. Deep Recurrent Neural Networks for Large
Vocabulary Speech Recognition".
2. Graves, A.; Liwicki, M.; Fernández, S.; International Conference on Artificial
Bertolami, R.; Bunke, H.; Schmidhuber, J. (May Intelligence (AICAI), 2019, pp. 345-348,
2009). "A Novel Connectionist System for doi:10.1109/AICAI.2019.8701273.
Unconstrained Handwriting Recognition". IEEE 8. T. M. N. P. Karunarathna et al., "Consumer and
Transactions on Pattern Analysis and Machine Farmer Centric Subscription Based Organic
Intelligence Vegetable/Fruit Delivery System," 2021 3rd
3. Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. InternationalConference on Advancements in
(2017). Predictive Business Process Monitoring Computing (ICAC), 2021, p p. 109-115,
with LSTM neural networks. Proceedings of the doi:10.1109/ICAC54203.2021.9671159.
International Conference on Advanced 9. D. K, R. M, S. V, P. N and I. A. Jayaraj, "Meta-
Information Systems Engineering (CAiSE). Learning Based Adaptive Crop Price
Lecture Notes in Computer Science. Vol. prediction for Agriculture Application," 2021 5th
4. Thireou, T.; Reczko, M. (2007). "Bidirectional International Conference on Electronics,
Long Short-Term Memory Networks for Communication and Aerospace Technology
predicting the subcellular localization of (ICECA2021, pp. 396-402, doi :
eukaryotic proteins". IEEE/ACM Transactions 10.1109/ICECA52323.2021.9675891.
on Computational Biology and Bioinformatics. 10. G. S. Sajja, S. S. Jha, H. Mhamdi, M. Naved, S.
5. Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, Ray and K. Phasinam, "An Investigation on Crop
J. (2017). "LSTM network: A deep learning Yield Prediction Using Machine Learning," 2021
approach for Short-term traffic forecast". IET Third International Conference on Inventive
Intelligent Transport Systems. Research in Computing Applications(ICIRCA),
6. Klaus Greff; Rupesh Kumar Srivastava; Jan 2021, pp. 916-921, do :
Koutník; Bas R. Steunebrink; Jürgen 10.1109/ICIRCA51532.2021.9544815.
Schmidhuber (2015). "LSTM: A Search Space 11. D. Elavarasan and P. M. D. Vincent, "Crop Yield
Odyssey". IEEE Transactions on Neural Prediction Using Deep Reinforcement Learning
Networks and Learning Systems. Model for Sustainable Agrarian Applications," in
7. A. Vohra, N. Pandey and S. K. Khatri, "Decision IEEE Access, vol. 8, pp. 86886-86901, 2020,
Making Support System for Prediction of Prices doi: 10.1109/ACCESS.2020.2992480.
in Agricultural Commodity," 2019 Amity

You might also like