Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IoT Framework for Real Time Weather Monitoring

using Machine Learning Techniques


2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT) | 979-8-3503-9763-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICEEICT56924.2023.10156901

Felicia Sharon *, Asnath Victy Phamila Y, Geetha S 3


a
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India

*in.felicia.sharon@gmail.com

Abstract: Weather forecasting and weather warnings are property, they are extremely important. Temperature, Out-
used to protect human lives and property. Temperature, Out- look, Humidity, and Wind forecasts are critical for farmers,
look, Humidity, and Wind forecasts are critical for farmers, as as well as traders in product markets. Utility providers use
well as traders in product markets. Since weather data analytics temperature projections to gauge demand over the next few
necessitates extreme precision, high-performance computing is days. Temperature projections are often used by utility pro-
required to handle the massive amount of data. The significant viders to forecast demand in the future days.
variability of climatic observations obtained in a day makes
weather forecasting difficult. The objective of this project is to A recent development in the literature is the use of machine
forecast the weather parameters for the next 24 hours using learning for weather forecasting. Aditya et al introduced a
Auto ARIMA model and to use machine learning techniques to weather forecasting model that uses deep belief networks to
reliably predict the weather. Machine learning predicts the take into account the combined influence of important
weather conditions for the day using strong and highly signifi- weather factors and generate predictions [1]. Radhika et al.
cant results based on current data. A cost effective IoT frame [2] used support vector machines to classify the weather
work is designed to read the real time input using sensors inte- prediction problem. . Crowdsensing is a notion that Montori
grated with Arduino platform. By inputting average tempera- et al. used to study environmental phenomena by having
ture, humidity, pressure, and other variables, decision trees and participants exchange data from their smartphones [3]. They
the Random Forest Algorithm will be utilized to predict events
unveiled the Sen-Square architecture, which manages data
such as fog, rain, dry, windy, clear, breezy, and thunder. The
from IoT sources and crowdsensing platforms and presents
algorithm is evaluated based on various performance metrics
that include precision, recall, F score and accuracy. the data collectively to subscribers. These statistics came
from the environmental monitoring of smart cities. Decision
trees can be used to forecast dependent variables like fog and
Keywords: Decision Tree; Random Forest; ARIMA; Weather rain in order to fully utilise the potential of enormous
Forecasting; Internet of Things; Arduino amounts of data. The machine can have artificial intelligence
thanks to software that uses decision trees. In the information
I.INTRODUCTION era, it is very cost-effective to provide a machine artificial
Utilizing science and technology, weather forecasting aims to intelligence software and sensors so that it can act intelli-
foretell the atmosphere's condition at a certain location. gently [4]. Holmstrom et al. proposed a technique to fore-
There are various constraints to better weather forecasting cast the maximum and minimum temperature of the next
implementation, making it difficult to accurately anticipate seven days, given the data of past two days [5]. A predictive
weather in the present. In meteorology, weather forecasting model for identifying the actuating patterns of meteorologi-
plays an important role. Making an exact prediction is one of cal conditions based on data mining was explored in various
the most difficult challenges facing meteorologists all around research articles. [6-9]. The historical data pattern is used to
the world. predict the upcoming weather conditions.
To the majority of people, it is obvious that there is an ev- In this paper, forecast of the weather parameters for the next
er-increasing demand for more precise weather forecasts in 24 hours is done using Auto ARIMA model and machine
modern society. Farmers can benefit from early rainfall data learning techniques are used to reliably predict the weather.
to reduce damage. Weather changes, in combination with Machine learning classifiers like Decision tree and Random
subsequent socio-financial pressures, may affect alterations Forest Classifiers predict the weather conditions for the day.
in food production, quantity, and price in any community By inputting average temperature, humidity, pressure, and
around the world. As a result, we need an extremely accurate other variables, decision trees and the Random Forest Algo-
climate forecast device, and we want to build a gadget with a rithm will be utilized to predict events such as fog, rain, dry,
high overall performance that can predict any natural disas- windy, clear, breezy, and thunder. Sensors are integrated with
ters or casualties. This will help humanity to remain safe. the Arduino platform to read input such as temperature, hu-
Because weather warnings are used to protect lives and midity, and pressure, among other things.

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
II. MATERIALS AND METHODS or value of the target variable is the aim of using a decision
Since weather data analytics necessitates extreme precision, tree.
high-performance computing is required to handle the mas-
sive amount of data. The significant variability of climatic B. Random Forest algorithm
observations obtained in a day makes weather forecasting Random forests are a non-parametric model used for both
difficult. The proposed system forecasts weather and uses classification and regression applications, similar to decision
Machine learning techniques to reliably predict the weather. trees [10]. The complete random forest method is built on top
The system forecasts weather using ARIMA model. Machine of decision trees. Random Forest is an ensemble technique
learning classifiers predict the weather conditions for the day that can handle both regression and classification tasks by
using strong and highly significant results based on current combining many decision trees and a technique known as
data. By inputting average temperature, humidity, pressure, Bootstrap and Aggregation, or bagging. The core idea is to
and other variables, decision trees and the Random Forest use numerous decision trees to determine the final output
Algorithm will be utilized to predict events such as fog, rain, rather than depending on individual decision trees. Random
dry, windy, clear, breezy, and thunder. The proposed system Forest's foundation learning models are numerous decision
will forecast the weather parameters- temperature, pressure, trees. We randomly select rows and features from the dataset
humidity, etc.. for the next 24 hours and the weather will be to create sample datasets for each model. This section is
predicted for those variables using Decision tree and Random known as Bootstrap. The word "random" denotes that a
Forest models. random selection of data is used to construct each decision
Further, Sensors are integrated with the Arduino platform to tree. The idea behind it is to combine different learning
read input such as temperature, humidity, and pressure, models in order to improve performance.
among other things. The efficiency of the two algorithms will
also be compared and performance metrics for the algorithms C. Autoregressive Integrated Moving Average (ARIMA)
would be done. The complete workflow is depicted in Figure A statistical analysis model called an autoregressive inte-
1. grated moving average, or ARIMA, uses time series data to
either better comprehend the data set or forecast future trends
A. Decision tree [11].
The decision tree algorithm is a member of the supervised If a statistical model forecasts future values using data from
learning algorithm family and it can be used to resolve clas- the past, it is said to be autoregressive. For instance, an
sification and regression problems as well, unlike other su- ARIMA model might try to anticipate a company's profita-

pervised learning algorithms [10]. A decision tree is a tree bility based on previous periods or try to predict a stock's
structure that resembles a flowchart, where each internal future pricing based on its historical performance.
node represents a test on an attribute, each branch a test re-
sult, and each leaf node (terminal node) a class label. Deci- Regression analysis that measures the strength of one de-
sion trees categorise instances by arranging them in a tree pendent variable in relation to multiple varying variables is
from the root to a leaf node, which gives the instance's cat- known as an autoregressive integrated moving average
egorization. As seen in the above diagram, to classify an model.
instance, one tests the attribute given by the root node of the
tree before continuing down the branch of the tree that cor- D. Tools & Techniques
responds to the attribute's value. The subtree rooted at the We have used the following Analytical techniques / meth-
new node is then subjected to the same procedure once more. odology for analyzing the Dat a:
Building a training model that can be used to predict the class

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
• Summary of Statistics for each variable
• Using Graphs and Box Plots to visually represent them
Tools used:
1. R Studio: An IDE for R language.
2. Arduino UNO: It is an open-source microcontroller board
based on the Microchip ATmega328P microprocessor and
developed by Arduino. It may be integrated into a variety of
electronic projects.
3. Arduino Integrated Development Environment (IDE):
The Arduino Integrated Development Environment (IDE) is Fig. 2. Summary of Dataset
a cross-platform application written in C and C++ functions.
It's used to program Arduino compatible boards and upload
them.
4. Sensors (DTH22, BMP180): A sensor is a device that
detects and reacts to some form of physical input.

III. IMPLEMENTATION
A. Data Source
The data for the Project was obtained from Kaggle. The
dataset has twelve features and 96,464 Records. The features
include – Time, Temperature, Humidity, Wind Speed, Visi-
bility, Pressure, Summary as shown in Figure 2.

B. Data Quality
1. Formatted date was not considered for building Super- Fig. 3a. Density Plot
vised ML models as classification might not yield a good
result for unique dates.
2. Summary and precip type were also removed as they were
insignificant.
3. Loud cover was also removed because it consists of only
0.
4. Daily summary was renamed to Summary and some
classes with less instances like only 1 instance for a class are
removed. Some of the classes were generalized into a single
class and some instances were also dropped.

C. Variables Transformation
1. For building the Decision Tree and Random forest clas-
sifier Models, all the independent variables were made sure
that they are either integers or numeric. Fig. 3b. Histogram
2. The target/ dependent variable – Summary was a nominal
data. But for building the models it is necessary that the target
variable is a factor. So the Summary variable was trans-
formed into a factor.

D. Exploratory Data Analysis


In statistics, Exploratory Data Analysis (EDA) is a method of
examining and exploring the data sets to highlight their key
features, frequently utilizing statistical graphics and other
techniques for data visualization It is a data analytics pro-
cedure used to fully comprehend the data and discover the
insightful patterns in it.

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
Inferences from the analysis of weather dataset as
shown in Figure 3:
1. There are totally 8 features out of which the last feature
‘Summary’ is the target variable. The target variable has 7
classes namely Breezy, Clear, Foggy, Mostly Cloudy,
Overcast, Partly Cloudy and Rain
2. From the density plot (Figure 3a) we can observe that
the class ’Partly cloudy’ is the most widely distributed class
among the other classes.
Fig. 3c. Summary Histogram
3. From the histogram (Figure 3b) drawn for the variable
‘Temperature.’, we can observe that most of the readings
lie in the range 0-10.
4. From the histogram (Figure 3c) drawn for the target var-
iable ‘Summary’, we can observe that the frequency of oc-
currence of ‘Mostly cloudy.’ is the highest followed by
‘Partly cloudy.’ and the least being ‘Breezy’ and ‘Rain’.
5. From the scatter plot (Figure 3d) drawn to elucidate
the relationship between 2 variable, first, it can be observed
that Temperature and Wind Speed have a positive linear
relationship between them and secondly there is a negative
linear relationship between Temperature and Pressure,
6. From the correlation plot (Figure 3e), we can observe
that there is a Positive relationship between the variables
Visibility and Temperature, Apparent Temperature. Wind
speed has a positive relationship with Visibility and Tem-
perature. There is a Negative relationship between Humid-
ity and Temperature, Apparent Temperature, Visibility,
Wind speed each. It is also observed that Pressure has a
negative relationship with Temperature, Apparent Tem-
perature, Visibility, Wind speed.
Fig. 3d. Scatter Plot
7. Finally, the ggpairs plot summarizes all the relation-
ship between variables.

E. Weather Forecasting using ARIMA


Here we are only forecasting the weather parameters – the
independent variables. For each variable the forecast graph
is drawn and the forecast dataset contains the values for the
next 24 hours.

Fig. 3e. Relationship between variablest

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
G. IoT Framework
A weather monitoring system driven by an Arduino UNO
is used to gauge environmental factors including tempera-
ture, humidity, and barometric pressure utilising the fol-
lowing list of components:
1. BMP180 pressure and temperature sensor
HT11 humidity sensor
2. Arduino Uno
3. Jumper wires
4. Bread board

H. Arduino Connections
This figure 5 depicts the process of reading real time sensor
Fig 4a. ARIMA Model Results
values (Temperature, Apparent temperature, Humidity,
Pressure) from Arduino.
1. The Arduino Uno is an ATmega328P-based microcon-
troller board. It contains a PWM output on six of its
fourteen digital input/output pins, six analog inputs, a
ceramic resonator operating at 16 MHz, a USB port, a
power jack, an ICSP header, and a reset button. Arduino
Uno can automatically draw power from either the USB
or an external power supply. The required board drivers
are installed. Bmp180 barometric pressure sensor reads
the atmospheric pressure in hPa and temperature and
Fig.4b. ARIMA Model – Forecast Results
DHT11 sensor reads temperature, humidity and calcu-
lates the apparent temperature.
2. The +5v pin and gnd pin of dht11 is connected to the +5v
F. Predictive Model Development and gnd of Arduino respectively.
3. The S (Signal) pin of dht11 is connected to D2 of Ar-
Decision tree Model duino.
First the dataset is split into Training and Testing dataset.
Then the model is built on the Training dataset and is tested 4. The Vin and gnd pins of bmp180 are connected to +5v
and validated on the testing dataset.

Random Forest Model : The Random Forest Model is built


on the same train dataset and the same test dataset is used to
validate and test the model.
The Train and Test dataset are separated in the ratio of
80:20. The dimension of Train data after splitting is (7055,8),
and that of test dataset is(1719,8).The Decision tree structure
will have a maximum depth of 15 and the minimum number
of split required at each node is 2.The Random Forest Model Fig 5a. Arduino UNO and Sensor Connections
will have a maximum of 1000 decision trees and the number
ofvariables randomly sampled as candidates at each split will
be 5. The minimum node size is 2. Since Random Forest uses and gnd pins of Arduino respectively. BMP180's SDA
a technique called ‘bagging’, the same train sample which and SCL pins are linked to the A4 and A5 pins of the
was used to build one decision tree can be used to build one Arduino, respectively
more as the replace parameteris set to TRUE.

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 5b. Real time sensor readings

IV. RESULTS & DISCUSSION

A. Evaluation Metrics
The performance is evaluated using confusion matrix, Pre-
cision, Recall and Accuracy metrics. Fig. 7. Evaluation metrics for Decision Tree Model
Confusion Matrix: The confusion matrix offers additional
information about a predictive model's performance, in- C. Weather Prediction - Random Forest model
cluding which classes are correctly predicted and which are From the confusion matrix shown in Figure 8, it is observed
erroneously, as well as the kinds of errors that are being that 132 values belonging to ‘Partly cloudy’ were wrongly
made. predicted as ‘Mostly cloudy’, 53 values belonging to ‘Partly
Precision: Precision is the ratio of correctly predicted posi- cloudy’ were wrongly predicted as ‘Clear’. Majority of the
tive samples by the total number of samples that are predicted
positive.
Recall: These metric measures the proportion of accurate
positive predictions made relative to all possible positive
predictions.
F-Measure: F-Measure offers a method for combining recall
and precision into a single measure that encompasses both
characteristics.
F-Measure = (2 * Precision * Recall) / (Precision + Recall)

B. Weather Prediction - Decision tree Model


From the confusion matrix (Figure 6), some of the obser-
vations which are inferred are 178 values belonging to ‘Partly
cloudy’ were wrongly predicted as ‘Mostly cloudy’, 67 val- Fig. 8. Confusion Matrix for Random Forest Model
ues belonging to ‘Partly cloudy’ were wrongly predicted as
‘Clear’. Majority of the observations are predicted correctly observations are predicted correctly by the model.
by the model.
From the evaluation metrics of the above implemented
models (Figure 7 and Figure 9) we can conclude that both the
models are good but in terms of evaluation metrices, Random
forest model is better than Decision tree model based on all
the evaluation metrices(Accuracy, Precision, Recall,
F1-Score).

Fig. 6. Confusion Matrix for Decision Tree Model

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.
dataset into a balanced dataset by using Variational Auto
Encoders(VAE) or Generative Adversarial Networks(GANs)
or by using Synthetic Minority Oversampling Tech-
niques(SMOTE), so that the Supervised Machine Learning
Classifiers could classify and predict the data points with
high accuracy. Also the values read by the Arduino sensors
DHT11 and BMP180 could be automated so that the values
can be saved in a CSV automatically in cloud platform. This
could also serve as a way to balance the imbalanced dataset
and improve the accuracy.

REFERENCES

[1] Aditya Grover, Ashish Kapoor, and Eric Horvitz, “A deep hy-
brid model for weather forecasting”. In Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining. ACM, pp. 379–386, August 2015.
[2] Rohit Kumar Yadav and Ravi Khatri. “A Weather Forecasting
Fig. 9. Evaluation Metrics for Random Forest Model Model using the Data Mining Technique”, International Jour-
nal of Computer Applications, vol. 139, no. 14, pp. 4-12, April
D. Weather Prediction 2016
[3] Y Radhika and M Shashi. 2009. Atmospheric temperature
The input values for the prediction are read from the sensors prediction using support vector machines. International journal
from Arduino framework real-time. In order for the model to of computer theory and engineering, vol. 1, no. 1, pp. 55-58,
April 2009.
[4] Rajesh Kumar, “Decision Tree for the Weather Forecasting”,
International Journal of Computer Applications, vol. 76, no. 2,
pp. 31-34, August 2013.
[5] Mark Holmstrom, Dylan Liu, and Christopher, “Machine
Learning Applied to Weather Forecasting”, Environmental
Science, pp. 1-6, December 2016.
[6] S. G. Totad, Sushmitha Kothapalli, “A real-time weather
forecasting and analysis”, In Proceedings of the IEEE Interna-
tional Conference on Power, Control, Signals and Instrumen-
tation Engineering (ICPCSI), pp. 1567-1570, June 2017.
[7] Wayan Suparta, Edi Abdurahman, Afan Galih Salman, Yaya
Heryadi, “Weather Forecasting Using Merged Long Short-term
Fig. 10. Weather Prediction Memory Model”, Electrical Engineering and Informatics, vol.
7, no. 3, pp. 377- 385, September 2018.
predict the weather, it is important to make sure that the input [8] Padma, N. Krishnaveni, “Weather forecast prediction and
analysis using sprint algorithm”, Journal of Ambient Intelli-
values are of the same form as train data-dataframe and the gence and Humanized, vol. 12, no. 2, pp. 4901-4909, May
same data types. The model has correctly predicted the 2021.
weather as ‘Foggy’ as shown in the Figure 10 for the real [9] Sayantanu Barua, Tanni Dhoom, Munmun Biswas, “Weather
time sensor readings taken. The results are validated with the Forecast Prediction: An Integrated Approach for Analyzing and
real weather information available in weather.com. Measuring Weather Data”, International Journal of Computer
Applications, vol. 182, no. 34, pp. 20-24, December 2018.
[10] Aurélien Géron, Hands-on Machine Learning with
V. CONCLUSION & FUTURE SCOPE Scikit-Learn, Keras, and TensorFlow Concepts, Tools, and
Descriptive analysis and Exploratory Data Analysis were Techniques to Build Intelligent Systems, 2nd ed., O’Reilly
Media, 2019.
done on the weather dataset to explore the relationship be-
[11] Fattah, Jamal & Ezzine, Latifa & Aman, Zineb & Moussami,
tween variables. Auto ARIMA was used to forecast weather Haj & Lachhab, Abdeslam, Forecasting of demand using
parameters accurately for the next 24 hours. Decision trees ARIMA model. International Journal of Engineering Business
and Random Forest Algorithm were used to predict the event Management, vol. 10, pp. 1-9, October 2018.
like fog, rain, dry, windy, clear, breezy, etc by inputting
average temperature, humidity, pressure etc. Random Forest
and Decision Tree algorithms are analyzed in terms of ac-
curacy, Precision, Recall and F1-Score and confusion matrix
was also built for each of the models and it is concluded that
Random Forest algorithm is better than Decision tree algo-
rithm based on the evaluation metrices. Hence Random
Forest is highly recommended over other decision trees when
the size of the dataset or the number of parameters in the
dataset is high.
This Weather dataset is highly imbalanced as we can see
from the plots. In the future, we can make this imbalanced

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on September 30,2023 at 03:29:44 UTC from IEEE Xplore. Restrictions apply.

You might also like