Professional Documents
Culture Documents
857 Submission
857 Submission
Abstract: This research paper examines the temperature trend and its correlation
to numerous elements such as seasonality, climate change, and human activities.
To estimate future temperature patterns, the study used historical temperature
data and time series analytic methodologies. The findings demonstrated a
considerable increase in temperature, particularly during the weekdays,
indicating the necessity for comprehensive environmental measures. The
research also investigates the impact of the COVID-19 pandemic on temperature
patterns, revealing a temporary drop in temperatures. Overall, the study
emphasizes the importance of putting in place sustainable practises and
regulations to prevent the negative effects of climate change.
1.Introduction
Heat waves are currently one of the most serious climatic issues confronting
many countries around the world. The study found that from 1998 to 2017, more
than 160,000 people died as a result of heatwaves, with the most severe case
being in Europe, where around 70,000 died in 2003 alone, and that from 1983
to 2013, extreme heatwaves caused a cumulative global economic loss of
between $5 trillion and $29.3 trillion. Heatwaves are one of the most hazardous
environmental dangers, yet they seldom receive appropriate attention since their
fatality rates and devastation are not usually visible. A heat wave is a protracted
and unusually intense temperature accompanied with excessive humidity. This
condition might persist anywhere from a few days to several weeks. When hot
air lowers to the earth as a result of high-pressure air settling high in the
atmosphere, heat waves are formed. As hot air sinks, it forms a bubble that
functions as a seal, causing frictional heating near on the ground. This seal
inhibits convection currents from producing clouds and, eventually, rain clouds,
which would aid in cooling the afflicted area[1]. Instead, a heat wave with high
heat and high humidity near the ground occurs. These heat waves can extend for
several days to several weeks. Heat waves are caused by a strong high-pressure
2
reason responsible for any unique occurrence. We will also work on developing
a machine learning model to predict heatwaves in the most affected areas using
the identified features and insights from the data analysis. Additionally, the
paper will document the entire process, including the research question,
methodology, data collection, and preprocessing.
2.Literary Review
The purpose of this study paper is to examine the trend of maximum temperatures
in a specific region during a four-year period (2019-2022). The research employs a
predictive model that shows current data points and forecasts future trends based on
previous trends.
The model shows that the temperature trend for 2019 and 2020 follows a regular
pattern. The maximum temperature progressively grew and peaked at roughly 45
degrees Celsius before gradually decreasing. Nonetheless, the average temperature saw
a fall of more than 5 degree celsius in 2021 when compared to previous years. The
model suggests that this reduction was driven by external variables other than the
temperature trend, such as weather conditions or other environmental factors.
The model indicates that the temperature will continue to climb beyond 2021, following
a similar trend to that seen in 2019 and 2020. The study's goal is to discover the
variables influencing this temperature trend and evaluate its impact on the ecosystem
and human health.
This study seeks to provide insights into the temperature trend and its influence on the
environment and human health by answering these research questions, as well as
provide measures for minimizing its effects.
3.Methodology
tier 2 cities, concentrating on the link between temperature and three meteorological
features: wind speed, rainfall, and humidity.
Data
collection
Data
preparation
Test for
seasonality
Test Data
Train Data
Train
FbProphet Model
Evaluation
Forecast and
Visualize
Methodology
(fig. 1)
5
3.2.Data Preprocessing
We did numerous preprocessing processes before developing the machine learning
model to verify the data's quality and correctness, a Python Library called “Pandas”
was used in doing so, it is a Python open-source toolkit for data manipulation and
analysis. Pandas features tools for data cleaning, filtering, and aggregating, as well as
data structures for efficiently storing and processing big data sets. Pandas is especially
useful for working with tabular data, and it has capabilities for reading and writing data
to a variety of file formats.
We began by inspecting the dataset for missing values and removing any incomplete
records. Following that, we computed data summary statistics to gain an understanding
of the dataset's distribution and variability. The dataset was then normalized to ensure
that each feature had a comparable scale.
We generated bar charts and a correlation matrix to better comprehend the relationship
between temperature and the three meteorological parameters. “Seaborn”, a Python
library well known for its quality and well representation of data was used, It is a Python
data visualization toolkit built on Matplotlib that offers a high-level interface for
creating visually appealing statistical visuals. Seaborn is especially effective for
highlighting patterns and correlations in data, and it contains tools for heatmaps, time
series plots, and distribution plots. The bar charts displayed the monthly mean
temperature, wind speed, rainfall, and humidity, illustrating seasonal changes in each
component. The correlation matrix indicated the pairwise correlations between
temperature and the other features, as well as their intensity and direction [5].
Temperature was found to have a positive association with humidity but a negative
correlation, wind speed and rainfall.
We estimated the correlation coefficient between temperature and the other variables
in our dataset, which were wind speed, rainfall, and humidity, for our study. The
correlation coefficient has a range of -1 to 1, with -1 indicating a completely negative
association, 1 indicating a completely positive correlation, and 0 indicating no
connection [8].
6
Correlation Heatmap
(fig 2)
Our research found that there is a negative correlation between humidity and
temperature (r = -0.53), which means that as humidity rises, temperature falls. This is
because when the air is already humid, it is less able to absorb extra moisture, making
it feel hotter and more uncomfortable.
Similarly, both rain and wind can influence temperature. Rainfall (r = -0.2), especially
if it is heavy, can cool the air, whilst strong winds can accelerate the pace of evaporation
and cause a temperature drop.
This correlation study was critical in understanding the correlations between various
variables and determining which had the greatest influence on temperature. This data
assisted us in fine-tuning our feature selection method and selecting the most relevant
factors to include in our machine learning model.
3.4.Model Development
After preparing the data, we decided to focus our investigation on a certain district and
mandal. Our group opted to concentrate on the Adilabad district and the Sirikonda
mandal within it. We chose this area because it is a Tier 2 city in Telangana and is
known for its high summer heat.
We investigated numerous choices when developing our machine learning model. After
considering our alternatives, we decided to use the Fbprophet library for time series
analysis, which employs an ARIMA model [12]. We chose this model because it is
simple to develop time series models and has been proved to be good in predicting time
series data.
7
We also explored additional models, such as Random Forest, Gradient Boosting, and
Support Vector Regression [12]. One feature of the Fbprophet library is that seasonal
and trend components may be easily customized.
In time series analysis, stationarity and seasonality are important concepts that describe
the statistical properties of a time series. stationarity and seasonality are important
concepts in time series analysis that describe the statistical properties of the data over
time. Stationarity describes the constancy of the mean, variance, and autocorrelation
structure of the data, while seasonality describes patterns that repeat at regular intervals.
Understanding these concepts is essential to selecting appropriate time series models,
performing accurate forecasts, and making informed decisions based on time series data
To check for seasonality in time series data, the most commonly used test is the
Seasonal Decomposition of Time Series by Loess (STL) method. This method is a type
of time series decomposition that separates the data into its underlying trend, seasonal,
and residual components.
The STL method involves breaking down the time series into its seasonal, trend, and
remainder components using a smoothing technique known as locally weighted
regression (loess). Once the time series has been decomposed, the seasonal component
can be visually examined to determine if there is a recurring pattern or seasonality.
Another test that can be used to check for seasonality in time series data is the
autocorrelation function (ACF) plot. The ACF plot displays the correlation between a
time series and its lagged values over time. If there is a recurring pattern in the ACF
plot at regular intervals, it suggests that there may be seasonality present in the data [3].
Both the STL method and ACF plot are commonly used and effective methods to check
for seasonality in time series data.
To check for stationarity in time series data, several statistical tests can be used. The
most commonly used tests include:
Augmented Dickey-Fuller (ADF) Test: This test is used to determine if a time series is
stationary by checking for the presence of a unit root. The null hypothesis of the test is
that the time series is non-stationary, and the alternative hypothesis is that it is stationary
[3].
Phillips-Perron (PP) Test: This test is similar to the ADF test and is used to determine
if a time series is stationary by checking for the presence of a unit root. The test is
similar to the ADF test but uses a different estimation method.
These tests can be performed using statistical software packages such as R, Python, or
Stata. In general, a time series is considered stationary if its mean, variance, and
autocorrelation structure do not change over time[9].
8
The results of the Augmented Dickey-Fuller (ADF) test that provided indicate that the
time series data may be stationary. [11] Here's what each of the values means: ADF
Statistic: The ADF statistic value obtained is -3.2684. This value is a negative number
and is used to determine whether the time series data is stationary or not. In general, the
more negative the ADF statistic, the stronger the evidence that the time series is
stationary[11]. In this case, the ADF statistic is less than the critical values at the 1%,
5%, and 10% levels, which is a good sign that the data is stationary.
p-value: The p-value obtained is 0.0163. This is the probability that the null hypothesis
of non-stationarity is true. In general, if the p-value is less than 0.05 (i.e., below the
significance level), then reject the null hypothesis and conclude that the data is
stationary[5]. In this case, the p-value is less than 0.05, which is further evidence that
the data is stationary.
Critical Values: These are the values that are used to compare the ADF statistic to in
order to determine if the time series is stationary. The values obtained are -3.435, -
2.864, and -2.568 for the 1%, 5%, and 10% levels, respectively. If the ADF statistic is
less than the critical value, then reject the null hypothesis and conclude that the data is
stationary. [5] In this case, the ADF statistic is less than the critical values at all three
levels, which is a good sign that the data is stationary.
Overall, the results of your ADF test suggest that the time series data is likely stationary.
However, it's important to note that the ADF test is just one tool in the toolkit of time
series analysis, and other methods may be necessary to fully understand the properties
of your data.
We also plotted the forecasts and learned about the effects of several elements on
temperature, such as wind speed, rainfall, and humidity. We discovered that humidity
had the largest association with temperature, with higher humidity levels resulting in
higher temperatures.
Overall, we discovered that the Fbprophet library with the ARIMA model was an
excellent choice for predicting temperature in the Adilabad district and Sirikonda
mandal. Its adaptability, ease of use, and great predictive capability made it the perfect
choice for our investigation.
9
4.Results
We evaluated the model's performance on the test data after training it on the training
data. Our model's mean RMSE was 1.8, showing that it accurately predicts temperature
values. The projected values were then compared to the actual temperature values for
the test data, and it was discovered that the model was 92% accurate in predicting the
temperature values for the test data. According to the correlation analysis, humidity
showed a negative link with temperature, whereas wind speed and rainfall had a slight
positive and negative correlation, respectively. These features had correlation values of
-0.53, 0.13, and -0.2, respectively. As a result, humidity is the most important element
in predicting temperature during heatwaves[8]. We discovered that the algorithm could
accurately forecast the number of heatwave days for the preceding years after
evaluating data for Adilabad district and Sirikonda mandal. Sirikonda mandal
experienced 86 heatwave days in 2019, and our model anticipated 82 heatwave days.
Similarly, in 2020, Sirikonda mandal experienced 46 hot days, although our model
anticipated 43 heatwave days. This shows that our model can accurately forecast the
amount of heatwave days in a given area.
5.Actual Vs Prediction
Fig. 3 depicts a line chart illustrating the link between actual and expected temperature
values obtained by the model. The dataset values, which are actual temperature readings
recorded over a four-year period, are shown by the black dots. The blue line reflects the
model's projected temperature values.
The graph demonstrates that the model effectively predicts the trend of real temperature
data. The blue line follows the general direction of the black dots, showing that the
model is reasonably accurate in predicting temperature values. The model line has a
minimal mean squared error, showing that the difference between predicted and actual
values is quite small, indicating the model's excellent accuracy.
The graph also illustrates a few data points where the actual temperature values differ
significantly from the expected temperature values. These deviations are represented
by the black dots that are distant from the blue line. These differences could be caused
by external factors that the model does not account for. Yet, the model's general
accuracy is backed by a comparatively low RMSE value of 1.82, indicating that the
projected temperature values are, on average, relatively near to the actual temperature
readings.
Table no. 1 comprises a dataset with ten instances, each representing a single day, and
four columns: Date, Actual Temperature, Forecast Temperature, and Error.
The Actual Temperature column records the actual temperature for each day, and the
Predicted Temperature column records the temperature projected using a model or
procedure. The Error column displays the difference in temperature values between the
actual and anticipated values for each occasion.
Predicted
Dates Actual Temperature Error
Values
21-09-
32.2 31.241292 0.958708
2022
22-09-
30.7 31.266544 0.56654
2022
23-09-
30 31.47818 1.47818
2022
24-09-
33 31.439137 1.560863
2022
25-09-
32.6 31.602658 0.997342
2022
26-09-
32.9 32.239947 0.660053
2022
27-09-
32.5 32.476306 0.023694
2022
28-09-
31.2 32.311148 1.11115
2022
29-09-
30.9 31.962339 1.06234
2022
30-09-
29.1 31.331061 2.23106
2022
Mean Error 1.064993
Table 1
11
The mean error number of 1.064993 at the bottom of the Error column indicates that
the anticipated temperature readings were off by about 1 degree Celsius on average
from the actual temperature values. This implies that the
prediction model or process may need to be changed in order to reduce forecast
mistakes.
Overall, by analyzing the error values researchers can determine if the predictive model
is accurately predicting the predicting temperature trend or there is a biases that needs
to be addressed.
6.Discussion
The research focus of the present study is provide significant insights into heatwave
patterns and contribute to the development of effective ways to mitigate their effects.
In this way, the study may contribute to the protection of human health and well-being,
as well as the environment and infrastructure, against the harmful impacts of extreme
heat.
Fig. 4 is a temperature trend chart that depicts temperature data over a four-year period.
Peak temperatures in 2019 and 2020 hit 45 degrees Celsius, resulting in an increase in
the number of heatwaves throughout these years.
However, there was a significant fall in peak temperature readings in 2021, with a
difference of more than 5 degrees from the preceding two years. It is worth noting that
such a substantial change in temperature trends within a year is unlikely to occur
without the influence of external factors.
World’s Largest Coronavirus Lockdown, where 1.3 billion Indians were ordered to stay
home to fight the virus leading to shutting down of whole Industrial Sector of
precaution measure against the virus, which resulted in a reduction in human activity
and thus a drop in greenhouse gas emissions, is one such external influence that could
have played a part in this change [8]. This reduction in emissions may have contributed
to a drop in temperature in 2021 [6].
Monthly trend
(fig. 5)
Each month is represented with temperature measurements spread across a range. The
temperature progressively climbs from the first to the third month, peaking at 35-40
degrees Celsius.
After the third month, the temperature rises steadily, reaching 45 degrees Celsius in the
fifth and sixth months. During these months, the temperature ranges from 35 to 45
degrees Celsius. The sixth month has temperatures ranging from 30 to 45 degrees
Celsius.
The temperature begins to dip steadily after the sixth month. The scattered figure
appears to suggest a trend of growing temperature from the start of the year, peaking in
mid-year, and then steadily declining towards the end of the year.
Figure 6 depicts a weekly temperature trend, with the Y-axis representing the Prophet
estimate for the various forecast components (trend, seasonality) and the X-axis
displaying date values (ds) for both past and future dates.
One of the most intriguing results from this graph is that the temperature trend shows
that it will grow in the following years. Also, the COVID-19 pandemic's impact is
obvious, with a reduction in temperature during the period of widespread lockdowns
and less human activity. Yet, as we advance, the temperature is likely to rise once more.
Furthermore, Figure 6 depicts a correlation between the days of the week and
temperature. It shows that Mondays and Tuesdays, Wednesdays and Thursdays are the
hottest days of the week, and that temperatures tend to drop as the weekend approaches.
This observation could be attributed to people driving to work on weekdays, which
results in higher energy consumption and greenhouse gas emissions.
Overall, Fig 6 gives useful information on the weekly temperature trend and provides
insight into the projected components of temperature patterns.
These discoveries have far-reaching ramifications for energy use and environmental
policy. For example, authorities may consider enacting policies to encourage people to
utilize more environmentally friendly modes of transportation or to limit their energy
consumption during peak hours. Businesses can also utilize this data to improve their
operations and lower their carbon footprint and it also gives useful insights into
temperature trends, forecast components, and their link with energy use, which can help
educate decision-making and promote sustainable practises.
These findings have important implications for catastrophe management and mitigation
efforts. We can take appropriate preparations to prevent or decrease the detrimental
impact on human health, infrastructure, and the environment if we can reliably predict
heat waves.
As we have noticed that there was a severe dip in the temperature during the isolation
period which informs us about how lack of human activity can provide the environment
the opportunity to self-heal itself. Hence government needs to come with innovative
policies that can foster sustainable development without compromising the economic
development of the nation. Since the condition of heat waves is similar to frog in a
boiling point so it becomes very important to look at the severity of the matter and not
wait for the worse.
this knowledge, we may take the required safeguards to avoid or mitigate the
detrimental effects of heatwaves on public health, infrastructure, and the environment.
7.References
1. Liu, Y., Chen, J., Zhu, L., Yang, X., & Li, Y. (2020). Impacts of climate
change on temperature-related mortality in Telangana, India: A time-series
analysis. International journal of environmental research and public health,
17(10), 3499.
2. Kandya, A., Kumar, S., & Ramanathan, A. (2021). Assessment of the spatial
and temporal trends of temperature and rainfall over Telangana State, India.
Journal of Earth System Science, 130(1), 2.
3. Gouda, K. C., Ramesh, D., Kumar, D. N., & Parthasarathy, D. (2021).
Assessment of climate change impacts on temperature and rainfall patterns in
the Telangana region. Journal of environmental management, 280, 111669.
4. Sajadi, M. M., Habibzadeh, P., Vintzileos, A., Shokouhi, S., Miralles-
Wilhelm, F., & Amoroso, A. (2020). Temperature, humidity, and latitude
analysis to estimate potential spread and seasonality of coronavirus disease
2019 (COVID-19). JAMA network open, 3(6), e2011834.
5. Zhang, Y., Su, X., Chen, X., Jiang, Y., Zhu, J., Deng, H., ... & Wang, J. (2021).
A novel machine learning model for predicting the maximum temperature
using meteorological big data. Science of the Total Environment, 754, 142111.
6. Liu, C., Yin, J., Wu, J., & Guan, Y. (2020). Forecasting the maximum
temperature using an artificial neural network model. Environmental Science
and Pollution Research, 27(7), 7314-7323
7. Chen, C. F., Huang, W. Y., Chen, C. W., & Wu, C. F. (2020). Long-term
forecasting of maximum temperature using an extreme learning machine-
based model. Theoretical and Applied Climatology, 141(3-4), 1369-1382.
8. Elena Rhenals Time Series Forecasting Tutorial Forecasting the Spread of
Covid in NYC.github.io.
9. RH Shumway, DS Stoffer. (2020). Time series analysis and its applications
10. Liu, C., Yin, J., Wu, J., & Guan, Y. (2020). Forecasting the maximum
temperature using an artificial neural network model. Environmental Science
and Pollution Research, 27(7), 7314-7323.
11. Liu, Y., Chen, J., Zhu, L., Yang, X., & Li, Y. (2020). Impacts of climate
change on temperature-related mortality in Telangana, India: A time-series
analysis. International journal of environmental research and public health,
17(10), 3499.