3 RD Literature Paper

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 5


Forecasting And Prediction Of Air Pollution

Levels To Protect Human Beings From Health
S. Suganya, Professor Dr. T. Meyyappan

Abstract: Prevention and control of air pollution has become an essential activity in many cities. Air is polluted at unacceptable
levels by industries and heavy vehicular traffic in cities which affects human health conditions to a great extent. Forecasting,
Predicting and controlling air pollution is the need of the hour to protect human beings from health hazards. Air pollution poses
threats not only to humans but also to entire flora and fauna. The prime objective of this paper is to propose a new method to
predict air pollution using data collected on monthly basis and provide recommendations to prevent and control air pollution.
This research work comprises of two phases. The first phase preprocesses the chosen dataset using python coding. The
second phase analyzes the preprocessed data to predict air pollution levels. Kaggle dataset containing monthly air pollution data
collected over the period 2000 to 2010 is subjected to the proposed method. Predictions for a future month are made by
computing Air Quality Index(AQI) metric and computed threshold value for the previous two months. The proposed method
shows acceptable accuracy in performance.

Index Terms: Air Pollution, Air Quality Index, Analysis, Pollution Forecasting, Prediction, Prevention, Control.
——————————  ——————————

1. INTRODUCTION Air Quality Index (AQI) is a specific number. AQI number is

Today air pollution occurs when harmful or unnecessary used to characterize the quality of the air at a particular time in
quantity of substance including gases are present in air. They the given location. The proposed research work predicts the
may cause severe health problems, diseases, allergy and air pollution for a future date using the AQI of past months.
even death to humans. Air pollution plays a significant role in Paper.
weakening health conditions of skin, eye and human organs,
which reduces human life time. Flora and Fauna are also
affected by polluted air. Healthy life of future generation is 2 RELATED WORKS
under threat by ever increasing air pollution at global level due Lot of work has been carried out as found in the literature in
industrialisation and increase in the use of petroleum products. the learning, analysis and prediction of air pollution as well as
These problems motivated to do research on air pollution data forecasting the future trends. Following are the three active
for accurate and early prediction. Storage, processing and researchers in this field: Polaiah Bojja , Y i-Ting Tsai , Ranjana
analysis of the pollution data set using traditional techniques is Waman Gore, and Ling WangPolaiah Bojja[6] has studied the
complex due to its huge volume. Hence, there is a need to Artificial Neural Networks(ANN), Fuzzy Logic Controller,
move to Big Data Analytic techniques. In Big Data techniques, Pollution Forecasting, Ecosystem, Effect of PM 10 and SO2.
HDFS and Hbase can store high volume of data. Hive and Pig Accuracy of measurement is ensured by evaluating the
can process both semi structured and unstructured data. Map minimum forecasting error using MA TLAB software. The level
Reduce can be used to analyze the dataset, The proposed of air pollution due to increase in number of vehicles in India
work analyzes the air pollution data set to predict air pollution and Andhra Pradesh is determined using Artificial Neural
for all the months of any future year. The results obtained can Networks (ANN), Fuzzy Logic Controller, Pollution
be used by policy makers to control and prevent air pollution in Forecasting, Ecosystem, Effect of PMIO and S02' with
future. In this proposed work the most dangerous air pollutant MATLAB coding. Soft computing approaches Feed-forward
particles considered are arbon-dioxide (CO), Nitrogen Oxides Back Propagation network ( BPN ) model and Mamdani Fuzzy
(NO2) and Sulpher Dioxide and Ozone. The main sources of Inference model are trained and tested using five years past
these particles are vehicles, traffic, smokes, burning plastics, data(meteorological data). Yi-Ting Tsai[7] proposed an
burning electronic wastages, and industries. When human approach to forecast PM2.5 concentration using RNN
beings inhale the polluted air, these particles get into their (Recurrent Neural Network) with LSTM(Long Short-Term
blood and cause dangerous diseases. Children and aged Memory). The training data used in the network is retrieved
people are affected at a faster rate by the polluted air. from the EPA (Environmental Protection Administration) of
Taiwan from year 2012 to 2016 and is combined into 20-
———————————————— dimensions data; and the forecasting test data is the year
 S. SUGANYA is currently pursuing Ph.D. in Department of Computer
2017. Experiments are conducted to evaluate the forecasting
Science in Alagappa University, India, PH-9786383278. E-mail: value of PM2.5 concentration for next four hours at 66 stations
suganyasudhakar04@gmail.com around Taiwan. The proposed approach forecasts PM2.5
 DR. T. MEYYAPPAN is currently Working as a Professor in concentration using RNN(Recurrent Neural Network) with
Department of Computer Science in Alagappa University, India, E-
LSTM (Long Short- Term Memory). Keras, which is a high-
mail: meyyappant@alagappauniversity.ac.in
level neural networks API written in Python is exploited in their


research work. Ranjana Waman Gore[8], analyzed how the air

pllution affects people. The classification based on Air Quality
Index(AQI) are good, moderate,(unhealthy for sensitive
groups) unhealthy, unhealthy, very_unhealthy,This paper
focused on analysis of air based on the available data of
various air pollutants such as NO2, SO2, CO and O3 with
corresponding AQI values. Naïve Bayes and Decision tree J48
algorithm are adopted for predicting the health concern. Ling
Wang[9] proposed a model named ―Prediction of Air Pollution
Based on FCM-HMM Multi-model‖. It analyzes the
relationships between the air pollution index (API) and
meteorological factors using correlation analysis and principal
component regression. A multi model frame is constructed with
FCM-HMM clustering and TS fuzzy inference. Firstly, fuzzy c-
means clustering (FCM) algorithm is adopted to obtain the
initial clusters of the observation sequences used as a tool for
the prediction of air pollution index. Compared to nonlinear
regression, gray model and ANN, HMM offers a powerful
framework for temporal modeling of features extracted from
time series data. The proposed strategy derives predictive Fig. 2. The Steps in the Proposed Air Pollution
model to predict air pollution index values in urban areas. Multi Prediction Model
Model Method Based on FCM-HMM is implemented. Many
researchers have adopted Clustering, Fuzzy C-means Steps in Analysis:
Clustering Algorithm, Hidden Markov Models, Observation 1. In preprocessing step, the parameters NO2, CO,
Sequence Generation based on PCA methods. Many SO2,
researchers have analyzed and monitored the air quality and 2.OThe
3 arefollowing
separated onformula
new monthlyisbasis.
air pollution in Delhi, Agra and USA.. In our work, we have AQI = 0.3 * NO2 + 0.3 * CO + 0.2 * SO2 + 0.2 * O3
introduced a new formula to compute Air Quality Index. The To find the value of Air Quality Index for every month in an
new AQI formula uses 30%, 30%, 20% and 20% weightages year.
for NO2, CO, SO2, O3 respectively. The air pollution for a given 3. Average of AQI value in 24 hours of a day is
period has a relationship with past month’s air pollution levels.
Hence, AQI for the future month is computed based on the
for each day. of AQI values of all the days in chosen
4. Average
AQI values of past two months.
Threshold T = n AQI
is computed.  n is the
where i no. of days in
the month i1
3 METHODOLOGY 5. Compute the difference between the threshold and
3.1 Data Set Used actual AQI values of all the days in the chosen month
The proposed research work uses air pollution data set DIFFi = T - AQIi (i = 1,2 … n) where n is the no. of days
downloaded from kaggle website. This data set contains data in the month
with four attributes that were collected during the years 2000 If DIFFi is < 0 then
to 2010. Air pollutants parameters are Air Quality Index (AQI) Air pollution is present
values of NO2, SO2, CO, and O3. Data set is preprocessed and Else
analysed using Big Data analytic techniques to predict air Air pollution is not present
pollution on monthly basis. Python coding is used for 6. Stop
implementation of techniques adopted. Sample data from the Classification Accuracy Rate and Error Rate for the month are
data set is shown in Fig 1. computed using confusion matrix as follows:
Classification Accuracy Rate = (TP + FN) / (TP + TN + FP +
3.2 Process Flow in the Proposed Work:
Classification Error Rate = (TN + FP) / (TP + TN + FP + FN)


Fig. 1. Fig.1. Example Dataset

In this proposed work, data sets for the years 2000 to 2010 are
Aggregation of Air Quality Index: collected from the Kaggle website and preprocessed using big
data analytics and python coding. After preprocessing, AQI
values of NO2, CO, SO2, O3 are computed on monthly basis.
Then New AQI is calculated using the formula (1) for every
month in the years from 2000 to 2010. This

New AQI = 0.3/NO2 + 0.3/CO + 0.2/SO2 + 0.2/O3 …………

Analysis of New AQI
After calculating the New AQI for every month,air pollution is
predicted for every month using the AQI values of previous
two months. For example, to make prediction for the month of
April, average of the AQI value is calculated for the previous
two months March and February. The average of this two
month’s AQI value is also calculated.
Fig. 3.. AQI Aggregation Index

Threshold Calculation
The proposed work is carried out in two Phases. Phase I Threshold value is computed as an average value of all the
computes AQI (Air Quality Index) value which is used in Phase AQI values of a the chosen month.
II to make prediction. Phase II makes prediction based on the
AQI values and Threshold value computed. Prediction
Prediction for the chosen month is made by comparing the
Phase I
actual AQI value with the Threshold value. The Threshold
Step 1: Pre-processing value is compared with all the average values of the previous
Step 2: AQI Value Computation two month’s AQI. Average value lesser than the Threshold
value indicates absence of air pollution. Average value greater
Phase II than or equal to the Threshold value indicates the presence of
Step 1: Analysis of New AQI air pollution. The process is repeated for all the months in an


4 RESULT AND DISCUSSION Confusion Matrix has the information on actual class and
predicted class. Performance of this proposed work is
Table 1 shows the difference between calculated and actual
evaluated using the data in the matrix.
AQI values for a given month. The threshold value is 4.38
which is computed based on the Actual AQI values. The Table.2. Actual Class and Predicted Class - Illustration
threshold value is compared with each average values of the Actual
days in previous two months. Either the presence or absence
of air pollution is determined based on the difference value
compared with threshold value. TRUE FALSE

Table.1 Prediction of Air Pollution for a month based on AQI TRUE 14 8

THRESHOLD T = 4.380556 (Average for the month of April) FALSE 4 4

It is evident from the data shown in table 1, for the given
month (for example April), the Difference value is found to be Accuracy Rate:
lesser than the Threshold value for 10 days. Hence, it is Accuracy Rate is the proportion of the total number of
concluded that the air pollution is not present on those days predictions that are correct. It is determined by the following
and it is present for remaining 20 days equation.
Classification Accuracy Rate = (TP + FN) / (TP + TN + FP +
Confusion Matrix FN)

Day AQI Actual February March Average Difference Prediction
Value(April) (A) (T - A) If (T-A) >=0 or (T-A)<0

1 4.1 5.383333 4.833333 5.108333 -0.72778 pollution present

2 4.816667 5.95 5.266667 5.608333 -1.22778 pollution present
3 5.083333 6.3 5.266667 5.783333 -1.40278 pollution present
4 5.366667 8.066667 3.95 6.008333 -1.62778 pollution present
5 4.633333 7.216667 2.683333 4.95 -0.56944 pollution present
6 4.9 6.166667 2.833333 4.5 -0.11944 pollution present
7 5.0 6.483333 2.716667 4.6 -0.21944 pollution present
8 5.783333 7.566667 3.633333 5.6 -1.21944 pollution present
9 5.216667 4.35 3.9 4.125 0.255556 pollution not present
10 5.05 3.883333 4.283333 4.083333 0.297222 pollution not present
11 4.0 4.083333 5.7 4.891667 -0.51111 pollution present
12 4.383333 4.233333 5.716667 4.975 -0.59444 pollution present
13 3.85 3.283333 4.7 3.991667 0.388889 pollution not present
14 3.583333 4.516667 4.766667 4.641667 -0.26111 pollution present
15 3.416667 6.266667 4.7 5.483333 -1.10278 pollution present
16 6.333333 5.633333 4.233333 4.933333 -0.55278 pollution present
17 4.65 4.733333 4.266667 4.5 -0.11944 pollution present
18 2.716667 4.266667 3.883333 4.075 0.305556 pollution not present
19 5.783333 6.666667 4.65 5.658333 -1.27778 pollution present
20 5.95 6.733333 3.85 5.291667 -0.91111 pollution present
21 6.95 2.55 3.083333 2.816667 1.563889 pollution not present
22 6.316667 3.483333 3.683333 3.583333 0.797222 pollution not present
23 5.45 3.383333 4.266667 3.825 0.555556 pollution not present
24 4.3 3.733333 4.766667 4.25 0.130556 pollution not present
25 3.366667 4.083333 6.183333 5.133333 -0.75278 pollution present
26 5.183333 5.466667 5.8 5.633333 -1.25278 pollution present
27 5.233333 6.65 3.9 5.275 -0.89444 pollution present
28 0.0 3.866667 2.833333 3.35 1.030556 pollution not present
29 0.0 4.966667 4.633333 4.8 -0.41944 pollution present
30 0.0 5.383333 4.133333 2.066667 2.313889 pollution not present
THRESHOLD = 4.380556


Accuracy Rate for the April Month = (14+4) / 30 = 18/30 = 0.6

Error Rate:
Error Rate is the proportion of the total number of predictions
that are incorrect. It is determined by the following equation.
Classification Error Rate = (TN + FP) / (TP + TN + FP + FN)
Error Rate for the year 2010 = (8+4) / 30 = 12/30 = 0.4

Air pollution is dangerous for nature as well as for human
beings. Prediction and remedial actions is the need of the
hour. In this research work, the data set chosen from Kaggle
website is preprocessed first to separate pollutant parameters
NO2, CO, SO2, O3. The prediction of air pollution is performed
in two phases. The first phase computes AQI (Air Quality
Index) values for all the days in a month. The second phase
computes threshold value of AQI as an average of previous
months average AQI values. Air pollution for the days in
chosen month is predicted by comparing the threshold value
with the average of the previous two month values. Big data
analytics are used to handle huge data volumes and Python
coding is used to implement computational procedures.
Prediction accuracy and error rate are computed. The results
are found to be encouraging. Further research work is in
progress to include other environmental parameters.

This article has been written with the financial Support of
RUSA-Phase2.0 grant sanctioned vide Letter NO.F,24-
51/2014-U,Policy (TN Multi-Gen),Dept of Edn. Govt of India,
Dt. 09.10.2018

[1] https://en.wikipedia.org/wiki/Air_pollution.
[2] Shweta Taneja,Dr.Nidhi Sharma ―Predicting Trends in air
pollution in Delhi using data mining‖,2016 IEEE.
[3] Peijiang Zhao, Koji Zettsu ―Convolution Recurrent Neural
Networks Based Dynamic Transboundary Air Pollution
Predictiona‖, 2019 the 4th IEEE International Conference
on Big Data Analytics.
[4] HOW can affect the human being atmospheric And
environment pollution.
[5] https://en.wikipedia.org/wiki/Air_pollution.
[6] Polaiah Bojja, Vivith Kumar Karumuri ―Development and
Evaluation of Pollution Forecasting Model Using Soft-
Computing Methods for PMIO and S02 in Ambient Air‖
IEEE WiSPNET 2016 conference.
[7] Yi-Ting Tsai, Dept. of Computer Science and Information
ngineering National Taipei University.‖ Air pollution
forecasting using RNN with LSTM‖, 2018 IEEE 16th Int.
[8] Ranjana Waman Gore, ―An Approach for Classification of
Health Risks Based on Air Quality Levels‖ 978-1-5090-
4264-7/17/$31.00 ©2017 IEEE.
[9] Ling Wang ―Prediction of Air Pollution Based on FCM-HMM
Multi-model‖Proceedings of the35th Chinese Control
Conference July 27-29, 2016, Chengdu, China


You might also like