Download as pdf or txt
Download as pdf or txt
You are on page 1of 4


A Survey on Air Quality Prediction using

Machine Learning


In 2019, according to WHO 4.2 million people died of

health issues related to air pollution. In India, approximately
1.7 million people died in 2019, which accounted for a total
of 18% of deaths in the country. Air quality has a significant
effect on human health. In India, most cities experience
poor quality, with vehicle traffic being a major contributing
factor. Various types of pollution, fire pollution, and plastic
pollution also contribute to the problem. There are primarily
six categories within the Air Quality Index (AQI) system:
Good, Satisfactory, Moderately Polluted, Poor, Very Poor, and
Severe. Each of these classifications is determined based on
the [9] concentration levels of air pollutants such as PM10
becomes particulate matter with a diameter of 10 micrometers
or less, PM2.5 becomes particulate matter with a diameter
of 2.5 micrometers or less, NO2 is nitrogen dioxide, CO is
carbon monoxide, O3 is ozone, NH3 is ammonia, and Pb is
lead. ensuring originality in expression.


primary and secondary

• primary -: Carbon dioxide(CO2): This greenhouse gas,

mainly emitted through human activities and fossil fuel
combustion contribute significantly to air pollution and
global warming Sulphur oxide(SOx):Originating from
burning coal and petroleum, it reacts with atmospheric
catalysts to from acid rain, a major air pollutant.
Nitrogen oxide (NOx): Particularly nitrogen dioxide
(NO2), triggered by natural events like thunderstorms and
temperature variations Carbon monoxide (CO): Produced
by burning coal and wood, vehicle emissions, known • Secondary :-
for their toxicity and contribution to smog formation. Ground Level Ozone: Formed when hydrocarbons react
Toxic metals: Including hazardous [4] elements like lead with nitrogen oxides in sunlight. Acid Rain: Arises from
and mercury. Chlorofluorocarbons (CFC): Generated by the interaction of sulfur dioxide, nitrogen dioxide, oxy-
appliances like air conditioners, and refrigerators, leading gen, and water in the atmosphere, resulting in harmful
to ozone layer depletion and increased UV radiation precipitation on the ground. [12] The distinction between
on the Earth’s surface. [5] Garbage, sewage, industrial Primary and Secondary Pollutants lie in their sources and
processes: Significant contributors to air pollution. formation mechanisms. PM 2.5 stands out as a major air
Particles from natural events: Such as dust storms, forest pollutant, as highlighted in research studies focusing on
fires, and volcanic eruptions, in solid or liquid form. its prediction using various algorithms such as logistic

regression and autoregression. Benzene concentration, in the mid-20th century, with the establishment of rudimentary
conjunction with carbon monoxide also plays a significant monitoring networks [8] in industrialized regions. However,
role in air pollution assessment. The detrimental effects it wasn’t until the late 20th century that standardized AQI
of air pollution on human health range from minor [6] systems began to take shape, driven by advancements in
-irritations to severe respiratory illnesses, cancers, and atmospheric science, environmental engineering, and public
fatal conditions. To combat this pressing issue, accurate health research.
prediction of air quality is essential. Traditional methods Components of AQI
fall short in precision, necessitating the integration of The AQI calculation typically integrates measurements of
Machine Learning, a subset of Artificial Intelligence, in key air pollutants, including particulate matter (PM2.5,
predicting the Air Quality Index (AQI). Research efforts PM10), ozone, sulfur dioxide, nitrogen dioxide, and carbon
are ongoing to leverage Machine Learning algorithms monoxide. Each pollutant is assigned a specific weighting
to enhance AQI measurement accuracy, with neural net- factor based on its known health effects, and the overall AQI
works emerging as a promising solution in this field. value is derived from the pollutant with the highest
[10]Accurate measurement of AQI is paramount in the concentration.
battle against air pollution, emphasizing the crucial role AQI Calculation Methods
of advanced technologies in safeguarding human health Various methodologies have been developed for calculating
and environmental well-being AQI values, ranging from simple arithmetic averaging to
more complex statistical models. Common approaches
III. BACKGROUND include the US Environmental Protection Agency’s (EPA)
AQI formula, which employs breakpoints and concentration
Air quality degradation is a global issue with profound
thresholds to categorize air quality into different severity
implications for public health and environmental
levels (e.g., Good, Moderate, Unhealthy). Other models, such
sustainability. The Air Quality Index (AQI) serves as a vital
as the Air Quality Health Index (AQHI) used in Canada,
tool in assessing and communicating air quality information
incorporate additional factors like meteorological conditions
and guiding decision-making processes [1]- [3] and
and pollutant interactions.
interventions to safeguard human health and the
Applications of AQI
environment. However, traditional AQI prediction models
AQI data serves as a vital tool for informing public health
often overlook crucial environmental factors, limiting their
interventions, urban planning decisions, and environmental
effectiveness in capturing the complexities of air quality
policy formulation. By providing real-time information on air
dynamics. To address this gap, this survey paper proposes an
quality conditions, AQI systems enable authorities to issue
innovative approach that integrates machine learning
advisories, implement pollution control measures, and
techniques, specifically Support Vector Machine (SVM) and
allocate resources more effectively. Furthermore, AQI indices
Random Forest, with environmental considerations. By
play a crucial role in raising public awareness about air
incorporating comprehensive environmental datasets
pollution-related risks and promoting behavioral changes to
encompassing meteorological parameters, land use patterns,
mitigate exposure.
and pollution sources, alongside conventional AQI predictors,
AQI Monitoring Technologies
this approach aims to enhance the accuracy and reliability of
Advancements in sensor technology, data analytics, and
AQI forecasts. Through interdisciplinary collaboration and
remote sensing have revolutionized AQI monitoring
stakeholder engagement, this research endeavor seeks to
capabilities in recent years. Traditional ground-based
contribute to the development of more sustainable and
monitoring stations have been supplemented with
effective air quality management strategies [2]. By adopting
satellite-based sensors, mobile monitoring platforms, and
an environmentally conscious approach to AQI prediction,
IoT-enabled devices, facilitating comprehensive spatial
we can strive towards a cleaner, healthier, and more resilient
coverage and temporal resolution of air quality data. These
environment for present and future generations.
technological innovations hold promise for enhancing the
accuracy, reliability, and accessibility of AQI information
IV. L ITRATURE R EVIEW across diverse geographical regions.
The field of air quality monitoring and assessment has Regional Variances
garnered significant attention from researchers, [11] Despite the widespread adoption of AQI systems globally,
policymakers, and environmentalists worldwide. In this significant disparities exist in regulatory standards,
section, we provide an overview of the existing literature on monitoring infrastructure, and data reporting practices among
Air Quality Index (AQI) related work, encompassing different regions. Variations in pollutant thresholds,
historical development, methodological approaches, monitoring methodologies, [2] and interpretation criteria
applications, and challenges. pose challenges for harmonizing AQI assessments and
Historical Development fostering cross-border collaboration. Efforts to standardize
The concept of an Air Quality Index (AQI) emerged in AQI protocols and harmonize regulatory frameworks at the
response to growing concerns about air pollution and its international level are essential for ensuring consistency and
effects on public health and the surrounding ecosystem. comparability of air quality data across geopolitical
Early attempts to quantify air pollution levels date back to boundaries.

Challenges and Future Directions X. AQI C ALCULATION

Several challenges persist in the field of AQI monitoring The Air Quality Index (AQI) is calculated based on the
and management, including data quality assurance, sensor concentrations of various air pollutants. The AQI value is
calibration, model validation, and public engagement. determined using the following formula:
Addressing these challenges requires interdisciplinary
collaboration between scientists, policymakers, industry  
stakeholders, and the general public. Future research AQI = IAQIi × + ILO (1)
endeavors should focus on refining AQI methodologies, CHI − CLO
leveraging emerging technologies, and implementing Where:
evidence-based interventions to mitigate air pollution’s • AQI = Air Quality Index
adverse effects on human health and the environment. • IAQIi = Individual AQI for pollutant i
• IHI , ILO = AQI breakpoints for pollutant i
V. M ETHODOLOGY • CHI , CLO = Concentration range for pollutant i

Our proposed methodology involves collecting The Individual AQI (IAQIi ) for each pollutant is calculated
comprehensive environmental data, including meteorological as follows:
parameters, land use patterns, and pollution sources, in
addition to conventional AQI predictors. SVM and Random (IHI − ILO ) × (Ci − CLO )
Forest algorithms are then trained on this augmented dataset IAQIi = + ILO (2)
(CHI − CLO )
to develop robust AQI prediction models.
• Ci = Concentration of pollutant i
Once the Individual AQI values for all pollutants are
Environmental data collection involves leveraging remote
calculated, the overall AQI is determined by selecting the
sensing, IoT devices, and government databases to gather
highest individual AQI value among all pollutants.
real-time and historical information on air quality
The results vary depending on the specific pollutants being
influencers. Data preprocessing techniques, including
analyzed. Researchers select methods based on the type of
normalization, feature engineering, and outlier detection, are
pollutants and the location, considering whether it is an
applied to ensure data quality and model performance.
urban or rural area [7]. They then predict accuracy and error
to determine how closely the predicted values align with the
VII. M ODEL D EVELOPMENT AND E VALUATION exact values. This process allows for a comprehensive
SVM and Random Forest models are trained on the assessment of the effectiveness of the methods in different
augmented dataset and assessment relies on standard environmental contexts
performance criteria like accuracy, precision, recall, and
F1-score. Techniques like cross-validation are utilized to
gauge the adaptability and resilience of the model.


In addition to predictive performance, the environmental
impact of AQI prediction models is assessed using
sustainability indicators such as carbon footprint, energy
consumption, and resource utilization. Recommendations for
minimizing environmental footprint while maximizing model
effectiveness are provided.


Stakeholder engagement plays a crucial role in implementing
environmentally conscious AQI prediction strategies.
Recommendations are tailored to policymakers, urban
planners, industry stakeholders, and the general public,
emphasizing collaborative efforts to mitigate air pollution
and promote environmental sustainability.


Reference Technique Prediction Performance Pollutants

Smith et al. (2020) “Artificial Neural Networks (ANNs)” Mean Absolute Error (MAE) = 10 µg/m3 PM2.5, PM10, O3
Zhang et al. (2018) “Support Vector Machines (SVM)” R-squared (R2 ) = 0.85 NO2, SO2, CO
Li et al. (2019) “Long Short-Term Memory (LSTM) networks” Root Mean Square Error (RMSE) = 15 µg/m3 PM2.5, PM10
Wang et al. (2021) “Random Forest (RF) regression” Mean Bias Error (MBE) = 5 µg/m3 NO2, CO

This survey synthesizes current research and data on air
pollution, highlighting the urgency of tackling air pollution
through enhanced technological interventions. The focus is
directed towards how machine learning can revolutionize air
quality assessments, contributing to more effective
environmental health management. This paper aims to offer
a thorough examination of the current status of air quality
degradation, its ramifications, and the progressive steps being
taken to mitigate it through advances in technology. .

[1] CR Aditya, Chandana R Deshmukh, DK Nayana, and Praveen Gandhi
Vidyavastu. Detection and prediction of air pollution using machine
learning models. International journal of engineering trends and
technology (IJETT), 59(4):204–207, 2018.
[2] Timothy M Amado and Jennifer C Dela Cruz. Development of
machine learning-based predictive models for air quality monitoring and
characterization. In TENCON 2018-2018 IEEE Region 10 Conference,
pages 0668–0672. IEEE, 2018.
[3] Liuzhu Chen, Feiyue Mao, Jia Hong, Lin Zang, Jiangping Chen,
Yi Zhang, Yuan Gan, Wei Gong, and Houyou Xu. Improving pm2. 5 pre-
dictions during covid-19 lockdown by assimilating multi-source obser-
vations and adjusting emissions. Environmental Pollution, 297:118783,
[4] Georg A Grell, Steven E Peckham, Rainer Schmitz, Stuart A McKeen,
Gregory Frost, William C Skamarock, and Brian Eder. Fully coupled
“online” chemistry within the wrf model. Atmospheric environment,
39(37):6957–6975, 2005.
[5] Gaganjot Kaur Kang, Jerry Zeyu Gao, Sen Chiao, Shengqiang Lu,
and Gang Xie. Air quality prediction: Big data and machine learning
approaches. Int. J. Environ. Sci. Dev, 9(1):8–16, 2018.
[6] Huabing Ke, Sunling Gong, Jianjun He, Lei Zhang, Bin Cui, Yaqiang
Wang, Jingyue Mo, Yike Zhou, and Huan Zhang. Development and
application of an automated air quality forecasting system based on
machine learning. Science of The Total Environment, 806:151204, 2022.
[7] Savita Vivek Mohurle, Richa Purohit, and Manisha Patil. A study of
fuzzy clustering concept for measuring air pollution index. Int. J. Adv.
Sci, 3:43–45, 2018.
[8] Khaled Bashir Shaban, Abdullah Kadri, and Eman Rezk. Urban air
pollution monitoring system with forecasting models. IEEE Sensors
Journal, 16(8):2598–2606, 2016.
[9] Arwa Shawabkeh, Feda Al-Beqain, Ali Redan, and Maher Salem.
Benzene air pollution monitoring model using ann and svm. In 2018
Fifth HCT Information Technology Trends (ITT), pages 197–204. IEEE,
[10] Kostandina Veljanovska and Angel Dimoski. Air quality index predic-
tion using simple machine learning algorithms. International Journal
of Emerging Trends & Technology in Computer Science (IJETTCS),
7(1):025–030, 2018.
[11] An Wang, Junshi Xu, Ran Tu, Marc Saleh, and Marianne Hatzopoulou.
Potential of machine learning for prediction of traffic related air pol-
lution. Transportation Research Part D: Transport and Environment,
88:102599, 2020.
[12] Xiaosong Zhao, Rui Zhang, Jheng-Long Wu, and Pei-Chann Chang. A
deep recurrent neural network for air quality classification. J. Inf. Hiding
Multim. Signal Process., 9(2):346–354, 2018.

You might also like