Air Quality Prediction

AIR QUALITY PREDICTION USING MACHINE
LEARNING ALGORITHMS
ABSTRACT
 Predicting air quality is necessary step to be taken by government as it is becoming the major concern
among the health of human beings. Air quality Index measure the quality of air.
 Various air pollutants causing air pollution are Carbon dioxide, Nitrogen dioxide, carbon monoxide etc
that are released from burning of natural gas, coal and wood, industries, vehicles etc.
 Air Pollution can cause severe disease like lungs cancer, brain disease and even lead to death. Machine
learning algorithms helps in determining the air quality index.
 Various research is being done in this field but still results are still not accurate. Dataset are available from
Kaggle, air quality monitoring sites and divided into two Training and Testing.
 Machine Learning algorithms employed for this are Linear Regression, Decision Tree, Random Forest,
Artificial Neural Network,Support Vector Machine.
INTRODUCTION
We forecast the air quality of India by using machine learning to predict the air quality index of a
given area. Air quality index of India is a standard measure used to indicate the pollutant (so2, no2,
rspm, spm. etc.) levels over a period. We developed a model to predict the air quality index based on
historical data of previous years and predicting over a particular upcoming year as a Gradient decent
boosted multivariable regression problem. we improve the efficiency of the model by applying cost
Estimation for our predictive Problem. Our model will be capable for successfully predicting the air
quality index of a total county or any state or any bounded region provided with the historical data of
pollutant concentration.In our model by implementing the proposed parameter reducing
formulations, we achieved better performance using the standard regression models.

OBJECTIVE
 The air quality index of the upcoming years can be predicted using the present AQI values.
 By predicting the air quality index, we can backtrack the major pollution causing pollutant and the
location affected seriously by the pollutant across India.
 With this forecasting model, various knowledge about the data are extracted using various
techniques to obtain heavily affected regions on a particular region(cluster).
EXISTING SYSTEM:
The ozone layer depending upon the temperature, humidity, windspeed, wind direction.
Various machine learning algorithm used are MLP, XG Boost, SVR, DTR. In this data set is taken
according to area selection then the preprocessing is done the fluctuation if any is removed in the
dataset using holt winter, moving average, savitsky, Golay and Savitsky and it was observed that
Golay gives better result in preprocessing.
Then Feature selection is performed using forward feature wrapper selection as there are some
unwanted features which is not to be taken into account to predict accurately the level of pollutant in
air .It predicts the Ozone layer using MLP on day to day basis.
DISADVANTAGEOF EXISTING SYSTEM:
 Then the machine learning algorithm described above is performed and the coefficient of
determination, root means square error, Mean absolute error are compared and MLP Comes out to
be superior model
 The result is not clear.
 Accuresy is low.
PROPOSED SYSTEM:
 The dataset which are entering into the data processing phase are undergoing the process of
finding the data shape, data type, elimination of null values etc.,
 the output from the data processing is obtained as a proper dataset with correct values and no
repeated values.
 Then these are entering into the training and testing of dataset.
 Here the dataset is trained to the machine and tested by machine. Then the trained dataset enter
into the machine learning algorithms where many algorithms are compared for finding the best
accuracy result.
 The supervised Machine learning algorithms are used and the final output is displayed in GUI
interface.
ADVANTAGE OF PROPOSED SYSTEM:
 an index for reporting air quality on a daily basis
 it is a measure of how air pollution affects one's health within a short time period.
 accurate in predicting air pollution

LITERATURE REVIEW
S:NO AUTHOR TITLE YEAR ALGORITHM ADVANTAGE DISADVANTAGE
S S
1 Applying machine learning Oct. 2021 Artificial neural network It’s providing Not a web based
Kalapanidas techniques in air quality prediction above 70% application
N. Avouris accuracy
2 R. Yu, Y. a random forest approach for May 2018 Random forest model Web based Low accuracy
Yang, L. predicting air quality (RAQ) is application
Yang proposed for urban sensing
systems.
LITERATURE REVIEW
S:N AUTHOR TITLE YEAR DISCRIPTION ADVANTAGES DISADVANTAGES
O
3 S. Deleawe, J. Predicting air quality Oct. , Decision tree model to improve Pre-processing
Kusznir, B in smart 2010 the accuracy techniques not
environment of water satisfied
quality
prediction
4 M. Vong, J. support vector May Support Vector Machine Model Techniques Low accuracy
Y. Yang prediction for daily 2018 are very
atmospheric strong
pollutant level
FLOW DIAGRAM:
BLOCK DIAGRAM:
DATASET COLLECTION
 Collecting data allows you to capture a record of past events so that we can use data analysis to
find recurring patterns. From those patterns, you build predictive models using machine
learning algorithms that look for trends and predict future changes.
 Predictive models are only as good as the data from which they are built, so good data collection
practices are crucial to developing high-performing models.
 The data need to be error-free (garbage in, garbage out) and contain relevant information for the
task at hand. For example, a loan default model would not benefit from tiger population sizes but
could benefit from gas prices over time.In this module, we collect the data from kaggle dataset
archives. This dataset contains the information of divorce in previous years.
DATA CLEANING
 Data cleaning is a critically important step in any machine learning project.

 In this module data cleaning is done to prepare the data for analysis by removing or modifying
the data that may be incorrect, incomplete, duplicated or improperly formatted.
 In tabular data, there are many different statistical analysis and data visualization techniques you
can use to explore your data in order to identify data cleaning operations you may want to
perform
FEATURE EXTRACTION:
 This is done to reduce the number of attributes in the dataset hence providing advantages like speeding up the
training and accuracy improvements.
 In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of

measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the
subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature
extraction is related to dimensionality reduction
 When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same
measurement in both feet and meters, or the repetitiveness of images presented as pixels), then it can be
transformed into a reduced set of features (also named a feature vector).
 Determining a subset of the initial features is called feature selection. The selected features are expected to
contain the relevant information from the input data, so that the desired task can be performed by using this
reduced representation instead of the complete initial data.
MODEL TRAINING
 A training model is a dataset that is used to train an ML algorithm. It consists of the sample output data and the
corresponding sets of input data that have an influence on the output.
 The training model is used to run the input data through the algorithm to correlate the processed output against the
sample output. The result from this correlation is used to modify the model.
 This iterative process is called “model fitting”. The accuracy of the training dataset or the validation dataset is
critical for the precision of the model.
 Model training in machine language is the process of feeding an ML algorithm with data to help identify and learn
good values for all attributes involved.
 There are several types of machine learning models, of which the most common ones are supervised and
unsupervised learning.
 In this module we use supervised classification algorithms like linear regression to train the model on the cleaned
dataset after dimensionality reduction.
TESTING MODEL:
 In this module we test the trained machine learning model using the test dataset
 Quality assurance is required to make sure that the software system works according to the
requirements. Were all the features implemented as agreed? Does the program behave as expected?
All the parameters that you test the program against should be stated in the technical specification
document.
 Moreover, software testing has the power to point out all the defects and flaws during development.
You don’t want your clients to encounter bugs after the software is released and come to you waving
their fists. Different kinds of testing allow us to catch bugs that are visible only during runtime.
PERFORMANCE EVALUATION
 In this module, we evaluate the performance of trained machine learning model using performance evaluation
criteria such as F1 score, accuracy and classification error.
 In case the model performs poorly, we optimize the machine learning algorithms to improve the performance.
 performance Evaluation is defined as a formal and productive procedure to measure an employee’s work and
results based on their job responsibilities. It is used to gauge the amount of value added by an employee in terms of
increased business revenue, in comparison to industry standards and overall employee return on investment (ROI).
 All organizations that have learned the art of “winning from within” by focusing inward towards their employees,
rely on a systematic performance evaluation process to measure and evaluate employee performance regularly.
 Ideally, employees are graded annually on their work anniversaries based on which they are either promoted or are
given suitable distribution of salary raises
 Performance evaluation also plays a direct role in providing periodic feedback to employees, such that they are
more self-aware in terms of their performance metrics.
SOFTWARE REQUIREMENTS
 Operating system : Windows 10.
 Coding Language : Python

HARDWARE REQUIREMENTS
 System : Pentium i3 Processor.
 Hard Disk : 500 GB.
 Monitor : 15’’ LED
 Input Devices : Keyboard, Mouse
 Ram : 2 GB
REFERENCE PAPERS:
[1] Dixian Zhu, ChangjieCai, Tianbao Yang and Xun Zhou: A Machine Learning Approach for Air Quality
Prediction: Model Regularization and Optimization. Big data and cognitive computing, Big Data Cogn.Comput.
2018, 2, 5; doi:10.3390/bdcc2010005
[2] J. He, S. Gong, Y. Yu, L. Yu, L. Wu, H. Mao, C. Song, S. Zhao, H. Liu, X. Li et al., “Air pollution
characteristics and their relation to meteorological conditions during 2014–2015 in major chinese cities,”
Environmental pollution, vol. 223, pp. 484–496, 2017.
[3] SachitMahajan, Ling-Jyh Chen, Tzu-Chieh Tsai: An Empirical Study of PM2.5 Forecasting Using neural
network. IEEE Smart World Congress, At San Francisco, USA [2017]
[4]. E. Kalapanidas and N. Avouris, “Applying machine learning techniques in air quality prediction,” in Proc.
ACAI, vol. 99, September 2017.
[5] Tragos, E. Z., Angelakis, V., Fragkiadakis, A., Gundlegard, D., Nechifor, C. S., Oikonomou, G., Gavras, A.

Air Quality Prediction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Air Quality Prediction

Uploaded by

Copyright:

Available Formats

AIR QUALITY PREDICTION USING MACHINE

pollutant concentration.In our model by implementing the proposed parameter reducing

formulations, we achieved better performance using the standard regression models.

 The result is not clear.

 an index for reporting air quality on a daily basis

 accurate in predicting air pollution

 Data cleaning is a critically important step in any machine learning project.

 In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of

 Operating system : Windows 10.

 Coding Language : Python

 System : Pentium i3 Processor.

 Hard Disk : 500 GB.

 Monitor : 15’’ LED

 Input Devices : Keyboard, Mouse

You might also like