Water Potability Prediction Using Neural Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Water Potability Prediction Using Neural Network

Kalpana A V, Mugesh Raj, Godfrey Ashwanth


Department of Data Science and Business System, School of Computing, SRM Institute of Science and Technology, Chengalpet

KEYWORDS ABSTRACT

CNN Water is an essential part of our planet and all the living things on it.
LSTM On Earth, water makes up about 70% of the total mass and 30% is
GRU
land. The quality of water depends largely on its sources--natural or
man-made lakes, rivers, ponds, etc. It is important that the quality of
water be constantly monitored in order to ensure an ample supply. Any
water is considered to the good only when its portable/drinkable.
Chemicals present in stagnant water might the water hard and in-result
it will cause health problems not just for human being but also for
other living organisms. This paper discusses a water quality prediction
model that uses different deep learning models.

1 Introduction

Water potability is considered one of the most valued thing in the environment as it is the basic need for every living things in the planet.
There are multiple way to check if the water is portable or not, but in this paper we will see how Deep learning Models are used in predicting
the portability of water. However, water quality is threatened by various factors such as pollution, climate change, population growth, and
industrialization. Therefore, it is important to monitor and predict water quality to ensure its safety and suitability for different purposes. Water
quality prediction is a challenging task that involves analyzing multiple parameters and their interactions over time and space. Traditional
methods of water quality prediction are often based on physical models that require extensive data collection and calibration, which may not
be feasible or accurate in many situations. Data-driven models based on deep learning can solve these types of issues much more effectively
than traditional methods do. Solutions for monitoring environmental pollutants are becoming more advanced with innovation in sensors and
telecommunications technologies etc.. One of the challenges in machine learning is that many existing methods do not take into account
missing values or unbalanced data points. In recent years, machine learning and deep learning techniques have emerged as powerful tools for
water quality prediction. These techniques can learn from data and capture complex nonlinear relationships among water quality parameters
without relying on prior assumptions or domain knowledge. Moreover, these techniques can handle large-scale and high-dimensional data
efficiently and effectively. Some of the machine learning and deep learning techniques that have been applied for water quality prediction
include support vector machines (SVM), decision trees (DT), random forests (RF), gradient boosting (GB), AdaBoost, artificial neural
networks (ANN), long short-term memory (LSTM), and auto deep learning (AutoDL). This paper discusses a water quality prediction model
that uses 3 different neural networks, a long short-term memory (LSTM) network, a Convolutional neural network (CNN), and a Gated
Recurrent Unit (GRU). A combination of these models is then used to test the efficiency of each model.

The goal of this research is to prove that neural networks produce as good results as traditional Machine Learning Algorithms and that
application of LSTM layers are not only suited for time series data, but also for discrete numerical data

highest recorded accuracy was produced by applying XG Boost


2 Related Works Classifier

Water Potability prediction has advanced enormously from Some of the other works imputed missing values with mean of
traditional methods to using Machine Learning and Neural each column and built a ML model to predict on the imputed
Networks. Recently, Machine Learning Algorithms like K dataset.
Nearest Neighbours (KNN)[1] has showed promising results and
Several studies have explored the use of machine learning
the use of XG Boost Classifier and Random Forest Classifier
shows even more promising results. techniques to predict water potability. One study used a machine
learning-based model that included techniques such as Synthetic
There was also a study which concluded that neural networks Minority Oversampling Technique (SMOTE) and explainable AI
aren’t an ideal application to this problem statement and that the (XAI) to predict water potability . The model was trained on the
Water Quality Index dataset available on Kaggle and used various

footer
machine learning approaches such as Support Vector Machine the model’s accuracy it has been incorporated with two different
(SVM), Decision Tree (DT), Random Forest, Gradient Boost, and layers and those are Convolutional Neural Networks (CNN) and
Ada Boost for water quality classification. The study found that Long Short-Term Memory Networks (LSTMs), thus the hybrid
Random Forest and Gradient Boost gave the highest accuracy of model was able to perform better and had a drastic increase in the
81%. accuracy of the samples. We then tried to add Gated Recurrent
Unit in addition to Convolution and LSTM layers. This spiked the
Another study used an artificial neural network to forecast water accuracy to a 80.8%. This model gives the best performance so
quality parameters for irrigation purposes . The study analyzed far. The results from each model created during this research were
the water quality of Ele River Nnewi, Anambra State and aimed recorded and then compared.
to predict a one-year water quality index using an Artificial
Neural Network. The model was used to predict four water quality
parameters including pH, Total Dissolved Solids (TDS), 4 Results
Electrical Conductivity (EC), and Sodium (Na) at four different
locations. The study found that the artificial neural network The result yielded Irrespective of whichever region of the area
the water came from, we were successful in achieving 81%
modeled the actual water quality data set very well with good
accuracy using this model and methodology.
prediction.
This shows that the use of LSTM, GRU layers into the neural
Other studies have also explored the use of neural networks for network gives a massive performance boost to our model and
predicting water quality. For example, one study used a radial- also that neural networks perform as good as traditional
basis-function (RBF) neural network for prediction research in Machine Learning Algorithms such as Decision Tree Classifier,
various water environments due to its simple structure, fast Random Forest Classifier, XGBoost Classifier, etc…
training speed, and ability to approximate arbitrary functions
globally with arbitrary precision .

These studies demonstrate the potential of using machine learning


techniques, including neural networks, to predict water potability. 5 Conclusion

Further research in this area could build upon these findings to In this research work we have proposed a Deep learning with
develop more accurate and reliable models for predicting water Neural Network based solution for better result in the assisting
potability. the quality of water for its portability. With the help of Neural
Network, we can monitor the changing potable level. We have
also extensively investigated the implications of changing the pH.
Optimal pH was considered for performance comparison and
subsequent investigation. To identify the best possible solution,
3 Methodology we have also examined the efficacy of various deep learning
approaches compared to one another. The continued development
of this approach can be used in agricultural applications and
Preprocessing enhanced to be applied in the treatment of water from industrial
operations. Furthermore, we may add active constituents to the
It is an important step in Data Analysis. Preprocessing the dataset water if modifying the water's ph balance is appropriate for its
carefully with existing methods could improve the accuracy and intended application. Deep learning technology is blossoming
quality of the dataset and make it more reliable. Here we have since it helps to create efficient, inexpensive, and user-friendly
diagnostic techniques for water quality. The technique has the
imputed the missing values with the mean of each column.
potential to be used productively to tackle the issues concerning
Although this introduces bias into the data, this method performs quality of water. Near real-time water quality surveying with a
better than any other method of handling missing values, such as portable methodology is feasible in industrial and resource
using median or mode instead of mean. limited settings. Also, regarding the nature of the terrain and water
source, it would be advantageous to provide extra details, for
Also, upon analysing the dataset further, it was found that the example, the local condition in the area where the water sample
dataset was unbalanced, that is, the number of samples for non- is being collected. Finally, it is recommended that we employ
cutting-edge deep learning techniques to improve the system's
potable water (1998 samples of non-potable) was far greater than
profitability.
the number of potable water samples (1278 samples of potable
water). So, we upscaled the samples of the potable water to make
the dataset balanced. This has shown improved performance and
Thus it is concluded the LSTM layers can not only be used in
gave an improved accuracy.
applications of time series data, but can also be used with discrete
numerical data, and also, that neural networks perform as good as
traditional Machine Learning algorithms
Model Building

We have used the Deep learning algorithms in this project as it


can give a better understanding on the sample data set that is being 6 References
used here. As getting started we have tested the sample with
Rectified Linear Unit (ReLu), Convolutional Neural Networks [1] Y. Yu, J. Cao and J. Zhu, "An LSTM short-term solar
(CNN) and Long Short-Term Memory Networks (LSTMs). irradiance forecasting under complicated weather conditions",
IEEE Access, vol. 7, pp. 145651-145666, 2019.
While running the samples in these algorithms and techniques we
[2] Mishra, D.R.; D’Sa, E.J.; Mishra, S. Preface: Remote sensing
were able to gain an accuracy of 76% and hence forth to increase of water resources. Remote Sens. 2018.

footer
[3] Md Omar Faruq et al., "Design and implementation of cost
effective water quality evaluation system", Humanitarian
Technology Conference (R10-HTC) 2017 IEEE Region 10, 2017.

[4] Baihaqi Siregar et al., "Monitoring quality standard of waste


water using wireless sensor network technology for smart
environment", ICT For Smart Society (ICISS) 2017 International
Conference on, 2017.

[5] M. Zounemat-Kermani, Ö. Kişi, J. Adamowski and A.


Ramezani-Charmahineh, "Evaluation of data driven models for
river suspended sediment concentration modeling", J. Hydrol.,
vol. 535, pp. 457-472, 2016.

[6] S. I. Abba, S. J. Hadi and J. Abdullahi, "River water modelling


prediction using multi-linear regression artificial neural network
and adaptive neuro-fuzzy in ference system techniques",
Procedia Comput. Sci., vol. 120, pp. 75-82, 2017.

[7] N. Vijayakumar and R. Ramya, "The real time monitoring of


water quality in IoT environment", Circuit Power and Computing
Technologies (ICCPCT) 2015 International Conference on, pp. 1-
4, 2015, March.

[8] S. S. Ahuja, "Monitoring water quality pollution assessment


and remediation to assure sustainability" in Monitoring Water
Quality, Amsterdam, The Netherlands:Elsevier, pp. 1-18, 2013.

Dataset

The dataset used for this research is available in Kaggle


under the name water_potability.csv

Code

The code is available in the link:


mugeshraj11/water_potability (github.com)

footer

You might also like