Professional Documents
Culture Documents
Mustakim 2021 J. Phys. - Conf. Ser. 1783 012020
Mustakim 2021 J. Phys. - Conf. Ser. 1783 012020
*mustakim@uin-suska.ac.id
Abstract. The Department of Environment and Forestry, Pollution and Environmental Damage
Control Division, has an active role in monitoring water quality in Riau Province. The rivers that are
still monitored and managed are Kampar River, Siak River and Indragiri River. Division of
Environment Pollution calculates river quality status manually using Microsoft Excel, this is not
maximally done since this important information should be processed quickly. Division of water
pollution must determine the right calculation to get the results of the water quality status. Because
of many calculation formulas set by the government, the commonly used method is the STORET
method and the Pollution Index. So, in overcoming the problem of classification, the researcher
proposes the use of learning methods that can predict or determine the status of water quality with
classification techniques on data mining that is Modified K-Nearest Neighbor (MKNN) which is a
modification of K-NN. The calculation of the MKNN algorithm produced the highest accuracy of
85.10% at K = 5 using STORET result data as training data. While, using the Pollution Index data
results, the highest accuracy is 76.92% at K = 1. Based on the analysis with attribute analysis, the
attributes that influence the determination of river water quality are BOD, COD, NH3, Fecal Coli
and Total Coli. This result can be taken into consideration by the Division of Environmental
Pollution in the process of overcoming and reducing pollutant overload that exceeds quality
standards.
1. Introduction
River water has a very important role in the lives of humans and other living things. In the past until now,
the river water has been used to fulfill daily needs, such as bathing, washing, for transportation to connect
one area to another, for cultivation areas, fishing, water sources for industrial production, agricultural
irrigation, and sources of clean water, as well as freshwater fishery sources[1]. To maintain river water
quality, monitoring of river water quality is necessary[2]. Based on Government Regulation of the Republic
of Indonesia Number 82 Year 2001 Concerning Water Quality Management and Water Pollution Control,
the one who plays a role in the process of monitoring river water quality is the Government. Government
Agency that plays an active role in monitoring water quality in Riau Province is the Department of
Environment and Forestry, the Division of Pollution and Environmental Damage Control [3].
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020
One important role of this agency is to prevent pollution, manage and monitor river water quality
regularly. The rivers that are still monitored and managed by the Department of Environment and Forestry
of Riau Province are Kampar River, Siak River and Indragiri River. The information about the river water
quality status from the collected data cannot be retrieved before further processing. After monitoring, The
Environmental Pollution Department must compile data again and calculate the status of river water quality
manually using Microsoft Excel, this is not the maximally done since this important information should be
processed quickly. The division of water pollution must determine the right calculation to get the results of
the status of water quality. Because of many calculation formulas which are determined by the government
to calculate water quality status, the commonly used methods are the STORET method and the Pollution
Index [4].
Data mining is one of computer science which involves several computational processes, statistical
techniques, clustering, classification and finding patterns in the dataset which are used to extract
information from large datasets by transforming it into an understandable format and is understood
beforehand [5]. Several studies about the determination of the classification of water quality status had been
done by some researchers. Hamidi, et al in 2017 researched about river water quality by using the Learning
Vector Quantization (LVQ) algorithm for the classification of River Water Quality with the rsult of average
accuracy is 81.13% [6]. Research that was conducted by Alamelu, M. J et al, in 2013 with the evaluation
of the correctness of the results was based on the value of accuracy. Based on the webpage classification
test results by using MKNN was better than using KNN. It was shown from the accuracy value of each
method. MKNN had the lowest accuracy of 92.05% and the highest of 97.60% with the test k at 9th to 13th.
Meanwhile, KNN had the lowest accuracy of 82.14% with the same test k place. Threfore, MKNN was
recommended to be improved in webpage classification [7].
One of the standards in determining the status of water quality is Storage and Retrieval (STORET).
STORET is one of the methods used to measure river water quality, but it requires a high amount of time
and cost. The weakness of this method requires some sufficient data, if there is one missing data it cannot
calculate the maximum and minimum averages in determining the status of water quality [8]. While the
pollution index method (PI) is used to calculate the river quality status. As an index-based approach, this
method is built on two quality indices. The first is the average index (IR) which shows the average pollution
level of all parameters in one observation, while the second is the maximum index [9].
The use of water pollution index is useful to provide a quick and simple initial assessment to determine
the status of water quality. This water quality assessment must be followed by regular water quality
monitoring on water resources needed to assess water quality for ecosystem health. The water pollution
index is an approach that minimizes data volume widely and simplifies the expression of water quality
status [10]. Water quality index calculation is based on a number of physico-chemical and bacteriological
parameters [11].
So, in overcoming the problem of classification, researcher proposes the use of other learning methods
that can help predict or determine the status of water quality with classification techniques in data mining.
One algorithm that is used in classification techniques is Modified K-Nearest Neighbor (MKNN). MKNN
is a classification algorithm to improve algorithm performance and improvement from the previous
algorithm, K-Nearest Neighbor (KNN), which uses the nearest neighbor in the training data. This MKNN
Algorithn is determined by using a different procedure from K-NN. The advantage of the MKNN algorithm
is its good improvement in accuracy compared to the K-NN method [12]. MKNN is a new classification
method to improve the performance of the proposed K-Nearest Neighbor that uses strong neighbors in the
training data. Strong neighbors are detected by using the validation process [13]. This method has the main
idea by classifying test samples according to neighbor tags. This method is a kind of weighted KNN so that
the weight is determined by using a different procedure. The experiment shows a very good increase in
accuracy compared to the KNN method [5].
By applying the MKNN algorithm to river water data using training data from the calculation results of
the STORET method and the Water Pollution Index (WPI), the status information on water quality classes
from the classification results will be obtained. Information on river water quality needs to be known by
the community to add information about the status of water pollution around the residence, provide
2
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020
information to the relevant government as information and help decision makers regarding the prevention
of river water pollution and raise awareness of industral centers in protecting the surrounding environment.
River water quality management based on the index can provide alternative to decision makers in order to
assess the quality of water bodies for an allotment and take action to improve quality in case the decrease
in water quality happen.
3
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020
is inputted in the training set by calculating the validity using the top nearest neighbor from each training
set document [17].
In the MKNN algorithm, each sample in the training data set must be validated in the first step. The
validity of each point is calculated according to its neighbors based on weight (weight voting) and the
validity of data points.This step is processed for all samples. The development process of the K-NN method
is very efficient because it reduces the number of data points and overcome the low accuracy [18]. The
stages of MKNN are described as follows:
The Distance calculation is done with equation 1 [19] [7].
= ∑ ( ᵢ − ᵢ)² (1)
4
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020
data record is 624, the cross validation was divided equally in 3 models. Cross validation took 10 tests to
determine the value of the parameter "K" and was a good model for MKNN. The results of cross validation
testing can be seen in Table 2.
Acuration
100.00% 91.48% 91.48% 91.48% 93.61% 91.48% 89.36% 89.36% 87.23% 91.48%
76.59%
80.00%
60.00%
40.00%
20.00%
0.00%
K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10
Based on the results of Cross validation, it can be seen that the best "K" parameter is at K = 1 with an
average accuracy of 83.00% and a good "Cross" model is found in Cross 1 with an average accuracy of
82.49%. Furthermore, the number of training data on Cross 1 were used as training data on the MKNN
algorithm using K = 1. To obtain the weight voting (WV) value, Equation 2 from each value of the distance
attribute between the training data and the testing data (TT) that had been obtained for every training data
validity was used. After the weight voting value was obtained, the next thing to do was to find out the
highest weight voting value as much as the predetermined K value, that is K = 1.
4. Conclusion
The results of the classification using the MKNN algorithm with STORET calculation as training data with
47 predicted data produced 47 data that were classified into the heavily polluted class. Meanwhile, the
results of the calculation by using Water Pollution Index with 208 predicted data produced 108 data that
were classified into the medium polluted class, 99 data were the class of lightly polluted and one data were
into the class of heavy pollution. The MKNN algorithm calculation that was implemented in the
classification of river water quality status produced the highest accuracy of 85.10% at K = 5 by using
STORET results as training data. On the other hand, by using the Pollution Index data as training data
produces the highest accuracy was 76.92% at K = 1 by testing using the calculation of confusion matrix
accuracy. The classification process used water quality monitoring data of Siak River, Kampar River and
Indragiri River from 2014 to 2016 by applying MKNN algorithm which was able to classify and predict
water quality classes according to algorithm calculations manually. The results are useful for decision
making for the Division of Pollution and Environmental Damage Control. Based on the analysis of results
with the analysis of attributes, the attributes that influence the determination of river water quality were 3
chemical parameters namely BOD, COD and NH3 as well as micro-ecological parameters of Fecal Coli
5
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020
and Total Coli. This can be considered by the Department of Environment and Forestry in the Division of
Environmental Pollution in Pekanbaru city in the process of overcoming and reducing the overload of
pollutants that exceed quality standards.
References
[1] R. Karolina and Y. G. C. Sianipar, “The utilization of stone ash on cellular lightweight concrete,”
in IOP Conference Series: Materials Science and Engineering, 2018, vol. 309, no. 1.
[2] R. Karolina and A. L. A. Putra, “The effect of steel slag as a coarse aggregate and Sinabung volcanic
ash a filler on high strength concrete,” in IOP Conference Series: Materials Science and
Engineering, 2018, vol. 309, no. 1.
[3] G. Regulation, “Peraturan Pemerintah Republik Indonesia Nomor 82 Tahun 2001,” Jakarta
Peratur. Pemerintah, pp. 1–32, 2001.
[4] Keputusan Menteri Negara Lingkungan Hidup, “Keputusan Menteri Negara Lingkungan Hidup
Nomor 115 Tentang Pedoman Penentuan Status Mutu Air,” Jakarta Menteri Negara Lingkung.
Hidup, pp. 1–15, 2003.
[5] V. VijayanV and A. Ravikumar, “Study of Data Mining Algorithms for Prediction and Diagnosis
of Diabetes Mellitus,” Int. J. Comput. Appl., vol. 95, no. 17, pp. 12–16, 2014.
[6] R. Agrawal, “A modified K-nearest neighbor algorithm using feature optimization,” Int. J. Eng.
Technol., vol. 8, no. 1, pp. 28–37, 2016.
[7] M. M. Siti Mutrofin, Abidatul Izzah, Arrie Kurniawardhani, “Optimasi Teknik Klasifikasi Modified
K Nearest Neighbor Menggunakan Algoritma Genetika,” J. GAMMA, vol. s3-VII, no. 182, p. 504,
2015.
[8] D. Purwitasari, O. P. Putri, and W. N. Khotimah, “Aturan Asosiasi Dengan Standar Storet Pada
Model Prediksi Parameter Pendukung Uji Kualitas Air Baku,” J. Inf. Syst. Eng. Bus. Intell., vol. 1,
no. 1, pp. 1–8, 2015.
[9] I. dan A. Mutiara, “Penerapan K-Optimal Pada Algoritma Knn Untuk Prediksi Kelulusan Tepat
Waktu Mahasiswa Program Studi Ilmu Komputer Fmipa Unlam Berdasarkan Ip Sampai Dengan
Semester 4,” Klik - Kumpul. J. Ilmu Komput., vol. 2, no. 2, pp. 159–173, 2015.
[10] S. V. Mohan, P. Nithila, and S. J. Reddy, “Estimation of heavy metals in drinking water and
development of heavy metal pollution index,” J. Environ. Sci. Heal. Part A, vol. 31, no. 2, pp. 283–
289, 1996.
[11] H. Parvin, H. Alizadeh, and B. Minaei-bidgoli, “MKNN : Modified K-Nearest Neighbor,” Proc.
World Congr. Eng. Comput. Sci. WCECS, pp. 22–25, 2008.
[12] D. A. Adeniyi, Z. Wei, and Y. Yongquan, “Automated web usage data mining and recommendation
system using K-Nearest Neighbor (KNN) classification method,” Appl. Comput. Informatics, vol.
12, no. 1, pp. 90–108, 2016.
[13] W. Wu, W. Guo, and K.-L. Tan, “Distributed processing of moving k-nearest-neighbor query on
moving objects,” in 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp.
1116–1125.
[14] Rezaei, Alizadeh, and H. Parvin, “An extended MKNN modified K-nearest neighbor,” J. Netw.
Technol., vol. 4, no. 2, pp. 162–168, 2011.
[15] Okfalisa, I. Gazalba, Mustakim, and N. G. I. Reza, “Comparative analysis of k-nearest neighbor and
modified k-nearest neighbor algorithm for data classification,” in Proceedings - 2017 2nd
International Conferences on Information Technology, Information Systems and Electrical
Engineering, ICITISEE 2017, 2018, vol. 2018-Janua.
[16] C. Shi, J. Ma, J. Wu, K. Chen, and B. Wu, “(Bi0. 5Na0. 5) ZrO3 modified KNN-based ceramics:
Enhanced electrical properties and temperature insensitivity,” Ceram. Int., vol. 46, no. 3, pp. 2798–
2804, 2020.
[17] P. Bolaj and S. Govilkar, “Text Classification for Marathi Documents using Supervised Learning
Methods,” Int. J. Comput. Appl., vol. 155, no. 8, pp. 6–10, 2016.
[18] T. Dharani and I. L. Aroquiaraj, “Content Based Image Retrieval System using Feature
6
Annual Conference on Science and Technology Research (ACOSTER) 2020 IOP Publishing
Journal of Physics: Conference Series 1783 (2021) 012020 doi:10.1088/1742-6596/1783/1/012020