Sustainability 11 02058 v2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

sustainability

Article
Analysis and Prediction of Water Quality Using
LSTM Deep Neural Networks in IoT Environment
Ping Liu 1 , Jin Wang 1,2,3, * , Arun Kumar Sangaiah 4 , Yang Xie 5 and Xinchun Yin 6
1 School of Information Engineering, Yangzhou University, Yangzhou 225127, China; pliu@yzu.edu.cn
2 School of Computer & Communication Engineering, Changsha University of Science & Technology,
Changsha 410114, China
3 School of Information Science and Engineering, Fujian University of Technology, Fujian 350118, China
4 School of Computing Science and Engineering, Vellore Institute of Technology (VIT), Vellore,
Tamil Nadu 632014, India; arunkumarsangaiah@gmail.com
5 Yangzhou Municipal Bureau of Ecology and Environment, Yangzhou 225007, China; yxie@yzu.edu.cn
6 Guangling College, Yangzhou University, Yangzhou 225000, China; xcyin@yzu.edu.cn
* Correspondence: jinwang@csust.edu.cn

Received: 1 March 2019; Accepted: 1 April 2019; Published: 7 April 2019 

Abstract: This research paper focuses on a water quality prediction model which requires high-quality
data. In the process of construction and operation of smart water quality monitoring systems based
on Internet of Things (IoT), more and more big data are produced at a high speed, which has made
water quality data complicated. Taking advantage of the good performance of long short-term
memory (LSTM) deep neural networks in time-series prediction, a drinking-water quality model
was designed and established to predict water quality big data with the help of the advanced deep
learning (DL) theory in this paper. The drinking-water quality data measured by the automatic water
quality monitoring station of Guazhou Water Source of the Yangtze River in Yangzhou were utilized
to analyze the water quality parameters in detail, and the prediction model was trained and tested
with monitoring data from January 2016 to June 2018. The results of the study indicate that the
predicted values of the model and the actual values were in good agreement and accurately revealed
the future developing trend of water quality, showing the feasibility and effectiveness of using LSTM
deep neural networks to predict the quality of drinking water.

Keywords: IoT; big data; LSTM; prediction model; water quality

1. Introduction
With the rapid development of economy and accelerated urbanization, water pollution has become
more and more serious [1]. Therefore, understanding the problems and trends of water pollution is
of great significance for the prevention and control of water pollution. In order to fully understand
the quality of the water environment, most cities in China have started the construction of water
environment monitoring systems. The development of remote sensing (RS) [2,3], Internet of Things
(IoT) [4–8], cloud computing, big data and artificial intelligence (AI) provides new opportunities and
approaches for the innovation and application of water environment monitoring technology. Relying
on different kinds of hydrological and water quality automatic monitoring stations, RS monitoring
systems, wireless sensor networks (WSNs), monitoring ships, and advanced underwater bionic robots,
smart monitoring systems for water environment protection have been built in cities and counties
in China.
On the basis of the historical data collected by the smart water quality monitoring systems,
a water quality prediction model can establish a corresponding mapping relationship between the

Sustainability 2019, 11, 2058; doi:10.3390/su11072058 www.mdpi.com/journal/sustainability


Sustainability 2019, 11, 2058 2 of 14

multi-monitoring data and the changes of water quality parameters and can predict changes of water
quality status in certain future periods. In recent years, the establishment of reliable water quality
prediction models [9–16] has become one of the research hotspots in the field of water environmental
science. However, with the large-scale application of the smart water quality monitoring systems,
although the shortcomings of the traditional collection method are solved, a large amount of big data
is produced. In addition, the water environment is affected by many factors and has strong nonlinear
characteristics. However, the traditional water quality prediction model cannot comprehensively
consider the influence of physics, chemistry, biology, meteorology, and hydraulics factors. At present,
researchers mainly focus on improving the applicability and reliability of water quality prediction
models and have introduced a variety of new technologies, such as fuzzy mathematics, stochastic
mathematics, 3S technology, artificial neural networks (ANN), etc., to improve water quality prediction
models and promote the scope of applications [17,18].
Among the technologies mentioned above, ANN has become a popular method for water quality
prediction because of its excellent applicability to uncertain and nonlinear situations. As a typical
representative of artificial neural networks, the traditional back-propagation (BP) neural network and
its improved algorithms have obvious advantages in predicting nonlinear problems and have been
effectively applied to water quality prediction [19]. The radial-basis-function (RBF) neural network
has been widely used in prediction research in various water environment by virtue of its simple
structure, fast training speed, and ability to approximate arbitrary functions globally with arbitrary
precision [20,21]. However, the above artificial neural network models are not suitable for time-series
prediction problems. The so-called time series is a series of observations in chronological order.
The automatic water quality monitoring station of drinking water source automatically collects water
quality parameters at a fixed time interval (such as on a daily basis) and uploads them to a server to
reflect the fluctuations of water quality at regular intervals at the monitoring points. Therefore, water
quality parameters appear in the form of time series. Due to the good performance of long short-term
memory (LSTM) models in time-series prediction, the application of LSTM in environmental research
has become more and more extensive [22]. However, few studies have applied the LSTM models to
the prediction of drinking-water quality parameters.
This paper proposes a drinking-water quality prediction model based on LSTM deep neural
networks to predict drinking-water quality data measured by the automatic water quality monitoring
station of the Guazhou Water Source in Yangzhou City and then compares the predicted results with
the measured data. The results show the potential of application of LSTM and deep learning in
predicting drinking water quality.

2. Data Source and Pre-Processing

2.1. Study Area Description and Water Quality Data Analyzed


With the popularization and application of IoT, water quality monitoring has gradually evolved
into automation and networking. Yangzhou City, Jiangsu Province, China, as the source city of the
East Route of the South-to-North Water Diversion Project, has organized the water quality monitoring
of major rivers (lakes) for many years and established a water quality monitoring and control system
at the city and county levels, with drinking-water source protection as the main objective. The main
urban area of Yangzhou includes users within towns and townships where regional water supply has
been implemented. There are three main water sources: one in the waters of Liaojiagou, the river
embayment of the Huaihe River, one in the Guazhou section of the Yangtze River, and the one in
Sanjiangying of Yangtze River in Touqiao town. The three water intakes are abundant in water, and
the water quality is stable and meets the national second-class drinking-water source water quality
standards. In order to protect the water sources, Yangzhou Environmental Protection Bureau has
carried out real-time monitoring to the water intakes. Up to now, a number of automatic water quality
monitoring stations have been built in the water source areas of Sanjiangying, Guazhou, Liaojiagou,
Sustainability 2019, 11, 2058 3 of 14
Sustainability 2019, 10, x FOR PEER REVIEW 3 of 14
Sustainability 2019, 10, x FOR PEER REVIEW 3 of 14

of
of the
the Yangzhou
Wanfu Chemical
Gate, and Shierwei
Yangzhou Industry
Industry Park
Chemicaldownstream to
to ensure
of the
Park ensure continuous
Yangzhou Chemicalmonitoring
continuous Industry Park
monitoring of
of water
to quality
ensure
water status
continuous
quality status
all
all day long. On the basis of actual needs, the commonly used six water quality parameters are
day long.
monitoring ofOnwaterthe basis
quality of actual
status allneeds,
day the
long. commonly
On the basisused
of six
actual water
needs, quality
the parameters
commonly used six
are
collected
collected and
water quality uploaded
and parameters
uploaded toare to the
the monitoring
collected
monitoring center
and uploaded server of
to the
center server Yangzhou
of monitoring Environmental
Yangzhou center Protection
server of Protection
Environmental Yangzhou
Bureau to
Environmentalform an automatic
Protection Bureauwater
to quality
form an monitoring
automatic network
water covering
quality
Bureau to form an automatic water quality monitoring network covering urban drinking-water urban
monitoring drinking-water
network covering
sources.
urban
sources. drinking-water sources.
Figure
Figure 11 isisaaalocation
Figure location
locationmap map
map ofof
of some
some
some water
waterwater quality
quality automatic
automatic
quality monitoring
monitoring
automatic stations
stations
monitoring in
in Yangzhou
in Yangzhou
stations water
Yangzhou
water
source source
area area
extracted extracted
by using by using
the Gaodethe Gaode
map map
API/SDK API/SDK
interface. interface.
water source area extracted by using the Gaode map API/SDK interface. The three red balloonsThe threeThe
red three red
balloons balloons
shown in
shown
shown in
the figure the
in are figure
the the
figure are
locations the
are the locations
of locations of
water quality water quality
automatic
of water automatic
monitoring
quality monitoring
automaticstations stations
in Shierwei,
monitoring in Shierwei,
stationsGuazhou, and
in Shierwei,
Guazhou,
Guazhou, and Wanfu Gate, from the left to the right. The blue area is part of the water system of the
Wanfu and
Gate, Wanfu
from the Gate,
left to from
the the
right. left
The to the
blue right.
area is The
part blue
of thearea is
water part of
system the
of water
the system
Yangzhou of
section
the
Yangzhou section
of the Yangtze
Yangzhou River.
section of
of the
the Yangtze
Yangtze River.
River.

Figure 1. Locations of partial water quality automatic monitoring stations in Yangzhou water source
Figure
Figure 1.1.Locations of partial
Locations water
of partial quality
water automatic
quality monitoring
automatic stations
monitoring in Yangzhou
stations water source
in Yangzhou water
area.
area.
source area.

The
The experimental data samples in this paper were collected from the water quality monitoring
The experimental
experimental data data samples
samples in in this
this paper
paper werewere collected
collected from the water quality monitoring monitoring
data
data from
from111January
January
January2016 2016
2016 to 30
30 June
to June June 2018 in the
the automatic water
water quality monitoring station of
data from to 30 20182018 in theinautomaticautomatic
water quality quality monitoring
monitoring station ofstation
Guazhou of
Guazhou
Guazhou Water Source,
WaterYangzhou Yangzhou
Source, Yangzhou City of
City ofRiver,Yangtze
Yangtze River, with a total of 912 groups. The water
Water Source, City of Yangtze withRiver,
a totalwith a total
of 912 of 912The
groups. groups.
waterThe water
source of
source
source of Guazhou
of belongs
Guazhou belongs to the Yangtze River system and is a river-type water source. The
Guazhou to belongs
the Yangtze to the Yangtze
River systemRiverand issystem and iswater
a river-type a river-type
source. Thewater source.water
Guazhou The
Guazhou
Guazhou water
water quality
quality automatic
automatic monitoring station automatically collects six
six water quality
quality automatic monitoring stationmonitoring
automatically station automatically
collects six water quality collects
parameterswater andquality
water
parameters
parameters and
and water temperatures at
at aaand
fixed time
time every day and uploads
uploads them them to
to the
the server.
temperatures at water
a fixedtemperatures
time every day fixed
uploads every
themdayto theand
server. server.
According
According to the 731 sets of water quality monitoring data collected in 2016 and 2017, the time
According to to the
the 731
731 sets
sets of water quality
quality monitoring
monitoring datadata collected
collected in in 2016
2016 and
and 2017,
2017, thethe time
time
variation
variation trend of water
of water quality
waterquality parameters
parametersofof
qualityparameters ofthethe
the water
water source
source of
of the
the Yangtze
Yangtze River
River in
in Guazhou
Guazhou
trend of water source of the Yangtze River in Guazhou was
was
was analyzed.
analyzed. For
For time-series
time-series data,
data, we
we drew
drew aa line
line graph
graph to
to reflect
reflect the
the characteristics
characteristics of
of each
each water
water
analyzed. For time-series data, we drew a line graph to reflect the characteristics of each water quality
quality
quality parameter
parameter over
over time. Figures
time. 2–7Figures 2–7 show the annual trends of
of pH, dissolved oxygen,
parameter over time. Figures show2–7 the show
annualthe annual
trends trends
of pH, dissolved pH, dissolved
oxygen, oxygen,
conductivity,
conductivity,
conductivity, turbidity,
turbidity, chemical
chemical oxygen
oxygen demand
demand (CODMn), and NH 3–N in 2016 and 2017. Tables 1
turbidity, chemical oxygen demand (CODMn), and(CODMn),
NH3 –N in and 2016NH and3–N in 2016
2017. Tablesand 2017.
1 and Tablesthe
2 show 1
and
and 2 show
2 show the monthly
the monthly average values of water quality monitoring parameters of the water source
monthly average values average
of watervalues
qualityofmonitoring
water quality monitoring
parameters parameters
of the of theinwater
water source Guazhousource
in
in
in Guazhou
Guazhou in
in 2016
2016 andand 2017.
2017.
2016 and 2017.
8.5 8
8.5 8
7.95
8.4
7.95
8.4 7.9
8.3 7.9
8.3 7.85
8.2 7.85
7.8
8.2
7.8
pH pH
pH pH

8.1 7.75
8.1 7.75
7.7
8
7.7
8 7.65
7.9 7.65
7.9 7.6
7.8 7.6
7.55
7.8
7.55
7.7 7.5
2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31 2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31
7.7 Date 2017-07-28 7.5
2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-09-18 2017-11-09 2017-12-31 2016-01-01 2016-03-14 2016-05-26 Date 2016-08-07 2016-10-19 2016-12-31
Date Date

(a)
(a) (b)
(b)
Figure
Figure2.
2.Graphs
Graphsofofthe
thepH
pHvariation
variationtrend
trendof
ofGuazhou
GuazhouWater
WaterSource.
Source.(a)
(a)pH
pHvariation
variationtrend
trendinin2016,
2016,
Figure 2. Graphs of the pH variation trend of Guazhou Water Source. (a) pH variation trend in 2016,
(b)
(b)pH
pHvariation
variationtrend
trend in
in 2017.
2017.
(b) pH variation trend in 2017.
Sustainability 2019, 10, x FOR PEER REVIEW 4 of 14
Sustainability 2019, 10, x FOR PEER REVIEW 4 of 14
Sustainability 11, 2058
2019,2019,
Sustainability 10, x FOR PEER REVIEW 44 of
of1414
Sustainability
13
2019, 10, x FOR PEER REVIEW 12
4 of 14
13 12
12
11
13 12
12
11 11
13 12
12 10
oxygen/(mg/L)

oxygen/(mg/L)
11 11
10
12 10
oxygen/(mg/L)

oxygen/(mg/L)
11 11
10 9
9 10
oxygen/(mg/L)

oxygen/(mg/L)
11
10 9
9 10
Dissolved oxygen/(mg/L)

Dissolved oxygen/(mg/L)
8
Dissolved

Dissolved
10 8
9 9
8
Dissolved

Dissolved
7 8
9 9
8 7
Dissolved

Dissolved
7 8
6
8 7
7 8
6 6
5 7
7
6 6
5 7
4 5
6
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2017-01-01
62017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
5 Date Date
4 5
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 6
2017-01-01
2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
5 Date Date
4 5
2016-01-01
4 (a) 2016-03-14 2016-05-26
Date (b)
2016-08-07 2016-10-19 2016-12-31 2017-01-01
5
2017-01-01
2017-02-22
2017-02-22
2017-04-15
2017-04-15
2017-06-06
2017-06-06
Date
2017-07-28
2017-07-28
2017-09-18
2017-09-18
2017-11-09
2017-11-09
2017-12-31
2017-12-31
2016-01-01
(a)
2016-03-14 2016-05-26
Date (b)
2016-08-07 2016-10-19 2016-12-31
Date

(a) (b)
Figure 3. Dissolved(a)oxygen variation trend of Guazhou Water Source. (b)
(a) Dissolved oxygen
Figure 3. Dissolved oxygen variation trend of Guazhou Water Source. (a) Dissolved oxygen
variation
Figure 3.trend in 2016,oxygen
Dissolved (b) dissolved oxygen
variation variation
trend trend inWater
of Guazhou 2017. Source. (a) Dissolved oxygen
variation
Figure trend
3.3.Dissolvedin 2016,
Dissolved (b) variation
oxygen
oxygen dissolved oxygen
variation trend
trend variation trend in
of Guazhou
of Guazhou Water 2017.Source.
Water
Source. (a) Dissolved
(a) Dissolved oxygen
oxygen variation
variation trend in 2016, (b) dissolved oxygen variation trend in 2017.
variation
trend trend(b)
in 2016,
400
indissolved
2016, (b) dissolved oxygen trend
oxygen variation variation trend in 2017.
in 2017. 400
400 400
350 350
400 400
350
400 350
400
300
300 350
Conductivity/(us/cm)

350

Conductivity/(us/cm)
300
300
350 350
Conductivity/(us/cm)

Conductivity/(us/cm)
250
300
250
300
Conductivity/(us/cm)

Conductivity/(us/cm)
250
300
250 200
300
Conductivity/(us/cm)

Conductivity/(us/cm)
250
200
250 200
250
150
200
250 200
150
150
200 200
100
150
150
200 100
150
100150 50
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2017-01-01
100 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
Date 50 Date
150100 2017-01-01
100 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31
Date 50 Date
100 2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
2016-01-01
100 (a) 2016-03-14 2016-05-26
Date
2016-08-07
(b) 2016-10-19 2016-12-31
50
2017-01-01 2017-02-22 2017-04-15 2017-06-06
Date
2017-07-28 2017-09-18 2017-11-09 2017-12-31
2016-01-01
(a)
2016-03-14 2016-05-26
Date
2016-08-07
(b)
2016-10-19 2016-12-31
Date

(a) (b)
Figure 4. Conductivity
(a) variation trend of Guazhou Water Source. (a) Conductivity
(b) variation trend in
Figure 4. Conductivity variation trend of Guazhou Water Source. (a) Conductivity variation trend in
2016,
Figure
Figure (b)
4.4.conductivity
Conductivityvariation
Conductivity variationtrend
variation trendin
trend ofof2017.
GuazhouWater
Guazhou WaterSource.
Source.(a)
(a)Conductivity
Conductivityvariation
variationtrend
trendinin
2016, (b)
Figure conductivity variation
4. Conductivity variation trend
trend in
of 2017.
Guazhou Water Source. (a) Conductivity variation trend in
2016,(b)
2016, (b)conductivity
conductivityvariation
variationtrend
trendinin2017.
2017.
2016, (b) conductivity variation trend in 2017.
200 200
200 200
180 180
200 200
180 180
160 160
200 200
180 180
160 160
140 140
180 180
Turbidity/(NTU)

Turbidity/(NTU)Turbidity/(NTU)

160 160
140 140
120 120
Turbidity/(NTU)

160
Turbidity/(NTU)

160
140 140
120 120
100 100
Turbidity/(NTU)

Turbidity/(NTU)

140 140
120 120
100
Turbidity/(NTU)

100 80
80
120 120
100 100
80 80
60 60
100 100
80 80
60 60
40 40
80 80
60 60
40 40
20 20
2016-01-01
60 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2016-01-01
60 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31
40 40 Date
20 Date 20
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31
40 40 Date
20 Date 20
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31
2016-01-01
20 (a) 2016-03-14 2016-05-26
Date
2016-08-07
(b) 2016-10-19 2016-12-31
20
2016-01-01 2016-03-14 2016-05-26
Date
2016-08-07 2016-10-19 2016-12-31
2016-01-01
(a)
2016-03-14 2016-05-26
Date
2016-08-07
(b)
2016-10-19 2016-12-31
Date

(a) (b)
Figure 5. Turbidity variation
(a) trend of Guazhou Water Source. (a) Turbidity variation
(b) trend in 2016,
Figure5.5.Turbidity
Figure Turbidityvariation
variationtrend
trendof ofGuazhou
Guazhou Water
WaterSource.
Source. (a)
(a) Turbidity
Turbidity variation
variation trend
trend in
in 2016,
2016,
(b) turbidity
Figure variation
5. Turbidity trend intrend
variation 2017. of Guazhou Water Source. (a) Turbidity variation trend in 2016,
(b)turbidity
(b) turbidity
Figure variation
variation
5. Turbidity trendinin
trend
variation 2017.of Guazhou Water Source. (a) Turbidity variation trend in 2016,
2017.
trend
(b) turbidity variation trend in 2017.
(b) turbidity variation trend in 2017.
7 4.5
7 4.5
6 4
7 4.5
6 4
7 3.5
4.5
5 4
6
3.5
5 3
CODMn/(mg/L)CODMn/(mg/L)

4
CODMn/(mg/L)CODMn/(mg/L)

6
4 3.5
5 3
CODMn/(mg/L)

CODMn/(mg/L)

4 2.5
3.5
5 3
CODMn/(mg/L)

CODMn/(mg/L)

3 2.5
4
32
3 2.5
4
2 2
3 1.5
2.5
2 2
3 1.5
1
2 21
1.5
1 1
2
0 0.5
1.5
2016-01-01
1 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
1
Date 0.5 Date
0 2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
1
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 1
Date 0.5 Date
0 2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
2016-01-01
0 (a) 2016-03-14 2016-05-26
Date
2016-08-07
(b)
2016-10-19 2016-12-31
0.5
2017-01-01 2017-02-22 2017-04-15 2017-06-06
Date
2017-07-28 2017-09-18 2017-11-09 2017-12-31
2016-01-01
(a) 2016-03-14 2016-05-26
Date
2016-08-07
(b)
2016-10-19 2016-12-31
Date

(a) (b)
Figure 6.6.Chemical
Figure Chemical oxygen
oxygen
(a) demand
demand (CODMn
(CODMn) ) variation
variation trend trend of Guazhou
of Guazhou Water Water (a)
(b) Source. Source.
CODMn(a)
Figure 6. Chemical oxygen demand (CODMn) variation trend of Guazhou Water Source. (a)
CODMn
variation
Figure 6. variation
trend trend
in 2016,
Chemical inCODMn
(b)
oxygen2016, (b)variation
demand CODMn trend
(CODMn ) invariation
variation trend intrend
2017. 2017.of Guazhou Water Source. (a)
CODMn
Figure 6. variation
Chemicaltrend
oxygenin 2016, (b) CODMn
demand (CODMn ) variation
variation trend trend
in 2017.
of Guazhou Water Source. (a)
CODMn variation trend in 2016, (b) CODMn variation trend in 2017.
CODMn variation trend in 2016, (b) CODMn variation trend in 2017.
Sustainability 2019, 11, 2058 5 of 14
Sustainability 2019, 10, x FOR PEER REVIEW 5 of 14

0.5 0.7

0.45
0.6
0.4

0.35 0.5
NH3-N/(mg/L)

0.3

NH3-N/(mg/L)
0.4
0.25
0.3
0.2

0.15 0.2

0.1
0.1
0.05

0 0
2016-01-01 2016-03-14 2016-05-26 2016-08-07 2016-10-19 2016-12-31 2017-01-01 2017-02-22 2017-04-15 2017-06-06 2017-07-28 2017-09-18 2017-11-09 2017-12-31
Date Date

(a) (b)
Figure 7.
Figure NH33–N
7. NH –Nvariation
variationtrend
trendofofGuazhou
GuazhouWater
WaterSource.
Source.(a)(a)
NH NH
3–N N variation
3–variation trend
trend in 2016,
in 2016, (b)
(b)
NHNH
3–N3 –N variation
variation trend
trend in 2017.
in 2017.

Table 1. Monthly mean values of water quality monitoring parameters of Guazhou Water Source
Table 1. Monthly mean values of water quality monitoring parameters of Guazhou Water Source in
in 2016.
2016.
Water Dissolved
Water Dissolved Conductivity Turbidity CODMn NH3 -N
Month Temperature pH Oxygen Conductivity Turbidity CODMn NH3-N
Month Temperature
◦ pH Oxygen /(us/cm) /(NTU) /(mg/L) /(mg/L)
/( C) /(mg/L) /(us/cm) /(NTU) /(mg/L) /(mg/L)
/(℃) /(mg/L)
January
January 9.7 9.7 7.67 7.67 11.90 11.90 238
238 81
81 3.12
3.12 0.30
0.30
February 9.3 7.62 12.26 248 55 2.50 0.13
February 9.3 7.62 12.26 248 55 2.50 0.13
March 12.7 7.78 11.14 203 64 2.26 0.13
March 12.7 7.78 11.14 203 64 2.26 0.13
April 17.2 7.76 9.52 209 84 2.61 0.13
MayApril 21.0 17.2 7.74 7.76 8.39 9.52 209
222 84
75 2.61
1.68 0.13
0.13
JuneMay 24.4 21.0 7.71 7.74 6.23 8.39 222
224 75
76 1.68
1.81 0.13
0.19
JulyJune 26.6 24.4 7.71 7.71 5.54 6.23 224
207 76
136 1.81
3.19 0.19
0.17
August July 30.0 26.6 7.77 7.71 5.58 5.54 207
241 136
105 3.19
2.61 0.17
0.16
September
August 26.9 30.0 7.81 7.77 6.35 5.58 220
241 101
105 2.53
2.61 0.17
0.16
September 22.1 26.9 7.80 7.81 7.36
October 6.35 217
220 97
101 3.01
2.53 0.14
0.17
November
October 16.9 22.1 7.84 7.80 8.47 7.36 288
217 110
97 2.58
3.01 0.16
0.14
November 12.9 16.9 7.83 7.84 9.51
December 8.47 298
288 119
110 2.55
2.58 0.24
0.16
December 12.9 7.83 9.51 298 119 2.55 0.24
Table 2. Monthly mean values of water quality monitoring parameters of Guazhou Water Source
in 2017.2. Monthly mean values of water quality monitoring parameters of Guazhou Water Source in
Table
2017.
Water Dissolved
Conductivity Turbidity CODMn NH3 -N
Month Temperature
Water pH Oxygen
Dissolved
◦ /(us/cm)
Conductivity /(NTU)
Turbidity /(mg/L)
CODMn NH/(mg/L)
3-N
Month /(Temperature
C) pH /(mg/L)
Oxygen
/(us/cm) /(NTU) /(mg/L) /(mg/L)
January 10.4 /(℃) 7.81 /(mg/L)
10.36 294 95 2.54 0.30
February
January 10.0 10.4 7.89 7.81 11.21 10.36 300
294 68
95 1.89
2.54 0.35
0.30
March
February 12.7 10.0 7.88 7.89 9.91 11.21 309
300 100
68 2.26
1.89 0.19
0.35
April
March 16.9 12.7 7.84 7.88 8.08 9.91 281
309 115
100 2.55
2.26 0.08
0.19
MayApril 21.8 16.9 7.95 7.84 7.71 8.08 297
281 83
115 1.57
2.55 0.08
0.08
June 24.8 7.97 6.79 286 105 1.88 0.18
May 21.8 7.95 7.71 297 83 1.57 0.08
July 27.6 7.99 6.23 208 123 2.65 0.21
June 24.8 7.97 6.79 286 105 1.88 0.18
August 29.7 7.97 6.92 264 95 1.87 0.19
July
September 27.4 27.6 8.06 7.99 7.586.23 208
294 123
105 2.65
1.67 0.21
0.20
August
October 22.8 29.7 8.10 7.97 8.206.92 264
311 95
133 1.87
1.99 0.19
0.20
September 18.9 27.4 8.12 8.06 8.97
November 7.58 294
308 105
97 1.67
1.77 0.20
0.14
October
December 13.7 22.8 8.14 8.10 11.03 8.20 311
348 133
98 1.99
1.87 0.20
0.17
November 18.9 8.12 8.97 308 97 1.77 0.14
December 13.7 8.14 11.03 348 98 1.87 0.17
As can be seen from Figures 2–7 and Tables 1 and 2, the pH value in 2016 was maintained between
7.52 and 7.96, while the pH value in 2017 was between 7.75 and 8.33, which was higher than that in
As can be seen from Figures 2–7 and Tables 1 and 2, the pH value in 2016 was maintained
2016. The minimum average values of dissolved oxygen in 2016 and 2017 appeared in July, a very
between 7.52 and 7.96, while the pH value in 2017 was between 7.75 and 8.33, which was higher
hot month, and the maximum average values of dissolved oxygen in 2016 and 2017 appeared in
than that in 2016. The minimum average values of dissolved oxygen in 2016 and 2017 appeared in
February, in the winter with low temperature, showing that the value was greatly affected by the
July, a very hot month, and the maximum average values of dissolved oxygen in 2016 and 2017
water temperature. The minimum average values of CODMn in 2016 and 2017 appeared in May, while
appeared in February, in the winter with low temperature, showing that the value was greatly
the maximum values appeared in July, according to certain rules. The maximum average values of
affected by the water temperature. The minimum average values of CODMn in 2016 and 2017
appeared in May, while the maximum values appeared in July, according to certain rules. The
Sustainability 2019, 11, 2058 6 of 14

conductivity in 2016 and 2017 appeared in December, and the minimum average values of turbidity
in 2016 and 2017 appeared in February. The minimum and maximum average values of NH3 –N
were irregular.
In order to deeply analyze the relationship between the awater quality parameters, correlation
analysis of water quality parameters was conducted for 731 groups of data collected between 1 January
2016 and 31 December 2017. For the sake of eliminating the influence of dimension, the data were
standardized, and Pearson correlation coefficient matrix was acquired. As can be seen from Table 3,
the water temperature in Guazhou Water Source of Yangtze River showed a significant relationship
with the pH value, dissolved oxygen, conductivity, turbidity, CODMn, and NH3 –N under the 0.01
significant level, showing a positive correlation with pH value and turbidity and a negative correlation
with the other parameters. In particular, a high correlation with dissolved oxygen was observed, with
a Person value of −0.93512. Dissolved oxygen was negatively correlated with water temperature
and turbidity at the significant level of 0.01; the correlation with turbidity was −0.56432, while
the correlation with CODMn was not significant. Conductivity was significantly correlated with
water temperature, dissolved oxygen, turbidity, CODMn, and NH3 –N at the significant level of 0.01;
specifically, conductivity was negatively correlated with water temperature and CODMn, while it was
positively correlated with the other parameters, especially, it was highly correlated with pH value,
with a Person value of 0.66766. The correlation between NH3 –N and pH value and turbidity was not
significant, considering the significant level of 0.01.

Table 3. Pearson correlation coefficient matrix of water quality parameters.

Water Dissolved
Parameters pH Conductivity Turbidity CODMn NH3 -N
Temperature Oxygen
Water Temperature 1.00000 0.24196 −0.93512 −0.30045 0.46194 −0.13766 −0.30422
pH 1.00000 −0.11692 0.66766 0.39667 −0.57278 0.00138
Dissolved Oxygen 1.00000 0.30731 −0.56432 0.00149 0.29651
Conductivity 1.00000 0.17924 −0.48991 0.19584
Turbidity 1.00000 0.27434 −0.00203
CODMn 1.00000 0.11754
NH3 -N 1.00000

2.2. Data Pre-Processing


Water quality automation monitoring stations likely encounter problems, such as missing data
and abnormal data during operation, due to sensor failures, network failures, or accidental events
(i.e., ships or fish shoal passing by). Data defects will eventually lead to excessive deviation between
water quality prediction results and actual monitoring values. To improve forecasting accuracy and to
provide clean, accurate, and concise data for predictive models, monitoring data of raw water quality
must be processed in advance.
In automatic water quality monitoring system, data are often missing. From the time dimension,
the monitored water quality parameters were only monitored once at a certain time. Because the time
is irreversible, the missing data can no longer be acquired, so the missing data can only be estimated
as accurately as possible. In the current study, the commonly used treatment method was to fill in
the missing part of the data and then predict water quality. As long as the method of filling in is
appropriate, the results obtained are satisfactory.
The current method of missing value imputation is mainly divided into single imputation (SI) and
multiple imputation (MI). The operation of MI is more complicated, and its cost is relatively higher.
As illustrated in Figures 2 and 3, the water quality parameters monitored by the automatic water
quality monitoring station showed continuous changes in the time dimension, and the monitoring
values showed usually certain time correlations. Therefore, this paper used a linear interpolation (LIN)
algorithm [23]. On the basis of the temporal correlation of the monitoring data, the algorithm uses a
Sustainability 2019, 11, 2058 7 of 14

linear imputation model to obtain a better estimation effect for the missing values of the monitoring
data in a short time interval.

Definition 1. Time series of water quality parameters.

Usually, the data set of water quality monitored by the same automatic monitoring station can
be regarded as a time series. It is an ordered collection composed of water quality measurement
values and measuring times. Supposing an automatic monitoring station measures the water quality
parameters at a fixed time every day, and the parameter number is j, we define

Si,n = (hyi,1 , T1 i, . . . , hyi,n , Tn i) (1)

to represent the n-length time series of water quality parameters, where yi,k is the value of the ith water
quality parameter measured by the monitoring station at observation time Tk (1 ≤ i ≤ j, 1 ≤ k ≤ n),
and, for any Tk , the sampling time interval is fixed at 4 T = Tk+1 − Tk = 1 (day).
If the observed value yi,k at Tk is missing, then we can obtain its estimated value ŷi,k and change

the problem of minimum ŷi,k − yi,k into the missing value estimation problem. For a water quality
monitoring station, based on the monitoring data yi,u and yi,v at any Ti,u and Ti,v , the linear imputation
function can be constructed as:
yi,u − yi,v
L(t) = yi,u + (t − Ti,u ) (2)
Ti,u − Ti,v

When the water quality data at a certain moment is missing, the LIN algorithm will firstly find
the two closest moments Ti,u , Ti,v (Ti,u < t < Ti,v ) and then estimate the missing value at time t by using
the monitoring data yi,u and yi,v of the two moments, based on formula (2), that is, ŷn = L(t).
In a word, the LIN algorithm can obtain a good estimation of the missing value in stationary
time series in a short period of time, but its estimation of missing values in non-stationary time
series is poor [23]. Therefore, for non-stationary time series, we used mean imputation to process the
missing data.

3. Water Quality Prediction Model Based on LSTM Deep Neural Networks

3.1. Water Quality Medium- to Long-Term Prediction

Definition 2. Time series prediction of water quality parameters.

For the n-length time series Si,n = (hyi,1 , T1 i, . . . , hyi,n , Tn i) of water quality parameters,
the prediction period length is defined as m, m ∈ {1,2, . . . }. LSTM deep neural networks are used to
realize the time-series prediction task and give a new future time series

Si, n+m = (hyi,n+1 , Tn+1 i, hyi, n+2 , Tn+2 i, . . . , hyi,n+m , Tn+m i) (3)

Because of the complexity, uncertainty, and lack of relevant information of water


quality-influencing factors, the precision of water quality prediction in several medium- to long-term
(m = 10 to 365) scales are still not satisfactory, which brings great difficulties to the scientific
decision-making of water pollution control and water quality planning and management. Aiming
at the problem, we built a prediction model on the basis of LSTM deep neural networks and deeply
learned and trained the monitoring data of water quality from the automatic monitoring station
of Guazhou Water Source in Yangzhou City in 2016 and 2017 (n = 731), to predict water quality
data for the next six months (m = 181). In this way, a more reasonable medium- to long-term water
quality prediction method was proposed to improve the accuracy of the medium- to long-term water
quality prediction.
Sustainability 2019, 11, 2058 8 of 14

3.2. LSTM Networks


As a special kind of recurrent neural network (RNN), LSTMs have the ability to learn long-term
dependencies. All RNNs take the form of chained repeating modules of neural network. LSTMs that
use purpose-built memory cells to store information also have this chained in similar structure, but the
repeating module is structured differently. As shown in Figure 8, there are four interacting layers in an
LSTM cell [24].
Sustainability 2019, 10, x FOR PEER REVIEW 8 of 14

ht −1 ht ht + 1

ct −1 ct

A A
tanh
it ot
ft ct
ht −1
σ σ tanh σ ht

x t −1 xt x t +1

Neural Network Pointwise Vector


Concatenate Copy
Layer Operation Transfer

Figure 8.
Figure 8. Long
Long short-term
short-term memory
memory (LSTM)
(LSTM) structure.
structure.

In Figure8,8,xxisisthe
In Figure input,h hand
theinput, andCCarearetwo
twomemory
memory vectors, and
vectors, andC are cellcell
C are activation
activation vectors, all all
vectors, of
which
of whichare are
the the
same sizesize
same as the hidden
as the vector
hidden h; σh;
vector 𝜎 islogistic
is the sigmoid
the logistic function.
sigmoid The task
function. The of tanh
task of is to
tanh
push the values
is to push to beto
the values between −1 and
be between 1. 1.
−1 and
For
For aa start,
start, the
the “forget
“forget gate
gate layer”
layer” decides
decides what
what information
information we we are
are going
going toto delete
delete from
from the
the cell
cell
state.
state. Secondly,
Secondly,the the“input
“inputgate
gatelayer”
layer”decides
decideswhich
whichvalues
valueswewewill
willupdate. Afterwards, aa tanh
update. Afterwards, tanh layer
layer
creates
createsaavector
vectorfor forthe
thenew
newcandidate valuesCet𝐶. Then,
candidatevalues . Then,weweupdate
update the old
the cell
old state
cell CtC−t−11 into the new
state new
cell state C
cell state Ctt . In the
the next part, the “output gate layer” decides what parts of the cell state
next part, the “output gate layer” decides what parts of the cell state wewe mean
mean to to
put
put out.
out. Finally,
Finally,we weput
putthe
thecell
cellstate throughtanh
statethrough tanhandandmultiply
multiplyititby
bythe
theoutput
outputgate.
gate.
Here
Here are
arethetheequations
equationscomputed
computedby byaacell:
cell:
1. Forget gate layer.
1. Forget gate layer.
𝑓 = 𝜎(𝑤 ∙ ℎ , 𝑥 + 𝑏) (4)
f t = σ w f ·[ht−1 , xt ] + b f (4)
2. Input gate layer.
2. Input gate layer.
𝑖it = σ(wi ·[∙htℎ−1 , x, 𝑥t ] ++bi𝑏) )
= 𝜎(𝑤 (5)
(5)

3.
3. New memory cell.
 
Ce tanh w ·[
𝐶 t = 𝑡𝑎𝑛ℎ(𝑤 f∙ ℎ t−1, 𝑥 t + 𝑏 f)
= h , x ] + b (6)
(6)

4.
4. Final memory cell.
Ct = f t ∗ Ct−1 + it ∗ Cet (7)
𝐶 =𝑓 ∗𝐶 +𝑖 ∗𝐶 (7)
5. Output gate layer.
5. Output gate layer.
ot = σ (wo ·[ht−1 , xt ] + bo ) (8)
𝑜 = 𝜎(𝑤 ∙ ℎ , 𝑥 + 𝑏 ) (8)
ht = ot + tanh(Ct ) (9)

ℎ = 𝑜 + 𝑡𝑎𝑛ℎ(𝐶 ) (9)
where w is a matrix, b is the bias, i is the input gate, f is the forget gate, and o is the output gate.
where w is a matrix, b is the bias, i is the input gate, f is the forget gate, and o is the output gate.
3.3. Water Quality Prediction Model Based on LSTM Deep Neural Networks
3.3. Water
As anQuality Prediction
extended Modelmultiple
model with Based onhidden
LSTM Deep
LSTM Neural Networks
layers, each layer of the stacked LSTM has
numerous memory cells, which makes the model earning the deep
As an extended model with multiple hidden LSTM layers, each learning
layertechnique [25]. As
of the stacked shown
LSTM has
in Figure 9, memory
numerous we designed
cells,a water
whichquality
makesprediction
the modelmodel based
earning theondeep
LSTM deep neural
learning networks.
technique [25]. As
shown in Figure 9, we designed a water quality prediction model based on LSTM deep neural
networks.
Sustainability 2019, 11, 2058 9 of 14
Sustainability 2019, 10, x FOR PEER REVIEW 9 of 14

Figure 9. Water quality prediction


prediction model
model based
based on
on LSTM
LSTM deep
deep neural
neural networks.
networks.

3.4.
3.4. Workflow
Workflow of
of Water
Water Quality
Quality Prediction
Prediction Model
Model Based
Based on
on LSTM
LSTM Deep
Deep Neural
Neural Networks
Networks
This
This paper tried to build a deep neural network architecture using Keras
paper tried to build a deep neural network architecture using Keras and
and Tensorflow
Tensorflow to
to
provide
provide water
water quality
quality forecasting
forecasting by
by means
means ofof Python
Python version
version 3.6.
3.6. The
The whole
whole specific
specific workflow
workflow of
of
water
water quality
quality prediction
prediction model
model based
based on
on deep
deep neural
neuralnetworks
networkswaswasasasfollows:
follows:
1.
1. Transform and load the water quality data from the CSV file to a pandas
pandas dataframe
dataframe which will
then be used to put out a NumPy array that will feed the LSTM.
2. Build the LSTM deep neural network model.
3. Train the model with training data.
4.
4. Run
Run aa point-by-point
point-by-point prediction
prediction for
for the
the output.
output.
If we tried to train the model on the raw data without normalizing, it would never converge
If we tried to train the model on the raw data without normalizing, it would never converge
because the water quality data is not just in the numerical range of −1 to 1. To solve this problem, we
because the water quality data is not just in the numerical range of −1 to 1. To solve this problem, we
normalized each n-sized list of training/testing data to reflect the percentage changes from the start
normalized each n-sized list of training/testing data to reflect the percentage changes from the start of
of the list. The following equation was used for normalizing:
the list. The following equation was used for normalizing:
𝑝
𝑛 
=  −1 (10)
pi𝑝
ni = −1 (10)
p
where n is a normalized list of water quality data, 0and p is a raw list of adjusted water quality data.
At thenend
where is a of the prediction
normalized list ofprocess, de-normalization
water quality data, and p iswas used
a raw list to
of get the real
adjusted water
water quality
quality data.data
At
out end
the of the
of prediction withprocess,
the prediction the following equation: was used to get the real water quality data out of
de-normalization
the prediction with the following equation: 𝑝 = 𝑝 (𝑛 + 1) (11)

pi = means
In terms of step 4, a point-by-point prediction p0 (ni +that
1) we only predict a single point before each
(11)
time, plot this point as a prediction, then take the next data listed along with the full testing data, and
In terms
finally of stepthe
predict 4, anext
point-by-point prediction means that we only predict a single point before each
point once again.
time, plot this point as a prediction, then take the next data listed along with the full testing data, and
finally predict the next point once again.
4. Experiments

4.1. Model Parameters Configuration


We used the water quality monitoring data from the automatic monitoring station of Guazhou
Water Source from January 2016 to December 2017 as the training set (time series length n = 731) and,
respectively, predicted the different six parameters in the first half of 2018 (prediction period length
m = 181). When performing the LSTM model training, the number of iterations epoch was set to 100.
Sustainability 2019, 11, 2058 10 of 14

4. Experiments

4.1. Model Parameters Configuration


We used the water quality monitoring data from the automatic monitoring station of Guazhou
Water Source from January 2016 to December 2017 as the training set (time series length n = 731) and,
respectively, predicted
Sustainability 2019, 10, xthe
FORdifferent six parameters in the first half of 2018 (prediction period10length
PEER REVIEW of 14
m = 181). When performing the LSTM model training, the number of iterations epoch was set to 100.
TheThe training
training setssets were
were grouped
grouped byby mini-batchgradient
mini-batch gradientdescent.
descent. After
After grouping,
grouping,the
thegradients
gradientswere
were
evaluated for each group, and then the parameters were updated. Mean square error (MSE) was
evaluated for each group, and then the parameters were updated. Mean square error (MSE) was
chosen as loss function. Adaptive moment estimation (Adam), which has a good effect in practical
chosen as loss function. Adaptive moment estimation (Adam), which has a good effect in practical
applications, was adopted as the LSTM model optimization algorithm, and the model update
applications, was adopted as the LSTM model optimization algorithm, and the model update weight
weight and the deviation parameters were adjusted. We set 100 neurons in each LSTM layer.
and the deviation parameters were adjusted. We set 100 neurons in each LSTM layer.
4.2. Water Quality Prediction Effects
4.2. Water Quality Prediction Effects
Taking the training set as a learning sample, deep learning and training were carried out. We
Taking the training set as a learning sample, deep learning and training were carried out. We took
took the actual monitoring data of 1 January 2018 to 30 June 2018 as the test set and compared the
the predicted
actual monitoring datathe
results with ofactual
1 January 2018 to data.
monitoring 30 June
The2018 as the
results are test set and
shown compared
in Figure 10. the predicted
results with the actual monitoring data. The results are shown in Figure 10.

(a) (b)

(c) (d)

(e) (f)
Figure
Figure 10. Comparison
10. Comparison of predicted
of predicted and actual
and actual valuesvalues of quality
of water water quality parameters.
parameters. (a) dissolved
(a) pH, (b) pH, (b)
dissolved
oxygen, oxygen, (c) conductivity,
(c) conductivity, (d) turbidity,(d)
(e)turbidity,
CODMn,(e) NH3 –N.(f) NH3–N.
(f)CODMn,

We concluded that the predicted values had a good agreement with the effective values of the
model, indicating that this model performed well in predicting the water quality parameters
(Figure 10). Our result reveals the potential of applying LSTM and deep learning to predict
drinking water quality, which can provide a reliable foundation for the formulation for water
source protection policies and concrete measures.
Sustainability 2019, 11, 2058 11 of 14

We concluded that the predicted values had a good agreement with the effective values of the
model, indicating that this model performed well in predicting the water quality parameters (Figure 10).
Our result reveals the potential of applying LSTM and deep learning to predict drinking water quality,
which can provide a reliable foundation for the formulation for water source protection policies and
concrete measures.
Sustainability 2019, 10, x FOR PEER REVIEW 11 of 14
In the course of predicting various water quality parameters, we recorded the values of loss
function after
In theeach iterating
course and drew
of predicting the change
various trend diagrams
water quality parameters, of we
lossrecorded
functionthewith respect
values to the
of loss
number of iterations
function after each epoch, as shown
iterating and drewin Figure 11. A loss
the change trendisdiagrams
a “penalty” score
of loss to reduce
function withwhen training
respect to
an algorithm
the numberon data. It is usually
of iterations epoch,called the in
as shown objective
Figure 11.function
A loss istoaoptimize.
“penalty” Drinking-water
score to reduce whenquality
prediction pursues
training the accuracy
an algorithm on data.ofItprediction results
is usually called theand the stability
objective functionoftoprediction error fluctuation.
optimize. Drinking-water
In our model, MSE was used to evaluate the degree of fitting between the predicted valueserror
quality prediction pursues the accuracy of prediction results and the stability of prediction and the
actual fluctuation.
values in the In model
our model, MSE was
analysis. used to evaluate
The smaller MSE is, thethe lower
degree the
of fitting between
dispersion the predicted
degree between the
values and the actual values in the model analysis. The smaller MSE is, the lower the dispersion
predicted values and the true values is, and the more reliable the predicted result will be [12]. As can
degree between the predicted values and the true values is, and the more reliable the predicted
be seen from Figure 11, after each round of whole training, the value of loss function always kept a
result will be [12]. As can be seen from Figure 11, after each round of whole training, the value of
decreasing trend and decreased less after a certain period of time, demonstrating that the learning
loss function always kept a decreasing trend and decreased less after a certain period of time,
outcome of this model
demonstrating thatwas good. outcome of this model was good.
the learning
-4
x 10
1.4 0.014

1.2 0.012

1 0.01
0.8 0.008
Loss

Loss

0.6 0.006
0.4 0.004
0.2 0.002
0
20 40 60 80 100 0
Epoch
20 40 60 80 100
Epoch
(a) (b)
0.035 0.08
0.03

0.025 0.06

0.02
Loss

Loss

0.04
0.015

0.01
0.02
0.005

0
20 40 60 80 100 0
Epoch 20 40 60 80 100
Epoch
(c)
(d)
0.2

0.6
0.15
0.5

0.4
Loss

Loss

0.1
0.3

0.05 0.2

0.1
0 0
20 40 60 80 100 20 40 60 80 100
Epoch
Epoch
(e) (f)

FigureFigure 11. Variation


11. Variation of loss
of the the loss function
function with
with the
the numberof
number of iterations
iterations epochs.
epochs.(a)(a)
pH, (b)(b)
pH, dissolved
dissolved
oxygen,
oxygen, (c) conductivity,
(c) conductivity, (d) turbidity,
(d) turbidity, (e) (e) CODMn,
CODMn, (f)(f)NH 3–N.
NH–N.
3

4.3. Contrastive Models


In order to verify the prediction effect of our model based on LSTM deep neural networks in
depth, this paper compared the model with two other classical time-series prediction models,
Sustainability 2019, 11, 2058 12 of 14

4.3. Contrastive Models


Sustainability 2019, 10, x FOR PEER REVIEW 12 of 14
In order to verify the prediction effect of our model based on LSTM deep neural networks in depth,
this paper compared the model with two other classical time-series prediction models, autoregressive
autoregressive integrated moving average (ARIMA) and support vector regression (SVR). We used
integrated moving average (ARIMA) and support vector regression (SVR). We used a set of 731
a set of 731 dissolved oxygen data monitored by the Guazhou automatic water quality monitoring
dissolved oxygen data monitored by the Guazhou automatic water quality monitoring station in 2016
station in 2016 and 2017 as the training set (n = 731) and respectively predicted the dissolved oxygen
and 2017 as the training set (n = 731) and respectively predicted the dissolved oxygen data in the next
data in the next 10 days (m = 10) and the next 6 months (m = 181). For model prediction accuracy,
10 days (m = 10) and the next 6 months (m = 181). For model prediction accuracy, MSE was chosen as
MSE was chosen as the unified metrics. The experiment results are shown in Figure 12 and Table 4.
the unified metrics. The experiment results are shown in Figure 12 and Table 4.

(a) (b)
Figure12.
Figure 12.Dissolved
Dissolved oxygen
oxygen prediction
prediction results
results of different
of different models.
models. (a) Prediction
(a) Prediction results
results from from 1
1 January
January
2018 to 1 2018 to 12018,
October October 2018, (b) prediction
(b) prediction results
results from from 12018
1 January January
to 302018
Juneto 30 June 2018.
2018.

Table 4. Comparison of the prediction accuracy of the three models. ARIMA, autoregressive integrated
Table 4. Comparison of the prediction accuracy of the three models. ARIMA, autoregressive
moving average, SVR, support vector regression.
integrated moving average, SVR, support vector regression.
MSEMSE Values
Values
ModelModel
PredictionPeriod
Prediction Period Length
Lengthmm= =
1010Prediction Period
Prediction LengthLength
Period m = 181m = 181
ARIMA 0.0126 2.4326
ARIMASVR 0.0126
0.0046 1.99432.4326
SVR LSTMs 0.0017
0.0046 0.00201.9943

LSTMs 0.0017 0.0020


The ARIMA model can be expressed as ARIMA (p, d, q), and, in the ARIMA modeling process,
we determined the values of the three parameters by calculating the Bayesian information criterion
(BIC) The ARIMA
value. Aftermodel
tuningcanthebe expressed we
parameters, as ARIMA
found that d, q), and,
(p, when in the
the BIC ARIMA
value modeling
was the minimum,process,
p = 0,
we determined
d = 1, and q = 2. the values of the three parameters by calculating the Bayesian information criterion
(BIC) Invalue.
the After
SVR tuning
model,the theparameters,
frequently we found
used that when
Gaussian RBFthe BICchosen
was value was the nonlinear
as the p = 0,
minimum,kernel
dfunction,
= 1, and qand = 2.the kernel function parameter σ was set to 0.001. If m = 10, we set the penalty factor C
In the
to 100, SVR
while if model,
m = 181,the wefrequently
set C to 10. used Gaussian RBF was chosen as the nonlinear kernel function,
and the kernel function parameter
Compared with the other two prediction σ was set tomodels, m dissolved
0.001. Ifthe = 10, we set the penalty
oxygen factor
prediction C to 100,
precision of
while if m = 181, we set C to 10.
our model was higher, no matter if it was for short-term or medium- to long-term predictions.
Compared
Especially, when with
thethe other twoperiod
prediction prediction
lengthmodels, thethe
m = 181, dissolved
dissolvedoxygen prediction
oxygen precision
precision of our of our
model
model was higher, no matter if it was for short-term or medium- to long-term
was obviously superior to those of ARIMA and SVR. Besides, it can be seen in Figure 11b that our predictions. Especially,
when
model the predictionquickly.
converged period length m = 181, with
Furthermore, the dissolved
the sameoxygen
amount precision
of data of
asour model
those was
of the obviously
training set,
superior
under ARIMA to thoseand of ARIMA
SVR, the and SVR. Besides,
prediction it can beoxygen
of dissolved seen in in Figure 11b six
the next thatmonths
our model converged
would become
quickly.
inaccurate Furthermore,
because of with thelong
the too sameprediction
amount ofperiod,
data aswhereas
those of LSTM
the training set, under
deep neural ARIMA
networks and
would
SVR, the prediction
still work well, withofonlydissolved oxygen
a relatively in the
small next six
training setmonths
of data.would become inaccurate because of
the too long prediction period, whereas LSTM deep neural networks would still work well, with only
a5.relatively
Conclusion smallandtraining
Futureset Workof data.

Combined with the analysis and pretreatment of the water quality data collected by the
automatic water quality monitoring station of Guazhou Water Source of the Yangtze River in
Yangzhou, this paper proposes a water quality forecasting method with the help of LSTM deep
neural networks and establishes the sample, data pre-processing, parameter setting, and learning
procedure of LSTM deep neural networks. The established prediction model can be trained and
learned automatically in the face of different water quality data samples and thus has broad
Sustainability 2019, 11, 2058 13 of 14

5. Conclusion and Future Work


Combined with the analysis and pretreatment of the water quality data collected by the automatic
water quality monitoring station of Guazhou Water Source of the Yangtze River in Yangzhou, this
paper proposes a water quality forecasting method with the help of LSTM deep neural networks and
establishes the sample, data pre-processing, parameter setting, and learning procedure of LSTM deep
neural networks. The established prediction model can be trained and learned automatically in the
face of different water quality data samples and thus has broad application scenarios. The result shows
that the built water quality model can predict the drinking-water quality in the future 6 months well,
offering a feasible approach for water quality prediction.
Our model only considered single dimensional inputs, while there are more complex datasets
with many different dimensions for sequences in water quality monitoring. Our future work will focus
on the model optimization by combining the various water quality parameters, and multi-dimensional
input datasets will be used to predict the target parameters, so as to improve the accuracy of the model.
Besides, the present research predicted the drinking-water quality data in one monitoring station in
Guazhou of the Yangtze River, and, for future research, we will take into account three monitoring
stations in the Yangzhou section of the Yangtze River (Guazhou, Shierwei, Xiaohekou of Yizheng) to
predict the water quality data, making predictions of water quality in spatial dimensions under the
hydrodynamic principle.

Author Contributions: Conceptualization, P.L. and J.W.; Methodology, P.L.; Software, P.L.; Validation, P.L., J.W.
and A.K.S.; Formal Analysis, P.L.; Investigation, P.L.; Resources, X.Y.; Data Curation, Y.X.; Writing-Original Draft
Preparation, P.L.; Writing-Review & Editing, P.L.; Visualization, P.L.; Supervision, J.W.; Project Administration,
A.K.S.; Funding Acquisition, J.W.
Funding: This research was funded by the National Natural Science Foundation of China (61772454, 61811530332,
61811540410).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Tirkolaee, E.B.; Hosseinabadi, A.A.R.; Soltani, M.; Sangaiah, A.K.; Wang, J. A Hybrid genetic algorithm for
multi-trip Green Capacitated Arc routing problem in the scope of urban services. Sustainability 2018, 10,
1366. [CrossRef]
2. Mishra, D.R.; D’Sa, E.J.; Mishra, S. Preface: Remote sensing of water resources. Remote Sens. 2018, 10, 115.
[CrossRef]
3. Gleason, C.J.; Wada, Y.; Wang, J. A hybrid of optical remote sensing and hydrological modelling improves
water balance estimation. J. Adv. Model. Earth Syst. 2017, 10, 2–17. [CrossRef]
4. Ren, Y.; Liu, Y.; Ji, S.; Sangaiah, A.K.; Wang, J. Incentive mechanism of data storage based on blockchain for
wireless sensor networks. Mob. Inf. Syst. 2018, 2018, 6874158. [CrossRef]
5. Yin, C.; Xi, J.; Sun, R.; Wang, J. Location privacy protection based on differential privacy strategy for big data
in industrial internet of things. IEEE Trans. Ind. Inform. 2018, 14, 3628–3636. [CrossRef]
6. Wang, J.; Ju, C.; Gao, Y.; Sangaiah, A.K.; Kim, G.-J. A PSO based energy efficient coverage control algorithm
for wireless sensor networks. Comput. Mater. Contin. 2018, 56, 433–446.
7. Wang, J.; Gao, Y.; Yin, X.; Li, F.; Kim, H.-J. An enhanced PEGASIS algorithm with mobile sink support for
wireless sensor networks. Wirel. Commun. Mob. Comput. 2018, 2018, 9472075. [CrossRef]
8. Wang, J.; Cao, J.; Sherratt, R.S.; Park, J.H. An improved ant colony optimization-based approach with mobile
sink for wireless sensor networks. J. Supercomput. 2018, 74, 6633–6645. [CrossRef]
9. Zheng, J.; Jiao, J.; Liping Sun, L. A modeling approach for early-warning of water bloom risk in urban lake
based on neural network. China Environ. Sci. 2017, 37, 1872–1878.
10. Li, X.H.; Ding, Y.; Si, A. Application of fuzzy comprehensive evaluation model in groundwater quality
evaluation—A case study in Baodi. Ground Water 2014, 1, 6–8.
Sustainability 2019, 11, 2058 14 of 14

11. Liu, S.; Tai, H.; Ding, Q.; Li, D.; Xu, L.; Wei, Y. A hybrid approach of support vector regression with genetic
algorithm optimization for aquaculture water quality prediction. Math. Comput. Model. 2013, 58, 458–465.
[CrossRef]
12. Zhang, C.C.; Chen, Q.W.; Xu, Q.; Zhang, X. A chlorophyll a prediction model for meiliang bay of Taihu
based on support vector machine. Acta Scientiae Circumstantiae 2013, 33, 2856–2861.
13. Theo, W.; Wong, S.H.C.; Choi, K.W.; Lee, J.H.W. Daily prediction of marine beach water quality in Hong
Kong. J. Hydro Environ. Res. 2012, 6, 164–180. [CrossRef]
14. Gazzaz, N.M.; Yousoff, M.K.; Aris, A.Z.; Juahir, H.; Ramli, M.F. Artificial neural network modeling of the
water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar. Pollut. Bull.
2012, 64, 2409–2420. [CrossRef]
15. Liu, D.J.; Zou, Z.H. Application of weighted combination model on forecasting water quality. Acta Scientiae
Circumstantiae 2012, 32, 3128–3132.
16. Clark, R.; Hakim, S.; Ostfeld, A. Handbook of Water and Wastewater Systems Protection (Protecting Critical
Infrastructure); Springer: New, York, NY, USA, 2011.
17. Maier, H.R.; Jain, A.; Dandy, G.C.; Sudheer, K.P. Methods used for the development of neural networks
for the prediction of water resource variables in river systems: Current status and future directions.
Environ. Model. Softw. 2010, 25, 891–909. [CrossRef]
18. Sangmok, L.; Donghyun, L. Improved prediction of harmful algal blooms in four major South Korea’s rivers
using deep learning models. Int. J. Environ. Res. Public Health 2018, 15, e1322.
19. Storey, M.V.; van der Gaag, B.; Burns, B.P. Advances in on-line drinking water quality monitoring and early
warning systems. Water Res. 2011, 45, 741–747. [CrossRef]
20. Yesilnacar, M.I.; Sahinkaya, E.; Naz, M.; Ozkaya, B. Neural network pre-diction of nitrate in groundwater of
Harran Plain, Turkey. Environ. Geol. 2008, 56, 19–25. [CrossRef]
21. Bouamar, M.; Ladjal, M. A comparative study of RBF neural net-work and SVM classification techniques
performed on real data fordrinking water quality. In Proceedings of the 5th International Multi-Conference
on Systems, Signals and Devices 2008, Amman, Jordan, 20–22 July 2008; pp. 1–5.
22. Huang, C.-J.; Kuo, P.-H. A deep CNN-LSTM model for particulate matter (PM2.5 ) forecasting in smart cities.
Sensors 2220, 18, 2220. [CrossRef]
23. Pan, L.; Li, J.; Luo, J. A temporal and spatial correction based missing values imputation algorithm in wireless
sensor networks. Chin. J. Comput. 2010, 33, 1–10. [CrossRef]
24. Christopher Olah. Understanding LSTM Networks. 27 August 2015. Available online: http://colah.github.
io/posts/2015-08-Understanding-LSTMs/ (accessed on 26 December 2018).
25. Jason Brownlee. Stacked Long Short-Term Memory Networks Develop Sequence Prediction Models in Keras.
18 August 2017. Available online: https://machinelearningmastery.com/stacked-long-short-term-memory-
networks/ (accessed on 19 January 2019).

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like