Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

water

Article
Real-Time Burst Detection in District Metering Areas
in Water Distribution System Based on Patterns of
Water Demand with Supervised Learning
Pingjie Huang, Naifu Zhu , Dibo Hou *, Jinyu Chen, Yao Xiao, Jie Yu, Guangxin Zhang and
Hongjian Zhang
State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang
University, Hangzhou 310027, China; huangpingjie@zju.edu.cn (P.H.); znf@zju.edu.cn (N.Z.);
chenjinyu@zju.edu.cn (J.C.); xiaoyaoleon@zju.edu.cn (Y.X.); yu_jie@zju.edu.cn (J.Y.); gxzhang@zju.edu.cn (G.Z.);
hj_zhang@zju.edu.cn (H.Z.)
* Correspondence: houdb@zju.edu.cn; Tel.: +86-571-8795-2241

Received: 9 November 2018; Accepted: 28 November 2018; Published: 1 December 2018 

Abstract: This paper proposes a new method to detect bursts in District Metering Areas (DMAs)
in water distribution systems. The methodology is divided into three steps. Firstly, Dynamic Time
Warping was applied to study the similarity of daily water demand, extract different patterns of
water demand, and remove abnormal patterns. In the second stage, according to different water
demand patterns, a supervised learning algorithm was adopted for burst detection, which established
a leakage identification model for each period of time, respectively, using a sliding time window.
Finally, the detection process was performed by calculating the abnormal probability of flow during a
certain period by the model and identifying whether a burst occurred according to the set threshold.
The method was validated on a case study involving a DMA with engineered pipe-burst events.
The results obtained demonstrate that the proposed method can effectively detect bursts, with a low
false-alarm rate and high accuracy.

Keywords: burst detection; district metering areas; dynamic time warping; patterns of water demand;
supervised learning

1. Introduction
Water leakage in water distribution systems (WDS) is a common issue and has caused widespread
concern in recent years [1]. One of the major forms of leakage are those caused by burst events
(high volume and short duration). An underground burst in WDS may not be reported for a long
time, resulting in a large amount of leakage [2]. Burst detection is a challenging problem that plagues
water supply industries. Bursts will not only cause serious waste of water resources, but will also
affect normal water supply [3]. For water supply industries, timely leakage detection in WDS is
of great significance for ensuring the continuance of water supply, as well as public safety. For the
purposes of improving burst detection efficiency, when trying to shorten the burst location detection
time and evaluating the pipe network’s leakage level, water supply companies usually divide the
entire pipe network into several small, separate district areas (district metering areas) for flow and
pressure monitoring, as shown in Figure 1.

Water 2018, 10, 1765; doi:10.3390/w10121765 www.mdpi.com/journal/water


Water 2018, 10, 1765 2 of 16
Water 2018, 10, x 2 of 16

Figure1.1.District
Figure Districtmetering
meteringarea
areadiagram.
diagram.

InInthethepastpastfew fewdecades,
decades,with withthe thedevelopment
developmentofofsupervisorysupervisorycontrolcontroland anddata
dataacquisition
acquisition
(SCADA) systems that can collect pressure and flow
(SCADA) systems that can collect pressure and flow data in real time, data-driven methodsdata in real time, data-driven methodshave
have prevailed in leakage detection. There are generally three
prevailed in leakage detection. There are generally three categories of data-driven methods: the categories of data-driven methods:
the classification
classification method,
method, prediction-classificationmethod,
prediction-classification method,and andthe
the statistical
statistical method.
method. MostMost of ofthese
these
methods
methods were tested and validated on DMA (district metering areas) under real circumstances [4].
were tested and validated on DMA (district metering areas) under real circumstances [4].
Related
Relatedstudiesstudies about
about DMA DMA burst detection
burst primarily
detection involve
primarily a single-inlet
involve flow meter
a single-inlet flow[5–14].
meter Mounce
[5–14].
etMounce
al. (2002) presented
et al. a neural network
(2002) presented methodology
a neural network to detecttobursts
methodology detect[5], and [5],
bursts Mounce et al. (2006)
and Mounce et al.
proposed static and time-delay artificial neural networks (ANNs)
(2006) proposed static and time-delay artificial neural networks (ANNs) to detect bursts. The to detect bursts. The results showed
results
that the time-delay
showed neural network
that the time-delay was better
neural network wasthan thethan
better staticthenetwork [6]. Mounce
static network et al. (2010)
[6]. Mounce also
et al. (2010)
combined mixture density (MDN) and fuzzy inference networks
also combined mixture density (MDN) and fuzzy inference networks (FIS) to detect bursts [7]. (FIS) to detect bursts [7]. Similarly,
Romano
Similarly, et al. (2014) proposed
Romano et al. (2014) a novel method
proposed to automatically
a novel method to detect bursts and
automatically otherbursts
detect abnormal flow
and other
events
abnormal in DMAs. The new
flow events methodology
in DMAs. The new combined
methodology AI algorithm
combined withAIthe statistical
algorithm process
with control
the statistical
(SPC) technique, as well as Bayesian inference systems [8]. Ye and
process control (SPC) technique, as well as Bayesian inference systems [8]. Ye and Fenner (2011) Fenner (2011) proposed a method
for burst detection
proposed a methodbyfor a Kalman filter (KF).
burst detection by aThis
Kalman methodfilterrequires
(KF). This a lot of normal
method historical
requires a lot ofdataset.
normal
By training a normal dataset, an optimal estimation for each time step
historical dataset. By training a normal dataset, an optimal estimation for each time step is calculated,is calculated, and the deviation
between
and the estimation
deviation between and actual values represents
estimation and actualthe size of
values bursts. [9].
represents theBesides, Ye and[9].
size of bursts. Fenner (2014)
Besides, Ye
also
andproposed
Fenner (2014) a methodology
also proposed to usea polynomial
methodology functions based on thefunctions
to use polynomial weightedbased least squares method,
on the weighted
with
leastexpectation
squares method, maximization (EM) to predict
with expectation normal (EM)
maximization flow or pressurenormal
to predict values.flow When bursts occur
or pressure in
values.
DMAs, observed data are significantly different from predictive values
When bursts occur in DMAs, observed data are significantly different from predictive values because because predictions depend on
normal historical
predictions depend dataon [10]. However,
normal the above
historical datamethods only determine
[10]. However, the above whether
methods or not
onlyburst occurs
determine
inwhether
DMA based on the predicted residual at a single time-point. When
or not burst occurs in DMA based on the predicted residual at a single time-point. When the the predicted residual exceeds
the set threshold,
predicted residual it exceeds
indicatesthe thatseta threshold,
burst has occurred.
it indicatesInthat general,
a burstthehas choice of threshold
occurred. In general,should the
be based on the maximum prediction residual of the predictive
choice of threshold should be based on the maximum prediction residual of the predictive model, model, but in practical applications,
the
butchoice of threshold
in practical often depends
applications, the choiceon experience,
of threshold thusoften
greatly limiting
depends onthe detection effect
experience, of the
thus greatly
threshold
limiting the classification
detection effect model. Besides,
of the threshold it is classification
very easy formodel. an outlier detection
Besides, at one
it is very easytime-point
for an outlierto
cause a false alarm, due to sudden water usage or a sensor fault. In
detection at one time-point to cause a false alarm, due to sudden water usage or a sensor fault. In addition, Jung and Lansey (2014)
used the Kalman
addition, Jung and filter methodology
Lansey (2014) used to estimate
the Kalman thefilter
WDSmethodology
state and detect bursts. To
to estimate thesome
WDSextent,state andit
avoided some false
detect bursts. To some alarms caused
extent, by the operation
it avoided some false of the pipecaused
alarms network by[11].
the Loureiro
operationetofal.the (2016)
pipe
introduced a renewed concept of outlier regions and utilized robust
network [11]. Loureiro et al. (2016) introduced a renewed concept of outlier regions and utilized statistics to identify abnormal flow
from
robustDMAs [12]. to identify abnormal flow from DMAs [12].
statistics
Burst
Burstdetection
detectionbased basedon onprediction
predictionclassification
classificationor orstatistical
statisticalmethods
methodsmainly mainlyrelyrelyon onlots
lotsofof
historical monitoring data under normal pipe conditions. Therefore,
historical monitoring data under normal pipe conditions. Therefore, the method based on prediction the method based on prediction
classification
classificationshould shouldeliminate
eliminateabnormal
abnormaldata dataincluded
includedininhistorical
historicaldata.
data.According
Accordingtotothe thedifferent
different
manifestations
manifestations of anomalies, time series anomalies can be mainly divided into point anomaliesand
of anomalies, time series anomalies can be mainly divided into point anomalies and
pattern
pattern(sequence)
(sequence)anomalies,
anomalies,which whichare areused
usedtotodiscover
discoveranomalies
anomaliesininaatime timeseries.
series.In
Inpast
pastresearch,
research,
ininorder
order to to
obtain the normal
obtain the normal operating patterns
operating dataset,dataset,
patterns it was widely
it wasused widelyto determine
used to the acceptable
determine the
lower
acceptable lower and upper confidence limits of each time point by statistical methods. If athe
and upper confidence limits of each time point by statistical methods. If the value at certain
value
time
at a point
certainexceeded
time point theexceeded
defined upper or lower
the defined upperboundary,
or lower theboundary,
data of thethe day needed
data of thetodaybe removed.
needed to
be removed. This data preprocessing method greatly reduces lots of normal historical data. In view
of the fact that a pipe-burst is a sustained event, this paper studies pattern anomaly instead of point
Water 2018, 10, 1765 3 of 16

This data preprocessing method greatly reduces lots of normal historical data. In view of the fact
thatWater
a pipe-burst
2018, 10, x is a sustained event, this paper studies pattern anomaly instead of point3anomaly, of 16
which refers to the pattern that is significantly different from other patterns in time series. The pattern
anomaly, which
can characterize therefers
mainto the pattern
feature that
of time is significantly
series and avoiddifferent
the bad from
effectsother patterns
of some in time series.
single-point outliers.
The
Forpattern can characterize
real historical flow data,theitmain feature
is usually of time
easy series data
to obtain and avoid the badaeffects
that contain large of some single-
amount of normal
datapoint
and aoutliers.
small amount of abnormal data. Thence, the primary question is how to distinguish those
abnormal Forflow
realdata
historical
fromflow data, it is
the normal usually
ones. easy to most
Secondly, obtainburst
data that contain
detection a large amount
methods of a
have better
normal data and a small amount of abnormal data. Thence, the primary question is how to
detection effect on larger bursts than smaller ones. This is mainly because the flow data of DMAs are
distinguish those abnormal flow data from the normal ones. Secondly, most burst detection methods
not fixed, and varies with time. Even at the same time of different days, the flow data will fluctuate
have better a detection effect on larger bursts than smaller ones. This is mainly because the flow data
within a certain normal range, and small bursts can easily be ignored. Aiming at solving the above
of DMAs are not fixed, and varies with time. Even at the same time of different days, the flow data
problems, this paper
will fluctuate withinproposes
a certain a methodology
normal range, andthat uses
small supervised
bursts can easilylearning, with
be ignored. patterns
Aiming of water
at solving
demand to detect small bursts in DMAs.
the above problems, this paper proposes a methodology that uses supervised learning, with patterns
of water demand to detect small bursts in DMAs.
2. Methodology
2.The
Methodology
method is mainly divided into three steps: (1) Firstly, using historical flow data of DMAs,
the similarity of daily
The method is water
mainlydemand
divided isintoestablished,
three steps:different patterns
(1) Firstly, of waterflow
using historical demand areDMAs,
data of extracted,
the similarity of daily water demand is established, different patterns of water
and abnormal patterns are removed; (2) normal data is divided into different time-period segments demand are extracted,
usingand abnormal
sliding patternsand
windows, are burst
removed;
events(2) normal datato
are added is normal
divided into
flowdifferent
data to time-period segmentsflow
simulate abnormal
datausing sliding windows,
(i.e., bursts). Then, theand burstflow
normal events are added
dataset to normal
(i.e., no bursts) flow dataabnormal
and the to simulate abnormal
flow datasetflow
(bursts)
data (i.e., bursts). Then, the normal flow dataset (i.e., no bursts) and the abnormal
can be trained by using the supervised learning algorithm, and the burst identification model can be flow dataset
(bursts) can be trained by using the supervised learning algorithm, and the burst identification model
established; (3) the detection process is performed by calculating the abnormal probability of flow
can be established; (3) the detection process is performed by calculating the abnormal probability of
during a certain period by the model, and identifying whether a burst occurs according to the defined
flow during a certain period by the model, and identifying whether a burst occurs according to the
threshold. The proposed method has been validated on a case study involving a DMA with engineered
defined threshold. The proposed method has been validated on a case study involving a DMA with
pipe-burst
engineeredevents (i.e., simulated
pipe-burst by opening
events (i.e., simulatedfirebyhydrants).
opening fire Therefore,
hydrants).a Therefore,
diagrammatic representation
a diagrammatic
of the burst identification
representation process
of the burst is shownprocess
identification in Figure 2. in Figure 2.
is shown

Figure Diagrammatic
2. 2.
Figure Diagrammaticrepresentation of the
representation of theburst
burstidentification
identification process.
process.
Water2018,
Water 10,x1765
2018,10, 4 4ofof16
16

2.1. Daily Patterns of Water Demand


2.1. Daily Patterns of Water Demand
As known already, time-series data change with time, yet appear to follow a certain regular
As known already, time-series data change with time, yet appear to follow a certain regular pattern;
pattern; however, there can be certain differences in different cycles. The data with the above
however, there can be certain differences in different cycles. The data with the above characteristics is
characteristics is called pseudo-period data. Since the daily water consumption in the same area is
called pseudo-period data. Since the daily water consumption in the same area is almost the same,
almost the same, hydraulic data from water distribution systems is a type of pseudo-period data as
hydraulic data from water distribution systems is a type of pseudo-period data as well, as shown
well, as shown in Figure 3. Analysis of this type of data involves an important issue: how to find out
in Figure 3. Analysis of this type of data involves an important issue: how to find out abnormal
abnormal patterns [15]. This paper presents the application of the Dynamic Time Warping (DTW)
patterns [15]. This paper presents the application of the Dynamic Time Warping (DTW) algorithm
algorithm to study the similarity of patterns in such flow data. DTW was first used widely in the
to study the similarity of patterns in such flow data. DTW was first used widely in the study of
study of speech recognition. Berndt and Clifford (1994) put forward DTW into the study of time-
speech recognition. Berndt and Clifford (1994) put forward DTW into the study of time-series data [16].
series data [16]. Currently, DTW is being widely applied to many occasions involving time-series
Currently, DTW is being widely applied to many occasions involving time-series data. DTW is a
data. DTW is a method that calculates the minimum distance between two time series by bending the
method that calculates the minimum distance between two time series by bending the time axis,
time axis, and determines the best correspondence between each point. Compared to Euclidean
and determines the best correspondence between each point. Compared to Euclidean distance,
distance, DTW can solve the problem of stretching (or compressing) and linear drift along the time
DTW can solve the problem of stretching (or compressing) and linear drift along the time axis for two
axis for two time-series data [17].
time-series data [17].

Figure 3. Several days of flow data from a District Metering Area (DMA).
Figure 3. Several days of flow data from a District Metering Area (DMA).
2.1.1. Dynamic Time Warping
2.1.1. Dynamic Time Warping
The DTW algorithm is used to find the best-matching path between two time series by
The DTW
adjusting the algorithm
correspondence is usedbetween
to find the best-matching
time points so aspath between the
to measure twosimilarity
time seriesofbytimeadjusting
series.
the correspondence between time points so as to measure the similarity of time
Specifically, there are two kinds of time-series flow data: dayi = x1 , x2 , · · · , xi , · · · xm and series. Specifically,
there
day j = yare two kinds of time-series flow data: dayi = xday , xday
1 , xi2 ,and i , xm defined
and
1 , y2 , · · · , yi , · · · ym . The DTW distance D ( dayi , day j ) between j is
day j = y1 , y2 , , yi ,  ym . The DTW distance D ( day i , day j ) between dayi and day j is
as follows:
D (dayi , day j ) = d(m, m). (1)
defined as follows:
d(i, j) = dist( xi , y j ) + min[d(i, j − 1), d(i − 1, j.), d(i − 1, j − 1)]. (2)
D ( dayi , day j ) = d ( m , m ) (1)
d(0, 0) = 0, d(i, 0) = d(0, j) = ∞, (i = 1, 2, · · · , m; j = 1, 2, · · · , m). (3)
d (i , j ) = dist ( x , y j ) + different
where dist( xi , y j ) can be calculatedi using
j − 1), d (measurements
min[ d (i ,distance i − 1, j ), d (i − 1,(e.g.,
j − 1)] . (2)
Euclidean distance,

Manhattan distance, etc.). In this paper, let dist( xi , y j ) = xi − y j .
When calculating d (0,0)the=DTW i,0) = dbetween
0 , d (distance (0, j) = ∞ (i = 1,2,
the, time series day j =day
, mi;and 1,2,j , , malgorithm
the ). needs(3)
to
construct a dynamic time-warping distance matrix with m × m, and the calculation process uses the
where dist ( xi , y j ) can be calculated using different distance measurements (e.g., Euclidean
recursive Formulation (2) to fill in the matrix. Since the calculation does not need to use the information
dist i , y j ) = xi − y j .
of the whole
distance, matrix,distance,
Manhattan the algorithm only
etc.). In thisneeds tolet
paper, save the( xunit of the current column and the unit of
the previous column in the matrix involved in the calculation. An example is shown in Figure 4.
When calculating the DTW distance between the time series dayi and day j , the algorithm
needs to construct a dynamic time-warping distance matrix with m× m , and the calculation process
uses the recursive Formulation (2) to fill in the matrix. Since the calculation does not need to use the
information of the whole matrix, the algorithm only needs to save the unit of the current column and
the unit of the previous column in the matrix involved in the calculation. An example is shown in
Figure 4.
Water 2018, 10, 1765 5 of 16
Water 2018, 10, x 5 of 16

Figure
Figure 4. An4.example
An example of a warping
of a warping path.path. The blue
The blue andcurves
and red red curves represent
represent different
different two daily
two daily flows.
flows.
2.1.2. Abnormal Daily Patterns of Water Demand
2.1.2. Abnormal
A method Daily
which Patterns
can be usedoffor
Water Demand
detecting abnormal periods in the pseudo-period data based on
DTW is Athe fact that
method DTW
which candistance
be usedbetween each period
for detecting abnormaldataperiods
is calculated
in the over the entire historical
pseudo-period data based
dataset, and the number of similar periods is determined accordingly.
on DTW is the fact that DTW distance between each period data is calculated over the entire Furthermore, it historical
can be
determined
dataset, and whether the detected
the number cycleperiods
of similar is abnormal according accordingly.
is determined to the percentage of the amount
Furthermore, it canofbe
similar cycles to the total amount of historical cycles [18,19]. In this paper,
determined whether the detected cycle is abnormal according to the percentage of the amount one cycle represents theof
daily pattern of water demand, and abnormal cycles is defined as abnormal patterns of
similar cycles to the total amount of historical cycles [18,19]. In this paper, one cycle represents the water demand.
Thepattern
daily flow data of the DMA
of water demand,inletand
is determined by theisuser’s
abnormal cycles definedwater-usage
as abnormalbehavior and of
patterns mainly
water
relates
demand.to the natural time. Therefore, historical flow data is divided into several continuous cycles
according
Theto a 24data
flow h interval.
of the In
DMApractice,
inlet less abnormal patterns
is determined can bewater-usage
by the user’s detected, and the DTW
behavior anddistance
mainly
between
relates to the natural time. Therefore, historical flow data is divided into several continuous they
different abnormal patterns may be larger. There are many kinds of abnormalities and cycles
cannot all betocompletely
according known.
a 24 h interval. This paper
In practice, less proposes
abnormalapatterns
novel method to identifyand
can be detected, abnormal
the DTW patterns,
distance
with two steps
between as follows:
different abnormal patterns may be larger. There are many kinds of abnormalities and they
1. cannot all be completely
Calculate known.between
the DTW distance This paper proposes
tested a novel
pattern method
i and each to identify
pattern abnormal dataset,
in the historical patterns,
withand
twocalculate
steps asthe
follows:
sum of the DTW distance Disti .
2.1. Sort Disti the
Calculate from DTWsmall to large.
distance Thetested
between largepattern i and
Disti , the higher abnormal
each pattern probability
in the historical of the
dataset,
and calculate the sum of the DTW distance Dist .
tested pattern.
i

2. Using Distabove
Sort the i from small toabnormal
method, large Dist
large. Thepatterns i , the demand
of water higher abnormal probability
in the historical of the
dataset tested
can be
eliminated to obtain normal patterns of water demand, which provides reliable data for establishment
pattern.
of the subsequent model.
Using the above method, abnormal patterns of water demand in the historical dataset can be
eliminated
2.2. to obtain
Burst Identification normal
Method Basedpatterns of water
on Supervised demand, which provides reliable data for
Learning
establishment of the subsequent model.
This paper presents the Random Forest for burst identification using DMA flow data. The Random
Forest [20] Identification
2.2. Burst has the advantages
Method of having
Based less manual
on Supervised setting parameters, better tolerance to noise,
Learning
high classification accuracy, and better explanations, and compared with other classification methods,
also hasThis paper
strong presents The
robustness. the Random Forest has
Random Forest for been
burstwidely
identification
applied using DMA
in many flow
fields, data.
such The
as for
Random
data mining Forest [20]
[21] or has the
pattern advantages
recognition of having
[22], and hasless
been manual
a focussetting parameters,
for research better tolerance to
as of late.
noise, high classification accuracy, and better explanations, and compared with other classification
methods,
2.2.1. also has strong
Burst Detection Based robustness.
on RandomThe Random
Forest Forest has been widely applied in many fields,
Classifier
such as for data mining [21] or pattern recognition [22], and has been a focus for research as of late.
Burst detection in DMA can be modeled as a binary classification, where the label “+1” means
a non-burst, and label “−1” means a burst. The model is trained using both normal samples and
2.2.1. Burst Detection Based on Random Forest Classifier
abnormal samples (i.e., burst events). After the model is established, each classification decision-tree
Burst detection in DMA can be modeled as a binary classification, where the label “+1” means a
non-burst, and label “−1” means a burst. The model is trained using both normal samples and
Water 2018, 10, x 6 of 16
Water 2018, 10, 1765 6 of 16
abnormal samples (i.e., burst events). After the model is established, each classification decision-tree
gives an independent classification result. The final decision is based on the most recognized
gives an independent classification result. The final decision is based on the most recognized category.
category. The final decision result is formulated as given in Equation (4). Figure 5 shows a
The final decision result is formulated as given in Equation (4). Figure 5 shows a diagrammatic
diagrammatic representation of the Random Forest:
representation of the Random Forest:
k
M ( x) = arg maxk  I (mi ( x) = T ) . (4)
T
M ( x ) = arg max ∑i=I1(mi (x) = T ).
T
(4)
i =1
where M (x) is the Random Forest mode, mi ( x) is the classification decision tree, T is the
where M ( x ) is the Random Forest mode, mi ( x ) is the classification decision tree, T is the
category, and
category, I (⋅)is isthethediscriminant
and I (·) discriminant function
function (if (if x )i ( x=) =T,T I,(mIi ((m
mi ( m x )i ( x=) =TT = 11;; otherwise,
) )= otherwise,
II((mmi (i x( )x )==TT) )==0)0 [23].
) [23].

5. Diagrammatic
Figure 5. Diagrammatic representation of the Random
Random Forest. Red and green solid circles
circles represent
represent
two different
different categories.
categories. Black
Black solid
solid circles
circles represent
represent the
the value
value of
ofan
anattribute.
attribute.

2.2.2. Node-Splitting for Random Forest


2.2.2. Node-Splitting for Random Forest
Node-splitting is the key step of the Random Forest algorithm. In this paper, Random Forest uses
Node-splitting is the key step of the Random Forest algorithm. In this paper, Random Forest
a classification and regression tree (CART) [24], and the node-splitting method is the minimum of the
uses a classification and regression tree (CART) [24], and the node-splitting method is the minimum
Gini index principle. The Gini index is used to describe the purity or uncertainty of the dataset and
of the Gini index principle. The Gini index is used to describe the purity or uncertainty of the dataset
determines the optimal dichotomy problem for the categorical variable.
and determines the optimal dichotomy problem for the categorical variable.
Assuming there are K categories, and that the probability of samples belonging to the class K is
Assuming there are K categories, and that the probability of samples belonging to the class
pk , then the Gini index of the probability distribution can be defined as the following [25]:
K is pk , then the Gini index of the probability distribution can be defined as the following [25]:
K K
K K 2
Gini( p) k=∑
=
pk (1 − pk ) = 1 −k∑
1 kpk2 .
Gini( p) = p k (1 − p k ) = 1 − p . (5)
1 = (5)
k =1 k =1
If the sample-set D is divided into two different parts, D1 and D2 , according to a certain feature f ,
If the sample-set D is divided into two different parts, D1 and D2 , according to a certain
then the formula for calculating the Gini index of set D is as follows:
feature f , then the formula for calculating the Gini index of set D is as follows:
D1 D2
Gini( D, f ) = D Gini( D1 ) + DGini( D2 ). (6)
Gini( D, f ) = D 1
Gini( D ) +D 2 Gini( D ) . (6)
1 2
D D
where Gini( D, f ) reflects data uncertainty of the divided subset. The smaller the value, the smaller the
uncertainty of the divided subset.
Water 2018, 10, 1765 7 of 16

2.2.3. Implementation of the Algorithm


The normal dataset is obtained by removing abnormal patterns in raw data using DTW,
and the abnormal dataset is obtained by adding engineered burst events into the normal dataset,
thereby obtaining the training sample set required by the Random Forest algorithm. To achieve
real-time burst detection, we used a sliding time window. The sliding time window length is set
in conjunction with the sampling frequency T (e.g., 15 min) of the DMA sensors, and the intervals
(e.g., every 2 h) for data upload. For each time period, we establish a corresponding burst detection
model. Table 1 shows the parameters of the training model.

Table 1. Parameters of the training model.

Parameters Settings
Feature attribute variables Flow data during N hours
Amount of feature attributes M (M = N/T, T is sampling frequency)
Number of trees NTree
Amount of selection attribute randomly NFeature
Categories Non-burst: 1; Burst: −1
Training dataset Normal dataset S0, Abnormal dataset S1

2.3. Performance Evaluation


The methodology was evaluated by the true positive rate (TPR) and false positive rate (FPR),
as shown by Formula (7) [26]:

TP FP
TPR = , FPR = . (7)
TP + FN FP + TN

where TPR represents the percentage of detected bursts when bursts occurred, and FPR defines the
percentage of false-burst identifications during a normal operation time.

3. Experiments and Results

3.1. Flow Data Acquisition


The method proposed in this paper is applicable to burst detection and other abnormal flow
events in DMAs [7]. In general, flow meters are installed at the DMA inlet and outlet. Flow meters
typically sample data at fixed time intervals (e.g., 1 min, 5 min, 15 min), then send data to water supply
companies at a fixed frequency (e.g., every hour, or a longer time interval) [8].
For example, the actual application was tested to verify the methodology using flow data of a
DMA in a city in China. The water consumption was caused by almost all residents. There is a single
flow sensor installed at the entrance of DMA. The flow meter collects the data at a fixed frequency of
15 min, then sends it to the water supply companies at a fixed frequency (every 2 h). Some statistical
data about this DMA is described in Table 2. Flow data from 1 January to 22 April 2018 (sixteen weeks)
were used in this study for method evaluation, as shown in Figure 6.

Table 2. Some statistical data about this DMA.

Flow (m3 /h) Total Water Daily Average Water


Month Consumption (m3 ) Consumption (m3 )
Maximum Minimum Average
January 2018 21 4 10 8179
February 2018 19 4 10 7005
261
March 2018 19 6 10 8002
April 2018 21 5 11 8089
Water 2018, 10, x 8 of 16
Water 2018, 10, 1765 8 of 16
Water 2018, 10, x 8 of 10

25

20

15
Flow (T/h)

10

0
10096
88
80
72 2018-04-23
64
56 2018-03-26
48
40
32 2018-02-26
24
16 2018-01-29
8
Time (15min) 0 2018-01-01
Figure 6. Sixteen weeks of flow data from the DMA.
Figure 6. Sixteen weeks of flow data from the DMA.
As shown in FigureFigure 6, the6.flow data
Sixteen change
weeks with
of flow time
data according
from the DMA.to a certain period, but there
As shown in Figure 6, the flow data change with time according
are some differences in different periods. Flow data is a kind of pseudo-period to a certain period, but there are
data stream.
some Water
differences in different periods. Flow data is a kind of pseudo-period data stream.
demand is greatly affected by consumer activities. This paper segments flow data into 4.5

timeWater demand
intervals is Since
(24 h). greatlytheaffected
samplingby consumer activities.
interval was 15 min,This paper
there weresegments
a total offlow data intopoints
96 sampling time 4

intervals (24
in one day. h). Since the sampling interval was 15 min, there were a total of 96 sampling points in
3.5
one day.
3
3.2. Results and Discussion About Patterns of Daily Water Demand
Standard Deviation (m³/h)

3.2. Results and Discussion About Patterns of Daily Water Demand 2.5

Sixteen weeks of flow data were studied using DTW. The ranking of abnormal probability of
eachSixteen
day is weeks of flow data were studied using DTW. The ranking of abnormal probability of each 2
from small to large, as shown in Figure 7.
day is from small to large, as shown in Figure 7. 1.5

0.5
0:00 4:00 8:00 12:00 16:00 20:00 23:45
Time

Figure 14. The standard deviation of flow data at different times.

Abnormal probability of each time period in leakage event (1m³/h) Abnormal probability of each time stage in leakage event (2 m³/h)
1 1.0

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
Abnormal Probability

Abnormal Probability

0.5 0.5

0.4 0.4

0.3 0.3

0.2 Figure 7. The ranking of abnormal probability of each day from small to large. 0.2

0.1 Figure 7. The ranking of abnormal probability of each day from small to large. 0.1

According to abnormal probability, periods with large abnormal probability can be detected.
0
0 10:15-12:00 12:15-14:00 14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00
Time
22:15-24:00 0:15-2:00 2:15-4:00 4:15-6:00 6:15-8:00 8:15-10:00
0
14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00 22:15-24:00 0:15-2:00 2:15-4:00
Time
4:15-6:00 6:15-8:00 8:15-10:00 10:15-12:00 12:15-14:00

whichAccording
are definedtoasabnormal
abnormalprobability, periods
patterns of daily with
water large abnormal
demand. probabilitydate
The corresponding canofbeabnormal
detected.
which are (a)
definedinasTable
abnormal (b)
patterns of daily water demand. The corresponding date of abnormal
patterns is shown 3.
Figure
patterns is16.
Figure The Table
7shown
and abnormal
in Table probability
3.
3 indicate of different
that patterns leakage
from Monday sizestoduring
Fridaydifferent
are veryperiods.
similar, (a)
butThe
there are
abnormalbetween
differences probability of burst events
weekdays (1 m /h) during
3
and weekends. differentpatterns
Obviously, periods; of
(b)festivals
the abnormal
suchprobability
as 19 February,
of burst events
20 February, (2 m3/h) during
21 February, Table
26 different
February, periods.
3. The corresponding
and date of
2 March (The abnormal
Spring patterns.
Festival in China) are different from
non-festivals. For some seasons, such asWeek
Date when thereDate
are changes in pipe-network
Week operating conditions,
4. Conclusions
Water 2018, 10, x 9 of 16

Water 2018, 10, 1765 9 of 16


28 January Sun. 19 February Mon. (Festival)
29 January Mon. 20 February Tues. (Festival)
30 January
the patterns of water demand Tues. to 21
differ greatly February
other patterns.Wed. (Festival)
As shown in Table 3, these patterns are
31 January Wed. 26 February Mon. (Festival)
defined as abnormal data from 29 January to 2 February and 9 April. These abnormal data must be
1 February
removed to get a pure historical Thur.
dataset. 2 March
It is crucial Mon.
to establish the (Festival)
corresponding burst identification
2 February Fri. 9 April
model for different patterns of water demand. The original historical Mon.data on weekdays (Monday to
Friday) and the normal historical data are shown in Figure 8.
Figure 7 and Table 3 indicate that patterns from Monday to Friday are very similar, but there are
differences between weekdays and
Table 3. Theweekends. Obviously,
corresponding patternspatterns.
date of abnormal of festivals such as 19 February,
20 February, 21 February, 26 February, and 2 March (The Spring Festival in China) are different from
Date
non-festivals. For some seasons, Week
such as when Date there are changesWeek in pipe-network operating
conditions, the patterns of28water
JanuarydemandSun.differ 19 February
greatly Mon.
to other (Festival)
patterns. As shown in Table 3, these
29 January Mon. 20 February Tues. (Festival)
patterns are defined as abnormal data from 29 January to 2 February and 9 April. These abnormal
30 January Tues. 21 February Wed. (Festival)
data must be removed to 31getJanuary
a pure historical
Wed. dataset. It is crucial
26 February to establish
Mon. (Festival) the corresponding burst
identification model for different
1 February patterns
Thur.of water demand.
2 March The original historical data on weekdays
Mon. (Festival)
(Monday to Friday) and the normal historical
2 February Fri. data9are shown in Figure
April Mon. 8.

(a) (b)
Figure
Figure 8.
8. (a)
(a) The
The original
original historical
historical data
data on
on weekdays,
weekdays, and
and (b)
(b) the normal historical data.

Figure 8a,b
Figure 8a,b depicts
depicts patterns
patterns of
of water
water demand
demand as being more concentrated
concentrated and
and regular
regular after
after
removing the abnormal patterns, indicating that the study of patterns of water demand is highly
removing the abnormal patterns, indicating that the study of patterns of water demand is highly
effective when based on DTW.

3.3. Results
3.3. Results and
and Discussion
Discussion About
About Burst
Burst Identification
Identification Method
Method
Flow data
Flow data from
from 1919 February
February toto 88 June
June were
were acquired
acquired from
from aa supervisory
supervisory control
control and
and data
data
acquisition (SCADA)
acquisition (SCADA) system.
system. Firstly,
Firstly, itit needed
needed toto remove
remove abnormal
abnormal patterns
patterns in
in aa historical
historical dataset,
dataset,
and then to divide normal data into patterns on weekdays and weekends. The method
and then to divide normal data into patterns on weekdays and weekends. The method was validated was validated
by simulating
by simulating burst
burst events
events through
through opening
opening firefire hydrants
hydrants in
in the
the DMA
DMA [8].
[8]. Experimental
Experimental information
information
can be
can be seen
seen in
in Table
Table 4.
4.
Table 4. Details about two simulated bursts in the DMA.
Table 4. Details about two simulated bursts in the DMA.
Hydrant Opened Hydrant Closed Leakage Percent Average
Leakage Hydrant Opened
Leakage Event Hydrant Closed Leakage 3 Percent Average Inflow
Time Time Rate/(m /h) Inflow (%)
Event Time Time Rate/(m³/h) (%)
E1 E1 6 June 2018—9:30
6 June 2018—9:30 7 June 2018—10:30 About
7 June 2018—10:30 About
1.0 1.0 About
About1010
E2 E2 7 June 2018—14:00
7 June 2018—14:00 8 June 2018—15:00 About
8 June 2018—15:00 About
2.0 2.0 About
About2020

Burst events where their size was 1 m 3 /hwere


m³/h wereattached
attachedtotothe
thenormal
normal dataset
dataset from
from 19
19 February
February
to
to 8 June 2018 to obtain abnormal
abnormal data
data with
with burst
burst events.
events. In
In order
order to
to have
have real-time
real-time burst
burst detection,
detection,
Water 2018, 10, 1765 10 of 16
Water 2018, 10, x 10 of 16

combined with the sampling frequency T (15 min) of the DMA sensors and the interval (every 2 h),
combined with the sampling frequency T (15 min) of the DMA sensors and the interval (every 2 h),
the sliding time window can be set to eight sampling points (120 min ÷ 15 min). One day can be
the sliding time window can be set to eight sampling points (120 min ÷ 15 min). One day can be
divided into 12 time periods, and the time window slides once every 2 h. This paper selected 60 days
divided into 12 time periods, and the time window slides once every 2 h. This paper selected 60 days
of historical dataset as the training dataset, with the next 10 days as the test dataset. The supervised
of historical dataset as the training dataset, with the next 10 days as the test dataset. The supervised
learning method was used to train the model, and the corresponding burst identification method was
learning method was used to train the model, and the corresponding burst identification method was
modeled for twelve time periods (0:15–2:00, 2:15–4:00, 4:15–6:00, 6:15–8:00, 8:15:10:00, 10:15–12:00,
modeled for twelve time periods (0:15–2:00, 2:15–4:00, 4:15–6:00, 6:15–8:00, 8:15:10:00, 10:15–12:00,
12:15–14:00, 14:15–16:00, 16:15–18:00, 18:15–20:00, 20:15–22:00, and 22:15–24:00). Table 5 shows the
12:15–14:00, 14:15–16:00, 16:15–18:00, 18:15–20:00, 20:15–22:00, and 22:15–24:00). Table 5 shows the
parameters of the training model.
parameters of the training model.

Table 5. Parameters of the training model.


Table 5. Parameters of the training model.
Parameters Settings
Parameters Settings
Feature attribute variables Flow data during 2 h
Feature attribute variables Flow data during 2 h
Amount of feature attributes 8
Amount of feature attributes 8
Number
Number ofof
trees
trees 10
10
Amount of selection
Amount attribute
of selection attributerandomly
randomly 66
Categories
Categories Non-burst:
Non-burst:1;1;Burst: −1−1
Burst:
Training
Training dataset
dataset Normal data: 60 days, Abnormal
Normal data: 60 days, Abnormal data: 60 60
data: days
days

For instance, Figure 9 depicts the results obtained by burst identification, with patterns of the
water demand system when burst events occurred. The observed flow data when simulative burst
events occurred is depicted in Figure 9a, 9a, and
and the
the test
test result
resultisisdepicted
depictedin
inFigure
Figure9b.
9(b).
TheThe
redred parts
parts in
in Figure
Figure 9a9a indicate
indicate burst
burst events.InInFigure
events. Figure9b,
9b,the
thered
redpart
partindicates
indicates that
that bursts
bursts have been detected,
detected,
and the blank part indicates no burst. Furthermore,
Furthermore, the time of the missed burst detection is depicted
in Figure 9c.

Figure 9. Results obtained


obtained by burst identification, with patterns
patterns of the water demand system when
burst events occurred. The red parts in
The red parts in (a) indicate burst events. In
In (b),
(b), the
the red part indicates that
bursts have been detected,
detected, and the blank part indicates no burst. (c) indicate
indicate the time of the missed
burst alarm.

Conversely, Figure
Conversely, Figure 10
10 shows
shows the
the results
results obtained by a
obtained by a burst
burst identification, with patterns
identification, with patterns of
of the
the
water demand system when no burst event occurred. The observed flow data when no
water demand system when no burst event occurred. The observed flow data when no burst event burst event
occurred is depicted in Figure 10a, and the test result is depicted in Figure 10b. In Figure 10b, the red
part indicates that bursts have been detected, and the blank part indicates no burst. The time of the
false-burst alarm is depicted in Figure 10c.
Water 2018, 10, x 11 of 16
Water 2018, 10, x
1765 1111of
of 16
16

Figure 10. Results obtained by the burst identification, with patterns of the water demand system
Figure 10. Results
Resultsobtained
obtainedby bythe
theburst identification,
burst with
identification, patterns
with of the
patterns ofwater demand
the water system
demand when
system
when no burst event occurred. (a) depicts the observed flow data when no burst events occurred. In
no burst
when no event occurred.
burst event (a) depicts
occurred. the observed
(a) depicts flow flow
the observed data data
whenwhen
no burst events
no burst occurred.
events In (b),
occurred. In
(b), the red part indicates that bursts have been detected, and the blank part indicates no burst. The
the red
(b), the part indicates
red part that bursts
indicates have have
that bursts been detected, and the
been detected, blank
and the part
blankindicates no burst.
part indicates noThe time
burst. of
The
time of the false-burst alarm is depicted in (c).
the false-burst alarm is depicted in (c).
time of the false-burst alarm is depicted in (c).
TPR and
TPR and FPR
and FPR were
FPR were calculated
were calculated
calculated forfor burst
for burst identification
burst identification with
identification with patterns
with patterns of
patterns of the
of the water
thewater demand
waterdemand system.
demandsystem.
system.
TPR
The system
The system
system hashas a high
has aa high
high TPRTPR (98.3%)
TPR (98.3%) with a low FPR (6.7%).
The (98.3%) with
with aa low
low FPR
FPR (6.7%).
(6.7%).
In
In previous research,
previous research, mostmost burst
burst detection
detection methods
methods (prediction-classification,
(prediction-classification, or statistical
or statistical
statistical
In previous research, most burst detection methods (prediction-classification, or
methods based
methods based
based onon normal
on normal flow
normal flow data)
flow data) were
data) were tested
were tested and
tested and validated
and validated on
validated on DMAs.
on DMAs.
DMAs. In In this
In this paper,
this paper, the
paper, the proposed
the proposed
proposed
methods
method
method was was compared
wascompared
comparedwith with other
withother methods,
othermethods,
methods, such as
such the Kalman
as the KalmanFiltering
Filtering(KF) method [27,28], and
method such as the Kalman Filtering (KF)(KF)
methodmethod [27,28],
[27,28], and
statistical process
and statistical control
process (SPC)
control [29,30].
(SPC) As a Detectability
[29,30]. Comparison ofofrandom forest (RF) with
statistical process control (SPC) [29,30]. As As a Detectability
a Detectability Comparisonof
Comparison randomforest
random forest(RF)
(RF) with
with
patterns
patterns of
of water
water demand,
demand, the
the KF
KF and
and SPC
SPC methods
methods are
are depicted
depicted in
in Figures
Figures 11
11 and
and 12.
12. In
In these
these two
two
patterns of water demand, the KF and SPC methods are depicted in Figures 11 and 12. In these two
figures, the
figures, the
red
the red
part
red part
indicates
part indicates
that
indicates that
bursts
that bursts
have
bursts have
been
have been
detected
been detected
and
detected and
the
and the
blank
the blank
blank part
part indicates
part indicates
indicates no
no burst.
no burst.
burst.
figures,
Figure
Figure
11 shows
11
shows thethe results
resultsof ofdifferent
differentmethods
methodswhen whenburst
burstevents
events occurred.
occurred. On
Onthe
the contrary,
contrary,
Figure
Figure 12
Figure 11 shows the results of different methods when burst events occurred. On the contrary, Figure
12 shows
shows
the
thethe
results
results
of different
of different
methods
methods when
when no
no no
burst
burst
event
event
occurred.
occurred.
By
By By
observing
observing
Figures
Figures
11
11 and
and
12,
12 shows results of different methods when burst event occurred. observing Figures 11 and
12, it’s obvious to draw a conclusion that the method proposed in this paper has a higher TPR and a
12, it’s obvious to draw a conclusion that the method proposed in this paper has a higher TPR and aa
it’s obvious to draw a conclusion that the method proposed in this paper has a higher TPR and
lower FPR.
lower FPR.
lower FPR.

Figure 11.
Figure Results obtained
11. Results obtained using
using (a)
(a) an
an random
random forest
forest (RF)
(RF) with
with patterns
patterns of
of water
water demand,
demand, (b)
(b) the
the
Figure
Kalman11. Resultsmethod,
Filtering obtainedand
using (a) Statistical
(c) the
the an random forest Control
Process (RF) with patterns
method of water
when demand,
burst events
events (b) the
occurred.
Kalman Filtering method, and (c) Statistical Process Control method when burst occurred.
Kalman Filtering method, and (c) the Statistical Process Control method when burst events occurred.
Water 2018, 10, 1765 12 of 16
Water 2018, 10, x 12 of 16

Figure 12. Results obtained using (a) an RF with


Figure 12. with patterns
patterns of
of water
water demand,
demand, (b) the
the Kalman
Kalman Filtering
Filtering
method, and (c) the Statistical Process Control method when no burst event occurred.
method, and (c) the Statistical Process Control method when no burst event occurred.

As
As shown
shown inin Table
Table 6,6, compared
compared withwith the
the KFKF and
and SPCSPC methods,
methods, the the proposed
proposed method
method performs
performs
better
better regardless of TPR or FPR. The SPC method is very ineffective for small bursts and large
regardless of TPR or FPR. The SPC method is very ineffective for small bursts and large
fluctuations
fluctuations inin water
water demand.
demand. The The main
main reason
reason for for this
this is
is that
that the
the SPC
SPC method
method is is point
point anomaly
anomaly
detection based on
detection based on statistical
statisticalmethods,
methods,and anddefines
definesa anormal
normal range
range (mainly
(mainly using
using thethe
mean meanandand the
the standard deviation). If the normal range is set to be small, false alarms
standard deviation). If the normal range is set to be small, false alarms are easily caused when theare easily caused when
the fluctuation
fluctuation of water
of water demand
demand is large.
is large. If If
thethe normalrange
normal rangeisislarge,
large,small
smallburst
burst events
events are
are easily
easily
ignored. Moreover, the
ignored. Moreover, the Kalman
Kalman filter
filter is
is used
used to to model
model normal
normal waterwater demand,
demand, and and thethe residual
residual
between
between thethe predictive
predictive flow
flow and
and the
the actual
actual flow
flow can
can indicate
indicate the the size
size ofof bursts. Essentially, it
bursts. Essentially, it is
is still
still
point-anomaly detection, so when the water usage changes greatly, it can also
point-anomaly detection, so when the water usage changes greatly, it can also easily cause false easily cause false alarms.
The burstThe
alarms. detection method based
burst detection methodon patterns
based onofpatterns
water demandof water anddemand
the Random Forest
and the classification
Random Forest
algorithm
classification algorithm proposed in this paper has a lower false-alarm rate while being burst
proposed in this paper has a lower false-alarm rate while being more sensitive to more
events.
sensitiveIntoactual
burstsituations,
events. In burst
actualevents in DMA
situations, burstrarely
events occur,
in DMAso the falseoccur,
rarely alarm soratetheshould be as
false alarm
small as possible.
rate should Otherwise,
be as small a highOtherwise,
as possible. false-alarmaratehighwill bring unnecessary
false-alarm laborunnecessary
rate will bring to the management
labor to
of water supply companies.
the management of water supply companies.

Table6.6. Detectability
Table Detectability Comparison
Comparison of of RF
RF with
with patterns
patterns of
of water
water demand,
demand, the
the Kalman
Kalman Filtering
Filtering (KF)
(KF)
and statistical process control (SPC) methods.
and statistical process control (SPC) methods.
Methods
Methods TPR (%)(%)
TPR FPR
FPR(%)
(%)
RF
RFwith
withpatterns
patternsof of
water demand
water demand 98.3
98.3 6.7
6.7
The
TheKalman
Kalman Filtering
Filtering 77.5
77.5 26.9
26.9
Statistical process control 64.4 31.7
Statistical process control 64.4 31.7

In addition, using flow data between 2:00 and 4:00 at night, we compared the proposed method
and the Minimum Night Flow (MNF) method, which is mostly widely used [31]. From Table 7, it is is
apparent 3
apparent that
thatMNF
MNFisissensitive toto
sensitive small burst
small events
burst (e.g.,
events flowflow
(e.g., rate rate
of 1 m
of 1/h), but there
m3/h), are some
but there false
are some
alarms. However,
false alarms. the proposed
However, method
the proposed in this in
method paper
this is able is
paper to able
detect
to small
detectbursts
smallwithout false alarms.
bursts without false
alarms.
Table 7. Detectability Comparison of RF with diurnal demand pattern and Minimum Night Flow (MNF).
Table 7. Detectability Comparison of RF with diurnal demand pattern and Minimum Night Flow
Methods TPR (%) FPR (%)
(MNF).
RF with patterns of water demand 100 0
Methods
MNF TPR
100 (%) FPR 10.5(%)
RF with patterns of water demand 100 0
MNF 100 10.5
Water 2018,Water
10, x2018, 10, 1765 13 of 16 13 of 16

Since burst events may occur at any time, this paper studied the detection performance of the
Water 2018, 10,Since
x burst events may occur at any time, this paper studied the detection performance of the 8 of 10
model atmodel
different time periods. The results are shown in Figure 13.
at different time periods. The results are shown in Figure 13.
True Positive Rate

25

20

15
Flow (T/h)

10
False Positive Rate

0
10096
88
80
Figure 13. The true-positive
72 rate (TPR) and false-positive rate (FPR) of the model2018-04-23
at different
Figure 13. The true-positive 64rate
56
(TPR) and false-positive rate (FPR) of the model at different
2018-03-26
time
time periods. 48
40
periods. 32 2018-02-26
24
16 2018-01-29
Figure 13 shows thatTime
false-positive
(15min)
rate is8 high between 4:15–8:00 and 18:15–24:00. This paper uses
0 2018-01-01
Figure 13 shows that false-positive rate to
the standard deviation of flow at each time is reflect
high the magnitude
between of waterand
4:15–8:00 fluctuations. FigureThis
18:15–24:00. 14 paper
shows the standard
uses the standard deviation deviation
of flowofatflow eachdata
timeat each time. It
to reflect can
the be seen thatof
magnitude the waterfluctuations.
water usage varies Figure
relatively greatly during the two time periods of 5:45–8:00 and 19:45–23:00. This results in a high
14 shows the standard deviation Figure
false-positive rate during 6. flow
of
periods Sixteen weeks
data
when water of flow
at each
usage
time.dataItfrom
fluctuations are the
can be DMA.
seen
large.
that the water usage varies
relatively greatly during the two time periods of 5:45–8:00 and 19:45–23:00. This results in a high
false-positive rate during periods when water usage fluctuations are large. 4.5

3.5

3
Standard Deviation (m³/h)

2.5

1.5

0.5
0:00 4:00 8:00 12:00 16:00 20:00 23:45
Time

Figure 14. The standard deviation of flow data at different times.


Figure 14. The standard deviation of flow data at different times.
Finally, the simulated burst events from 6 June to 8 June 2018 were used to verify the proposed
methodology. The red 14.
Figure curve
Therepresents
standardflow data under
deviation burst
of flow events,
data and the times.
at different blue curve represents
flow data under normal water demand (see Figure 15).
Finally, the simulated burst events from 6 June to 8 June 2018 were used to verify the proposed
Abnormal probability of each time stage in leakage event (2 m³/h)

methodology. The red curve represents flow data under burst events, and the blue curve represents
Abnormal probability of each time period in leakage event (1m³/h)
1 1.0

0.9 0.9

flow data under normal water demand (see Figure 15).


0.8 0.8

0.7 0.7

0.6 0.6
Abnormal Probability

Abnormal Probability

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 10:15-12:00 12:15-14:00 14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00 22:15-24:00 0:15-2:00 2:15-4:00 4:15-6:00 6:15-8:00 8:15-10:00 14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00 22:15-24:00 0:15-2:00 2:15-4:00 4:15-6:00 6:15-8:00 8:15-10:00 10:15-12:00 12:15-14:00
Time Time
Flow (T/
10

Water 2018, 10, 1765 0 14 of 16


Water 2018, 10, x 10096 14 of 16
88
80
72 2018-04-23
64
56 2018-03-26
48
40
32 2018-02-26
24
16 2018-01-29
8
Time (15min) 0 2018-01-01

Figure 6. Sixteen weeks of flow data from the DMA.

4.5

3.5

3
Standard Deviation (m³/h)

2.5

1.5

Figure
Figure 15. 15.
FlowFlow data
data
1
underburst
under burstevents
events and
and flow
flowdata
dataunder
undernormal water
normal demand.
water demand.
Figure 16a,b shows that when a burst occurs, the best detection time-period is between 2:15–4:00 0.5
0:00 4:00 8:00 12:00 16:00 20:00 23:45
Figure 16a,b shows that when a burst occurs, the best detection time-period is between 2:15–4:00 Time
at night. When the leakage rate is 10% of the average inflow per hour, the method could detect all
at night.
burstWhen
eventsthe leakage ratealarm.
is 10% of the average inflow per hour, the method
it is to could detect all
withoutFigure 14. The standard
a false Also, deviation
the largerofthe
flow data at rate,
leakage different
thetimes.
easier be detected.
burstBurst
events without a false alarm. Also, the larger the leakage rate, the easier
detection in DMA can be modeled as a binary classification, so a threshold value is neededit is to be detected.
to
Burstdistinguish
detection whether
in DMAbursts
can be modeled
have as[32].
occurred a binary
If the classification, so a threshold
threshold of abnormal probabilityvalue is needed
is set to 0.5, to
distinguish whetherrapid
almost real-time bursts have
burst occurred
detection can [32]. If the threshold of abnormal probability is set to 0.5,
be achieved.
almost real-time rapid burst detection can be achieved.
Abnormal probability of each time period in leakage event (1m³/h) Abnormal probability of each time stage in leakage event (2 m³/h)
1 1.0

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
Abnormal Probability

Abnormal Probability

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 10:15-12:00 12:15-14:00 14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00 22:15-24:00 0:15-2:00 2:15-4:00 4:15-6:00 6:15-8:00 8:15-10:00 14:15-16:00 16:15-18:00 18:15-20:00 20:15-22:00 22:15-24:00 0:15-2:00 2:15-4:00 4:15-6:00 6:15-8:00 8:15-10:00 10:15-12:00 12:15-14:00
Time Time

(a) (b)
Figure 16.
Figure 16. The
The abnormal
abnormal probability
probability of different leakage
of different leakage sizes
sizes during
during different
different periods.
periods. (a) The
(a) The
abnormal probability
abnormal probabilityof ofburst
burstevents
events(1(1mm 3/h)during
3 /h) duringdifferent
differentperiods;
periods;(b)
(b)the
the abnormal
abnormal probability
probability of
of burst
burst events
events (2 m
(2 m 3 /h)/h)
(a)
3 during
during different
different periods.
periods. (b)
Figure
4. 16. The abnormal probability of different leakage sizes during different periods. (a) The
Conclusions
4. Conclusions
abnormal probability of burst events (1 m3/h) during different periods; (b) the abnormal probability
The paper proposed a new burst detection methodology and showed the effect of the actual
of burst events (2 m3/h) during different periods.
application in a DMA. According to the analysis results, it indicated that patterns of water demand and
supervised learning methods can improve the effect of burst-event detection. Here are the following
4. Conclusions
conclusions:
The This
1. paper proposed
paper studiedadifferent
new burst detection
patterns methodology
of water demand in aandDMA.showed the effect
The method of thecan
proposed actual
application in a DMA.
effectively According
remove abnormal to historical
the analysis
dataresults, it indicated
to obtain thatand
normal data, patterns of water
distinguish demand
different
and supervised
patterns learning methodsusing
of water demand can improve the considers
DTW. It fully effect of the
burst-event detection. Here
use of characteristics of DMAare the
followingflow
conclusions:
data.
2. The proposed method had a better performance on real-time burst detection using supervised
1. This paper studied different patterns of water demand in a DMA. The method proposed can
learning with patterns of water demand. It indicated that differentiating patterns of water
effectively remove abnormal historical data to obtain normal data, and distinguish different
demand could improve the accuracy of burst detection. Moreover, the performance of the burst
patterns of water
detection demand
is different using DTW.
at different It fully considers
time periods, the use of
with high accuracy at characteristics
night (2:15–4:00)ofand
DMAlow flow
data.
2. The proposed method had a better performance on real-time burst detection using supervised
learning with patterns of water demand. It indicated that differentiating patterns of water
demand could improve the accuracy of burst detection. Moreover, the performance of the burst
detection is different at different time periods, with high accuracy at night (2:15–4:00) and low
Water 2018, 10, 1765 15 of 16

accuracy around breakfast time (5:45–8:00) or during early evenings (19:45–23:00). The advantage
of burst detection based on the supervised learning method is that it can guarantee high accuracy
and a low false-alarm rate, even if the burst is small.
3. Although the methodology has a relatively low false-positive rate, some false alarms can
still occur in the detection process, particularly if the consumption of consumers is unusual.
Additional studies should be carried out in order to further decrease the false-alarm rate.

In summary, the proposed method can successfully detect bursts, with a low false-alarm rate and
high accuracy. With the development of the SCADA system for urban water distribution networks,
the proposed method shows great promise in its future application to DMAs.

Author Contributions: Conceptualization, N.Z. and D.H.; methodology, N.Z.; validation, P.H.; formal analysis,
N.Z.; investigation, Y.X.; data curation, N.Z. and J.C.; writing—original draft preparation, N.Z.; writing-review
and editing, all of the authors; project administration, D.H.
Funding: This work was funded by the National Natural Science Foundation of China (No. U1509208; 61573313;
6180333), the National Key R&D Program of China (No. 2017YFC1403801), and the Key Technology Research and
Development Program of Zhejiang Province (No. 2015C03G2010034).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Azevedo, B.B.; Saurin, T.A. Losses in water distribution systems: A complexity theory perspective.
Water Resour. Manag. 2018, 32, 2919–2936. [CrossRef]
2. Moors, J.; Scholten, L.; van der Hoek, J.P.; Besten, J.D. Automated leak localization performance without
detailed demand distribution data. Urban Water J. 2018, 15, 116–123. [CrossRef]
3. Wu, Y.; Liu, S.; Wu, X.; Liu, Y.; Guan, Y. Burst detection in district metering areas using a data driven
clustering algorithm. Water Res. 2016, 100, 28–37. [CrossRef]
4. Wu, Y.; Liu, S. A review of data-driven approaches for burst detection in water distribution systems.
Urban Water J. 2017, 14, 972–983. [CrossRef]
5. Mounce, S.R.; Day, A.J.; Wood, A.S.; Khan, A.; Widdop, P.D.; Machell, J. A neural network approach to burst
detection. Water Sci. Technol. 2002, 45, 237–246. [CrossRef]
6. Mounce, S.; Machell, J. Burst detection using hydraulic data from water distribution systems with artificial
neural networks. Urban Water J. 2006, 3, 21–31. [CrossRef]
7. Mounce, S.R.; Boxall, J.B.; Machell, J. Development and verification of an online artificial intelligence system
for detection of bursts and other abnormal flows. J. Water Resour. Plan. Manag. 2010, 136, 309–318. [CrossRef]
8. Romano, M.; Kapelan, Z.; Savić, D.A. Automated detection of pipe bursts and other events in water
distribution systems. J. Water Resour. Plan. Manag. 2012, 140, 457–467. [CrossRef]
9. Ye, G.; Fenner, R.A. Kalman filtering of hydraulic measurements for burst detection in water distribution
systems. J. Pipeline Syst. Eng. Pract. 2011, 2, 14–22. [CrossRef]
10. Ye, G.; Fenner, R.A. Weighted least squares with expectation-maximization algorithm for burst detection in
U.K. water distribution systems. J. Water Resour. Plan. Manag. 2014, 140, 417–424. [CrossRef]
11. Jung, D.; Lansey, K. Burst detection in water distribution system using the extended kalman filter.
Procedia Eng. 2014, 70, 902–906. [CrossRef]
12. Loureiro, D.; Amado, C.; Martins, A.; Vitorino, D.; Mamade, A.; Coelho, S.T. Water distribution systems
flow monitoring and anomalous event detection: A practical approach. Urban Water J. 2016, 13, 242–252.
[CrossRef]
13. Eliades, D.G.; Polycarpou, M.M. Leakage fault detection in district metered areas of water distribution
systems. J. Hydroinform. 2012, 14, 992–1005. [CrossRef]
14. Mounce, S.; Mounce, R.; Boxall, J. Novelty detection for time series data analysis in water distribution
systems using support vector machines. J. Hydroinform. 2010, 13, 672–686. [CrossRef]
15. Ma, J.; Sun, L.; Wang, H.; Zhang, Y.; Aickelin, U. Supervised anomaly detection in uncertain pseudo periodic
data streams. ACM Trans. Internet Technol. 2016, 16, 1–20. [CrossRef]
16. Berndt, D.J.; Clifford, J. Using Dynamic Time Warping to Find Patterns in Time Series. In Proceedings of the
AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, WA, USA, 31 July 1994; pp. 229–248.
Water 2018, 10, 1765 16 of 16

17. Long, X.; Fonseca, P.; Foussier, J.; Haakma, R.; Aarts, R.M. Sleep and wake classification with actigraphy and
respiratory effort using dynamic warping. IEEE J. Biomed. Health Inform. 2014, 18, 1272–1284. [CrossRef]
18. Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition.
Read. Speech Recogn. 1990, 26, 159–165.
19. Douglass, A.C.S.; Harley, J.B. Dynamic time warping temperature compensation for guided wave structural
health monitoring. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2018, 65, 851–861. [CrossRef]
20. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
21. Verikas, A.; Gelzinis, A.; Bacauskiene, M. Mining data with random forests: A survey and results of new
tests. Pattern Recogn. 2011, 44, 330–349. [CrossRef]
22. Désir, C.; Bernard, S.; Petitjean, C.; Heutte, L. One class random forests. Pattern Recogn. 2013, 46, 3490–3506.
[CrossRef]
23. Bosch, A.; Zisserman, A.; Munoz, X. Image Classification using Random Forests and Ferns. In Proceedings of
the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007;
pp. 1–8.
24. Razi, M.A.; Athappilly, K. A comparative predictive analysis of neural networks (NNs), nonlinear regression
and classification and regression tree (CART) models. Expert Syst. Appl. 2005, 29, 65–74. [CrossRef]
25. Vega, R.B.; Sánchez Valdés, C.L.; Cortiñas Abrahantes, C.J.; Castro, P.O.; González Rubio, C.D.; Castro, P.M.
Classification of dengue hemorrhagic fever using decision trees in the early phase of the disease. Rev. Cubana
Med. Trop. 2012, 64, 35–42.
26. Oliker, N.; Ostfeld, A. Network hydraulics inclusion in water quality event detection using multiple sensor
stations data. Water Res. 2015, 80, 47–58. [CrossRef]
27. Okeya, I.; Kapelan, Z.; Hutton, C.; Naga, D. Online burst detection in a water distribution system using the
Kalman filter and hydraulic modelling. Procedia Eng. 2014, 89, 418–427. [CrossRef]
28. Choi, D.Y.; Kim, S.W.; Choi, M.A.; Zong, W.G. Adaptive Kalman filter based on adjustable sampling interval
in burst detection for water distribution system. Water 2016, 8, 142. [CrossRef]
29. Jung, D.; Kang, D.; Liu, J.; Lansey, K. Improving the rapidity of responses to pipe burst in water distribution
systems: A comparison of statistical process control methods. J. Hydroinform. 2015, 17, 307. [CrossRef]
30. Buchberger, S.G.; Nadimpalli, G. Leak estimation in water distribution systems by statistical analysis of flow
readings. J. Water Resour. Plan. Manag. 2004, 130, 321–329. [CrossRef]
31. Alkasseh, J.M.A.; Adlan, M.N.; Aziz, H.A.; Hanif, A.B.M. Applying minimum night flow to estimate water
loss using statistical modeling: A case study in Kinta Valley, Malaysia. Water Resour. Manag. 2013, 27,
1439–1455. [CrossRef]
32. Hutton, C.; Kapelan, Z. Real-time burst detection in water distribution systems using a Bayesian demand
forecasting methodology. Procedia Eng. 2015, 119, 13–18. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like