Professional Documents
Culture Documents
A Framework For Statistical Analysis of Water Pipeline Field Performance Data
A Framework For Statistical Analysis of Water Pipeline Field Performance Data
net/publication/334548102
CITATIONS READS
8 110
2 authors, including:
Hao Xu
Virginia Polytechnic Institute and State University
8 PUBLICATIONS 33 CITATIONS
SEE PROFILE
All content following this page was uploaded by Hao Xu on 27 February 2020.
ABSTRACT
Statistical models are prevalent in the failure and performance analysis of water pipelines. This
paper presents a comprehensive analytical framework for statistical analysis of water pipeline field
performance data. The proposed framework incorporates three objectives (i.e. failure, performance
and risk) and employs both exploratory and predictive statistical modelling approaches. The
procedures to implement this methodology are outlined, and the preliminary analyses are also
conducted on the datasets of two water utilities. As more and more utilities all over the United
States are participating in the PIPEiD project, this will hopefully make it possible to conduct
research on a regional or even global level with high confidence level. Some local-level knowledge
and findings based on the case studies are discovered or verified in the results of preliminary
analysis.
1. INTRODUCTION
Water pipeline infrastructure is an indispensable component of the whole family of infrastructure
assets. However, due to deterioration problems the pipeline infrastructure is becoming insufficient
to satisfy the social needs. A lot of research has been conducted in the past to evaluate and improve
promising innovative technologies that can reduce costs and improve the effectiveness of
operation, maintenance, and replacement of aging and failing drinking water systems. (Clair and
Sinha 2012) Hundreds of statistical models have been developed along with such kind of research.
Lots of such models focus on pipe failure prediction by use of regression (Yamijala et al. 2009) or
classification (Chen et al. 2017). Although these models have been proved effective in some cases,
they are mostly limited to some specific circumstances and fail to generalize well. To address this
issue, both comprehensive datasets and customized advanced statistical models are needed. By
examining, customizing and systematically integrating the available statistical models, water
pipeline performance prediction can be conducted on a larger scale (e.g. regional and global levels)
without compromising accuracy.
PIPEiD has been created by the SWIM Center at Virginia Tech to help understand the critical
drinking water pipeline infrastructure. It is envisioned to be a “National Database Platform for
Advanced Asset Management” that provides secure access to the aggregated data and to models
and tools which will enable the synthesis, analysis, query, and visualization of the data for decision
support. (Sinha and Sears 2017) With the help of PIPEiD, it becomes possible to integrate the
previous models and to develop a systematic and comprehensive approach in the statistical
modeling of drinking water pipeline performance prediction.
This paper presents a comprehensive framework that incorporates different modeling techniques
to achieve the three-objective statistical analysis which incorporates pipe failure, performance and
1
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
risk. In the section of preliminary analytical results, the emphasis is on the implementation of
exploratory approach which can still be helpful in data mining and knowledge discovery.
2. METHODOLOGY
The comprehensive framework for statistical analysis of water pipeline data is summarized in
Figure 1 below. It consists of three major objectives, i.e. pipe failure, performance and risk. Pipe
failure analysis is the bridge that links performance and risk. Pipe failure records is the major easily
accessible performance data, which researchers usually start with analyzing when the source of
data is limited. (Yamijala et al. 2009; Chen et al. 2017) Pipe performance and risk are relatively
complicated to analyze because it is firstly needed to define the indices or criteria of performance
and risk. Previous research has been done and validated in the development of data standard and
definition of performance for water pipes. (Clair 2013)
Likelihood Consequence
Performance Performance of Failure of Failure
Index/Rating Curve
The procedures needed to implement this framework can be summarized in five steps.
1. Correlation & Cluster Analysis: Understand the influences of different parameters which
affect the performance.
2. Failure Analysis: Predict failure numbers or failure/survival probability of water pipes
based on historical pipe break records.
3. Performance Analysis: Develop robust performance index/rating.
4. Performance Curve: Develop robust performance/condition curve to predict how pipes
perform with time.
5. Risk Analysis: Quantify the risk of pipe failures by incorporating probability of failure
(based on failure analysis in Step 2) and consequence of failure (based on geospatial data
of utilities).
3. PRELIMINARY ANALYSIS
The preliminary analyses include exploratory data analysis and some parts in failure analysis (e.g.
survival analysis). The exploratory parts (procedure 1 & 2) of the proposed methodology is
implemented and elaborated through case studies in this section. The proposed framework and
methodology are applied to two water utilities that participated in PIPEiD project. Due to the
request of anonymity in this project, their names are denoted by utility A and utility B. The two
case studies with detailed preliminary analyses are explained in this section. Due to the limited
space here, only some representative results are selected to show in each part.
2
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
Both these two utilities provide two GIS layers: information of all the pipes and information of
historical work order (repair/failure records). Using the common attribute PIPEID, these two layers
can be joined as one file. Subsequent analyses are all built upon this combined file.
Utility A is a medium sized water utility with 51484 pipe records (distribution and transmission
pipelines) in its current database. Service pipelines are not in the scope of this study and thus are
excluded. The earliest installation dates back to 1885. The work orders (repair records) are
collected since 2005. The other information provided by the utility includes road types, pressure,
friction factors, defect types, cause of failure, etc. The information of major interest is summarized
in Table 1 below.
Utility B is a large sized utility with 173106 records of pipes. This utility doesn’t record pipe types,
but according to common practice pipes with diameters equal to or larger than 16 inches are treated
as transmission pipes and those with diameters less than 16 (typically 6/8/10) inches are considered
distribution pipes. The earliest installation dates back to 1915, and the work orders (repair records)
have been collected since 2006. Additional information provided by the utility include pressure
zone, lining date, encasement, cathodic protection and so on. Table 2 summarizes some key
attributes stratified by pipe materials.
The distributions of key parameters related to pipe performance are useful for exploring underlying
rules and knowledge. For example, by stratifying the data in Utility A based on pipe type, we can
3
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
see a clear distinction in the diameter of distribution pipes and transmission pipes as is shown in
Figure 2. This knowledge confirms the common practice of Utility B that pipes with diameters
equal to or larger than 16 inches are treated as transmission pipes and those with diameters less
than 16 inches are considered distribution pipes.
Friction factor is an important indicator of internal corrosion. As the internal surface gets corroded,
the friction factor will decrease. In Figure 3, the distribution of friction factor stratified by pipe age
groups (“0~29”, “30~59”, “60~89”, “>90”) reveals a clear relationship that older pipes tend to
have rougher internal surface.
4
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
Using the full dataset of Utility A, the correlation analysis can be conducted in terms of several
numeric variables (pipe length, diameter, vintage, pressure, friction and number of repairs). As is
shown in Figure 4, the most significant correlation occurs between pipe age and friction factor.
Their correlation coefficient is -0.72, indicating a strong negative correlation between pipe age and
friction factor. As the pipes get older, the friction factor becomes lower, which indicates a rougher
internal surface. The other correlations are not particularly strong.
Figure 5 shows a clearer illustration via correlogram. The data are stratified into 4 age groups:
“0~29”, “30~59”, “60~89”, “>90”. The size and shade of each circle represents the strength of
each relationship, while the color represents the direction, either negative or positive. In
comparison, we can conclude that in the initial stage (e.g. first 30 years) pipe’s internal surface
gets significantly rougher with the increase of pipe age, while afterwards the friction factor will
not be influenced as significantly by pipe age. This finding is one step further than that found in
previous parameter distribution analysis because we do not only see the trends but also the trend
of the strength of trends.
5
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
Take Utility B as an example. The 5 sub-types of pipes are selected for investigation, i.e. CI Lined,
CI Unlined, Asphaltic DI Lined, Asphaltic DI Unlined, and PCCP Unlined. Clustering analysis is
conducted for each sub-type group of pipes. One significant problem in K-Means Clustering is
that the clustering results are not stable, different from one run to another. To account for this issue,
the K-Means clustering is run 25 times and then the result with the least SSE (Sum of Squared
Errors) is selected as the final result.
The analysis results for CI Lined pipes are shown below. It can be seen from Figure 6 that the
failed pipes (red points) are approximately evenly distributed in all pipes (black points), but some
data points can be considered as unusual because they indicate the failures occurred in pipes less
than 50 years old. Such kinds of premature failures can be spotted and investigated.
6
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
Figure 7 shows the K-Means clustering results for different designated number of clusters. It
should be noted that the values of pipe ages and diameters are all scaled and standardized before
clustering analysis so that the two axes (age and diameter) are comparable.
The results of K-Means clustering are not only dependent on the choice of initial means, but also
dependent on the number of clusters. There are several methods to determine the optimal number
of clusters, such as Elbow method, Average Silhouette method and Gap Statistic Method. Here the
average Silhouette method is used, and the result is shown in Figure 8. Both 2 and 3 clusters seem
to be acceptable.
7
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
used distributions are Weibull distribution, exponential distribution, log-normal distribution and
log-logistic distribution. When the pipe dataset is partitioned into different groups, the influence
of a grouping covariate on the survival probability can be illustrated via a categorized Kaplan-
Meier plot.
In the dataset of Utility A, the major material types include CI, DI, CU, GS, HDPE and PVC. It
should also be noted that the CI pipes installed prior to 1930 are specially sorted out as ‘CIPre1930’
in the dataset of Utility A. In this analysis, we adopt the representation and notation systems of the
8
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
utility. Based on the stratification by pipe materials, Kaplan-Meier plots for different material types
are created separately, as are shown in Figure 9 (a). CI pipes installed prior to 1930 are the most
durable among different materials’ pipes, while HDPE pipes are least likely to survive. The
parametric survival curves by subgroups are also developed. To make the plot legible, only the
survival models fitted with Weibull distribution are shown in Figure 9 (b).
4. CONCLUSION
This paper presents a comprehensive analytical framework for statistical analysis of water pipeline
field performance data. The proposed framework incorporates three objectives (i.e. failure,
performance and risk) and employs both exploratory and predictive statistical modelling approach.
Some knowledge and findings are discovered and verified in the preliminary analysis. For utilities
that do not record pipe types, pipes with diameters equal to or larger than 16 inches can be treated
as transmission pipes and those with diameter less than 16 inches can be considered distribution
pipes. In the initial stage (e.g. first 30 years) pipe’s internal surface gets rougher with the increase
of pipe age significantly, while afterwards the friction factor will not be influenced with pipe age
as significantly. This finding is not only about trends but also the trend of the strength of trends.
Premature failures can be detected and investigated. Both 2 and 3 clusters are appropriate for
cluster analysis. According to survival analysis of Utility A, CI pipes installed prior to 1930 are
the most durable among different materials’ pipes, while HDPE pipes are least likely to survive.
In the survival analysis, most of the recorded breaks in HDPE pipes occurred in the early stage,
while most of the breaks in CI pipes occurred in the late stage. There are few recorded breaks for
HDPE pipes after the age of 75 years. This is the main reason why the survival curve of HDPE
pipes exhibits a steep downward trend.
REFERENCES
Chen, T. Y.-J., Beekman, J. A., and Guikema, S. D. (2017). "Drinking water distribution systems
asset management: Statistical modelling of pipe breaks." Pipelines 2017, 173-186.
Clair, A. M., and Sinha, S. (2012). "State-of-the-technology review on water pipe condition,
deterioration and failure rate prediction models." Urban Water Journal, 9(2), 85-112.
9
Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
10