A Framework For Statistical Analysis of Water Pipeline Field Performance Data

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/334548102
A Framework for Statistical Analysis of Water Pipeline Field Performance Data
Conference Paper · July 2019

DOI: 10.1061/9780784482506.019
CITATIONS READS
8 110
2 authors, including:
Hao Xu
Virginia Polytechnic Institute and State University
8 PUBLICATIONS 33 CITATIONS
SEE PROFILE
All content following this page was uploaded by Hao Xu on 27 February 2020.
The user has requested enhancement of the downloaded file.

Pipelines 2019: Multidisciplinary Topics, Utility Engineering, and Surveying
https://doi.org/10.1061/9780784482506.019
A Framework for Statistical Analysis of Water Pipeline Field Performance Data

Hao Xu1 and Sunil K. Sinha2
1
Graduate Research Assistant, Dept. of Civil and Environmental Engineering, Virginia
Tech, Blacksburg, VA, 24061. E-mail: haoxu@vt.edu
2
Professor and Director, Sustainable Water Infrastructure Management (SWIM) Center, Dept. of
Civil and Environmental Engineering, Virginia Tech, 200 Patton Hall, Blacksburg, VA, 24061.
E-mail: ssinha@vt.edu
ABSTRACT
Statistical models are prevalent in the failure and performance analysis of water pipelines. This
paper presents a comprehensive analytical framework for statistical analysis of water pipeline field
performance data. The proposed framework incorporates three objectives (i.e. failure, performance
and risk) and employs both exploratory and predictive statistical modelling approaches. The
procedures to implement this methodology are outlined, and the preliminary analyses are also
conducted on the datasets of two water utilities. As more and more utilities all over the United
States are participating in the PIPEiD project, this will hopefully make it possible to conduct
research on a regional or even global level with high confidence level. Some local-level knowledge
and findings based on the case studies are discovered or verified in the results of preliminary
analysis.
1. INTRODUCTION
Water pipeline infrastructure is an indispensable component of the whole family of infrastructure
assets. However, due to deterioration problems the pipeline infrastructure is becoming insufficient
to satisfy the social needs. A lot of research has been conducted in the past to evaluate and improve
promising innovative technologies that can reduce costs and improve the effectiveness of
operation, maintenance, and replacement of aging and failing drinking water systems. (Clair and
Sinha 2012) Hundreds of statistical models have been developed along with such kind of research.
Lots of such models focus on pipe failure prediction by use of regression (Yamijala et al. 2009) or
classification (Chen et al. 2017). Although these models have been proved effective in some cases,
they are mostly limited to some specific circumstances and fail to generalize well. To address this
issue, both comprehensive datasets and customized advanced statistical models are needed. By
examining, customizing and systematically integrating the available statistical models, water
pipeline performance prediction can be conducted on a larger scale (e.g. regional and global levels)
without compromising accuracy.
PIPEiD has been created by the SWIM Center at Virginia Tech to help understand the critical
drinking water pipeline infrastructure. It is envisioned to be a “National Database Platform for
Advanced Asset Management” that provides secure access to the aggregated data and to models
and tools which will enable the synthesis, analysis, query, and visualization of the data for decision
support. (Sinha and Sears 2017) With the help of PIPEiD, it becomes possible to integrate the
previous models and to develop a systematic and comprehensive approach in the statistical
modeling of drinking water pipeline performance prediction.
This paper presents a comprehensive framework that incorporates different modeling techniques
to achieve the three-objective statistical analysis which incorporates pipe failure, performance and
1
https://doi.org/10.1061/9780784482506.019
risk. In the section of preliminary analytical results, the emphasis is on the implementation of
exploratory approach which can still be helpful in data mining and knowledge discovery.
2. METHODOLOGY
The comprehensive framework for statistical analysis of water pipeline data is summarized in
Figure 1 below. It consists of three major objectives, i.e. pipe failure, performance and risk. Pipe
failure analysis is the bridge that links performance and risk. Pipe failure records is the major easily
accessible performance data, which researchers usually start with analyzing when the source of
data is limited. (Yamijala et al. 2009; Chen et al. 2017) Pipe performance and risk are relatively
complicated to analyze because it is firstly needed to define the indices or criteria of performance
and risk. Previous research has been done and validated in the development of data standard and
definition of performance for water pipes. (Clair 2013)
Comprehensive Statistical Model
Performance Failure Risk

(deterioration) Analysis
Likelihood Consequence
Performance Performance of Failure of Failure
Index/Rating Curve
Figure 1. Framework of Statistical Analysis
The procedures needed to implement this framework can be summarized in five steps.
1. Correlation & Cluster Analysis: Understand the influences of different parameters which
affect the performance.
2. Failure Analysis: Predict failure numbers or failure/survival probability of water pipes
based on historical pipe break records.
3. Performance Analysis: Develop robust performance index/rating.
4. Performance Curve: Develop robust performance/condition curve to predict how pipes
perform with time.
5. Risk Analysis: Quantify the risk of pipe failures by incorporating probability of failure
(based on failure analysis in Step 2) and consequence of failure (based on geospatial data
of utilities).
3. PRELIMINARY ANALYSIS
The preliminary analyses include exploratory data analysis and some parts in failure analysis (e.g.
survival analysis). The exploratory parts (procedure 1 & 2) of the proposed methodology is
implemented and elaborated through case studies in this section. The proposed framework and
methodology are applied to two water utilities that participated in PIPEiD project. Due to the
request of anonymity in this project, their names are denoted by utility A and utility B. The two
case studies with detailed preliminary analyses are explained in this section. Due to the limited
space here, only some representative results are selected to show in each part.
2
https://doi.org/10.1061/9780784482506.019
3.1 Data Summary
Both these two utilities provide two GIS layers: information of all the pipes and information of
historical work order (repair/failure records). Using the common attribute PIPEID, these two layers
can be joined as one file. Subsequent analyses are all built upon this combined file.
Utility A is a medium sized water utility with 51484 pipe records (distribution and transmission
pipelines) in its current database. Service pipelines are not in the scope of this study and thus are
excluded. The earliest installation dates back to 1885. The work orders (repair records) are
collected since 2005. The other information provided by the utility includes road types, pressure,
friction factors, defect types, cause of failure, etc. The information of major interest is summarized
in Table 1 below.
Table 1. Water pipe information provided by Utility A

Mileage Average Diameter (inch) Number of Repair Records
Material
Distribution Transmission Distribution Transmission Distribution Transmission
Asbestos 2.2 NA 7.3 NA 0 0
Brass 0.0 NA 1.9 NA 2 0
CI 301.9 1.4 7.6 25.9 1878 2
CIPre1930 94.2 NA 7.5 NA 249 0
CU 0.8 NA 1.9 NA 12 0
DI 317.8 46.4 8.8 26.6 354 14
DIPVC 0.1 NA 12.0 NA 0 0
FPVC 0.2 NA 3.0 NA 0 0
GS 77.2 NA 2.0 NA 1217 0
HDPE 6.7 0.0 5.8 12.0 52 0
PVC 199.7 0.0 6.6 24.0 335 0
RCP 0.0 7.6 15.7 32.6 0 0
SP 0.3 NA 2.9 NA 5 0
Unknown 768.4 0.9 6.1 27.2 255 0
All Pipes 1769.6 56.4 6.9 27.6 4359 16
Utility B is a large sized utility with 173106 records of pipes. This utility doesn’t record pipe types,
but according to common practice pipes with diameters equal to or larger than 16 inches are treated
as transmission pipes and those with diameters less than 16 (typically 6/8/10) inches are considered
distribution pipes. The earliest installation dates back to 1915, and the work orders (repair records)
have been collected since 2006. Additional information provided by the utility include pressure
zone, lining date, encasement, cathodic protection and so on. Table 2 summarizes some key
attributes stratified by pipe materials.
3.2 Parameter Distribution Analysis
The distributions of key parameters related to pipe performance are useful for exploring underlying
rules and knowledge. For example, by stratifying the data in Utility A based on pipe type, we can
3
https://doi.org/10.1061/9780784482506.019
see a clear distinction in the diameter of distribution pipes and transmission pipes as is shown in
Figure 2. This knowledge confirms the common practice of Utility B that pipes with diameters
equal to or larger than 16 inches are treated as transmission pipes and those with diameters less
than 16 inches are considered distribution pipes.
Table 2. Water pipe information provided by Utility B

Average Diameter
Mileage Number of Repair Records
Material (inch)
Lined Unlined Lined Unlined All Lined Unlined
CI 935.5 1639.1 7.8 8.4 15615 5445 10170
DI(Asphaltic) 238.5 2520.4 8.5 8.6 3599 1185 2414
DI(Zinc-Coated) 1.1 4.3 17.0 10.6 12 0 12
DI(PE-Coated) 0.0 0.7 8.0 12.9 5 2 3
PCCP 5.8 344.2 64.8 28.1 183 0 183
HDPE 0.0 0.1 NA 8.0 0 0 0
PVC 3.0 4.4 7.9 8.2 5 3 2
Steel 13.4 26.1 65.8 32.4 5 1 4
Asbestos 0.1 2.9 9.5 8.7 6 0 6
Copper 0.0 1.3 NA 1.8 5 0 5
Unknown 1.4 16.9 15.1 12.4 18 3 15
All Pipes 1198.8 4560.4 8.3 9.2 19453 6639 12814
(a) Distribution Pipes (b) Transmission Pipes

Figure 2. Distribution of Diameters
Friction factor is an important indicator of internal corrosion. As the internal surface gets corroded,
the friction factor will decrease. In Figure 3, the distribution of friction factor stratified by pipe age
groups (“0~29”, “30~59”, “60~89”, “>90”) reveals a clear relationship that older pipes tend to
have rougher internal surface.
4
https://doi.org/10.1061/9780784482506.019
(a) 0~29 years (b) 30~59 years
(c) 60~89 years (d) More than 90 years

Figure 3. Distribution of Friction Factor stratified by age groups
3.3 Correlation Analysis
Using the full dataset of Utility A, the correlation analysis can be conducted in terms of several
numeric variables (pipe length, diameter, vintage, pressure, friction and number of repairs). As is
shown in Figure 4, the most significant correlation occurs between pipe age and friction factor.
Their correlation coefficient is -0.72, indicating a strong negative correlation between pipe age and
friction factor. As the pipes get older, the friction factor becomes lower, which indicates a rougher
internal surface. The other correlations are not particularly strong.
Figure 5 shows a clearer illustration via correlogram. The data are stratified into 4 age groups:
“0~29”, “30~59”, “60~89”, “>90”. The size and shade of each circle represents the strength of
each relationship, while the color represents the direction, either negative or positive. In
comparison, we can conclude that in the initial stage (e.g. first 30 years) pipe’s internal surface
gets significantly rougher with the increase of pipe age, while afterwards the friction factor will
not be influenced as significantly by pipe age. This finding is one step further than that found in
previous parameter distribution analysis because we do not only see the trends but also the trend
of the strength of trends.
5
https://doi.org/10.1061/9780784482506.019
Figure 4. Correlation chart for water pipes
3.4 Cluster Analysis

Cluster analyses are widely used in water pipe failure and performance analysis, but most of them
are in terms of spatial or temporal analysis. There are a few studies in literature that do not use
cluster analysis for spatial or temporal purposes. For example, Farmani et al. (2017) used K-Means
clustering approach to partition the training data into a number of clusters with similar features
based on diameter and age of the pipe groups, and then developed predictive models based on
those clusters.
Take Utility B as an example. The 5 sub-types of pipes are selected for investigation, i.e. CI Lined,
CI Unlined, Asphaltic DI Lined, Asphaltic DI Unlined, and PCCP Unlined. Clustering analysis is
conducted for each sub-type group of pipes. One significant problem in K-Means Clustering is
that the clustering results are not stable, different from one run to another. To account for this issue,
the K-Means clustering is run 25 times and then the result with the least SSE (Sum of Squared
Errors) is selected as the final result.
The analysis results for CI Lined pipes are shown below. It can be seen from Figure 6 that the
failed pipes (red points) are approximately evenly distributed in all pipes (black points), but some
data points can be considered as unusual because they indicate the failures occurred in pipes less
than 50 years old. Such kinds of premature failures can be spotted and investigated.
6
https://doi.org/10.1061/9780784482506.019
Figure 7 shows the K-Means clustering results for different designated number of clusters. It
should be noted that the values of pipe ages and diameters are all scaled and standardized before
clustering analysis so that the two axes (age and diameter) are comparable.
The results of K-Means clustering are not only dependent on the choice of initial means, but also
dependent on the number of clusters. There are several methods to determine the optimal number
of clusters, such as Elbow method, Average Silhouette method and Gap Statistic Method. Here the
average Silhouette method is used, and the result is shown in Figure 8. Both 2 and 3 clusters seem
to be acceptable.
(a) 0~29 years (b) 30~59 years
(c) 60~89 years (d) > 90 years

Figure 5. Correlograms for Different Age Groups of Pipes
3.5 Survival Analysis

Using the whole water pipe dataset of Utility A, the non-parametric survival curves of all water
pipes can be developed. This kind of proportional hazards model can be visualized with a Kaplan-
Meier plot. Survival curves can also be fitted with parametric distributions. The most commonly
7
https://doi.org/10.1061/9780784482506.019
used distributions are Weibull distribution, exponential distribution, log-normal distribution and
log-logistic distribution. When the pipe dataset is partitioned into different groups, the influence
of a grouping covariate on the survival probability can be illustrated via a categorized Kaplan-
Meier plot.
Figure 6. Distribution of Pipes in terms of Age and Diameter
Figure 7. Cluster Results with Different Numbers of Clusters
Figure 8. Optimal Number of Clusters
In the dataset of Utility A, the major material types include CI, DI, CU, GS, HDPE and PVC. It
should also be noted that the CI pipes installed prior to 1930 are specially sorted out as ‘CIPre1930’
in the dataset of Utility A. In this analysis, we adopt the representation and notation systems of the
8
https://doi.org/10.1061/9780784482506.019
utility. Based on the stratification by pipe materials, Kaplan-Meier plots for different material types
are created separately, as are shown in Figure 9 (a). CI pipes installed prior to 1930 are the most
durable among different materials’ pipes, while HDPE pipes are least likely to survive. The
parametric survival curves by subgroups are also developed. To make the plot legible, only the
survival models fitted with Weibull distribution are shown in Figure 9 (b).
(a) Kaplan-Meier plot (b) Weibull fitting of survival curves

Figure 9. Survival analysis stratified by pipe materials
4. CONCLUSION
This paper presents a comprehensive analytical framework for statistical analysis of water pipeline
field performance data. The proposed framework incorporates three objectives (i.e. failure,
performance and risk) and employs both exploratory and predictive statistical modelling approach.
Some knowledge and findings are discovered and verified in the preliminary analysis. For utilities
that do not record pipe types, pipes with diameters equal to or larger than 16 inches can be treated
as transmission pipes and those with diameter less than 16 inches can be considered distribution
pipes. In the initial stage (e.g. first 30 years) pipe’s internal surface gets rougher with the increase
of pipe age significantly, while afterwards the friction factor will not be influenced with pipe age
as significantly. This finding is not only about trends but also the trend of the strength of trends.
Premature failures can be detected and investigated. Both 2 and 3 clusters are appropriate for
cluster analysis. According to survival analysis of Utility A, CI pipes installed prior to 1930 are
the most durable among different materials’ pipes, while HDPE pipes are least likely to survive.
In the survival analysis, most of the recorded breaks in HDPE pipes occurred in the early stage,
while most of the breaks in CI pipes occurred in the late stage. There are few recorded breaks for
HDPE pipes after the age of 75 years. This is the main reason why the survival curve of HDPE
pipes exhibits a steep downward trend.
REFERENCES
Chen, T. Y.-J., Beekman, J. A., and Guikema, S. D. (2017). "Drinking water distribution systems
asset management: Statistical modelling of pipe breaks." Pipelines 2017, 173-186.
Clair, A. M., and Sinha, S. (2012). "State-of-the-technology review on water pipe condition,
deterioration and failure rate prediction models." Urban Water Journal, 9(2), 85-112.
9
https://doi.org/10.1061/9780784482506.019
Clair, A. M. (2013). "Development of a Novel Performance Index and a Performance Prediction

Model for Metallic Drinking Water Pipelines." Virginia Tech.
Farmani, R., Kakoudakis, K., Behzadian Moghadam, K., and Butler, D. (2017). "Pipe failure
prediction in water distribution systems considering static and dynamic factors." Procedia
Engineering, 186, 117-126.
Sinha, S., and Sears, L. (2017). "Collection and Compilation of Water Pipeline Field Performance
Data." Pipelines 2017, 124-135.
Yamijala, S., Guikema, S. D., and Brumbelow, K. (2009). "Statistical models for the analysis of
water distribution system pipe break data." Reliability Engineering & System Safety, 94(2),
282-293.
10
View publication stats

A Framework For Statistical Analysis of Water Pipeline Field Performance Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Framework For Statistical Analysis of Water Pipeline Field Performance Data

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Framework for Statistical Analysis of Water Pipeline Field Performance Data

Conference Paper · July 2019

The user has requested enhancement of the downloaded file.

A Framework for Statistical Analysis of Water Pipeline Field Performance Data

Comprehensive Statistical Model

Performance Failure Risk

Figure 1. Framework of Statistical Analysis

3.1 Data Summary

Table 1. Water pipe information provided by Utility A

3.2 Parameter Distribution Analysis

Table 2. Water pipe information provided by Utility B

(a) Distribution Pipes (b) Transmission Pipes

(a) 0~29 years (b) 30~59 years

(c) 60~89 years (d) More than 90 years

3.3 Correlation Analysis

Figure 4. Correlation chart for water pipes

3.4 Cluster Analysis

(a) 0~29 years (b) 30~59 years

(c) 60~89 years (d) > 90 years

3.5 Survival Analysis

Figure 6. Distribution of Pipes in terms of Age and Diameter

Figure 7. Cluster Results with Different Numbers of Clusters

Figure 8. Optimal Number of Clusters

(a) Kaplan-Meier plot (b) Weibull fitting of survival curves

Clair, A. M. (2013). "Development of a Novel Performance Index and a Performance Prediction

View publication stats

You might also like