Intelligent Approach Based On Random Forest For Safety Risk Prediction of Deep Foundation Pit in Subway Stations

Case Study
Intelligent Approach Based on Random Forest for Safety

Risk Prediction of Deep Foundation Pit in Subway Stations
Ying Zhou 1; Shiqi Li 2; Cheng Zhou 3; and Hanbin Luo 4
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 11/05/18. Copyright ASCE. For personal use only; all rights reserved.
Abstract: The number of safety accidents caused by excavation of deep foundation pits in subway stations has been increasing rapidly in
recent years. Thus, precisely predicting the safety risks for subway deep foundation pits bears importance. Existing methods, such as machine
learning models, have been established for predicting such risks. However, these methods are unable to provide accurate results for deep
foundation pits in subway stations due to small and unbalanced data samples. In this research, an intelligent model based on random forest
(RF) was established for risk prediction of deep foundation pits in subway stations. To achieve such a goal, different types of monitoring data
and risk level monitoring were introduced to the RF for training the model and estimating unknown relationships between monitoring values
and safety risks of pits. An actual deep foundation pit in a subway station of the Wuhan Metro was used to demonstrate the applicability of the
developed RF risk prediction model. The results showed that the superiority of the proposed RF risk prediction model can be used as a basis to
implement a decision-making tool for predicting safety risks of subway foundation pits. The importance evaluation function of the model
provides the ability to aid onsite engineers in determining the causes of safety risks, thus facilitating the implementation of emergency
measures in advance. DOI: 10.1061/(ASCE)CP.1943-5487.0000796. © 2018 American Society of Civil Engineers.
Author keywords: Subway station; Deep foundation pit; Safety risk prediction; Random forest.
Introduction models are often difficult to apply to different kinds of ground con-
ditions and construction techniques (Loganathan and Poulos 1998).
Deep foundation pits in subway stations are typically characterized On the other hand, when adopting numerical simulation methods,
by long duration of construction, substantial uncertainties, and seri- complex engineering conditions and construction parameters are
ous effects on the surrounding environment, which lead to frequent constantly simplified, thus leading to large deviations in results
accidents (Zhou et al. 2017a; Ding et al. 2017). The Singapore (Li 2000). Machine learning methods, such as artificial neural nets
Nicoll Highway collapse in 2004, Guangzhou Metro Haizhu Square (ANN), bayesian network (BN), support vector machine (SVM),
accident in 2005, and Hangzhou Metro Line 1 station accident in and random forest (RF), have gradually become the mainstream
2008 were all caused by foundation pit collapse and resulted in a of risk prediction in recent years. Monitoring data are limited,
large number of casualties and economic losses. To reduce the dam- and only small sample data of 100–200 sets can be collected
age caused by accidents in foundation pits and to minimize their det- (Hasofer and Qu 2002) because the average construction period
rimental environmental effects, the safety risks of deep foundation of a deep foundation pit project in a subway station is approxi-
pits in subway stations must be effectively predicted (Ding and Jie mately 1 year, and monitoring frequency of the monitored item
2017). is generally once every 1–2 days. Among the current popular
In the last few decades, numerous scholars have tried to predict machine learning algorithms, ANN and BN are generally used for
the risks of deep foundation pits via various methods, including thousands of sample data sets and are unsuitable for small amounts
empirical (Whittle et al. 1993; Mair and Taylor 1997; Lee and of sample data (Kecman 2001; Zhou et al. 2017b). In comparison
Halpin 2003) and numerical simulations (Chen et al. 2003, 2004; with other algorithms, SVM and RF are more suitable for problems
Yoo and Lee 2008) and machine learning methods (Sun and Wu with a small amount of sample data and can achieve higher predic-
1998; Su et al. 2009; Sun 2010; Zhou and Zhang 2011). Empirical tion accuracy. However, SVM is time consuming when solving
complex problems and is sensitive to missing and unbalanced data
1 (Martens et al. 2007). In actual deep foundation pit projects, there
Professor, School of Civil Engineering and Mechanics, Huazhong
are consistently less low-risk than high-risk data, thereby resulting
Univ. of Science and Technology, Wuhan, Hubei 430074, China. Email:
ying_zhou@hust.edu.cn in imbalanced sample data.
2
Master Student, School of Civil Engineering and Mechanics, Compared with SVM, RF is easy to understand and robust to
Huazhong Univ. of Science and Technology, Wuhan, Hubei 430074, China. unbalanced data (Zhou et al. 2016a). RF also features a function
Email: 13237166717@163.com that is superior to other algorithms—importance evaluation of pre-
3
Associate Professor, School of Civil Engineering and Mechanics, dictive variables, which can determine the most important predic-
Huazhong Univ. of Science and Technology, Wuhan, Hubei 430074, China tive variable and find the cause of risks (Kuhn and Johnson 2013).
(corresponding author). Email: chengzhou@hust.edu.cn Based on the previous reasons, this study primarily aimed to ex-
4
Professor, School of Civil Engineering and Mechanics, Huazhong plore the capability of RF for risk prediction of deep foundation
Univ. of Science and Technology, Wuhan, Hubei 430074, China. Email:
pits in subway stations. Thus, a RF risk prediction model for deep
luohbcem@hust.edu.cn
Note. This manuscript was submitted on January 23, 2018; approved on
foundation pits in subway stations was established. The feasibility
May 25, 2018; published online on September 26, 2018. Discussion period of using monitoring data as risk predictive variables of deep foun-
open until February 26, 2019; separate discussions must be submitted for dation pits in subway stations was verified. The proposed ap-
individual papers. This paper is part of the Journal of Computing in Civil proach was validated in a real deep foundation pit project in a
Engineering, © ASCE, ISSN 0887-3801. subway station of the Wuhan Metro, and the importance analysis
© ASCE 05018004-1 J. Comput. Civ. Eng.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Table 1. Review of machine learning literature
Method Literature Application Advantages Limitations
ANN Sun and Wu (1998) Predict the deformation of deep foundation Nonlinear adaptability; high Not suitable for small sample data;
pits accuracy can be achieved when difficult to design an optimal
Jan et al. (2002) Predict the displacement of diaphragm walls in processing large data sets architecture; high computational
deep foundation pits cost; not robust to outliers; risk of
Ma et al. (2008) Compare three models’ prediction results on overfitting
foundation pit deformation
BN Zhang et al. (2014) Evaluate risk in deep foundation pits Simple; high computing speed Sensitive to missing sample data;
Wu et al. (2015) Estimate the probability of surface can be maintained when not suitable for small sample data
damage risk processing large data sets
SVM Samui (2008) Estimate the settlement of shallow foundation Works well with high- Consumes too much computing
in clay dimensional small data sets; time and space; sensitive to missing
Sun (2010) Predict deformation of deep foundation pits in good at handling complex and unbalanced sample data; lack
soft soil areas biological nonlinear data of transparency
Li et al. (2016) Predict horizontal displacement of deep
foundation pits
Zhou et al. (2017c) Forecast potential safety hazards in
construction of deep foundation pits
function of RF also effectively helped the exploration of risk penetration rate. Zhou et al. (2016b, c) investigated the feasibility
source. of using RF to forecast surface movements induced by tunnel
The rest of this paper is divided into four parts. The first section construction. Hong et al. (2017) applied logistic regression and
briefly reviews research on predicting foundation pit risks. The RF to analyze the landslide susceptibility of the Wuyuan area
second part introduces the RF algorithm and the process of estab- in Jiangxi Province, China. The superiority of the RF method
lishing the RF risk prediction model for deep foundation pits in has been determined via comparative analyses. These studies
subway stations. The third part applies the developed model to showed promising applications of RF in engineering risk prediction
an actual project, a foundation pit project in a subway station of problems.
the Wuhan Metro. A SVM risk prediction model is also established Deep foundation pit projects in subway stations are constantly
for comparison. Finally, the fourth section summarizes the main constructed in city centers, which are densely populated. Risk pre-
achievements and conclusions of this research. diction of foundation pits in subway stations is a complex problem,
which presents a challenge when using traditional methods. Unlike
steel structures, rock soil acts as a transmission medium. Therefore,
Literature Review in construction, if small deformations occur in foundation pits, then
such deformations will expand accordingly when effective treat-
In recent years, several machine learning methods have been used ments are not applied, eventually leading to large-scale collapse
to analyze and predict construction risks of deep foundation pits, of foundation pits and surrounding buildings and serious damage.
namely ANN, BN, and SVM. Based on these methods, a number of Therefore, an accurate prediction should be accomplished immedi-
complex problems can be addressed by learning simple relations ately to promptly adopt the appropriate remedial measure. RF is
without knowing the exact relationship among parameters. The algorithmically simpler and computationally lighter than other
lack of a definite expression between risk results and various influ- machine learning methods (Rodriguez-Galiano et al. 2014). RF is
encing factors can be overcome. Table 1 shows several applications also robust to missing and unbalanced data; thus, rescaling and
of these machine learning methods in foundation pit risk prediction. modification of data are unnecessary (Hong et al. 2017). In addi-
The table also summarizes the advantages and limitations of these tion, RF can identify the most important predictive variable. In
methods. As denoted in the table, these methods are not used in theory, RF is suitable for risk prediction of deep foundation pits
cases involving complex calculations, small amounts of sample in subway stations. However, no research has been conducted in
data, or unbalanced and missing sample data. Additionally, using relation to this subject. In this work, a RF risk prediction model
these models for calculation is extremely time consuming. There- was introduced to predict the safety risks of deep foundation pits
fore, these methods are unsuitable for safety risk prediction of deep in subway stations.
foundation pits in subway stations because only limited and unbal-
anced sample data can be retrieved in real projects. Calculations
Research Methods
should also be generated in a short time when applying the methods
in actual constructions. This section explains the RF algorithm and establishment of a RF
The advent of the RF algorithm aids in solving complex engi- risk prediction model.
neering problems due to its capability to discover nonlinear com-
plex relationships between independent and dependent variables
without statistical assumptions (Rodriguez-Galiano et al. 2014). RF Method
A comparative experiment of 10 supervised learning methods The RF algorithm is an ensemble learning method proposed by
was conducted to classify a data set of 246 rockburst events. When Breiman in 2001. In this model, numerous trees with the same dis-
these methods were compared, the RF method was considered the tribution are used to set up a forest to train and predict the sample
best (Zhou et al. 2016a). Several researchers have also applied RF data (Kuhn and Johnson 2013). The generalization error of the forest
to other engineering problems. For example, Hu et al. (2015) ap- approaches the limit with the increasing number of trees (Breiman
plied the RF method to predict the hard rock tunnel boring machine 2001a). RF is extensively used for prediction and feature selection
J. Comput. Civ. Eng., 2019, 33(1): 05018004

problems due to its strong data mining capability and high predic- randomly sampling the original sample data with replacements.
tion accuracy (Iverson et al. 2008). This algorithm also features Each training subset is then used to build a corresponding decision
advantages in addressing classification and regression problems. tree. All decision trees are CART trees that form a forest. Each tree
In this study, only the classification mode was explored because provides a single vote to the assignation of the category with the
the main research purpose was predicting safety risk levels of deep highest frequency to the input vector (x). The value ĈArf = major-
foundation pits in subway stations. ityvote fĈa ðxÞgA1 , for example, Ĉ5 ðxÞ, is the voting category of the
fifth CART tree (Rodriguez-Galiano et al. 2012). When the trees of
Classification and Regression Tree Algorithm the RF grow, each node divides with the best split among a random
Breiman et al. (1984) proposed the classification and regression subset of predictive variables rather than the best split predictive
tree (CART) algorithm, which is a typical decision tree model. variable, which decreases the correlation among CART trees
In CART, binary recursive segmentation technology is used to di- and reduces generalization error. The RF algorithm was imple-
vide the sample data into two subsample sets. Then, each nonleaf
mented as follows:
node of a tree obtains two branches. All nonleaf nodes use an attrib- 1. The bootstrapping method was used to obtain ntree training sub-
ute selection measure to determine the optimal attribute and are sets from the original sample data. Each training subset featured
divided according to this attribute. The attribute selection measure the same size as the original sample data, indicating that some
is a heuristic method. Ideally, divisions obtained from this measure data may be repeated or left out.
should be pure, indicating that sample data at the same node belong 2. Each training subset was used to generate a CART tree. At each
to the same category. A node’s impure function F can thus be used tree node, a subset of attributes of predictive variables was ran-
to determine the performance of divisions. A higher F value indi- domly selected. The number of attributes is mtry , which is not
cates a higher impurity level of the node. When the value of F higher than the number of predictive variables. The best split
reaches zero, all sample data at that node belong to the same cat- variable from the subset was used to divide the node.
egory. CART trees generally use the most popular impure function 3. Each CART tree contributed a single voting category to the
F, the Gini index (Rodriguez-Galiano et al. 2014), as the split stan- forest, and the classification result was obtained by taking the
dard and select attributes with minimized Gini indices as optimal majority of the voting categories.
attributes.
CART requires no prior knowledge. Thus, this algorithm is eas-
ier to explain than neural networks and other methods. However, Establishment of RF Risk Prediction Model
the established trees may be extremely complicated during recur- The RF risk prediction model was implemented as follows: (1) the
sion due to its numerous nodes, thus possibly reducing classifica- sample data were collected and processed; (2) the monitoring data
tion accuracy. Pruning can avoid overfitting but increase model were used as input, and safety risk levels of the foundation pit in a
complexity. subway station as output were used to establish the model and an-
alyze the correlation and importance of predictive variables; and
Integrated Learning Method (3) classification accuracy of the model was verified. Fig. 1 shows
A single classifier often cannot achieve satisfactory classification the specific process.
accuracy and easily encounters overfitting, thereby resulting in a
weak generalization. The integrated learning method is the combi- Collecting and Processing of Sample Data
nation of several classifiers. Through a combination of the results of Multiple monitoring items are used during the construction of deep
each basic classifier, sample data categories can be determined. The foundation pits in subway stations, and hidden dangers are indi-
integrated learning method can achieve better classification perfor- cated by abnormal monitoring data. Therefore, safety risk levels
mance than any single classifier and thereby effectively improve the of foundation pits in subway stations can be determined by analyz-
generalization capability of the learning system. ing monitoring data, and preventive measures can be implemented
Bagging (Breiman 1996) is an integrated learning method based accordingly. Common types of monitoring include settlement,
on the idea of bootstrapping (Davison and Hinkley 1997) from sta- stress, groundwater level, lateral displacement, and excavation
tistics. Bootstrapping is achieved via resampling randomly with depth. Currently, selection of monitoring items for deep foundation
replacement. In the bagging algorithm, several training subsets pits in China is based on the national standard “Technical Code of
can be drawn through bootstrapping, and corresponding basic clas- Urban Rail Transit (MHURDPRC 2009),” and its monitoring rules
sifiers can be obtained from training. Each training subset retains are shown in Table 2. The table categorizes foundation pits into
the original size of the sample data, and some data in the same train- three types on the basis of excavation depth. Based on the standard,
ing subset may be repeated (Han et al. 2017). When the size of the the monitoring items are divided into two types: compulsory and
original sample data is N, the probability of each datum not being optional measurements. Selection of optional monitoring items in
drawn is approximately ½1 − ð1=NÞN. If N is large enough, then the the actual project should be compatible with the design and con-
probability is 1=e ≈ 0.368, indicating that approximately 37% of struction plan. Geological hydrology, construction parameters, and
the original sample data are not drawn each time. This part of the the surrounding environment should also be fully considered when
sample is called out-of-bag (OOB). selecting optional monitoring items.
The bagging algorithm can improve the accuracy of classifica- The traditional method of evaluating safety risks of pits typically
tion results better than data-sensitive classifiers, such as CART. involves analyzing one or two abnormal monitoring values. How-
Bagging can also train multiple basic classifiers in parallel, thereby ever, assessments based on information from different monitoring
saving considerable time. types and those based on different monitoring points of the
same monitoring types may conflict due to the complexity of
RF Algorithm subway foundation pit projects and considerable uncertainty in
RF, which is an integrated algorithm formed through the combina- evaluation. Therefore, inferring the safety risk levels based only
tion of CART trees and bagging, encounters no overfitting and im- on the abnormal values of one or two monitoring types is unreli-
proves the deficiencies of single decision trees (Breiman 2001b). able, and the values of multiple monitoring types should be
In this algorithm, multiple training subsets are obtained by synthetically considered. In this study, a RF risk prediction model
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Monitoring data
Risk level 1 Risk level 2 Risk level 3
Preprocessing
M Samples
Training data Splitting into training and test sets Testing data
N samples x variables M-n samples x variables
Bootstrap Bootstrap Bootstrap Turns

37% oob 1 37% oob 2 37% oob k 1 2 3 4 K

63% 63% 63%
K-fold CV
Sub training Sub training Sub training
data 1 data 2 data k
Testing Training
CART tree 1 CART tree 2 CART tree k
RF classification prediction model Testing classification accuracy
Correlation analysis Importance analysis
Fig. 1. Establishment of RF risk prediction model.
Table 2. Selection of urban rail transit monitoring items On the basis of the principles of representativeness, universality,
and compactness, historical monitoring data and records of safety
Pit category
risk levels were selected from the monitoring system. Denoising or
Monitoring project Level 1 Level 2 Level 3 standardization of monitoring data is unnecessary because the RF is
Retaining wall (pile) horizontal displacement C C C resistant to outliers in predictive variables and can automatically
Retaining wall (pile) vertical displacement C C C handle missing values (Catani et al. 2013). To prevent overfitting,
Deep horizontal displacement C C O at least 80% of the samples were used to train the model, and
Pillar vertical displacement C C O the remaining samples were used to validate the model. To avoid
Internal force of retaining wall O O O
densely stacking similar sample data and to guarantee effective
Internal force of support C C C
Internal force of column O O O prediction results, the orders of the sample data were randomly
Internal force of bolt C C C disrupted.
Internal force of soil nail O O O
Pit bottom uplift (rebound) O O O Establishment of Risk Prediction Model
Retaining wall lateral force O O O Two major parameters of RF must be optimized when establishing
Pore water pressure O O O a model (Breiman 2001a). These parameters are ntree and mtry ,
Groundwater level C C C which refer to the numbers of trees in the forest and prediction var-
Soil layered settlement O O O iables at each node of the tree, respectively. If the value of mtry is
Surface settlement O O O
small, then overfitting may occur and the accuracy of model may
Building settlement C C C
Building inclinometer O O O decrease. If the value of mtry is large, then calculating speed may
Building horizontal displacement O O O decrease. If the value of ntree is small, then training may become
Building/surface crack C=O C=O C=O inadequate. If the value of ntree is large, then computing complexity
Underground pipeline settlement C=O C=O C=O of the model may increase.
Note: C = compulsory measurement; O = optional measurement; Level 1 = A k-fold cross-validation method (Kohavi 1995) was used
design depth ðDDÞ ≥ 20 m; Level 2 ¼ 20 > DD ≥ 10 m; and Level 3 ¼ to optimize these two major parameters. In this method, sam-
DD < 10 m. ple data are divided into k subsets, k − 1 subsets are used to
train the model, and the remaining subset is used for testing.
The subsets alternately act as the independent test set, whereas
was established via training. The monitoring values of important the others serve as training sets (Arlot and Celisse 2010).
monitoring types and points were used as predictive variables, Based on the k-fold cross-validation test, the optimal parameters
and safety risk levels of the foundation pit in a subway station were were determined. The parameters and training sets were used
used as output. as input to generate a final pit risk classifier, and a RF risk
J. Comput. Civ. Eng., 2019, 33(1): 05018004

prediction model for a foundation pit in a subway station was According to geological hydrological information, the main
established. strata of this subway station comprise a plain fill, clay, fine sand,
and grit-bearing coarse sand. Two main types of groundwater are
Validation of RF Risk Prediction Model present in the foundation pit, namely upper stagnant and intermedi-
In addition to training data, the remaining samples were used to test ate confined aquifers. The upper stagnant aquifer is stored in an
and validate the capability of the model. The testing results were artificial fill layer, and water depth approximates 0.5–2.0 m.
evaluated using the classification accuracy rate, which is a primary The intermediate confined aquifer, which is the main groundwater
evaluation criterion (Grinand et al. 2008). The principle is as fol- type in this area, is mainly stored in fine and gravelly coarse sand
lows: in a C × C confusion matrix in the form of rows and columns, layers. The intermediate confined aquifer forms a unified confined
n is the total number of samples, and x is the total number of data aquifer with overlying clay and silt, with a thickness of approxi-
sets predicted to belong to one of the C classes relative to the real mately 38–42 m. The fill strength is low, and the cohesive soil
class label, as expressed in the following formula. By comparing is easily softened by water. The station is 300 m away from Yangtze
the predicted and actual risk levels, the classification accuracy rate River, which is the main factor behind the dynamic changes in
was determined. Higher values of the classification accuracy rate groundwater. Fluctuation of groundwater critically affects the level
result in better classification performance of the classifier of confined water and thus influences the stability of the foundation
X pits (Ding et al. 2017). This subway station is in a commercial
1 C center. A safety accident will lead to critical consequences. There-
Accuracy ¼ xii × 100%
n i¼1 fore, a risk prediction model must be established to forecast and
analyze safety risks. On the basis of forecast results, effective
In this study, another machine learning algorithm, specifically suggestions can be proposed and the corresponding control and
the SVM, was also used to verify the classification effects of RF. By emergent measures can be implemented in time. Thus, safety risks
comparing its classification accuracy rate with that of the RF, the can be reduced to an acceptable range. Fig. 3 illustrates the estab-
validity of the established RF risk prediction model can be tested. lishment and application of the risk prediction model for foundation
After the capability of the model has been verified, the model pits in this subway station.
can be applied to an actual deep foundation pit construction in a
subway station. Through this model, onsite engineers can be in- Collecting and Processing of Sample Data
formed in a timely manner once a risk has been identified. Thus,
measures can be implemented accordingly to prevent potential The northern section of the foundation pit, which is adjacent to a
safety accidents. large number of commercial buildings, was the key monitoring
area. Because the excavation depth of the foundation pit is greater
than 20 m, the selection of the monitoring items mainly referred
Case Study to the compulsory measurement items of the foundation pit of
Level 1 in the specification. In this study, various types of mon-
A foundation pit project in a subway station of the Wuhan Metro itoring points from the northern section were used as research
was selected for this case study. This station lies in the center of objects.
Wuhan City, which is a densely populated area. An open-cut A web-based early warning system for subway safety risk was
method was used to construct the pit, as shown in Fig. 2. The recently developed by the Huazhong University of Science and
pit was completely covered by ground buildings. The pit depth Technology (Ding and Zhou 2013). Considerable monitoring infor-
of the station was around 24–25.5 m, its excavation length mea- mation for construction of different foundation pits was recorded in
sured 203.5 m, and its width spanned 22.15 m. Waterproof imper- this system. The safety risk levels of foundation pits were deter-
meable diaphragm walls and inner support constituted the strutting mined via a hybrid data fusion model on the basis of multisource
structure of the pit. The diaphragm wall was supported by concrete information, in which information sources of monitoring measure-
and reinforced by steel support. The wall depth was 60 m, and the ments, construction parameters, and field inspections were com-
wall thickness approximately 1 m. bined (Zhou et al. 2017c). The assessment process included the
following: (1) primary safety risk assessment of the foundation
pit was implemented by 130 experts, (2) the basic probability
assignment (BPA) of safety risk assessment was calculated using
the BPA function, and (3) Dempster–Shafer theory was used to
determine safety risks. Safety risk levels of foundation pits were
divided into three levels, namely low, medium, and high risk,
which were indicated in the platform by green, yellow, and orange
boxes, respectively. The monitoring types and daily safety risk
levels of the foundation pit in this subway station can be found
in the system.
Eight monitoring types that remarkably influence the safety
status were selected on the basis of the corresponding technical
specifications and experience of experts. These types included sur-
face, building, underground pipeline, and structural settlements;
diaphragm wall and structural horizontal displacements; steel sup-
port axial force; and concrete support steel stress. The maximum
cumulative changing values and rates were selected to compose 16
predictive variables. The monitoring data and daily safety risk lev-
els were collected from the subway safety early warning system.
Fig. 2. Construction site in the subway station.
Monitoring was planned to be conducted once a day. When the data
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Monitoring data and safety status
200 sample sets
Establishment of the model

Splitting into training and test sets
Training sets Testing sets

80% 20%
RF Classifier
Repeated 5-fold cross validation Model verification
Importance
analysis
Optimal RF risk
prediction model
Correlation
analysis
Optimal RF risk
A set of sample data Input
prediction model
Output
High risk Medium risk Low risk
Application of the model

Find out abnormal
reasons and take
remedial measures
Check the safety

hazards and
Eliminate strengthen
Yes hidden dangers protective measures
No
Contact the relevant

units to explore
further solutions
Increase monitoring frequency and

strengthen site management
Fig. 3. Establishment and application of RF risk prediction model.
of a monitoring item vary dramatically, monitoring of this item labeled 1, 2, and 3. The orders of the sample data were randomly
should be conducted several times a day. Fig. 4 displays the mon- disrupted. A total of 160 sets of sample data were used for model
itoring points in the northern section of the pit. Several monitoring training, and the remaining 40 were used for model testing. Table 4
types exhibited multiple monitoring points. Only the points with presents a part of the training sample data.
the highest absolute values were selected. Table 3 lists the collected
monitoring data and safety risk levels.
A total of 200 sets of sample data was acquired. These sets con- Establishment of RF Risk Prediction Model
sisted of 92 low-risk, 75 medium-risk, and 33 high-risk sets. Each The collated sample data were used to establish a RF risk predic-
set of sample data included 16 predictive variables and their cor- tion model. Inputs for the model included 16 predictive variables,
responding safety risk levels. The category label causes no effect on and the outputs were safety risk levels. Two important parameters
RF classification accuracy. Thus, the three safety risk levels were of the RF algorithm, ntree and mtry , were determined via fivefold
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 4. Monitoring layout of northern section.
Table 3. Collected monitoring data and safety risk levels

Mean square
Predictive variables Types Symbol Unit Maximum Minimum Mean displacement
Surface cumulative settlement Cumulative changing value A1 mm −29.07 −75.98 −51.38 15.83
Building cumulative settlement Cumulative changing value A2 mm −21.56 −61.30 −46.48 10.70
Pipe cumulative settlement Cumulative changing value A3 mm 0.19 −187.15 −91.85 70.15
Diaphragm wall cumulative movement Cumulative changing value A4 mm 9.83 −26.33 −14.27 6.55
Structure cumulative settlement Cumulative changing value A5 mm −1.50 −6.40 −4.12 1.03
Structural cumulative horizontal displacement Cumulative changing value A6 mm 20.00 2.20 14.37 4.14
Cumulated steel support axial force Cumulative changing value A7 kN 1,113.89 8.14 774.07 231.64
Cumulated concrete support steel stress Cumulative changing value A8 kN 2,403.32 −1,172.83 883.07 974.51
Surface subsidence rate Changing rate B1 mm=day 2.48 −13.01 −0.48 1.33
Building subsidence rate Changing rate B2 mm=day 1.35 −3.65 −0.57 0.91
Pipe subsidence rate Changing rate B3 mm=day 1.05 −118.41 −1.23 8.45
Diaphragm wall displacement rate Changing rate B4 mm=day 8.32 −8.00 −0.39 2.60
Structural subsidence rate Changing rate B5 mm=day 4.70 −1.40 −0.14 0.49
Structural horizontal displacement rate Changing rate B6 mm=day 10.00 −16.00 0.55 2.35
Steel support axial force changing rate Changing rate B7 kN=day 488.50 −395.56 17.59 130.82
Concrete support steel stress changing rate Changing rate B8 kN=day 1,537.42 1,296.84 49.67 337.55
Safety risk level — — — High risk, medium risk, low risk
cross-validation, and classification accuracy was used as the assess- maximum changing rate and maximum cumulative changing
ment criterion. Fig. 5 shows the fivefold cross-validation accuracy values of the monitoring items were used simultaneously, then
under different values of ntree and mtry . As displayed in the figure, duplicates will occur during calculation. Therefore, in this re-
the XOY plane represents the transformation range of ntree and mtry . search, the maximum cumulative changing values of the moni-
The search range of mtry was set to 1–16, and the step distance toring items were used for correlation analysis. Fig. 6 illustrates
was 1. The search range of parameter ntree was set to 0–500, and the correlation matrix of the maximum cumulative changing
the step distance was 5. The Z-axis represents the change in cross- values A1–A8.
validation accuracy. Each group of ntree and mtry corresponds to a The diagonal boxes in Fig. 6 show the distribution of each pre-
cross-validation accuracy; different mtry and ntree values can dictive variable, the boxes in the lower-left area show the scatter
achieve the same accuracy. However, the lowest values of the plots with fitted lines of each two predictive variables, and the
parameters were used to reduce computing complexity and time. boxes in the upper-right area show the correlation values and sig-
The results showed that when ntree ¼ 215 and mtry ¼ 5, the accu- nificance levels of each two predictive variables. Significance levels
racy rate of cross validation was the highest at 99.38%. Based on in the figure present a one-to-one correspondence with several sym-
the determined optimal parameters, the training sample data were bols {correspondence between p values [(0.0, 0.001), (0.001, 0.01),
used to generate the RF risk prediction model. (0.01, 0.05), (0.05, 0.1), (0.1, 1)] and symbols (***, **, *, °, ·)}.
A correlation matrix was provided to illustrate the correlation The eight predictive variables exhibited good correlation with one
among predictive variables. In comparison with the maximum another. Among these variables, A1, A2, A5, and A6 manifested
changing rate of the monitoring types, the numerical relationship the strongest correlation. A1 and A2 showed a highly positive cor-
between maximum cumulative changing values can better reflect relation, and both showed a highly negative correlation with A5 and
the changes in pit safety status and reveal the deformation trend a highly positive correlation with A6. A5 was also highly correlated
(Lu and Zhang 2013; He et al. 2014). On the other hand, if the with A6. In excavation, most monitoring data were linked with one
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Table 4. Part of training sample data

Safety
© ASCE
Number A1 B1 A2 B2 A3 B3 A4 B4 A5 B5 A6 B6 A7 B7 A8 B8 risk level
1 −31.13 0.47 −32.16 0.49 −16.87 0.29 −6.55 1.64 −2.95 −0.21 8.39 −0.96 771.82 −44.06 1,781.34 −664.79 1
2 −49.75 −2.73 −49.38 −2.89 −46.77 −3.95 −15.40 3.23 −4.30 −0.75 15.00 5.50 809.64 −34.24 1,814.50 394.69 3
3 −30.10 0.71 −32.07 −1.06 −32.74 0.76 5.98 −1.60 −3.20 −0.65 9.10 −1.25 828.39 42.50 1,662.07 664.79 1
4 −29.30 0.31 −32.63 0.31 −17.69 0.33 −9.93 −0.90 −2.85 0.03 9.93 −0.40 817.32 −31.56 2,275.36 270.46 1
5 −71.34 −2.35 −58.88 −2.20 −183.63 −3.65 −19.08 −1.28 −5.60 −0.38 16.75 0.89 359.35 −40.62 17.41 −3.53 2
::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: ::: :::
156 −73.10 −0.71 −57.96 −0.60 −184.29 −7.10 −20.63 −0.40 −5.10 −0.35 18.00 1.00 414.15 −120.18 15.84 −2.40 2
157 −73.66 0.68 −57.52 0.55 −184.36 0.53 −20.42 −0.38 −5.20 −0.16 18.50 1.10 489.33 −90.30 21.82 −10.84 3
158 −46.02 −0.64 −47.30 −0.64 −46.27 −0.62 −10.73 0.73 −4.10 −0.25 15.38 6.25 802.58 39.79 16,611.02 263.89 2
159 −45.25 −0.91 −46.97 −0.71 −45.76 −0.63 −8.51 −1.25 −3.80 −0.22 14.50 1.25 904.21 96.22 1,874.97 152.56 1
160 −42.96 −0.35 −44.19 −0.8 −43.53 0.35 −5.37 2.90 −3.50 −0.30 15.00 2.00 884.07 128.74 1,297.88 −224.72 1
05018004-8
pipelines.
ntree and mtry .
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 6. Correlation matrix of A1–A8.
when sudden risk events occur. Thus, the maximum changing rate
items cannot be eliminated. Although the maximum changing rate
thereby leading to lateral displacement of the retaining structure
importance analysis, the maximum changing rate of the monitoring

the importance of various predictive variables. When conducting
A popular feature of the RF algorithm is its function to analyze
erally cause changes in axial force and stress in inner support,
deep pit construction, the rate will probably remarkably change

of monitoring items remains stable during most of the period in
tance rankings of the 16 predictive variables based on importance

cannot be excluded during risk prediction. Fig. 7 shows the impor-
dation pit may occur, and this phenomenon can lead to destruc-
Fig. 5. Fivefold cross-validation accuracy under different values of
tion of surrounding buildings and movement of underground

another. Excavation and unloading of soil in foundation pits gen-
J. Comput. Civ. Eng.

and diaphragm wall. Settlement of the surface around the foun-
Fig. 7. Importance analysis of predictive variables: (a) mean decrease accuracy; and (b) mean decrease Gini.
scores and Gini indices in the RF algorithm. The conclusions drawn

from the two methods were generally the same, and all predic-
tive variables played a role in risk prediction. In comparison
with the values of maximum changing rate (variables B1–B8
in Table 3), those of the maximum cumulative changing values
(variables A1–A8 in Table 3) considerably affected safety risk
prediction, especially for surface, building, and underground
pipeline settlements. For the monitoring types, the rankings
of importance in decreasing order were as follows: surface sub-
sidence, building subsidence, underground pipeline subsidence,
displacement of the diaphragm wall, concrete stress, structural
settlement, structural horizontal displacement, and steel support
axial force.
To reduce the risk in foundation pit construction, this foundation
pit was dug after the supporting structure has been established.
The foundation pit was well protected. Therefore, the changing
values of lateral wall displacement were minimal. Moreover,
changes in the postmade steel support structure caused by axial
force were reduced due to early installation of concrete support. Fig. 8. Test results of RF risk prediction model.
In general, surface, building, and underground pipeline sub-
sidence are used to show the safety status of the entire external
environment of a foundation pit; structural subsidence and hori-
zontal displacement of structures are used to show the safety Table 5. Classification probability of the first 10 sets of test data
status of the retaining structure. Therefore, these variables con-
siderably influence the pit safety status. The results obtained by Low Medium High Predictive Actual
Number risk risk risk label label
the model were consistent with those in practical applications,
and this agreement validated the reliability of the model (Zhou 1 0.9919 0.0081 0.0000 1 1
and Peng 2016). 2 0.9893 0.0107 0.0000 1 1
3 0.0000 0.7078 0.2092 2 2
4 0.0000 0.1260 0.8740 3 3
Validation of RF Risk Prediction Model 5 0.9890 0.0110 0.0000 1 1
6 0.0027 0.2520 0.7453 3 3
To verify the classification effect of the model, the remaining 40 7 0.0161 0.9356 0.0483 2 2
sets of sample data were used for testing. Fig. 8 shows a compari- 8 1.0000 0.0000 0.0000 1 1
son chart of the risk levels obtained by the model and the actual risk 9 0.4504 0.5013 0.0483 2 2
levels. As presented in the figure, the X-axis represents the test 10 1.0000 0.0000 0.0000 1 1
sample numbers, and the Y-axis represents the category labels. The
test sample included 21 low-risk, 12 medium-risk, and 7 high-risk
sets. From the results, all 40 sets of test samples were classified SVM risk prediction model was established with the same sample
correctly. Table 5 shows the classification probability of the first data as those used for the proposed model for comparison. A total
10 sets of test data. of 16 predictive variables were normalized. Then, according to the
Given that only 200 sets of sample data were collected, methods fivefold cross-validation, the optimal penalty parameter C and
such as ANN and BN, which require thousands of sets of sample kernel function parameter g of the SVM reached 8 and 0.574, re-
data, were unsuitable for this project. The SVM is a popular ma- spectively, and the corresponding classification accuracy rate was
chine learning method that is extensively used in foundation pit risk 95%. Similarly, the remaining 40 sets of test data were input for
research due to its advantage in dealing with small sample prob- testing. As shown in Fig. 9, the test sample included 19 low-risk,
lems (Li et al. 2016; Zhou et al. 2017c). To validate the superiority 16 medium-risk, and 5 low-risk sets. Classification accuracy can
of the RF algorithm in predicting subway foundation pit risk, a reach up to 95%, indicating that two sets of data were misclassified.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Application of RF Risk Prediction Model
The established RF risk prediction model was applied in the
subsequent construction of the foundation pit in this subway sta-
tion. According to the early warning system for subway safety, the
values of several monitoring points were abnormal, as denoted in
Fig. 10. The accumulative settlement of point A02 was −53.7 mm,
and the settlement rate reached −14.98 mm=day. The accumulative
settlement of point F03 reached −44.97 mm, and the settlement
rate was −3.93 mm=day.
The first calculation was conducted. During processing of
monitoring data, a set of predictive variables A1–B8 ¼ f−53.7;
−14.98; −44.97; −3.93; −143.85; −131.5; −15.98; −1.64; −5.4;

−0.24; 16.37; 4.45; 524.7; 85.23; 17.34; 30.32g was input into the
model. The predicted result was 3, which was classified as high risk
and can cause foundation pit collapse. Based on the previous im-
portance analysis, surface and building accumulative settlements
were the primary causes of risk. Onsite engineers immediately
investigated the causes for abnormalities in surface and building
Fig. 9. Test results of SVM risk prediction model.
settlement data and discovered a wall seepage in the northern head
well. The seepage caused water loss and soil erosion, which led to
surface and building settlement (Zhou et al. 2017b). The risks could
Table 6. Optimal classification accuracy of risk prediction models be reduced only by solving the seepage problem. Fig. 11 shows
the pit seepage in this subway station. To address this issue, con-
SVM risk prediction struction workers immediately backfilled the seepage area with
RF risk prediction model model
cement and clay. Water gushing subsequently stopped. Then, the
Training Test Training Test second calculation was conducted. A new set of predictive variables
Number data (%) data (%) data (%) data (%) A1–B8 ¼ f−56.84; −7.73; −45.68; −2.89; −156.77; −8.95; −16.4;
1 99.38 100 98.13 95 −1.4; −5.5; −0.23; 16; 5.23; 731.26; −142.47; 13.6; 4.43g was input
2 98.75 97.50 96.25 92.50 into the model. The predicted result was still 3, indicating that
3 98.75 97.50 98.75 97.50 secondary seepage possibly occurred because the initial seepage
4 100 100 96.9 100 was only temporarily stopped. Construction workers then contacted
5 99.38 100 95.63 92.50 personnel in the mixing station to backfill the seepage area with
6 98.75 97.50 95.63 87.50
7 100 97.50 93.75 92.50
C15 early-strength concrete, as shown in Fig. 12. Approximately
8 99.38 97.50 98.75 95 15 m3 of concrete was used.
9 98.75 95 96.25 97.50 After backfilling, changing rates of the surface and building set-
10 100 97.50 93.75 92.50 tlement gradually stabilized. The third calculation was conducted.
Average 99.31 98 96.38 94.25 Another set of predictive variables A1–B8 ¼ f−57.4; −1.23;
−45.77; −1.96; −158.74; 0.76; 16.98; −1.35; −5.4; −0.23; 16.5;
3.62; 467.09; 136.6; 13.5; 2.6g was input into the model. The pre-
An investigation showed that both misclassified sets featured mu- dicted result was 2, indicating an improvement in the safety status
tated changing rates, which led to their misclassification. Therefore, of the foundation pit. Fig. 13 shows the changing values of mon-
the RF algorithm exhibited better learning adaptability than the itoring points A02 and F03. Although the accumulated settlement
SVM algorithm. values of monitoring points A02 and F03 continually increased, the
Randomness of the selected training and testing data may in- settlement rate noticeably slowed down. Fig. 14 shows the proba-
fluence experimental results. Thus, nine other experiments were bility distribution of the three predictions. As denoted in the figure,
conducted. Table 6 shows the classification accuracy of the 10 after the second calculation, the probability distribution of level 1
experiments for the two risk prediction models. For the RF risk was higher than that of the first calculation, thereby indicating that
prediction model, the average classification accuracy for the train- the risk still intensified, and the first measure exerted no effect.
ing data reached 99.31%, and that for the test data totaled 98%. After the third calculation, the probability distribution of level 1
For the SVM risk prediction model, the average accuracy for the decreased dramatically, thereby indicating that the second measure
training data amounted to 96.38%, and that for the test data was achieved remarkable results.
94.25%. On the day after the second backfilling, the construction unit
The previous comparative analysis confirmed that the RF risk held an internal meeting. Several experts were invited to analyze
prediction model can achieve better classification accuracy than the causes of seepage and assess the disposal measures. Numerous
the SVM risk prediction model, and prediction results of the RF suggestions were proposed, as follows:
risk prediction model were relatively stable. In addition, standardi- 1. Increase monitoring frequency for monitoring points at the
zation of predictive variables featured a minimal effect on RF risk northern terminal of the foundation pit.
prediction model results but remarkably influenced the SVM risk 2. Prepare the necessary materials and equipment needed for
prediction model results (Zhou et al. 2013a, b). The RF risk pre- emergencies.
diction model consumes less time than the SVM risk prediction 3. Contact the Pipeline Property Rights unit to inspect and replace
model, which is conducive to rapid decision-making. The impor- the damaged pipeline.
tance evaluation function of RF also determines the causes of 4. Direct the construction workers to grout and reinforce the
safety risks. diaphragm wall joint of the foundation pit.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 10. Monitoring data from April 3 to May 2: (a) surface accumulative settlement; and (b) building accumulative settlement.
Fig. 11. Pit leakage. Fig. 12. Pit backfilling.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 13. Changing values of monitoring data on May 1: (a) surface cumulative settlement; and (b) building cumulative settlement.
Conclusion
Accurate prediction of the safety risks of deep foundation pits is a

prerequisite for effective insurance of the construction safety of
subway foundation pit projects. Most of the existing methods, such
as ANN, BN, and SVM, cannot achieve good prediction results for
deep foundation pits in subway stations. On the basis of these ob-
servations, in this study, the strengths and weaknesses of the current
risk prediction methods were summarized, and a risk prediction
model based on the RF algorithm was proposed for deep founda-
tion pits in subway stations. Then, the model was applied to an
actual foundation pit project in a subway station of the Wuhan
Metro. High prediction accuracy was acquired. The main achieve-
ments and conclusions of this work are summarized as follows:
1. The proposed RF risk prediction model is a powerful tool for
risk classification of deep foundation pits. The RF does not rely
on large amounts of sample data and robust and unbalanced
Fig. 14. Probability distribution of three predictions.
data. With these characteristics, the proposed model is suitable
J. Comput. Civ. Eng., 2019, 33(1): 05018004

for foundation pit projects and was applied to a real deep foun- intensity, validation procedures, and integration of spatial context.”
dation pit project of the Wuhan Metro. The comparison results Geoderma 143 (1): 180–190. https://doi.org/10.1016/j.geoderma.2007
between the existing SVM and RF risk prediction models were .11.004.
analyzed in detail. Results showed that the proposed model Han, T., D. Jiang, Q. Zhao, L. Wang, and K. Yin. 2017. “Comparison of
random forest, artificial neural networks and support vector machine for
produced more accurate and stable prediction results than the
intelligent diagnosis of rotating machinery.” Trans. Inst. Meas. Control
existing SVM risk prediction model, further demonstrating the 40 (8): 2681–2693. https://doi.org/10.1177/0142331217708242.
good application prospects of the proposed model. Hasofer, A. M., and J. Qu. 2002. “Response surface modelling of Monte
2. The importance evaluation function of the proposed RF risk Carlo fire data.” Fire Saf. J. 37 (8): 772–784. https://doi.org/10.1016
prediction model can determine the causes of safety risks. In /S0379-7112(02)00028-0.
addition, several model application programs can be developed He, Y. G., X. Liu, and S. Chen. 2014. “Correlation degree analysis of
to guide the construction work. In this case study, using the RF surface settlement influence factors around the pit.” In Proc., Int. Conf.
risk prediction model, the most important predictive variable on Mechanics and Civil Engineering. Paris: Atlantis Press.
was the value of accumulated surface subsidence. By exploring Hong, H., P. Tsangaratos, I. Ilia, W. Chen, and C. Xu. 2017. “Comparing
the cause of outliers, hints of dangerous sources were found the performance of a logistic regression and a random forest model in
punctually, and remedial measures were implemented to prevent landslide susceptibility assessments, the case of Wuyaun area, China.”
In Workshop on World Landslide Forum, 1043–1050. Cham, Switzerland:
major safety incidents.
Springer.
In future research, real-time automatic data acquisition systems
Hu, T., J. Wang, and L. Zhang. 2015. “Prediction of hard rock TBM
should be considered to achieve automatic collection and extraction penetration rate using random forests.” In Proc., Control and Decision
of sample data. Therefore, for any type of problem, the superiority Conf., 3716–3720. New York: IEEE.
of any method cannot be broadly generalized. This study showed Iverson, L. R., M. Prasadam, S. N. Matthews, and M. Peters. 2008.
that the RF features a good application prospect for deep pit risk “Estimating potential habitat for 134 eastern US tree species under six
prediction. climate scenarious.” For. Ecol. Manage. 254 (3): 390–406. https://doi
.org/10.1016/j.foreco.2007.07.023.
Jan, J. C., S. L. Hung, S. Y. Chi, and J. C. Chern. 2002. “Neural network
Acknowledgments forecast model in deep excavation.” J. Comput. Civ. Eng. 16 (1): 59–65.
https://doi.org/10.1061/(ASCE)0887-3801(2002)16:1(59).
The presented work has been supported by the National Science Kecman, V. 2001. Learning and soft computing: Support vector machines,
Foundation of China (NSFC) through Grant No. 71471072. The neural networks, and fuzzy logic models. Cambridge, MA: MIT Press.
authors gratefully acknowledge the NSFC’s support. Kohavi, R. 1995. “A study of cross-validation and bootstrap for accuracy
estimation and model selection.” In Proc., Int. Joint Conf. on Artificial
Intelligence, 1137–1143. Burlington, MA: Morgan Kaufmann.
Kuhn, M., and K. Johnson. 2013. Applied predictive modeling. New York:
References Springer.
Arlot, S., and A. Celisse. 2010. “A survey of cross-validation procedures Lee, S., and D. W. Halpin. 2003. “Predictive tool for estimating accident
for model selection.” Stat. Surv. 4: 40–79. https://doi.org/10.1214/09 risk.” J. Constr. Eng. Manage. 129 (4): 431–436. https://doi.org/10
-SS054. .1061/(ASCE)0733-9364(2003)129:4(431).
Breiman, L. 1996. “Bagging predictors.” Mach. Learn. 24 (2): 123–140. Li, H. 2000. “Grey forecast and precaution system for foundation pit
https://doi.org/10.1007/BF00058655. deformation.” [In Chinese.] Site Invest. Sci. Technol. 6: 40–44.
Breiman, L. 2001a. “Random forests.” Mach. Learn. 45 (1): 5–32. https:// https://doi.org/10.3969/j.issn.1001-3946.2000.06.009.
doi.org/10.1023/A:1010933404324. Li, W. D., M. H. Wu, and N. Lin. 2016. “Horizontal displacement predic-
Breiman, L. 2001b. “Using iterated bagging to debias regressions.” Mach. tion research of deep foundation pit based on the least square support
Learn. 45 (3): 261–277. https://doi.org/10.1023/A:1017934522171. vector machine.” In Proc., 3rd Int. Conf. on Wireless Communication
Breiman, L., J. Friedman, R. Olshen, and C. J. Stone. 1984. Classification and Sensor Networks. Paris: Atlantis Press.
and regression trees. Belmont, CA: Wadsworth International Group. Loganathan, N., and H. G. Poulos. 1998. “Analytical prediction for
Catani, F., D. Lagomarsino, S. Segoni, and V. Tofani. 2013. “Landslide tunneling-induced ground movements in clays.” J. Geotech. Geoen-
susceptibility estimation by random forests technique: Sensitivity and viron. Eng. 124 (9): 846–856. https://doi.org/10.1061/(ASCE)1090
scaling issues.” Nat. Hazards Earth Syst. Sci. 13 (11): 2815–2831. -0241(1998)124:9(846).
https://doi.org/10.5194/nhess-13-2815-2013. Lu, Z. G., and J. D. Zhang. 2013. “Spatial and temporal analysis of pit
Chen, C., S. Pei, and J. Jiao. 2003. “Land subsidence caused by ground- deformation monitoring based on GIS.” Appl. Mech. Mater. 239–240:
water exploitation in Suzhou city, China.” Hydrogeol. J. 11 (2): 536–543. https://doi.org/10.4028/www.scientific.net/AMM.239-240.536.
275–287. https://doi.org/10.1007/s10040-002-0225-5. Ma, F., Y. Zheng, and F. Yang. 2008. “Research on deformation prediction
Chen, C., S. Zhang, and Y. Yu. 2004. “Prediction of retaining structure method of soft soil deep foundation pit.” J. Coal Sci. Eng. 14 (4):
displacement in foundation pit.” [In Chinese.] Chin. J. Rock Mech. 637–639. https://doi.org/10.1007/s12404-008-0430-5.
Eng. 23 (12): 2065–2068. Mair, R. J., and R. N. Taylor. 1997. “Theme lecture: Bored tunneling in the
Davison, A. C., and D. V. Hinkley. 1997. Bootstrap methods and their urban environment.” In Proc., 14th Int. Conf. on Soil Mechanics and
application. Cambridge, UK: Cambridge University Press. Foundation Engineering. Hamburg, Germany.
Ding, L., and X. U. Jie. 2017. “A review of metro construction in China: Martens, D., M. D. Backer, R. Haesen, J. Vanthienen, M. Snoeck, and B.
Organization, market, cost, safety and schedule.” Front. Eng. Manage. Baesens. 2007. “Classification with ant colony optimization.” Trans.
4 (1): 4–19. https://doi.org/10.15302/J-FEM-2017015. Evolut. Comput. 11 (5): 651–665. https://doi.org/10.1109/TEVC.2006
Ding, L., K. Li, Y. Zhou, and P. E. D. Love. 2017. “An IFC-inspection .890229.
process model for infrastructure projects: Enabling real-time quality MHURDPRC (Ministry of Housing and Urban-Rural Development of the
monitoring and control.” Autom. Constr. 84: 96–110. https://doi.org/10 People’s Republic of China). 2009. Technical code of urban rail transit.
.1016/j.autcon.2017.08.029. Beijing: China Planning Press.
Ding, L. Y., and C. Zhou. 2013. “Development of web-based system for Rodriguez-Galiano, V. F., B. Ghimire, J. Rogan, M. Chica-Olmo, and
safety risk early warning in urban metro construction.” Autom. Constr. J. P. Rigol-Sanchez. 2012. “An assessment of the effectiveness of a
34 (2): 45–55. https://doi.org/10.1016/j.autcon.2012.11.001. random forest classifier for land-cover classification.” J. Photogramm.
Grinand, C., D. Arrouays, B. Laroche, and M. P. Martin. 2008. “Extrapo- Remote Sens. 67 (1): 93–104. https://doi.org/10.1016/j.isprsjprs.2011
lating regional soil landscapes from an existing soil map: Sampling .11.002.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Rodriguez-Galiano, V. F., M. P. Mendes, M. J. Garcia-Soldado, M. Chica- Zhou, J., X. Li, and H. S. Mitri. 2016a. “Classification of rockburst in
Olmo, and L. Ribeiro. 2014. “Predictive modeling of groundwater underground projects: Comparison of ten supervised learning methods.”
nitrate pollution using random forest and multisource variables related J. Comput. Civ. Eng. 30 (5): 1–19. https://doi.org/10.1061/(ASCE)CP
to intrinsic and specific vulnerability: A case study in an agricultural .1943-5487.0000553.
setting.” Sci. Total Environ. 476–477 (4): 189. https://doi.org/10 Zhou, J., X. Li, S. Wang, and W. Wei. 2013a. “Identification of large-scale
.1016/j.scitotenv.2014.01.001. goaf instability in underground mine using particle swarm optimization
Samui, P. 2008. “Support vector machine applied to settlement of shallow and support vector machine.” Int. J. Mining Sci. Technol. 23 (5):
foundations on cohesionless soils.” Comput. Geotech. 35 (3): 419–427. 701–707. https://doi.org/10.1016/j.ijmst.2013.08.014.
https://doi.org/10.1016/j.compgeo.2007.06.014. Zhou, J., X. Z. Shi, K. Du, X. Qiu, X. Li, and H. S. Mitri. 2016b.
Su, G., K. Zhang, H. Zhang, and Y. Zhang. 2009. “Deformation prediction “Feasibility of random-forest approach for prediction of ground settle-
of foundation pit using Gaussian process machine learning.” In Proc., ments induced by the construction of a shield-driven tunnel.” Int. J.
2009 Asia-Pacific Conf. on Computational Intelligence and Industrial Geomech. 17 (6): 04016129. https://doi.org/10.1061/(ASCE)GM
Applications, 99–102. New York: IEEE. .1943-5622.0000817.
Sun, F. X. 2010. “SVM in predicting the deformation of deep foundation pit Zhou, J., X. Z. Shi, K. Du, X. Y. Qiu, X. B. Li, and H. S. Mitri. 2016c.
in soft soil area.” In Proc., 2010 Int. Conf. on Machine Vision and “Development of the ground movements due to shield tunneling
Human-Machine Interface, 761–763. New York: IEEE. prediction model using random forests.” In Proc., 4th Geo-China
Sun, H. T., and X. Wu. 1998. “Study on neural networks method of de-
Int. Conf., 108–115. Reston, VA: ASCE.
formation prediction of foundation pit based on artificial.” [In Chinese.]
Zhou, Y., L. Y. Ding, and L. J. Chen. 2013b. “Application of 4D visuali-
Rock Soil Mech. 4: 11.
zation technology for safety management in metro construction.”
Whittle, A. J., Y. M. A Hashash, and R. V. Whitman. 1993. “Analysis of
Autom. Constr. 34 (13): 25–36. https://doi.org/10.1016/j.autcon.2012
deep excavation in Boston.” J. Geotech. Eng. 119 (1): 69–90. https://doi
.10.011.
.org/10.1061/(ASCE)0733-9410(1993)119:1(69).
Wu, X., H. Liu, L. Zhang, M. J. Skibniewski, Q. Deng, and J. Teng. 2015. Zhou, Y., L. Y. Ding, Y. Rao, H. Luo, B. Medjdoub, and H. Zhong. 2017a.
“A dynamic Bayesian network based approach to safety decision sup- “Formulating project-level building information modeling evaluation
port in tunnel construction.” Reliab. Eng. Syst. Saf. 134: 157–168. framework from the perspectives of organizations: A review.” Autom.
https://doi.org/10.1016/j.ress.2014.10.021. Constr. 81: 44–55. https://doi.org/10.1016/j.autcon.2017.05.004.
Yoo, C., and D. Lee. 2008. “Deep excavation-induced ground surface Zhou, Y., H. Luo, and Y. Yang. 2017b. “Implementation of augmented real-
movement characteristics—A numerical investigation.” Comput. Geotech. ity for segment displacement inspection during tunneling construction.”
35 (2): 231–252. https://doi.org/10.1016/j.compgeo.2007.05.002. Autom. Constr 82: 112–121. https://doi.org/10.1016/j.autcon.2017
Zhang, L., X. Wu, M. J. Skibniewski, J. Zhong, and Y. Lu. 2014. .02.007.
“Bayesian-network-based safety risk analysis in construction projects.” Zhou, Y., and Y. Peng. 2016. “A case history of deep excavation above
Reliab. Eng. Syst. Saf. 131: 29–39. https://doi.org/10.1016/j.ress.2014 an operational metro subway.” [In Chinese.] Soil Eng. Found.
.06.006. 30 (5): 541–543+565.
Zhou, H. B., and H. Zhang. 2011. “Risk assessment methodology for a Zhou, Y., W. Su, L. Ding, H. Luo, and P. E. Love. 2017c. “Predicting safety
deep foundation pit construction project in Shanghai, China.” J. Constr. risks in deep foundation pits in subway infrastructure projects: A sup-
Eng. Manage. 137 (12): 1185–1194. https://doi.org/10.1061/(ASCE) port vector machine approach.” J. Comput. Civ. Eng. 31 (5): 04017052.
CO.1943-7862.0000391. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000700.
J. Comput. Civ. Eng., 2019, 33(1): 05018004

Intelligent Approach Based On Random Forest For Safety Risk Prediction of Deep Foundation Pit in Subway Stations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intelligent Approach Based On Random Forest For Safety Risk Prediction of Deep Foundation Pit in Subway Stations

Uploaded by

Copyright:

Available Formats

Case Study

Intelligent Approach Based on Random Forest for Safety

© ASCE 05018004-1 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

© ASCE 05018004-2 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

© ASCE 05018004-3 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

Bootstrap Bootstrap Bootstrap Turns

37% oob 1 37% oob 2 37% oob k 1 2 3 4 K

RF classification prediction model Testing classification accuracy

Correlation analysis Importance analysis

Fig. 1. Establishment of RF risk prediction model.

© ASCE 05018004-4 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

© ASCE 05018004-5 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

200 sample sets

Establishment of the model

Training sets Testing sets

Repeated 5-fold cross validation Model verification

High risk Medium risk Low risk

Application of the model

Check the safety

Contact the relevant

Increase monitoring frequency and

Fig. 3. Establishment and application of RF risk prediction model.

© ASCE 05018004-6 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 4. Monitoring layout of northern section.

Table 3. Collected monitoring data and safety risk levels

© ASCE 05018004-7 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

Table 4. Part of training sample data

J. Comput. Civ. Eng., 2019, 33(1): 05018004

importance analysis, the maximum changing rate of the monitoring

deep pit construction, the rate will probably remarkably change

tance rankings of the 16 predictive variables based on importance

tion of surrounding buildings and movement of underground

J. Comput. Civ. Eng.

scores and Gini indices in the RF algorithm. The conclusions drawn

© ASCE 05018004-9 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

−14.98; −44.97; −3.93; −143.85; −131.5; −15.98; −1.64; −5.4;

© ASCE 05018004-10 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

Fig. 11. Pit leakage. Fig. 12. Pit backfilling.

© ASCE 05018004-11 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

Accurate prediction of the safety risks of deep foundation pits is a

© ASCE 05018004-12 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

© ASCE 05018004-13 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

© ASCE 05018004-14 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2019, 33(1): 05018004

You might also like