Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Natural Hazards (2020) 102:851–871

https://doi.org/10.1007/s11069-020-03927-8

ORIGINAL PAPER

Multi‑geohazards susceptibility mapping based on machine


learning—a case study in Jiuzhaigou, China

Juan Cao1 · Zhao Zhang1 · Jie Du2 · Liangliang Zhang1 · Yun Song3 · Geng Sun4

Received: 29 January 2020 / Accepted: 12 April 2020 / Published online: 5 May 2020
© Springer Nature B.V. 2020

Abstract
Jiuzhaigou, located in the transitional area between the Qinghai–Tibet Plateau and the
Sichuan Basin, is highly prone to geological hazards (e.g., rock fall, landslide, and debris
flow). High-performance-based hazard prediction models, therefore, are urgently required
to prevent related hazards and manage potential emergencies. Current researches mainly
focus on susceptibility of single hazard but ignore that different types of geological hazards
might occur simultaneously under a complex environment. Here, we firstly built a multi-
geohazard inventory from 2000 to 2015 based on a geographical information system and
used satellite data in Google earth and then chose twelve conditioning factors and three
machine learning methods—random forest, support vector machine, and extreme gradi-
ent boosting (XGBoost)—to generate rock fall, landslide, and debris flow susceptibility
maps. The results show that debris flow models presented the best prediction capabilities
[area under the receiver operating characteristic curve (AUC 0.95)], followed by rock fall
(AUC 0.94) and landslide (AUC 0.85). Additionally, XGBoost outperformed the other two
methods with the highest AUC of 0.93. All three methods with AUC values larger than
0.84 suggest that these models have fairly good performance to assess geological hazards
susceptibility. Finally, evolution index was constructed based on a joint probability of these
three hazard models to predict the evolution tendency of 35 unstable slopes in Jiuzhaigou.
The results show that these unstable slopes are likely to evolve into debris flows with a
probability of 46%, followed by landslides (43%) and rock falls (29%). Higher susceptibil-
ity areas for geohazards were mainly located in the southeast and middle of Jiuzhaigou,
implying geohazards prevention and mitigation measures should be taken there in near
future.

Keywords  Jiuzhaigou · Susceptibility · Machine learning · Geological hazards · Evolution


tendency

Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s1106​


9-020-03927​-8) contains supplementary material, which is available to authorized users.

* Zhao Zhang
Zhangzhao@mail.bnu.edu.cn
Extended author information available on the last page of the article

13
Vol.:(0123456789)
852 Natural Hazards (2020) 102:851–871

1 Introduction

The occurrence of a geological hazard is a complex process, and many natural (e.g.,
earthquake and tsunami) and anthropogenic factors (e.g., deforestation and urbaniza-
tion) can trigger such hazards, including ground subsidence, landslides, rock falls, and
debris flows (Ge et  al. 2013; Gutiérrez et  al. 2014; Tehrany et  al. 2015). Geological
hazards have caused major threats to the lives and property of humankind, and heavy
damage to our environment and resources (Corominas et  al. 2013; Huang and Zhao
2018). The economic losses from such geological hazards have amounted to about 20
billion Yuan (CNY) every year in China (Hong et al. 2016). The geological hazards of
China are characterized by various types, wide distribution, and high frequency, espe-
cially in its western mountainous areas (Guha-Sapir et al. 2012). Therefore, it is of great
importance for related hazard relief and policy-making entities to obtain high-resolu-
tion susceptibility maps, especially for typical hazards such as landslide, rock fall, and
debris flow. Moreover, for potential threat from unstable slopes, we should know their
evolution tendencies and take  timely  preventive  measures. In this way, we can better
understand the spatial probability of geological hazard occurrence, and further monitor,
forecast, and warn the hazards more accurately. Consequently, risk assessment, hazard
prevention, and emergency management could be conducted in a timely manner.
Some traditional methods (e.g., heuristic and deterministic models) have been criti-
cized because of their excessive dependence on either the subjective judgments of
experts (Hong et  al. 2016) or the large amounts of data required (Pourghasemi et  al.
2012). However, machine learning can handle various types of data (e.g., ratio, interval,
nominal, or ordinal data) and are capable of identifying complex, nonlinear relation-
ships (Ge et al. 2013; Youssef et al. 2015; Zou et al. 2013). Therefore, machine learning
has been used widely, particularly to handle problems in which the characteristics of the
underlying processes are difficult to describe using physical equations (Cao et al. 2019).
Consequently, machine learning has become a potential alternative approach to study
geological hazard susceptibility. Such methods have been widely applied to hazard anal-
ysis through learning the relationship between a certain geological hazard and condi-
tioning factors without assuming a structural model at first (Dickson and Perry 2016;
Huang and Zhao 2018); there has been increasing interest in using machine learning
techniques to study susceptibility of geological hazards, e.g., the boosted regression tree
(BRT), classification and regression tree (CART), generalized linear model (GLM), ran-
dom forest (RF), and support vector machine (SVM) methods. For example, SVM was
used to map landslide susceptibility in China (Huang and Zhao 2018; Yao et al. 2008).
Moreover, different machine learning methods have been compared and estimated in
different regions of the world (Marjanović et  al. 2011; Mokhtari and Abedian 2019;
Youssef et  al. 2015; Zhu et  al. 2017), such as in Saudi Arabia (Youssef et  al. 2015),
Sichuan China (Zhu et  al. 2017), Iran (Pourghasemi and Rahmati 2018), and Western
Serbia (Marjanović et  al. 2011). Nevertheless, such previous researchers have only
focused on a single type of hazard, but ignored that different types of geological hazards
might occur simultaneously under a complex environment. Thus, multi-hazards studies
should be systematically investigated for a scientific decision. Additionally, very few
studies have concerned on the evolution tendency of potential hazard (e.g., the unstable
slopes). More urgently, decision-makers should know firstly the evolution tendency of
some potential hazards and then implement the corresponding prevention and reduction
measures in advance (Huang and Zhao 2018).

13
Natural Hazards (2020) 102:851–871 853

After the 2008 M ­ s 8.0 Wenchuan earthquake, many unstable slopes threaten residents
in western mountainous areas of Sichuan Province. Jiuzhaigou has been one of such hot-
spots since it has been affected by frequent geological hazards due to climatic, seismic,
and anthropogenic factors. For example, on August 8, 2017, the occurrence of an ­Ms 7.0
earthquake caused a great number of geological hazards in Jiuzhaigou, and researchers
have conducted a series of studies in the area (Fabbri et al. 2003; Wu et al. 2018). In this
study, based on a multisource datasets, together with remote sensing (RS) and geographical
information system (GIS) techniques, three typical machine learning methods—including
SVM, RF, and XGBoost—were employed to assess multi-hazards (landslide, rock fall, and
debris flow) susceptibility and predict the evolution tendency of unstable slopes in Jiuzhai-
gou. The main objectives were: (1) to develop RF, SVM, and XGBoost models to map the
susceptibility of the three main geological hazards, (2) to validate these hazard susceptibil-
ity maps based on receiver operating characteristic (ROC) curves and predictive accuracy
(ACC), and identify the main conditioning factors, and (3) to predict the evolution ten-
dency of unstable slopes in Jiuzhaigou. Our research may provide some contributions for
Chinese decision-makers in monitoring and warning geological hazards.

2 Materials and method

2.1 Study area

Jiuzhaigou is located in Akanzang and Qiang nationality autonomous prefecture, north of


Sichuan Province, China, between 32°54′ – 33°19′ N and 103°46′– 104°4′ E (Qiao et  al.
2016). The topography is high in the northwest and low in the southeast. Tableland and
alpine valleys make up the main landform (Fig. 1). In addition, the Jiuzhaigou has a conti-
nental plateau climate with an annual average temperature of 12.7 °C and total annual rain-
fall of 550 mm. The 80% of rainfall occurs between May and October, thereby triggering a
series of rainstorm occurrences and many geological hazards.
Jiuzhaigou is one of the most frequent geodisasters striking regions in China because
of its complex geological environments, well-developed folds and faults, extensive crustal
uplift, poor stability of rock slopes, and frequent rainstorms. Furthermore, Jiuzhaigou, rec-
ognized as a World Heritage Site in 1992 and a World Biosphere Reserve in 1997, attracts
many tourists each year (Li et  al. 2006). Therefore, it is important to analyze the spati-
otemporal distribution of geological hazards, and multi-hazard susceptibility assessment to
identify and delineate hazard-prone areas.

2.2 Geological hazard inventory

An accurate and detailed hazard inventory map has been prepared. In this study, geologi-
cal hazard information (i.e., occurrence location, size, type, and occurrence time) is com-
piled from official records (BLRS, the Bureau of Land and Resources of Sichuan) from
2000 to 2015. In order to delineate these hazards area, the interpretation of satellite images
was carried out in Google Earth pro 7.1, including geological hazards (landslide, rock fall,
and debris flow) and potential hazard (unstable slope). The contour and texture features of
common geohazards in satellite images are shown in Fig. 2.

13
854 Natural Hazards (2020) 102:851–871

Fig. 1  A location of the study area in China’s Sichuan Province (a–b); the distribution of geological haz-
ards in towns (c); the locations of the geological hazards in Jiuzhaigou (d)

2.3 The spatiotemporal features of hazards

To further investigate the features of hazards in Jiuzhaigou, the spatial patterns of


hazards were firstly analyzed from 2000 to 2015. An inverted “Y” shape is shown in
Fig.  1d, indicating that the hazard sites were mainly located along rivers, roads, val-
leys, and settlements. A total of 393 hazards were recorded, including 182 debris flows
(46.3%), 109 rock falls (27.7%), 67 landslides (17.1%), and 35 unstable slopes (8.9%).
Zhangzha Town suffered the most frequent geological hazards, with 53 rock falls,
45 debris flows, 29 landslides, and 11 unstable slopes, which together accounted for
35.03% of all hazards in the study area (Fig. 1b, d). The five towns with hazard numbers
over 20 were Zhangzha (138), Nanping (44), Heihe (34), Yuwa (29), and Baihe (23),
which were mainly distributed in the central and northeastern areas of Jiuzhaigou.
Regarding interannual patterns of geological hazards, the occurrence number was
concentrated in the period of 2008–2014, with a peak in 2013 (55.98%, Fig.  3b). The
hazard scale behaved similarly to the number, with the maximum of 2.94 × 107  m3 in
2013, and major hazards occurred in a small size (65.7%). For the intra-annual pat-
terns (Fig. 3a), it was found that geological hazards occurred more frequently during the

13
Natural Hazards (2020) 102:851–871 855

Fig. 2  Interpretation of geological hazards: landslide (a), rock fall (b), debris flow(c), and unstable slope
(d)

Fig. 3  Temporal patterns of geographical hazards, annual occurrence number (bar graph) and total scale
(blue line: unit 1­ 04 m3) (a), and seasonal changes (b) of geological hazards from 2000 to 2015 in Jiuzhaigou

13
856 Natural Hazards (2020) 102:851–871

period from March to July (367, 93.4%), especially in June (255, 64.8%), followed by
May (50, 12.7%) and March (23, 5.85%).

2.4 Conditioning factors

Geological hazards are caused by many potential factors (e.g., geology, topography, and
earthquakes tendency) and triggering factors (e.g., earthquakes, rainfall, stream scouring,
and human activities) (Chuang and Shiu 2018; van Westen et al. 2008; Yao et al. 2008).
Considering the causes of geological hazards and the geomorphologic characteristics of
Jiuzhaigou, we selected twelve factors (i.e., altitude, aspect, slope, land use, fault density,
distance to rivers, distance to roads, normalized difference vegetation index (NDVI), mean
annual rainfall, distance to faults, distance to epicenters, and lithology) related to triggering
mechanisms (e.g., rainfall, earthquake, and human activities) and potential variables (e.g.,
topographic structure, vegetation cover, and river systems) as conditioning factors (Fig. 4).
All the data were converted to 90 × 90  m grid data and a unified projection (UTM-Zone
48, WGS84 datum). One-to-one correlation coefficient analysis between the conditioning
factors was calculated to prevent collinearity (Fig. S2). Kornejady et al. (2015, 2017) sug-
gested that an absolute value of the correlation coefficient of 0.7 should be chosen as a
threshold to judge the collinearity between two factors. An absolute value of coefficient
larger than 0.7 will cause bias in the hazard susceptibility map. Hence, high correlation
should be removed (Kornejady et al. 2015; Kornejady et al. 2017). No significant corre-
lations were indicated for the 12 conditioning factors, suggesting that all factors can be
reasonably applied to develop model. Besides, the distributions of all geohazards with the
12 conditioning factors are illustrated in Fig. S1 (see supplementary material). The result
shows that there are obviously nonlinear relationships between conditioning factors and
geohazards. The detailed description of these factors was displayed as follows:

2.4.1 Triggering factors

Rainfall (mean annual rainfall), earthquakes (distance to epicenters), and certain engineer-
ing measures related to human activities (distance to roads, land use) are the main causes
of geological hazards in Jiuzhaigou. Rainfall data were derived from China’s meteorologi-
cal data sharing service system (http://data.cma.cn). We calculated the mean annual rain-
fall from 2000 to 2015 and used the inverse distance weighted (IDW) method to interpolate
rainfall per pixel (Fig. 4i). The locations of historical earthquakes ­(Ms ≥ 3) from 1970 were
obtained from the China Earthquake Networks Center (http://news.ceic.ac.cn). Information
on land use in Jiuzhaigou in 2015 was derived from the Resource and Environment Data
Cloud Platform (http://www.resdc​.cn/). The road information was extracted from a topo-
graphic map, obtained from the China Geological Survey (http://gsd.cgs.cn/downl​oad.asp).
The Euclidean Distance tool in ArcGIS 10.3 was used to produce the maps of distance to
epicenters (Fig. 4g) and distance to roads (Fig. 4j).

2.4.2 Potential factors

The geomorphology (altitude, slope, and aspect), geology (lithology, distance to faults, and
fault density), vegetation cover (mean annual NDVI), and river systems (distance to riv-
ers) are the basic internal factors of geological hazards. A digital elevation model (DEM)
with 90 × 90  m resolution was derived from the Consultative Group for International

13
Natural Hazards (2020) 102:851–871 857

Fig. 4  Conditioning factors: a altitude, b slope, c aspect, d land use, e faults density, f distance to rivers,
g distance to roads, h NDVI, i rainfall, j distance to epicenters, k distance to faults, i lithology. Note that
Carboniferous (C), Devonian (D), Paleogene (E), Permian (P), Quaternary (Q), Triassic (T), Ediacaran (Z),
Cambrian (∈)

13
858 Natural Hazards (2020) 102:851–871

Agricultural Research-Consortium for Spatial Information (CGIAR-CSI, http://srtm.csi.


cgiar​.org). Slope and aspect were extracted from the DEM using ArcGIS 10.3. The fault
locations and lithological units were extracted from a geological map. The lithological
maps were divided based on chronostratigraphic units (Fig.  4l), including units of eight
ages: (1) Carboniferous (C), (2) Devonian (D), (3) Paleogene (E), (4) Permian (P), (5) Qua-
ternary (Q), (6) Triassic (T), (7) Ediacaran (Z), (8) Cambrian (∈). The rivers were also
extracted from a topographic map. The Euclidean Distance tool was used to produce the
maps of distance to rivers (Fig.  4f) and distance to faults (Fig.  4k). The Kernel Density
tool was used to produce the fault density map (Fig.  4e). The NDVI values for 2001 to
2015 were derived from the Landsat 8-Day NDVI composites product of the Google Earth
Engine (GEE) (https​://earth​engin​e.googl​e.com/). In addition, the maximum value compos-
ite (MVC) method (Corominas et al. 2013; Kamp et al. 2008; Youssef et al. 2015) was used
to extract annual NDVI for 2000–2015 and calculate the mean annual NDVI (Fig. 4h).

2.5 Methods

The analysis is conducted in five steps: (1) preparing landslide, rock fall, and debris flow
inventories, and conditioning factors; (2) analyzing the correlation and rescaling the con-
ditioning factors, and determining the relationships between the hazards and the factors;
(3) optimizing the crucial parameters and constructing RF, SVM, and XGBoost models,
and producing three geological hazard susceptibility maps; (4) evaluating and comparing
the models using ROC curves and ACC, and sorting the factors by their importance; (5)
finally, predicting the evolution tendencies of unstable slopes in Jiuzhaigou (Fig. 5).
An equal number of non-hazards were randomly selected as hazard-free sites. Then all
geological hazards and non-hazards were divided using a random partitioning algorithm
for training data (70%) and validation data (30%) (Cao et al. 2019; Trigila et al. 2015). To
reduce variability, the tenfold cross-validation and GridSearchCV function were applied to
optimize hyper-parameters for each method from empirical candidates only using the train-
ing data (Chen et al. 2017).

2.5.1 SVM

Support vector machine (SVM) is a supervised learning model based on the principle of
structural risk minimization (SRM) (Pourghasemi et  al. 2012; Vapnik 2000). A kernel
function is efficiently applied for linear classification and nonlinear classification (Boser
2008; Cortes and Vapnik 1995) because it can transform the training data into high-
dimensional feature spaces (Huang and Zhao 2018). Selection of the kernel function, such
as sigmoid, polynomial, linear, or radial basis function (RBF), will affect the prediction
accuracy. In this paper, we used the two-class (0 or 1) SVM with RBF, the most popu-
lar function commonly used for landslide susceptibility mapping, to build the SVM model
(Marjanović et al. 2011; Pourghasemi and Rahmati 2018). The algorithm generates a sepa-
rating hyper-plane between the points of two distinct classes.
The classification problem of hazards and conditioning factors are a nonlinearity; the
SVM approach should transform the nonlinear case into a linear one by using the kernel
function. Finally, we followed previous studies (Pourghasemi and Rahmati 2018; Pradhan
2013) to determine three key parameters of SVM: the penalty factor C, the kernel functions
(the regularization parameter (ϑ), and the kernel width (γ). The grid search cross-valida-
tion (GridSearchCV) function was used to optimize crucial parameters (Vapnik 1999) and

13
Natural Hazards (2020) 102:851–871 859

Fig. 5  Analysis flowchart of methodology

13
860 Natural Hazards (2020) 102:851–871

“SVM” (Pedregosa et al. 2011) in scikit-learn of the Python 3.5 software to assess multi-
hazard susceptibility.

2.5.2 RF

Random forest (RF) is an ensemble-learning method (Breiman 2001). For classification


problem, the RF operated by constructing a multitude of decision trees and outputted the
results by taking the majority votes (Ho 1998). Each tree was built on a bootstrap sample,
which was selected from observations and a subset of features, and then the final classifica-
tion result was voted among all the trees in the “forest.” The misclassification error (%) for
the overall out-of-bag element is called the out-of-bag (OOB) error and is used to estimate
the generalization error rate and assess the importance of variables. Selecting random fea-
tures and bagging are two powerful ideas in a random forest (Breiman 2001).
Similarly, GridSearchCV function was used to optimize crucial parameters (the number
of trees and predictive variables). Moreover, “RandomforestClassifier” (Pedregosa et  al.
2011) in scikit-learn was used to assess multi-hazard susceptibility. The RF method has
been widely and successfully used to generate landslide susceptibility maps all around the
world (Chen et al. 2017; Hong et al. 2016; Youssef et al. 2015) because of its high perfor-
mance in landslide susceptibility assessment (Chen et al. 2017; Pourghasemi and Rahmati
2018; Pradhan 2013).

2.5.3 XGBoost

Extreme Gradient Boosting (XGBoost) is a combination of a scalable boosting algorithm


and ensemble machine learning techniques (Chen and Guestrin 2016). The “regularized
boosting” technique has been regarded as a powerful idea to improve model accuracy
because of its role in reducing overfitting. XGBoost is used to supervise learning prob-
lems (classification and regression). XGBoost also allows users to run a cross-validation at
each iteration of the boosting process, which is beneficial to get the exact optimum number
of boosting iterations in a single run. In addition, the main advantages of XGBoost are
its scalability and faster computation. As the winner of an increasing number of Kaggle
competitions, XGBoost  has become a great all-around algorithm with its excellent per-
formance (Chen and Guestrin 2016). However, to date, few studies have used XGBoost
to map geological hazard susceptibility. Therefore, GridSearchCV function in scikit-learn
of the Python 3.5 software was used to optimize the main hyper-parameters (i.e., learning
rate, tree depth, and subsample).

2.6 Model assessment

ROC curve and area under the curve (AUC) has been widely used to characterize the per-
formance of multi-hazard susceptibility models, which are the most common evaluation
index in this field (Chen et  al. 2017; Hong et  al. 2016; Kim et  al. 2018; Mokhtari and
Abedian 2019; Youssef et al. 2015). The AUC values of ROC curves display the goodness
of these model predictions. The value of the AUC is between 0.5 and 1. A higher AUC
value indicates better predictive ability. The AUC values less than 0.7 indicate poor pre-
dictive capability, with higher values indicating moderate (0.7–0.8), good (0.8–0.9), and
excellent (0.9–1) predictive capability (Swets 1988). Furthermore, the ACC is also a useful
tool to assess the predictive capability of these models. ACC is the proportion of hazard

13
Natural Hazards (2020) 102:851–871 861

and non-hazard pixels that models correctly classified. In this study, ACC together with
AUC was used to evaluate performances of the three hazard models.
TP + TN
Accuracy = (1)
TP + FP + TN + FN
where TP (true positive) and TN (true negative) are the number of pixels that are correctly
classified and FP (false positive) and FN (false negative) are the numbers of pixels incor-
rectly classified (Hong et al. 2016; Wang et al. 2015).

2.7 Evolution tendency prediction of the unstable slope

Triggered by the earthquakes, rainstorms, and human activities, etc., an unstable slope
could evolve into a landslide, debris flow, rock fall, or other geohazards. In order to pre-
dict the evolution tendencies of 35 unstable slopes recorded in Jiuzhaigou. We established
an evolution index (EI) by determining a joint probability of the three machine learning
methods discussed above for landslide, rock fall, and debris flow, respectively. Firstly, we
extracted the susceptibility values (calculated via three machine learning methods for land-
slide, rock fall, and debris flow) of 35 unstable slopes using ArcGIS 10.3. Secondly, we
calculated the average susceptibility value of 35 unstable slopes for landslide, rock fall,
and debris flow as EI (produced by three techniques), respectively. Finally, we defined this
unstable slope could evolve into the geohazard if susceptibility values calculated by three
machine learning methods are all larger than the EI of the corresponding hazard.

3 Results

Three machine learning techniques (including RF, SVM, and XGBoost) were employed to
produce three geohazard’s (landslide, rock fall, and debris flow) susceptibility maps, then
compared, and evaluated their accuracy for each type of hazard. Finally, the importance of
conditioning factors was analyzed and predicted the evolution tendency for unstable slopes
in Jiuzhaigou.

3.1 Spatial pattern of multi‑geohazards susceptibility

RF, SVM, and XGBoost techniques were used to calculate the susceptibility index val-
ues throughout the study area. After that, the susceptibility map was reclassified into five
classes using natural break point method (Cao et  al. 2019; Chen et  al. 2017; Dragicevic
et  al. 2015; Kornejady et  al. 2017). The three hazard susceptibility maps are illustrated
in Fig. 6, and the area percentages of each class are shown in Tables 1, 2 and 3. Inverted
“Y” shapes were indicated, which implied that the highly susceptible areas of landslide
are mainly concentrated in the southeast (Fig.  6a–c), and rock falls and debris flows are
mainly concentrated in the southeast and middle of the study area (Fig. 6d–i). Moreover,
the highly susceptible areas (including very high and high area) for rock falls and debris
flows are relatively clustered,  and the low susceptibility areas are relatively small. Spe-
cifically, the highly susceptible area percent (including very high and high area percent)
for landslides only accounted for approximately 14% of the study area, but almost 91% of

13
862 Natural Hazards (2020) 102:851–871

Fig. 6  Susceptibility maps of landslides (a–c), debris flow (d–f), and rock fall produced (g–i)

historical landslide percent happened in those areas; similar results were found for rock
falls (~ 12% vs. ~ 92%) and debris flows (~ 11% vs. ~ 91%) (Tables 1, 2, 3).

3.2 Accuracy evaluation of modeling results

In this study, the ACC, ROC curves, and AUC values of these three techniques using
training data are shown in Table 4 and Fig. 7 (landslide, rock fall, and debris flow). The
XGBoost model has the highest performance in terms of ACC and AUC, with mean values
of 0.92 and 0.95, respectively (Table 4 and Fig. 7). The RF and SVM techniques exhib-
ited slightly lower ACC and AUC values than XGBoost techniques, with mean ACC/AUC
values of 0.90/0.94 and 0.89/0.94 for the RF and SVM models, respectively (Table 4 and
Fig. 7). All these results indicated reasonable goodness-of-fit performance with the train-
ing dataset, and the XGBoost performed slightly better than the other two techniques.

13
Natural Hazards (2020) 102:851–871 863

Table 1  Landslide susceptibility Model Susceptibility level Area (%) Landslide Landslide (%)
areas and landslide percent for
three techniques
SVM Very low 45.83 2 2.99
Low 22.84 0 0.00
Moderate 14.95 4 5.97
High 8.46 14 20.90
Very high 7.92 47 70.15
RF Very low 55.03 0 0
Low 19.01 1 1.49
Moderate 11.74 3 4.48
High 7.18 8 11.94
Very high 7.05 55 82.09
XGBoost Very low 68.99 5 7.46
Low 11.09 0 0.00
Moderate 6.66 3 4.48
High 5.11 4 5.97
Very high 8.16 55 82.09

Table 2  Rock fall susceptibility Model Susceptibility level Area (%) Rock fall Rock fall (%)
areas and rock fall percent for
three techniques
SVM Very low 61.86 2 0
Low 15.55 2 0.92
Moderate 8.84 4 1.83
High 7.01 17 9.17
Very high 6.74 84 88.07
RF Very low 69.89 1 0.92
Low 6.63 2 1.83
Moderate 9.74 6 5.5
High 8.52 29 26.61
Very high 5.23 71 65.14
XGBoost Very low 81.2 0 7.46
Low 5.53 1 0.00
Moderate 3.88 2 4.48
High 3.33 10 5.97
Very high 6.06 96 82.09

The prediction capabilities of the three constructed models for three hazards were
evaluated using validation data; Fig.  8 and Table  4 show the ROC curves and ACC
for the three techniques of three geological hazards (landslides, rock falls, and debris
flows). All three models exhibited good prediction performance for rock falls and debris
flows (AUC ≥ 0.9; ACC ≥ 0.85). In the case of landslides, the AUCs/ACCs of RF, SVM,
and XGBoost corresponded to 0.84/0.81, 0.88/0.86, and 0.86/0.84, respectively. Cor-
responding values for rock falls (debris flows) were 0.90/0.91 (0.93/0.91), 0.95/0.85
(0.96/0.86), and 0.97/0.92 (0.97/0.92), respectively. We concluded that all three tech-
niques exhibited reasonably good prediction capabilities in the study area. In addition,

13
864 Natural Hazards (2020) 102:851–871

Table 3  Debris flow susceptibility areas and debris flow percent for three techniques
Model Susceptibility level Area (%) Debris flow Debris flow (%)

SVM Very low 77.71 5 2.75


Low 9.30 11 6.04
Moderate 4.78 14 7.69
High 2.31 11 6.04
Very high 5.89 141 77.47
RF Very low 67.76 0 0
Low 11.27 1 0.55
Moderate 6.98 2 1.10
High 6.12 18 9.89
Very high 7.87 161 88.46
XGBoost Very low 81.38 1 0.55
Low 5.41 9 4.95
Moderate 3.48 9 4.95
High 3.48 20 10.99
Very high 6.25 143 78.57

Table 4  ACC for the three ACC​ Landslide Rock fall Debris flow
techniques on training and
validation data Training data
 SVM 0.90 0.86 0.90 0.89
 RF 0.82 0.92 0.95 0.90
 XGBoost 0.86 0.93 0.96 0.92
 Mean 0.86 0.90 0.94 –
Validation data
 SVM 0.86 0.85 0.86 0.86
 RF 0.81 0.91 0.91 0.88
 XGBoost 0.84 0.92 0.92 0.89
 Mean 0.84 0.89 0.90 –

Fig. 7  Success rates using the training data

13
Natural Hazards (2020) 102:851–871 865

Fig. 8  Prediction rates using the validation data

the best models to predict landslide are SVM, but the rock fall and debris flow are
XGBoost model, with AUCs/ACCs of 0.88/0.86, 0.97/0.92, and 0.97/0.92, respectively,
although the three landslide models performed more poorly than the models for rock
falls and debris flows.

3.3 Conditioning factors analysis

The best method XGBoost was applied to assess the importance of all conditioning fac-
tors (Fig. 9). The result showed that altitude (89.7%, 98.3%, and 96.1% for landslides,
rock falls, and debris flows, respectively) was the most important factor, which was con-
sistent with previous researches (Chen et  al. 2017; Tien Bui et  al. 2015). Our results
further indicated that geomorphic conditions are closely related to the occurrence of
geohazards. Moreover, other factors, such as distance to roads (85.8%, 91.8%, 91.6%),
rainfall (65.5%, 61.2%, 70.2%), and distance to epicenters (68.9%, 65.5%, 52.4%), also
contribute greatly to the susceptibility models, which indicated that rainfall, earth-
quakes, and engineering measures related to human activities significantly impact geo-
hazards occurrences.

Fig. 9  Important order of conditioning factors according to XGBoost method

13
866 Natural Hazards (2020) 102:851–871

3.4 The evolution tendency for unstable slopes

Table 5 shows that susceptibility values statistics of these unstable slopes and EIs (the bold
blue number in Table  5) were 0.76 (debris flow), 0.69 (rock fall), and 0.63 (landslide),
respectively. In this research, we defined this unstable slope could become the correspond-
ing hazard (the red recorded in Table S1) if the joint probability of three models is larger
than the EI of the corresponding hazard. A detailed summary on the susceptibility values
and their distributions is shown in Table  S1 and Fig.  10. All the unstable slopes could
potentially develop as follows: 16 unstable slopes (45.7%) could become debris flows; 15
(42.9%) for landslides; and 10 (28.6%) for rock falls. According to the results, some unsta-
ble slopes have more than one evolution tendency. In terms of their spatial patterns, unsta-
ble slopes for landslide are mainly located in southern towns (Fig. 10a), especially in Nan-
ping Town (five red triangles) and Zhangzha Town (three red triangles). Unstable slopes
prone to rock falls are sparsely scattered in eastern towns, especially in Nanping Town
(four black circles) (Fig. 10b). However, a belt is clearly indicated by the unstable slopes
prone to debris flows, with many green points in a mosaic pattern near the southeast towns
(Fig. 10c), especially in considering the causes of geological hazards and the geomorpho-
logic and geological characteristics of Jiuzhaigou Nanping Town (six green circles).

4 Discussion

4.1 Comparing XGBoost with RF and SVM

Each machine learning technique has its advantages and shortcomings, and performance
may vary in different areas (Huang and Zhao 2018; Pourghasemi and Rahmati 2018;
Youssef et al. 2015). The results showed all models (XGBoost, RF, and SVM) performed
well for the three geological hazards in Jiuzhaigou county. We attributed their good

Table 5  Susceptibility statistics of unstable slope produced by three models


Debris flow (RF) Debris flow (XGB) Debris flow (SVM) Mean

Minimum 0.63 0.11 0.01 0.25


Maximum 0.98 0.94 0.89 0.94
Mean 0.88 0.73 0.66 0.76
Rock fall (RF) Rock fall (XGB) Rock fall (SVM) Mean

Minimum 0.31 0.04 0.11 0.15


Maximum 0.89 0.97 0.96 0.94
Mean 0.64 0.74 0.71 0.69
Landslide (RF) Landslide (XGB) Landslide (SVM) Mean

Minimum 0.31 0.03 0.03 0.12


Maximum 0.95 0.99 0.88 0.94
Mean 0.69 0.69 0.52 0.63

Note that the bold number represents evolution index (EI)

13
Natural Hazards (2020) 102:851–871 867

Fig. 10  Evolution tendency for unstable slopes in Jiuzhaigou

performances in prediction to the relatively concentrated distribution of three types of geo-


hazards. Thus, the highly susceptible areas (hazard-prone areas) are easily distinguished.
The same model may perform differently for different hazards, even in the same study
area. The results of state-of-the-art XGBoost were not always better than the others (RF
and SVM). For debris flows and rock falls, the performance of XGBoost was slightly supe-
rior to SVM and RF techniques. However, the SVM was better than XGBoost and RF mod-
els for landslide susceptibility assessment. We further compared with the algorithms of
three techniques and found that the SVM model is good at analyzing a small group of data
(only 67 landslide samples, the smallest among three hazards) for its kernel function and
structural risk minimization scheme. Nevertheless, there are some defects in SVM, such
as overfitting issues (Yao et al. 2008), and a lack of user friendliness (Goetz et al. 2015).
Comparing with SVM, however, XGBoost has joined the regularization, which is helpful
to reduce overfitting, and the users can define custom optimization objectives and evalua-
tion criteria (Chen and Guestrin 2016; Cheng et al. 2018).

13
868 Natural Hazards (2020) 102:851–871

Compared with the RF technique, XGBoost used a gradient boosting method to improve
model accuracy, whereas an ensemble of trees that vote independently is used for RF. In addi-
tion, XGBoost reduces error mainly by reducing bias rather than reducing variance as RF
model conducting. Although prediction accuracy may differ for different parameters and dif-
ferent datasets (Trigila et al. 2015), we found that XGBoost generally performed better than
SVM and RF.

4.2 Suggestions and implications

Based on factors importance analysis, distance to roads was ordered secondly after altitude,
which demonstrated the close relationship between human activities and hazard susceptibil-
ity. Moreover, the spatial susceptibility patterns of the three geological hazards (Fig. 6) indi-
cated highly susceptible areas are located along roads, rivers, settlements, and valleys because
intensive human activities (built-up areas, road constructions, and the related soil erosion) can
reduce the natural stability of the original slopes and change their topographic and geological
conditions (Cao et al. 2019; Chuang and Shiu 2018; Gorsevski et al. 2006). According to sta-
tistical records, tourism is the largest industry in Jiuzhaigou, and tourism numbers and income
have risen sharply in recent years (Fig. S3). Some previous studies have shown that Jiuzhaigou
needs to construct lots of housing and roads to satisfy the dramatically increasing tourist arriv-
als, especially in Zhangzha Town, where 138 geological hazards, accounting for 35.03% of
the total and including 85 hazards in scenic areas, have occurred (Cao et al. 2019). Therefore,
how the government can balance and control eco-sustainable tourism evolution in Jiuzhaigou
should be investigated in the future.

4.3 Some limitations

There were also some limitations in this study. Firstly, the prediction evolution tendency
for unstable slopes is based on the assumption that these unstable slopes are more likely to
become landslides, rock falls, and debris flows under the backgrounds of historical mountain-
ous climate, earthquakes-prone, and intensive human activities. However, land morphology
changes continually; multi-hazard susceptibility maps and the evolution tendency for unstable
slopes will consequently change with the environmental changes. Therefore, hazard inven-
tory maps and conditioning factors should be updated regularly. Secondly, RF, SVM, and
XGBoost are all statistical models, but lacking in hazard mechanisms. Therefore, the mech-
anism processes of geological hazards should be mainly focused in future studies. Finally,
machine learning only considers the attribute information of spatial objects and ignores the
spatial structural information, which led to suboptimal geohazard susceptibility mapping.
In addition, the selection of conditioning factors was not objective to such an extent, which
may reduce the reliability of susceptibility mapping. To address these problems, combining
GeoDetector (Geographical Detectors), machine learning model and spatial auto-regression
(SAR) model for geohazard susceptibility mapping are proposed in the further research (Yang
et al. 2019), which can make full use of both the spatial structure and attribute information of
spatial objects.

13
Natural Hazards (2020) 102:851–871 869

5 Conclusion

Based on historical geological hazards and 12 conditioning factors in Jiuzhaigou County,


our study found that these three well-optimized models (RF, XGBoost, and SVM) are all
suitable for multi-hazards (landslide, rock fall, and debris flow) susceptibility mapping, but
the performance of XGBoost was slightly better than RF and SVM. Furthermore, highly
susceptible areas were mainly concentrated in the south of Jiuzhaigou, especially along
roads, rivers, and valleys. Finally, 35 potential geological hazards (unstable slopes) should
be monitored more closely; these unstable slopes will most likely to become debris flows,
followed by landslides and rock falls. These findings suggest that the related government
should emphasize these highly susceptible areas and take positive and efficient measures
for adaption, such as monitoring and forecasting rainstorms and earthquakes,  improving
anti-seismic measures, and balancing regional tourism evolution, to reduce the adverse
effects of human activities and natural hazards on the environment.

Acknowledgements The study is financially supported by National Key R&D Program of China


(2017YFC1502505), and Jiuzhaigou Post-Disaster Restoration and Reconstruction Program Research on
Restoration and Protection of World Natural Heritage.

Compliance with ethical standards 


Conflicts of interest  The authors declare no conflict of interest.

References
Boser BE (2008) A training algorithm for optimal margin classifiers. Proc Annu ACM Workshop Comput
Learn Theory 5:144–152. https​://doi.org/10.1145/13038​5.13040​1
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cao J, Zhang Z, Wang C, Liu J, Zhang L (2019) Susceptibility assessment of landslides triggered by
earthquakes in the Western Sichuan Plateau. CATENA 175:63–76. https​://doi.org/10.1016/j.caten​
a.2018.12.013
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM sig-
kdd international conference on knowledge discovery and data mining, pp 785–794. ACM
Chen W, Xie X, Wang J et al (2017) A comparative study of logistic model tree, random forest, and classifi-
cation and regression tree models for spatial prediction of landslide susceptibility. CATENA 151:147–
160. https​://doi.org/10.1016/j.caten​a.2016.11.032
Cheng S, Zhang S, Li L, Zhang D (2018) Water quality monitoring method based on TLD 3D fish tracking
and XGBoost. Math Probl Eng 7:1–12. https​://doi.org/10.1155/2018/56047​40
Chuang YC, Shiu YS (2018) Relationship between landslides and mountain development-integrating geo-
spatial statistics and a new long-term database. Sci Total Environ 622–623:1265–1276. https​://doi.
org/10.1016/j.scito​tenv.2017.12.039
Corominas J et al (2013) Recommendations for the quantitative analysis of landslide risk. Bull Eng Geol
Environ. https​://doi.org/10.1007/s1006​4-013-0538-8
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Dickson ME, Perry GLW (2016) Identifying the controls on coastal cliff landslides using machine-learning
approaches. Environ Modell Softw 76:117–127. https​://doi.org/10.1016/j.envso​ft.2015.10.029
Dragićević S, Lai T, Balram S (2015) GIS-based multicriteria evaluation with multiscale analysis to charac-
terize urban landslide susceptibility in data-scarce environments. Habitat Int 45:114–125. https​://doi.
org/10.1016/j.habit​atint​.2014.06.031
Fabbri AG, Chung CJF, Cendrero A, Remondo J (2003) Is prediction of future landslides possible with a
GIS? Nat Hazards 30:487–503. https​://doi.org/10.1016/j.habit​atint​.-2014.06.031
Ge Y, Dou W, Gu Z et  al (2013) Assessment of social vulnerability to natural hazards in the Yangtze
River Delta, China. Stoch Environ Res Risk Assess 27:1899–1908. https​://doi.org/10.1007/s0047​
7-013-0725-y

13
870 Natural Hazards (2020) 102:851–871

Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning and statistical pre-
diction techniques for landslide susceptibility modeling. Comput Geosci 81:1–11. https​://doi.
org/10.1016/j.cageo​.2015.04.007
Gorsevski PV, Gessler PE, Boll J, Elliot WJ, Foltz RB (2006) Spatially and temporally distributed mod-
eling of landslide susceptibility. Geomorphology 80:178–198. https​://doi.org/10.1016/j.geomo​
rph.2006.02.011
Guha-Sapir D, Vos F, Below R, Ponserre S (2012) Annual disaster statistical review 2011: the numbers
and trends. Centre for Research on the Epidemiology of Disasters (CRED). https​://doi.org/10.13140​
/RG.2.2.10378​.88001​
Gutiérrez F, Parise M, De Waele J, Jourde H (2014) A review on natural and human-induced geohazards
and impacts in karst. Earth Sci Rev 138:61–88. https​://doi.org/10.1016/j.earsc​irev.2014.08.002
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal
20:832–844. https​://doi.org/10.1109/34.70960​1
Hong H, Pourghasemi HR, Pourtaghi ZS (2016) Landslide susceptibility assessment in Lianhua
County (China): a comparison between a random forest data mining technique and bivariate and
multivariate statistical models. Geomorphology 259:105–118. https​://doi.org/10.1016/j.geomo​
rph.2016.02.012
Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines.
CATENA 165:520–529. https​://doi.org/10.1016/j.caten​a.2018.03.003
Kamp U, Growley BJ, Khattak GA, Owen LA (2008) GIS-based landslide susceptibility mapping for the
2005 Kashmir earthquake region. Geomorphology 101:631–642. https​://doi.org/10.1016/j.geomo​
rph.2008.03.003
Kim HG, Lee DK, Park C, Ahn Y, Kil S-H, Sung S, Biging GS (2018) Estimating landslide susceptibil-
ity areas considering the uncertainty inherent in modeling methods. Stoch Environ Res Risk Assess
32:2987–3019. https​://doi.org/10.1007/s0047​7-018-1609-y
Kornejady A, Heidari K, Nakhavali M (2015) Assessment of landslide susceptibility, semi-quantitative
risk and management in the Ilam dam basin, Ilam, Iran. Environ Resour Res 3:85–109. https​://doi.
org/10.22069​/ijerr​.2015.2563
Kornejady A, Ownegh M, Bahremand A (2017) Landslide susceptibility assessment using maximum
entropy model with two different data sampling methods. CATENA 152:144–162. https​://doi.
org/10.1016/j.caten​a.2017.01.010
Li W, Zhang Q, Liu C, Xue Q (2006) Tourism’s impacts on natural resources: a positive case from
China. Environ Manag 38:572–579. https​://doi.org/10.1007/-s0026​7-004-0299-z
Marjanović M, Kovačević M, Bajat B, Voženílek V (2011) Landslide susceptibility assessment using SVM
machine learning algorithm. Eng Geol 123:225–234. https​://doi.org/10.1016/j.engge​o.2011.09.006
Mokhtari M, Abedian S (2019) Spatial prediction of landslide susceptibility in Taleghan basin, Iran.
Stoch Environ Res Risk Assess 33:1297–1325. https​://doi.org/10.1007/s0047​7-019-01696​-w
Pedregosa F, Varoquaux G, Gramfort A et  al (2011) Scikit-learn: machine learning in python. J Mach
Learn Res 12:2825–2830
Pourghasemi HR, Rahmati O (2018) Prediction of the landslide susceptibility: which algorithm, which
precision? CATENA 162:177–192. https​://doi.org/10.1016/j.caten​a.-2017.11.022
Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of fuzzy logic and analytical hierarchy
process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63:965–
996. https​://doi.org/10.1007/s1106​9-012-0217-2
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector
machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci
51:350–365. https​://doi.org/10.1016/j.cageo​.2012.08.023
Qiao X, Du J, Lugli S, Ren J, Xiao W, Chen P, Tang Y (2016) Are climate warming and enhanced
atmospheric deposition of sulfur and nitrogen threatening tufa landscapes in Jiuzhaigou National
Nature Reserve, Sichuan, China? Sci Total Environ 562:724–731. https​://doi.org/10.1016/j.scito​
tenv.2016.04.073
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240(4857):1285–1293. https​://
doi.org/10.1126/scien​ce.32876​15
Tehrany MS, Pradhan B, Jebur MN (2015) Flood susceptibility analysis and its verification using a novel
ensemble support vector machine and frequency ratio method. Stoch Environ Res Risk Assess
29:1149–1165. https​://doi.org/10.1007/s0047​7-015-1021-9
Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2015) Spatial prediction models for shallow
landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial
neural networks, kernel logistic regression, and logistic model tree. Landslides 13:361–378. https​://
doi.org/10.1007/s1034​6-015-0557-6

13
Natural Hazards (2020) 102:851–871 871

Trigila A, Iadanza C, Esposito C, Scarascia-Mugnozza G (2015) Comparison of Logistic Regression and


Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily,
Italy). Geomorphology 249:119–136. https​://doi.org/10.1016/j.geomo​rph.2015.06.001
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999. https​://
doi.org/10.1109/72.78864​0
Vapnik VN (2000) The nature of statistical learning theory. Springer, New York. https​://doi.
org/10.1007/978-1-4757-2440-0
Wang LJ, Guo M, Sawada K, Lin J, Zhang J (2015) Landslide susceptibility mapping in Mizunami City,
Japan: a comparison between logistic regression, bivariate statistical analysis and multivariate adaptive
regression spline models. CATENA 135:271–282. https​://doi.org/10.1016/j.caten​a.2015.08.007
Westen CJV, Castellanos E, Kuriakose SL (2008) Spatial data for landslide susceptibility, hazard, and
vulnerability assessment: an overview. Eng Geol 102:112–131. https​://doi.org/10.1016/j.engge​
o.2008.03.010
Wu CH, Peng C, Li YS, Ayala IA, Chao H, Yi SJ (2018) Seismogenic fault and topography control on the
spatial patterns of landslides triggered by the 2017 Jiuzhaigou earthquake. J Mt Sci 15:793–807. https​
://doi.org/10.1007/s1162​9-017-4761-9
Yang J, Song C, Yang Y, Xu C, Guo F, Xie L (2019) New method for landslide susceptibility mapping sup-
ported by spatial logistic regression and GeoDetector: a case study of Duwen Highway Basin, Sichuan
Province, China. Geomorphology 324:62–71. https​://doi.org/10.1016/j.geomo​rph.2018.09.019
Yao X, Tham LG, Dai FC (2008) Landslide susceptibility mapping based on Support Vector Machine:
a case study on natural slopes of Hong Kong, China. Geomorphology 101:572–582. https​://doi.
org/10.1016/j.geomo​rph.2008.02.011
Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2015) Landslide susceptibility mapping
using random forest, boosted regression tree, classification and regression tree, and general linear mod-
els and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides
13:839–856. https​://doi.org/10.1007/s1034​6-015-0614-1
Zhu X, Xu Q, Tang M, Nie W, Ma S, Xu Z (2017) Comparison of two optimized machine learning models
for predicting displacement of rainfall-induced landslide: a case study in Sichuan Province, China. Eng
Geol 218:213–222. https​://doi.org/10.1016/-j.engge​o.2017.01.022
Zou Q, Zhou J, Zhou C, Song L, Guo J (2013) Comprehensive flood risk assessment based on set pair anal-
ysis-variable fuzzy sets model and fuzzy AHP. Stoch Environ Res Risk Assess 27(2):525–546. https​://
doi.org/10.1007/s0047​7-012-0598-5

Publisher’s Note  Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Affiliations

Juan Cao1 · Zhao Zhang1 · Jie Du2 · Liangliang Zhang1 · Yun Song3 · Geng Sun4


1
State Key Laboratory of Earth Surface Processes and Resource Ecology/MEM&MoE, Key
Laboratory of Environmental Change and Natural Hazards, Beijing Normal University,
Beijing 100875, China
2
Administration of Jiuzhaigou National Park of China, Zhangzha, China
3
Sichuan Geological and Mineral Bureau Regional Geological Survey Team, Chengdu, China
4
China‑Croatia “Belt and Road” Joint Laboratory on Biodiversity and Ecosystem Services,
Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China

13

You might also like