Computers & Geosciences: J.N. Goetz, A. Brenning, H. Petschko, P. Leopold

Computers & Geosciences 81 (2015) 1–11
Contents lists available at ScienceDirect
Computers & Geosciences

journal homepage: www.elsevier.com/locate/cageo
Evaluating machine learning and statistical prediction techniques for

landslide susceptibility modeling
J.N. Goetz a,b,d,n, A. Brenning d,b, H. Petschko c, P. Leopold a
a
Health and Environment Department, AIT-Austrian Institute of Technology GmbH, Konrad-Lorenz-Straße 24, 3430 Tulln, Austria
b
Department of Geography and Environmental Management, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1
c
Department of Geography and Regional Research, University of Vienna, Universitätsstraße 7, A-1010 Vienna, Austria
d
Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, Germany
art ic l e i nf o a b s t r a c t
Article history: Statistical and now machine learning prediction methods have been gaining popularity in the field of
Received 6 June 2014 landslide susceptibility modeling. Particularly, these data driven approaches show promise when tack-
Received in revised form ling the challenge of mapping landslide prone areas for large regions, which may not have sufficient
14 January 2015
geotechnical data to conduct physically-based methods. Currently, there is no best method for empirical
Accepted 16 April 2015
Available online 20 April 2015
susceptibility modeling. Therefore, this study presents a comparison of traditional statistical and novel
machine learning models applied for regional scale landslide susceptibility modeling. These methods
Keywords: were evaluated by spatial k-fold cross-validation estimation of the predictive performance, assessment of
Statistical and machine learning techniques variable importance for gaining insights into model behavior and by the appearance of the prediction (i.e.
Landslide susceptibility modeling
susceptibility) map. The modeling techniques applied were logistic regression (GLM), generalized ad-
Spatial cross-validation
ditive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest
Variable importance
classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant
analysis (BPLDA). These modeling methods were tested for three areas in the province of Lower Austria,
Austria. The areas are characterized by different geological and morphological settings.
Random forest and bundling classification techniques had the overall best predictive performances.
However, the performances of all modeling techniques were for the majority not significantly different
from each other; depending on the areas of interest, the overall median estimated area under the re-
ceiver operating characteristic curve (AUROC) differences ranged from 2.9 to 8.9 percentage points. The
overall median estimated true positive rate (TPR) measured at a 10% false positive rate (FPR) differences
ranged from 11 to 15pp. The relative importance of each predictor was generally different between the
modeling methods. However, slope angle, surface roughness and plan curvature were consistently highly
ranked variables. The prediction methods that create splits in the predictors (RF, BPLDA and WOE) re-
sulted in heterogeneous prediction maps full of spatial artifacts. In contrast, the GAM, GLM and SVM
produced smooth prediction surfaces. Overall, it is suggested that the framework of this model evalua-
tion approach can be applied to assist in selection of a suitable landslide susceptibility modeling tech-
nique.
& 2015 Elsevier Ltd. All rights reserved.
1. Introduction selection of quantitative methods applied for spatial modeling and

predicting landslide susceptibility (Chung and Fabbri, 1999; Guz-
Mitigating the impacts of landslides remains a great challenge zetti et al., 1999; Dai et al., 2002; van Westen et al., 2003; Bren-
for land-use planners and policy makers. Landslide susceptibility ning, 2005; Goetz et al., 2011; Pradhan, 2013). Quantitative
models, which are used to derive maps of locations prone to methods for modeling landslide susceptibility can be generalized
landslides, can support and enhance spatial planning decisions into physically-based and statistical approaches (Soeters and van
focused on reducing landslides hazards. Currently there is a vast Westen, 1996; van Westen et al., 1997). This study focuses on
statistical and machine learning techniques, which have become
common approaches for modeling landslide susceptibility over
n
Corresponding author at: Department of Geography, Friedrich Schiller Uni-
large regions (van Westen et al., 1997; Brenning, 2005; Petschko
versity Jena, Loebdergraben 32, 07743 Jena, Germany. et al., 2014, Micheletti et al., 2014).
E-mail address: jason.goetz@uni-jena.de (J.N. Goetz). The basic assumption of the empirical approach is that future
http://dx.doi.org/10.1016/j.cageo.2015.04.007
0098-3004/& 2015 Elsevier Ltd. All rights reserved.
2 J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11
landslides are likely to occur in similar conditions of the past techniques can be challenging. A standardized approach for com-
(Varnes, 1984). Typically, a range of predictors (i.e., independent paring the relative variable importance of different modeling sta-
variables) is used to represent landslide preparatory conditions tistical and machine learning techniques for geospatial problems
(van Westen et al., 2008). The exact relationship of the predictors was demonstrated by Brenning et al. (2012b). They assessed
to the response (i.e., landslide presence/absence) is not always variable importance using internal estimates of changes in error
well known a priori. In most cases, the predictors are proxies for rates by randomly permuting predictors in out-of-bag samples
conditions and processes that are difficult to measure across large (Breiman, 2001; Strobl et al., 2007).
regions (Pachauri and Pant, 1992; Guzzetti et al., 1999; Goetz et al., There are many criteria that can be considered for model se-
2011). The susceptibility model output is a prediction surface or lection in the context of landslide susceptibility (Brenning, 2012a).
map that spatially represents the distribution of predicted values, This study focuses on one particular aspect, which is the predictive
usually as probabilities distributed across grid cells. performance. Therefore, a rigorous assessment of prediction per-
The freedom of choice to decide which modeling method is formance is performed on various statistical and machine learning
most suitable for a particular application is challenging. Numerous techniques in an attempt to determine the ‘best’ predictive model.
comparisons of susceptibility modeling methods have been con- The modeling techniques include logistic regression (GLM), gen-
ducted; yet no single best method has been identified (Brenning, eralized additive models (GAM), weights of evidence (WOE), the
2005; Yesilnacar and Topal, 2005; Lee et al., 2007; Yilmaz, 2009, support vector machine (SVM), random forest classification (RF),
2010; Yalcin et al., 2011; Goetz et al., 2011; Pradhan, 2013). The and bootstrap aggregated classification trees (bundling) with pe-
search for the optimal susceptibility modeling method is a com- nalized discriminant analysis (BPLDA). The importance of pre-
plicated one and should not only consider model accuracy. Ro- dictor variables in each model is also analyzed to demonstrate
bustness to sampling variation and adequacy to describe processes how a standard measure of variable importance can be applied to
associated with landslides are also crucial model properties communicate and compare model behavior, even when a model is
(Frattini et al., 2010). considered ‘black box’. The main objective of this paper is to de-
The simplest approach to select an optimal model for predic- monstrate an approach to make a rigorous comparison of landslide
tion is to compare the error rates estimated from cross-validation, susceptibility models for the purpose of spatial prediction.
where the modeling method with the lowest error estimate is
determined as the best one to use. This assessment on the pre-
diction performance is also viewed as essential for a model to have 2. Materials and methods
any practical or scientific significance (Chung and Fabbri, 2003;
Guzzetti et al., 2006). There are a variety of measures to assess the 2.1. Study area
accuracy of landslide susceptibility models. Common ones are
derived from success rate curves, prediction rate curves (Chung Multiple areas of interest (AOIs) were selected to observe
and Fabbri, 2003) or receiver operating characteristic (ROC) curves model behavior under different landslide conditions. The model-
(Brenning, 2005; Beguería, 2006; Gorsevski et al., 2006; Frattini ing techniques were tested on AOIs that were each 50 km2 and
et al., 2010). It is necessary to carefully select a suitable perfor- within the province of Lower Austria (Fig. 1). The Molasse AOI
mance measure. Ideally, this measure should communicate per- (Fig. 1a) is located in a relatively low lying basin (o 300 m a.s.l). It
formance in the context of the model application (Brenning, mainly consists of sand and clay sediments, sandstones, clays-
2012a). Performance should also be assessed using test sets that tones, and marls. These bedrock materials can be covered by
are independent from the training set used to build the prediction Quaternary gravels and eolian sediments (loess). Deep-seated and
model, resampling-based estimation methods such as cross-vali- shallow landslides occur in this area. The Austroalpine AOI
dation being the state of the art (Brenning, 2012a): cross-valida- (Fig. 1b), which includes the Upper Austroalpine lithology units, is
tion utilizes the entire dataset for training and testing the model. made up of predominately steep terrain, and has the highest ele-
The ability to communicate model behavior is a desirable vations in Lower Austria (1000–2000 m a.s.l). The lithology is
quality for landslide susceptibility models (Brenning, 2012a). dominated by limestone and dolomite rock, with some inter-
Generally speaking, users feel more comfortable in the practical bedded strata of claystone and marl. Landslides in the Aus-
application of a model if they understand how the model works. troalpine area are typically shallow. Generally, the Flysch AOI
The ability of a model to adequately describe the system behavior (Fig. 1c) is very susceptible to landslide activity (Petschko et al.,
can be assessed by determining how well the predictors represent 2014). This low mountain region has exceptionally undulating
the processes associated with landslides (Frattini et al., 2010). In terrain, and consists of sedimentary rocks that are made up of
statistical methods, this is relatively straight forward compared to layers of sandstones, marls and claystones. The main triggers for
machine learning models. The model coefficients from generalized landslides in Lower Austria are intense rainfall and rapid snow-
linear models can be used to evaluate the relative importance of melt events (Schwenk, 1992; Schweigl and Hervás, 2009). For
landslide predictors (Dai and Lee, 2002; Ayalew and Yamagishi, more details on the lithology and geology of Lower Austria, please
2005). Variable importance has also been estimated for regression refer to Gottschling (2006) and Wessely (2006).
models by observing the relative frequencies of variable selection
when an automatic stepwise variable selection method has been 2.2. Landslide inventory
applied and tested with cross-validation (Brenning, 2009; Goetz
et al., 2011; Petschko et al., 2014). In contrast, the internal me- The landslides in this analysis are a subset of an inventory for
chanisms defining the representation of response by the pre- Lower Austria that has been previously published by Petschko
dictors are difficult to interpret in machine learning models be- et al. (2014), which consists of mapped initiation areas for deep-
cause of their ‘black box’ nature. Micheletti et al. (2014) demon- seated and shallow landslides. These landslides were mapped in a
strated how some feature selection properties of different machine geographic information system (GIS) using topographic derivatives
learning techniques can be implemented to assess the relative (e.g. hillshade and slope angle maps) of an airborne laser scanner
importance of variables for landslide susceptibility modeling. (ALS) digital terrain model (DTM) with a 1 m 1 m spatial re-
However, since their approach applied features selection methods solution, which was acquired between 2006 and 2009. The general
only relevant to the corresponding machine learning technique, procedure for mapping these landslides was similar to Schulz
making comparisons of variable importance with other modeling (2004, 2007). This inventory consists of points (single grid cells)
J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11 3
Fig. 1. A lithological map of Lower Austria indicating (black rectangles) AOIs and landslide inventories.
that mark the landslide scarp. Only one point was mapped per index (TWI; Beven and Kirkby, 1979), slope aspect and surface
landslide to provide equal treatment of large and small landslide roughness (SDS) were used as predictors in this study. These were
occurrences, to reduce the effect of spatial autocorrelation be- derived from a 10 m 10 m spatial resolution ALS-DTM, which is
tween observations, to increase mapping effectiveness and to also the resolution the modeling in the analysis was completed in.
avoid uncertainties in mapping landslide boundaries (Petschko All except surface roughness were processed from the DTM with
et al., 2014). Further details on the landslide inventory and its SAGA GIS (Conrad, 2006). Catchment area, which was extremely
quality can be found in Petschko et al. (2013, 2014). The numbers skewed, was logarithmically transformed. Slope aspect was sepa-
of landslides for the three AOIs used in this study are 261 land- rated into the ‘east-exposedness’ (sine transformation) and ‘north-
slides in the Molasse area, 193 in the Austroalpine area and 285 in exposedness’ (cosine transformation; Brenning and Trombotto,
the Flysch area. 2006). Surface roughness can be used as variable that may dis-
tinguish between types of landslide activity (Glenn et al., 2006).
2.3. Predictor variables We define surface roughness as the variation of slope angle for a
given scale for this study (Atkinson and Massari, 1998). It was
Terrain analysis commonly forms the basis of quantitative calculated as the standard deviation of slope (SDS) in a 3 3
landslide modeling (Dai and Lee, 2002; Gorsevski et al., 2006; Van moving window. The predictor variables are summarized in Ta-
den Eeckhaut et al., 2006; Muenchow et al., 2012; Petschko et al., ble 1. Predictor variables sensitive to temporal changes, e.g. land-
2014). Terrain attributes derived from elevation models function as use and rainfall, were not included in the analysis because the
surrogates for surface processes and geophysical site conditions to landslide inventory does not have information on when the
simplify complex meaningful geomorphological relationships landslides were triggered (Petschko et al., 2014). Therefore, a
(Pachauri and Pant, 1992; Moore et al., 1991; Lineback Gritzner meaningful empirical relationship to such temporally sensitive
et al., 2001). Goetz et al. (2011) demonstrated that empirical predictors cannot be made.
models based on terrain attributes alone could outperform tradi- In addition to the terrain attributes, lithological units (map
tional physically based models such as SHALSTAB and the factor of scale 1:200,000; Schnabel, 2002) were included as indicator
safety. The terrain attributes slope angle, elevation, profile curva- variables. The strong generalization provided by this map scale is
ture, plan curvature, catchment area (C. area), catchment height (C. likely to result in a reduced contribution of lithology to the pre-
height), convergence index (Conv. Ind.), topographic wetness dictive performance.
Table 1
Descriptive statistics (median and interquartile range, in parentheses) of the predictor variables in each AOI and the sample size n for landslide and non-landslide points.
Variable Median (IQR) for landslide/non-landslide points
Flysch (n¼ 285) Molasse (n¼261) Austroalpine (n¼ 193)
Slope angle (deg) 21 (9)/15 (8) 15 (7)/7 (5) 28 (8)/27 (12)

Elevation (m) 507 (95)/541 (124) 291 (28)/310 (47) 537 (167)/666 (185)
Profile curv. (10 3 m 1) 4.8 (21.4)/0.5 (6.8) 0.4 (18.5)/0.0 (2.8) 9.7 (23.7)/ 0.6 (9.1)
Plan curv. (10 3 m 1) 6.1 (19.5)/0.4 (6.0) 2.1 (12.5)/0.2 (3.2) 11.6(24.6)/0.7 (9.1)
C. area (log10 m2) 3.4 (0.4)/3.2 (0.5) 3.0 (0.4)/3.0 (0.5) 3.3 (0.5)/3.0 (0.5)
C. height (m) 32 (24)/21 (22) 9 (8)/8 (10) 45 (37)/30 (40)
Conv. Ind. 5.1 (18.7)/1.3 (25.4) 3.2 (23.1)/2.3 (28.5) 5.5 (18.2)/4.2 (25.2)
TWI 8.5 (1.3)/8.6 (1.5) 8.5 (1.5)/9.4 (2.3) 8.2 (1.1)/7.7 (1.4)
South( þ1)–north( 1) 0.45 (1.24)/ 0.11 (1.50) 0.65 (1.33)/0.40 (1.49) 0.25 (1.3)/0.23 (1.6)
East( þ1)–west( 1) 0.19 (1.14)/ 0.03 (1.22) 0.15 (1.02)/ 0.01 (1.21) 0.50 (0.95)/0.09 (1.27)
SDS 2.5 (1.9)/1.5 (1.5) 2.8 (2.6)/0.86 (1.4) 2.4 (2.1)/1.6 (1.9)
2.4. Susceptibility modeling techniques The support vector machine (SVM) is a machine learning
technique that is based on discrimination of classes in a high-di-
Six statistical and machine learning techniques are compared mensional feature space that is generated through nonlinear
in this study: a generalized linear model (GLM) with stepwise transformations of the predictors (Vapnik, 1998). In this high-di-
variable selection (Hosmer and Lemeshow, 2000); the generalized mensional space, a decision hyperplane is computed to separate
additive model with stepwise variable selection (GAM; Hastie and prediction classes. Brenning (2005) demonstrated the potential
Tibshirani, 1990); weights of evidence (WOE; Bonham-Carter, use of SVM for susceptibility modeling. Our implementation of
1994); the support vector machine (SVM; Moguerza and Muñoz, SVM used the default parameter settings of the R package e1071;
2006); random forest (RF; Breiman, 2001); and bootstrap ag- the regularization parameter was C ¼1, and the kernel bandwidth
gregated classification trees (bundling) with penalized linear dis- was γ ¼p 1, where p is the number of predictors. SVM parameter
criminant analysis (BPLDA; Hastie et al., 1995; Hothorn and Lau- tuning was not performed since this process does not necessarily
sen, 2005). While each of these methods could potentially be used improve model performance and may result in poorly defined, or
with a variety of settings and procedures for model selection, we highly variable optimal parameters when comparing different
choose configurations that we consider typical for most of their cross-validation repetitions (Brenning et al., 2009, 2012).
applications. Classification trees are a nonlinear technique for predicting a
The GLM with a logistic link function, or logistic regression, is response using a set of binary decision rules that determine class
the most common statistical technique for prediction of landslide assignment based on the predictors (Breiman et al., 1984). Random
susceptibility (Brenning, 2005). It was first applied to landslide forest (RF) is an ensemble technique that utilizes many classifi-
susceptibility modeling by Atkinson and Massari (1998) and cation trees (a ‘forest’) to stabilize the model predictions (Breiman,
Guzzetti et al. (1999). In contrast, generalized additive models 2001). These trees are fitted to resamples of the observations that
(GAM) have recently been applied for landslide susceptibility are selected randomly with replacement (bootstrap resampling).
mapping (Goetz et al., 2011; Muenchow et al., 2012). A GAM is a Each decision of the tree is furthermore based on randomly se-
semi-parametric nonlinear extension of a GLM (Hastie and Tib- lected predictors. The prediction of class assignment is determined
shirani, 1990). In our application of the GLM and GAM we use by the majority voting among all trees, and the proportion of trees
forward–backward stepwise variable selection based on the in the ensemble that predict landslide presence can be used as an
Akaike Information Criterion (AIC; Akaike, 1974). index of landslide susceptibility. RF was recently applied for
Weights of evidence (WOE) is a non-linear statistical technique landslide susceptibility modeling by Ließ et al. (2011).
based on the log-linear form of the Bayesian probability model Bundling is another ensemble classification tree technique
(Bonham-Carter, 1994). Early application of WOE for landslide (Hothorn and Lausen, 2005). Like RF, bundling uses a bootstrap-
susceptibility modeling was completed by Lee et al. (2002). The aggregation approach; however, an ancillary classifier is trained on
weights, which represent the statistical relationship of the pre- the part of the training set that is not included in the bootstrap
dictor variable to the response, are defined as the combination of resample. This ancillary classifier is then used as an additional
the natural logarithm of relative risk-the ratio of conditional predictor in constructing a tree. In this study we use penalized
probabilities for the presence to absence of a response to a set of linear discriminant analysis (PLDA) as the ancillary classifier with
discrete, categorical variables. The conditional probabilities are the bundling approach (BPLDA). PLDA is a discriminant analysis
calculated using Bayes' theorem. This technique requires con- technique designed for high-dimensional data and correlated
tinuous variables to be classified into a set of categories. We use an predictors. It avoids overfitting by applying smoothing constraints
automatic approach that ‘bins’ the data into quartiles, which is on coefficients of the predictor variables (Hastie et al., 1995).
believed to provide a reasonable trade-off between model flex- Bundling with PLDA has not yet been applied for landslide sus-
ibility (more bins) and sufficient data availability for estimating ceptibility modeling; however, Brenning (2009) utilized it for an-
the weights. The WOE method assumes predictor variables are other geomorphological classification context, and Bundling with
conditionally independent (Bonham-Carter, 1994). Instead of sta- other ancillary classifiers was included in a model comparison by
tistically testing these assumptions (Bonham-Carter, 1994; Agter- Brenning (2005).
berg and Cheng, 2002), which is rarely done in practical applica-
tions and would require additional questionable assumptions such 2.4.1. Assessing prediction performance
as spatial independence, we use this technique as a purely pre- The performances of the susceptibility models were estimated
dictive method and interpret predicted ‘probabilities’ only as a with a repeated k-fold spatial cross-validation approach (Brenning
relative index of susceptibility (Neuhäuser and Terhorst, 2007). et al., 2012). This approach is similar to k-fold cross-validation,
where the data is randomly partitioned into k disjoint sets, and Therefore, the SVI values range from 0 to 1, 0 being low relative
one set at a time is used for model testing while the combined importance, and 1 being high relative importance.
remaining k 1 sets are used for model training (e.g., James et al.,
2013). However, instead of partitioning the dataset into k random 2.4.3. Mapping landslide susceptibility
subsets, spatial cross-validation splits the data into spatially dis- Maps of the model predictions were produced for each AOI and
joint sub-areas. In this study, these partitions were formed with modeling technique to facilitate a visual comparison of model
the k-means clustering algorithm (Ruß and Brenning, 2010). The outputs. Since the models were fitted using a 1:1 sampling ratio of
estimation of model performance was repeated 20 times using landslide to non-landslide points, the predicted unadjusted prob-
k ¼5 spatial cross-validation folds. Each model training and testing abilities should be interpreted as relative scores (see Petschko
was based on the commonly applied 1:1 sampling strategy of et al., 2014 for possible adjustments). Since these models are ty-
presence to absence of landslide initiation (Heckmann et al., 2014). pically classified into levels of susceptibility, the probabilities were
Other sampling strategies have been investigated by Heckmann classified into five classes representing the relative potential for
et al. (2014) and Regmi et al. (2014) for landslide susceptibility landslide initiation. An approach based on an equal percentage of
modeling using logistic regression; however, they both conclude overall area for each class was applied in this study to facilitate the
that the sampling ratio of presence to absence of landslides does visual comparison of the output predictions (Chung and Fabbri,
not significantly influence the prediction accuracies. 2003). The probabilities were classified based on the 50th, 75th,
The performance measure estimated with spatial cross-vali- 90th and 95th percentiles of the prediction values.
dation was the area under the receiver operating characteristic
(ROC) curve (AUROC). The ROC curve plots all possible true posi-
tive rates (TPR; sensitivity) to the corresponding false positive 3. Results
rates (FPR, 1 – specificity). AUROC values close to 50% indicate no
discrimination, while an AUROC close to 100% indicates perfect 3.1. The predictive performances
discrimination between binary prediction classes. The AUROC is
independent of a specific decision threshold (Zweig and Campbell, The variation in median AUROC model performances between
1993; Beguería, 2006). In addition, to assess the ability of the the models was relatively low for the Flysch and Molasse AOIs. In
models to predict the occurrence of landslide initiation while the Flysch and Molasse AOIs, differences in median AUROC be-
correctly classifying most non-landslide locations as stable, the tween models were only up to 2.9 and 3.6 percentage points (pp;
TPR at a low FPR of 10% (or a high specificity of 90%) was measured Table 2). In contrast, differences in median AUROC between
(Brenning, 2012a). models in the Austroalpine AOI was up to 8.9pp. The tree-based
The differences in model performance were tested for statis- ensemble techniques, RF and BPLDA had the highest model per-
tical significance to show these differences may not only be caused formance based on median AUROC values for all three AOIs
by random variability. The statistical comparison of AUROC and (Figs. 2 and 3, Table 2). RF was overall the highest ranked model
TPR at 10% FPR performances for different susceptibility models based on median AUROC and median TPR at 10% FPR. It achieved
was based on the non-parametric Kruskal–Wallis test for sys- the highest median AUROC for the Flysch (86.3%) and Molasse
tematic differences among a set of variables. The Wilcoxon–Mann– (93.0%) areas, and lowest IQR of AUROC for AOIs. RF also had the
Whitney rank sum test was then applied to individually detect highest median TPR at 10% FPR estimation for all AOIs: Flysch
differences in model performances. An adjustment for multiple 64.3%, Molasse 79.1% and Austroalpine 56.5%. The model perfor-
comparisons was applied using the Benjamini–Hochberg proce- mance of BPLDA was similar to RF and always achieved a top three
dure to control the false discovery rate, i.e. the expected propor- estimation of median AUROC and median TPR at 10% FPR for all
tion of falsely rejected null hypotheses among all rejected null AOIs, usually just behind RF. The rank in performances of WOE,
hypotheses (FDR; Benjamini and Hochberg, 1995; Brenning, 2009). SVM, and GAM was much more variable than RF and BLPDA. GAM
The modeling and statistical analysis was conducted entirely in had the second highest median TPR at 10% FPR in the Flysch area
R, a free software environment for statistical computing (version (61.5%); however, it had the second lowest median TPR at 10% FPR
3.0.0; R Development Core Team, 2013), with the contributed for the remaining AOIs. WOE usually performed in the middle of
packages ‘sperrorest’ (Brenning, 2012b), ‘e1071’ (Dimitriadou et al., the pack just behind SVM. The lowest model performance was
2007; Chang and Lin, 2011), ‘gam’ (Hastie, 2009), ‘ipred’ (Peters consistently achieved by the GLM, which includes having the
and Hothorn, 2009), ‘mda’ (Hastie and Tibshirani, 2009), ‘ran- highest estimated AUROC and IQR values of all models in the AOIs.
domForest’ (Breiman and Cutler, 2012), ‘raster’ (Hijmans and van Overall, the differences in AUROC performance were not only
Etten, 2013), and ‘ROCR’ (Sing et al., 2009). small, but in many cases they were insignificant (Table 2). By es-
timating significance of AUROC difference, with a pairwise com-
2.4.2. Estimating which predictors are important parison, it was observed in all areas that there were no significant
A permutation-based variable accuracy importance approach, systematic differences in AUROC values for RF and BPLDA, and
which computes how much a performance measure deteriorates SVM and WOE. Also, it was observed that the GLM AUROC per-
when an individual variable is randomly permuted, was used to formance was always significantly lower than the other models in
assess variable importance across all model types (i.e., ‘messed all AOIs.
up’; Strobl et al., 2007). As applied in a geomorphological classi-
fication study by Brenning et al. (2012), we measured variable 3.2. A ranking of predictor importance
importance in a spatial cross-validation context. This permutation-
based spatial variable importance (SVI) measure was calculated for The ranking of relative variable importance was substantially
the change in median AUROC values estimated by spatial cross- different for all modeling techniques and AOIs. However, there was
validation. In this approach, one predictor variable was permuted some consistency in the set of highest ranked predictors for each
ten times for each test partition for a total of 1000 permutations area (Table 3). Slope angle, surface roughness (SDS) and plan
per predictor, and the AUROC of the prediction for each permu- curvature were the only variables that were ranked in the top five
tation was measured and compared to the unperturbed predictive based on maximum SVI in all of the AOIs (SVI40.28). The highest-
performance. The SVI was standardized for each model by dividing ranked predictor based on maximum SVI was always slope angle
it by the highest value obtained for an individual predictor. (SVI¼1.00). The most consistently lower-ranked variables
Table 2
Model performance estimated with 20-repeated 5-fold spatial cross-validation. The median summarizes the central tendency of the estimated performance and the in-
terquartile range (IQR) its spread. The systematic difference (Δ), by percentage points (pp), and the associated significance of model performance (AUROC and TPR at 10%
FPR) were measured for pairwise comparisons. The significance of systematic differences was based on the adjusted p-values (to control the FDR) corresponding to the
Wilcoxon–Mann–Whitney tests.
Model AUROC Model TPR at 10% FPR
Median (IQR) Δ AUROC p-Values Median (IQR) Δ TPR p-Values
FLYSCH
GLM 83.4 (6.4) GLM 53.2 (25.8)
SVM 84.6 (6.0) þ 1.3 p ¼ 0.030n WOE 54.4 (22.5) þ 1.2 p ¼ 0.025n
WOE 84.9 (6.0) þ 0.3 p ¼ 0.468 SVM 58.9 (18.8) þ 4.6 p ¼ 0.012n
GAM 85.1 (5.4) þ 0.2 p ¼ 0.155 BPLDA 59.2 (12.5) þ 0.3 p ¼ 0.840
BPLDA 85.3 (5.2) þ 0.1 p ¼ 0.471 GAM 61.5 (25.3) þ 2.2 p ¼ 0.980
RF 86.3 (4.1) þ 1.0 p ¼ 0.094 RF 64.3 (18.7) þ 2.8 p ¼ 0.034n
MOLASSE
GLM 89.4 (7.7) GLM 64.1 (28.3)
GAM 91.9 (7.3) þ 2.5 p o 0.001nn GAM 75.2 (29.9) þ 11.1 p ¼ 0.012n
WOE 92.1 (7.0) þ 0.2 p ¼ 0.358 SVM 76.5 (33.1) þ 1.3 p ¼ 0.911
SVM 92.3 (6.9) þ 0.3 p ¼ 0.900 BPLDA 76.7 (25.3) þ 0.2 p ¼ 0.747
BPLDA 92.5 (5.7) þ 0.1 p ¼ 0.809 WOE 77.8 (35.7) þ 1.1 p ¼ 0.911
RF 93.0 (4.9) þ 0.5 p ¼ 0.707 RF 79.1 (19.8) þ 1.3 p ¼ 0.747
AUSTROALPINE
GLM 74.7 (12.1) GLM 43.9 (26.1)
GAM 76.6 (9.9) þ 2.0 p ¼ 0.092 GAM 46.6 (22.0) þ 2.7 p ¼ 0.499
WOE 80.1 (9.0) þ 3.5 p ¼ 0.015n WOE 50.0 (20.7) þ 3.4 p ¼ 0.103
SVM 80.4 (9.2) þ 0.3 p ¼ 0.878 SVM 50.0 (23.5) þ 0.0 p ¼ 0.464
RF 83.5 (5.5) þ 3.2 p o 0.001nn BPLDA 55.9 (18.7) þ 5.9 p ¼ 0.008nnn
BPLDA 83.6 (6.8) þ 0.0 p ¼ 0.977 RF 56.5 (16.2) þ 0.6 p ¼ 0.926
Significance codes for adjusted p-values: p o 0.001 “nnn”, p o 0.01 “nn”, p o0.05 “n”, p o0.1 “ ”, p 40.1 “ ”.
(SVIr0.15) were the convergence index (Conv. Ind), south-north mean median-AUROC of 84.9 and a mean of 5 variables with an
slope aspect and catchment height (C. height). Slope angle (max. SVI4 0.15. The highest AUROC performances (mean median-
SVI ¼1.00), catchment area (C. area; 1.00), and plan curvature AUROC ¼91.9) in the Molasse area had generally the lowest spread
(0.83) were the highest ranked variables in the Flysch Area. In the of variables with SVI 40.15 (mean ¼4).
Molasse area, slope (1.00), surface roughness (0.75) and elevation Correlations between predictor variables were examined with
(0.60) were the highest ranked variables; and in the Austroalpine Spearman rank coefficient (ρSp). In each AOI, catchment area was
area they were slope (1.00), profile curvature and surface rough-
strongly correlated with catchment height, convergence index and
ness (1.00).
TWI (0.59 r|ρSp| r0.85); plan and profile curvature were moder-
There was no pattern related to how SVI was distributed for
ately to strongly correlated (0.59 r|ρSp| r0.85); and catchment
each modeling technique across the three AOIs. However, SVI was
distributed more uniformly for the AOIs with lower AUROC values. height, convergence index and topographic wetness were mod-
The Austroalpine area, which had the lowest AUROC performances erately correlated (0.41 r|ρSp| r0.68). Slope angle was only
(the mean median-AUROC¼79.8), had the largest number of moderately correlated with other predictors, TWI (ρSp ¼ 0.65)
variables with an SVI40.15 (mean¼6). The Flysch area had a and SDS (ρSp ¼0.66), in the Molasse AOI.
Fig. 2. Box-and-whisker plot of area under the receiver operating characteristic curve (AUROC %) estimated for each prediction technique applied to landslide susceptibility
modeling in different AOIs.
Fig. 3. Box-and-whisker plot of true positive rates (TPR %) estimated for each prediction technique at a 10% false positive rate (FPR %) applied to landslide susceptibility
modeling in different AOIs.
3.3. Comparing susceptibility map appearances BPLDA, in contrast, produced more heterogeneous prediction surfaces.
In particular, the RF and BPLDA models had more spatial artifacts than
Modeling techniques whose predictions are a continuous function the other techniques, which make the prediction surface appear rather
of the predictors (GLM, GAM, SVM) had much smoother prediction noisy. Abrupt changes in prediction surface related to actual catego-
surfaces in the landslide susceptibility maps (Fig. 4). WOE, RF and rical predictors (i.e., lithology) were not present in the maps.
Table 3
Spatial variable importance (SVI) based on median AUROC values from spatial cross-validation. The values are standardized relative to the most important predictor variable
for each model.
Variable Rank Max. SVI GAM GLM WOE SVM RF BPLDA
FLYSCH
Slope angle 1 1.00 0.74 0.68 1.00 1.00 1.00 1.00
C. area 2 1.00 1.00 1.00 0.11 0.46 0.19 0.52
Plan curv. 3 0.83 0.28 0.33 0.56 0.40 0.83 0.67
SDS 4 0.70 0.52 0.25 0.61 0.70 0.69 0.51
TWI 5 0.61 0.40 0.61 0.04 0.08 0.12 0.12
Profile curv. 6 0.31 0.02 0.02 0.28 0.07 0.31 0.21
East–west 7 0.17 0.00 0.01 0.00 0.15 0.14 0.17
Elevation 8 0.10 0.08 0.10 0.05 0.08 0.03 0.09
South–north 9 0.09 0.09 0.08 0.05 0.02 0.01 0.00
Conv. Ind. 10 0.06 0.02 0.00 0.07 0.01 0.06 0.05
C. height 11 0.04 0.03 0.00 0.04 0.02 0.04 0.01
Lithology 12 0.03 0.00 0.00 0.03 0.00 0.00 0.01
MOLASSE
Slope angle 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00
SDS 2 0.75 0.05 0.02 0.75 0.17 0.17 0.12
Elevation 3 0.60 0.25 0.17 0.60 0.25 0.26 0.31
TWI 4 0.49 0.13 0.42 0.49 0.29 0.11 0.14
Plan curv. 5 0.28 0.01 0.02 0.28 0.05 0.07 0.03
Profile curv. 6 0.26 0.01 0.00 0.26 0.02 0.03 0.08
Lithology 7 0.20 0.04 0.06 0.20 0.05 0.02 0.04
C. height 8 0.06 0.00 0.00 0.06 0.02 0.01 0.02
Conv. Ind. 9 0.04 0.00 0.02 0.02 0.04 0.01 0.00
South–north 10 0.03 0.00 0.00 0.03 0.03 0.01 0.02
C. area 11 0.02 0.02 0.02 0.00 0.00 0.01 0.01
East–west 12 0.00 0.00 0.00 0.01 0.01 0.00 0.00
AUSTROAPLINE
Slope angle 1 1.00 1.00 1.00 0.43 0.79 0.39 0.48
Profile curv. 2 1.00 0.03 0.00 1.00 0.49 1.00 1.00
SDS 3 1.00 0.38 0.15 0.44 1.00 0.45 0.26
Plan curv. 4 0.86 0.78 0.86 0.78 0.77 0.80 0.61
Elevation 5 0.53 0.39 0.53 0.14 0.51 0.22 0.20
C. area 6 0.48 0.41 0.43 0.02 0.48 0.10 0.17
TWI 7 0.29 0.22 0.00 0.05 0.29 0.11 0.07
East–west 8 0.18 0.11 0.13 0.18 0.04 0.01 0.00
Conv. Ind. 9 0.15 0.04 0.01 0.04 0.12 0.15 0.08
South–north 10 0.09 0.02 0.09 0.04 0.04 0.03 0.06
C. height 11 0.01 0.00 0.01 0.05 0.16 0.10 0.07
Lithology 12 0.00 0.04 0.01 0.03 0.04 0.00 0.01
The three highest SVI values for each model area are printed in boldface.
Fig. 4. An example of classified landslide susceptibility maps for each prediction technique in the Austroalpine AOI.
4. Discussion configurations may lead to different results, and it is beyond the

scope of this study to evaluate these possible differences. Among
4.1. Evaluating prediction performance the factors that may influence model performances are, in parti-
cular, feature selection procedures (Nguyen and de la Torre, 2010),
With all of the available prediction techniques and methods for model setup (e.g., SVM kernel choice: Moguerza and Muñoz,
model selection, it is always critical to remember that in reality no 2006), sampling design (Mathur and Foody, 2008) or preproces-
one model is correct in a suite of competing models (Elith et al., sing of predictors (Xu et al., 2014).
2002). Therefore, we can only base our criteria and decision to Apart from the GLM, or logistic regression, the AUROC perfor-
select a model on the specific scientific goals of the study (Elith mances of the prediction techniques appear to be very similar.
and Leathwick, 2009). In this paper, the spatial prediction ability of Previous comparison of prediction techniques also found no or
empirically-based landslides susceptibility models was in- only small differences that were significant in prediction of land-
vestigated while using typical model configurations. Clearly, as is slide susceptibility (Yilmaz, 2009, 2010; Goetz et al., 2011). Con-
the case for any complex algorithm, different model or data sequently, we cannot declare which model is best solely on the
performance measures unless they had significant and practical In addition to careful validation of model performance, there
differences. In general, the ability of many machine learning pre- are also qualitative features that could be of importance to the end
diction techniques to represent complex nonlinear relationships user. For example, models with similar prediction performance do
and higher-order interactions are similar. This can result in sta- not necessarily have similar prediction surfaces (Sterlacchini et al.,
tistically insignificant results when comparing their performances 2011). The geographic representation of susceptibility levels may
(Han and Kamber, 2006). affect the way the mapped results are interpreted. For example,
The small differences in AUROC performance can have some isolated grid cells corresponding to a heterogeneous prediction
practical significance depending on the prediction requirements surface, which are predicted as being possibly unstable, may affect
for a particular application (Beguería, 2006). In the case of land- planning decisions (also for surrounding stable terrain). It is gen-
slide risk management, a higher detection rate of landslides in erally easier to clearly define hazardous zones for application in
areas where the detection of the absence of landslides is already planning when the prediction surface is smooth. In practice, the
high can translate to an improved model prediction for high risk appearance of the prediction surface can also influence the end
areas (Goetz et al., 2011; Brenning, 2012a). The TPR at 10% FPR is users perception of the method; including their trust in the model.
an example of an AUROC cutoff measure that can be used to assess Heterogeneous prediction surfaces are sometimes misinterpreted
such a specific prediction requirement (Goetz et al., 2011). For by end users as being associated with a poor prediction of land-
example, the difference between the best and worst median slide susceptibility; which is not necessarily true. In this study, the
AUROC performances was 2.9–9pp, depending on the AOI (Ta- prediction performances of the RF, which does not produce a
ble 2). These differences in median AUROC resulted in an increase smooth prediction surface and is more prone to spatial artifacts
of 11–15pp in the TPR at 10% FPR. That is, 21–29% more landslides (Brenning, 2005, 2012a), was for the majority better at prediction
were being detected in potential high risk areas. than models that produced much smoother prediction surfaces
The TPR at 10% FPR is only one such measure that was decided such as the GLM, GAM, and SVM. It is also important to mention
to present in this paper. The ROC curve can be utilized to weigh the that abrupt changes in the prediction surface can also be caused in
assessment on either high accuracy of predicting the presence of any of the methods investigated in this paper when a categorical
landslides or of identifying the most stable areas (Gorsevski et al., variable, such as lithology, has a strong effect in the model.
2006). More balanced decisions regarding class determination can
be obtained utilizing the ROC curve since the specific accuracy of 4.3. Comparing the importance of predictor variables
various cut-off combinations can be evaluated with this curve.
Therefore, the predictive costs of the susceptibility classes can be A comparison of model behavior was roughly gauged on a
clearly defined (Beguería, 2006; Gorsevski et al., 2006). standardized approach for comparing relative importance of pre-
In addition to evaluating the prediction performance with a dictors for each prediction technique. In this paper, it was ob-
specific accuracy measure, Guzzetti et al. (2006) suggested that served with the SVI measure that the importance of predictors,
the model that is least sensitive to variation in performance given which in this case are based primarily on geomorphic conditions,
different sampling conditions should be considered. We have ac- differed for each AOI. This relationship should be expected because
counted for variation in model performance by reporting the in- each AOI is associated with generally unique geomorphological
terquartile range (IQR) of the cross-validation results. Lower IQR conditions. Petschko et al. (2014) observed a similar finding when
values indicated more robust model performances. Therefore, analyzing variable-selection frequencies of different GAMs applied
when comparing two of the highest performing prediction tech- individually for lithology units in Lower Austria; the importance of
niques, random forest (RF) and bundling with penalized linear variables was different for each lithology unit. It has been well
discriminant analysis (BPLDA), we can suggest that random forest established that the local site conditions have an important role in
was the better technique for landslide initiation prediction based the prediction of landslide susceptibility (van Westen et al., 2003;
on a lower IQR. Sidle and Ochiai, 2006; Lee et al., 2007; Blahut et al., 2010). The
dissimilar rank of predictor variables between each site provided
4.2. Considering other model criteria additional quantitative evidence to support this relationship.
The ranking of variable importance was irregular when com-
Additional criteria can also be used to support decisions in model paring prediction techniques. Yet, the highest ranked variables
selection. This study focused on assessing the performance of models were generally consistent. Slope angle, surface roughness, and
for spatial prediction. However, in addition to spatial prediction plan curvature were the most common variables ranking high in
performance, the ability to interpret the model for statistical in- terms of relative importance in all AOIs (Table 3). The availability
ference (i.e. spatial analysis) to gain insight into landslide distribution of geologic data was only important for the prediction using WOE
characteristics may also be important (Brenning, 2012a). Therefore, a in the Molasse area, which had the most heterogeneity in lithol-
statistically valid model is critical; good predictive properties are less ogy. The lack of importance of geology may be attributed to re-
important in this situation (Brenning, 2005, 2012a). Interpretation of gression dilution bias related to the thematic coarseness of the
models is more complex for machine learning algorithms, which are lithology (Carrol et al. 2012); more detailed, in terms of scale,
generally considered to be ‘black-box’ models (Elith and Leathwick, geologic data may be required for the size of the AOIs. Regmi et al.
2009). The GLM and GAM, and with some limitations also WOE, are (2010) suggested only a handful of variables may be sufficient for
prediction techniques that provide easily interpretable results that prediction of landslide susceptibility. The uneven distribution of
can shed light on landslide conditioning factors (Lee et al., 2002; the SVI values observed in this study may indicate just that.
Brenning, 2005; Regmi et al., 2010; Goetz et al., 2011). Especially the The different ranking of variables between prediction techni-
integration of physically motivated predictors or model components ques should be expected. Yesilnacar and Topal (2005) had made a
can facilitate the identification of possible causal mechanisms (Goetz similar observation when comparing a GLM to artificial neural
et al., 2011). The machine learning techniques, such as SVM, RF and networks (ANN), a machine learning technique, for landslide
BPLDA, have been developed especially for prediction. They have the susceptibility modeling. Although the prediction techniques had
advantage of automatically detecting interactions between pre- similar performances, they are unique in their individual ap-
dictors; thus, the prediction accuracy typically exceeds more con- proaches to model construction and establishing the relevant re-
ventional techniques when complex interactions are present (Elith lationships between the predictor variables and landslide initia-
et al., 2006). tion. Understanding of these differences is essential to select a
suitable prediction technique for a specific study goal (Brenning, Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Re-
2009, 2012a). gression Trees. CRC Press, Wadsworth.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32.
Breiman, L., Cutler, C., 2012. Breiman and Cutler's random forests for classification
and regression (randomForest). R package version 4.6-7, R port by A. Liaw & M.
5. Conclusions Wiener.
Brenning, A., 2005. Spatial prediction models for landslide hazards: review, com-
parison and evaluation. Nat. Hazards Earth Syst. Sci. 5, 853–862.
Our study demonstrates that there was generally little differ- Brenning, A., Trombotto, D., 2006. Logistic regression modeling of rock glacier and
entiation in prediction performance between statistical and ma- glacier distribution: topographic and climatic controls in the semi-arid Andes.
Geomorphology 81 (1), 141–154.
chine learning landslide susceptibility modeling techniques ap- Brenning, A., 2009. Benchmarking classifiers to optimally integrate terrain analysis
plied to the AOIs in Lower Austria. This result underlines that even and multispectral remote sensing in automatic rock glacier detection. Remote
when conducting model comparisons with a clear objective, such Sens. Environ. 113, 239–247.
Brenning, A., 2012a. Improved spatial analysis and prediction of landslide sus-
as prediction performance, understanding the abilities and lim- ceptibility: practical recommendations. In: Eberhardt E. Froese, C., Turner A.K.,
itation remains critical for model selection. In terms of pure pre- Leroueil S. (Eds.). Landslides and Engineered Slopes: Protecting Society through
diction performance, the RF and BPLDA modeling techniques were Improved Understanding. Proceedings of the 11th International and 2nd North
American Symposium on Landslides and Engineered Slopes, vol. 1., Banff, Ca-
the best. The most interpretable method and visually appealing nada, 3–8 June 2012, CRC Press/Balkema Leiden, the Netherlands, pp. 789–794.
(i.e. had a smooth prediction surface) was the GAM, which sig- Brenning, A., 2012b. Spatial cross-validation and bootstrap for the assessment of
nificantly performed better than the GLM. The SVM also had a prediction rules in remote sensing: the R package ‘sperrorest’ 2012. IEEE In-
ternational Geoscience and Remote Sensing Symposium (IGARSS), 23–27 July
smooth prediction surface, but is generally difficult to interpret.
2012, pp. 5372–5375.
However, the SVM, RF and BPLDA may be particularly useful for Brenning, A., Long, S., Fieguth, P., 2012. Detecting rock glacier flow structures using
high-dimensional prediction problems where a large number of Gabor filters and IKONOS imagery. Remote Sens. Environ. 125, 227–237.
Carrara, A., Guzzetti, F., Cardinali, M., Reichenbach, P., 1999. Use of GIS technology in
highly correlated predictor variables are present.
the prediction and monitoring of landslide hazard. Nat. Hazards 20 (2–3),
Overall, it is recommended that model evaluation should be 117–135.
tied closely to the goals of the study. The framework of this paper Carrol, R.J., Rupert, D., Stefanski, L.A., Crainiceanu, C.M., 2012. Measurement Error in
Nonlinear Models: A Modern Perspective, 2nd edn. CRC Press, New York, p. 488.
is designed to assist in the evaluation of susceptibility modeling
Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM
techniques to enhance a user's decision on which method is most Trans. Intell. Syst. Technol. (TIST) 2 (3), 1–27.
suitable for a particular application. Chung, C.J.F., Fabbri, A.G., 1999. Probabilistic prediction models for landslide hazard
mapping. Photogramm. Eng. Remote Sens. 65 (12), 1389–1399.
Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide
hazard mapping. Nat. Hazards 30 (3), 451–472.
Acknowledgements Conrad, O., 2006. SAGA – Program structure and current state implementation. In:
Böhner, J., McCloy, K.R., Strobl, J. (Eds.), SAGA — Analysis and Modelling Ap-
plications, vol. 115. Göttinger Geographische Abhandlungen, pp. 39–52.
The data for this research was provided by the Provincial Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling
Government of Lower Austria from the MoNOE project – Method using GIS Lantau Island, Hong Kong. Geomorphology 42, 213–228.
development for landslide susceptibility maps for Lower Austria. Dai, F.C., Lee, C.F., Ngai, Y.Y., 2002. Landslide risk assessment and management: an
overview. Eng. Geol. 64 (1), 65–87.
The authors are grateful for contributions of students from the Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., 2007. e1071: Misc
Department of Geography and Regional Research, University of Functions of the Department of Statistics (e1071) TU Wien. R package version
Vienna, under the supervision of Dr. Rainer Bell and Dr. Thomas 1.6-1.
Elith, J., Burgman, M.A., Regan, H., 2002. Mapping epistemic uncertainties and va-
Glade, and colleagues at the Austrian Institute of Technology gue concepts in predictions of species distribution. Ecol. Modell. 157 (2),
GmbH, (AIT) in the Health and Environment department, for 313–329.
construction of the landslide inventory. R. Cabrera implemented Elith, J., et al., 2006. Novel methods improve prediction of species' distributions
from occurrence data. Ecography 29, 129–151.
the WOE method during a stay at the AIT that was partly sup- Elith, J., Leathwick, J.R., 2009. Species distribution models: ecological explanation
ported by a Natural Sciences and Engineering Research Council of and prediction across space and time. Annu. Rev. Ecol., Evol. Syst. 40, 677–697.
Canada (NSERC) Discovery Grant (No. 355764-2013) awarded to A. Frattini, P., Crosta, G., Carrara, A., 2010. Techniques for evaluating the performance
of landslide susceptibility models. Eng. Geol. 111, 62–72.
Brenning. We would also like to express our gratitude to the Glenn, N.F., Streutker, D.R., Chadwick, D.J., Thackray, G.D., Dorsch, S.J., 2006. Ana-
anonymous reviewers for their constructive comments that helped lysis of LiDAR-derived topographic information for characterizing and differ-
to improve the paper. entiating landslide morphology and activity. Geomorphology 73 (1), 131–148.
Goetz, J.N., Guthrie, R.H., Brenning, A., 2011. Integrating physical and empirical
landslide susceptibility models using generalized additive models. Geomor-
phology 129 (3), 376–386.
References Gottschling, P., 2006. Massenbewegungen. Geologie der Bundesländer-Niederö-
sterreich. vol. 2. Geologische Bundesanstalt, Wien, pp. 335–340.
Gorsevski, P.V., Gessler, P.E., Foltz, R.B., Elliot, W.J., 2006. Spatial prediction of
Agterberg, F.P., Cheng, Q., 2002. Conditional independence test for weights-of- landslide hazard using logistic regression and ROC analysis. Trans. GIS 10 (3),
evidence modeling. Nat. Resour. Res. 11 (4), 249–255. 395–415.
Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., Galli, M., 2006. Estimating
Autom. Control 19, 716–723. the quality of landslide susceptibility models. Geomorphology 81 (1), 166–184.
Atkinson, P.M., Massari, R., 1998. Generalised linear modelling of susceptibility to Han, J., Kamber, M., 2006. Data Mining: Concepts and Techniques, 2nd edn. Morgan
landsliding in the central apennines, Italy. Comput. Geosci. 24 (4), 373–385. Kaufmann Publishers, San Francisco, p. 743.
Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for Hastie, T.J., Buja, A., Tibshirani, R., 1995. Penalized discriminant analysis. Ann. Stat.
landslide susceptibility mapping in the Kakuda-Yahiko mountains Central Ja- 23 (1), 73–102.
pan. Geomorphology 65 (1), 15–31. Hastie, T.J., Tibshirani, R., 1990. Generalized Additive Models.. Chapman & Hall,
Beguería, S., 2006. Validation and evaluation of predictive models in hazard as- London, p. 352.
sessment and risk management. Nat. Hazards 37 (3), 315–329. Hastie, T., 2009. GAM: Generalized Additive Models R package version 1.08.
Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical Hastie, T.J., Tibshirani, R., 2009. MDA: Mixture and Flexible Discriminant Analysis. R
and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.), package version 0.4-2, R port by F. Leisch, K. Hornik & B. D. Ripley. 〈http://cran.
289–300. r-project.org/package¼mda〉.
Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model Heckmann, T., Gregg, K., Gregg, A., Becht, M., 2014. Sample size matters: in-
of basin hydrology. Hydrol. Sci. Bull., 24; , pp. 43–69. vestigating the effect of sample size on a logistic regression susceptibility
Blahut, J., van Westen, C.J., Sterlacchini, S., 2010. Analysis of landslide inventories for model for debris flows. Nat. Hazards Earth Syst. Sci. 14, 259–278.
accurate prediction of debris-flow source areas. Geomorphology 119 (1), 36–51. Hijmans, R.J., van Etten, J., 2013. Raster: Geographic data analysis and modeling
Bonham-Carter, G., 1994. Geographic information systems for geoscientists; mod- (raster). R package version 2.1-25.
elling with GIS. Computer Methods in Geosciences. Vol. 13. Pergamon Press, p. Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression, 2nd edition. John
398. Wiley & Sons, New York, p. 373.
Hothorn, T., Lausen, B., 2005. Bundling classifiers by bagging trees. Comput. Stat. Environment and Sustainability, Italy. Available at: 〈http://eusoils.jrc.ec.europa.
Data Anal. 49 (4), 1068–1078. eu/ESDB_Archive/eusoils_docs/other/EUR23785EN.pdf〉 (last access: 01.0311).
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Schwenk, H., 1992. Massenbewegungen in Niederösterreich 1953-1990. Jahrbuch
Learning. Springer, New York, p. 441. der Geologischen Bundesanstalt. vol. 135. Geologische Bundesanstalt, Wien, pp.
Lee, S., Choi, J., Min, K., 2002. Landslide susceptibility analysis and verification using 597–660.
the Bayesian probability model. Environ. Geol. 43 (1–2), 120–131. Schulz, W.H., 2004. Landslides mapped using LIDAR imagery, Seattle, Washington.
Lee, S., Ryu, J.H., Kim, I.S., 2007. Landslide susceptibility analysis and its verification U.S. Geological Survey Open-File Report 2004-1396. 11 pp., 1 plate.
using likelihood ratio, logistic regression, and artificial neural network models: Schulz, W.H., 2007. Landslide susceptibility revealed by LIDAR imagery and his-
case study of Youngin Korea. Landslides 4 (4), 327–338. torical records, Seattle, Washington. Eng. Geol. 89, 67–87.
Ließ, M., Glaser, B., Huwe, B., 2011. Functional soil-landscape modelling to estimate Sidle, R.C., Ochiai, H., 2006. Landslides: Processes, Prediction, and Land Use.
slope stability in a steep Andean mountain forest region. Geomorphology 132 American Geophysical Union, Washington DC, p. 312.
(3), 287–299. Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T., 2009. ROCR: Visualizing the
Lineback Gritzner, M., Marcus, W.A., Aspinall, R., Custer, S.G., 2001. Assessing Performance of Scoring Classifiers. R package version 1.0-4. http://cran.r-pro
landslide potential using GIS, soil wetness modeling and topographic attributes ject.org/package¼ ROCR.
Payette River, Idaho. Geomorphology 37 (1), 149–165. Soeters, R., van Westen, C. J., 1996. Landslides: Investigation and mitigation.
Mathur, A., Foody, G.M., 2008. Crop classification by a support vector machine with Chapter 8-Slope instability recognition, analysis, and zonation. Transportation
intelligently selected training data for an operational application. Int. J. Remote Research Board Special Report 247.
Sens. 29 (8), 2227–2240. Sterlacchini, S., Ballabio, C., Blahut, J., Masetti, M., Sorichetta, A., 2011. Spatial
Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, agreement of predicted patterns in landslide susceptibility maps. Geomor-
Kanevski, M., 2014. Machine learning feature selectoin methods for landslide phology 125 (1), 51–61.
susceptiblity mapping. Math. Geosci. 46, 33–57. http://dx.doi.org/10.1007/ Strobl, C., Boulesteix, A.-L., Zeileis, A., Horthon, T., 2007. Bias in random forest
s11004-013-9511-0. variable importance measures: Illustrations sources, and a solution. BMC Bioinf.
Moguerza, J.M., Muñoz, A., 2006. Support vector machines with applications. Stat. 8, 25. http://dx.doi.org/10.1186/1471-2105-8-25.
Sci. 21 (3), 322–336. http://dx.doi.org/10.1214/088342306000000493. Van den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G.,
Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modelling: a review of Vandekerckhove, L., 2006. Prediction of landslide susceptibility using rare
hydrological, geomorphological, and biological applications. Hydrol. Process. 5 events logistic regression: a case-study in the Flemish Ardennes (Belgium).
(1), 3–30. Geomorphology 76, 392–410.
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of land- van Westen, C.J., Rengers, N., Terlien, M.T.J., Soeters, R., 1997. Prediction of the oc-
slides along a humidity gradient in the tropical Andes. Geomorphology 139- currence of slope instability phenomenal through GIS-based hazard zonation.
140, 271–284. Geol. Rundsch. 86 (2), 404–414.
Neuhäuser, B., Terhorst, B., 2007. Landslide susceptibility assessment using van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological informa-
“weights-of-evidence” applied to a study area at the Jurassic escarpment (SW- tion in indirect landslide susceptibility assessment. Nat. Hazards 30 (3),
Germany). Geomorphology 86 (1), 12–24. 399–419.
Nguyen, M.H., de la Torre, F., 2010. Optimal feature selection for support vector van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide
machines. Pattern Recognit. 43, 584–591. susceptibility, hazard, and vulnerability assessment: an overview. Eng. Geol.
Pachauri, A.K., Pant, M., 1992. Landslide hazard mapping based on geological at- 102 (3), 112–131.
tributes. Eng. Geol. 32 (1), 81–100. Vapnik, V., 1998. Statistical Learning Theory. John Wiley & Sons Inc., New York, p.
Peters, A., Hothorn, T., 2009. Ipred: Improved predictors. R package version 0.9-1. 736.
Petschko, H., Bell, R., Leopold, P., Heiss, G., Glade, T., 2013. Landslide inventories for Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice,
reliable susceptibility maps. In: Margottini, C., Canuti, P., Sassa, K. (Eds.), Natural Hazards No. 3, IAEG Commission on Landslides and other Mass-
Landslide Science and Practice, vol. 1: Landslide Inventory and Susceptibility Movements, UNESCO, Paris.
and Hazard Zoning. Springer. Wessely, G., 2006. Geologie der Österreichischen Bundesländer: Niederösterreich.
Petschko, H., Brenning, A., Bell, R., Goetz, J., Glade, T., 2014. Assessing the quality of Geologische Bundesanstalt, Vienna, p. 416.
landslide susceptibility maps – case study Lower Austria. Nat. Hazards Earth Xu, L., Li, J., Brenning, A., 2014. A comparative study of different classification
Syst. Sci. 14, 95–118. http://dx.doi.org/10.5194/nhess-14-95-2014. techniques for marine-oil spill identification using RADARSAT-1 imagery. Re-
Pradhan, B., 2013. A comparative study on the predictive ability of the decision tree, mote Sens. Environ. 141, 14–23.
support vector machine and neuro-fuzzy models in landslide susceptibility Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative
mapping using GIS. Comput. Geosci. 51, 350–365. study of frequency ratio, analytical hierarchy process, bivariate statistics and
R Development Core Team R, 2003. A language and environment for statistical logistic regression methods for landslide susceptibility mapping in Trabzon, NE
computing. R Foundation for Statistical Computing Vienna, Austria. Turkey. Catena 85, 2011.
Regmi, N.R., Giardino, J.R., Vitek, J.D., 2010. Modeling susceptibility to landslides Yesilnacar, E., Topal, T., 2005. Landslide susceptibility mapping: a comparison of
using the weight of evidence approach: Western Colorado, USA. Geomorphol- logistic regression and neural networks methods in a medium scale study,
ogy 115 (1), 172–187. Hendek region (Turkey). Eng. Geol. 79 (3), 251–266.
Regmi, N.R., Giardino, J.R., McDonald, E., Vitek, J.D., 2014. A comparison of logistic Yilmaz, I., 2009. Landslide susceptibility mapping using frequency ration, logistic
regression-based models of susceptibility to landslides in western Colorado, regression, artificial neural networks and their comparison: a case study from
USA. Landslides 11, 247–262. Kat landslides (Tokat-Turkey). Comput. Geosci. 35, 1125–1138.
Ruß, G., Brenning, A., 2010. Data mining in precision agriculture: Management of Yilmaz, I., 2010. Comparison of landslide susceptibility mapping methodologies for
spatial information. Lect. Notes Comput. Sci. 6178, 350–359. http://dx.doi.org/ Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural
10.1007/978-3-642-14049-5_36. networks, and support vector machine. Environ. Earth Sci. 61 (4), 821–836.
Schnabel, W., 2002. Geologische Karte von Niederösterreich 1:200000. Wien, Zweig, M.H., Campbell, G., 1993. Receiver-operating characteristic (ROC) plots. Clin.
Austria (in German). Chem. 39, 561–577.
Schweigl, J., Hervás, J., 2009. Landslide Mapping in Austria, JRC Scientific and
Technical Reports, European Commission Joint Research Centre, Institute for

Computers & Geosciences: J.N. Goetz, A. Brenning, H. Petschko, P. Leopold

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computers & Geosciences: J.N. Goetz, A. Brenning, H. Petschko, P. Leopold

Uploaded by

Copyright:

Available Formats

Computers & Geosciences 81 (2015) 1–11

Contents lists available at ScienceDirect

Computers & Geosciences

Evaluating machine learning and statistical prediction techniques for

1. Introduction selection of quantitative methods applied for spatial modeling and

Variable Median (IQR) for landslide/non-landslide points

Flysch (n¼ 285) Molasse (n¼261) Austroalpine (n¼ 193)

Slope angle (deg) 21 (9)/15 (8) 15 (7)/7 (5) 28 (8)/27 (12)

Model AUROC Model TPR at 10% FPR

Median (IQR) Δ AUROC p-Values Median (IQR) Δ TPR p-Values

Variable Rank Max. SVI GAM GLM WOE SVM RF BPLDA

4. Discussion conﬁgurations may lead to different results, and it is beyond the

You might also like