Journal Pre-Proof

Journal Pre-proof
Flash flood susceptibility mapping using a novel deep learning

model based on deep belief network, back propagation and
genetic algorithm
Himan Shahabi, Ataollah Shirzadi, Somayeh Ronoud, Shahrokh

Asadi, Binh Thai Pham, Fatemeh Mansouripour, Marten
Geertsema, John J. Clague, Dieu Tien Bui
PII: S1674-9871(20)30240-1
DOI: https://doi.org/10.1016/j.gsf.2020.10.007
Reference: GSF 1100
To appear in:
Received date: 8 May 2020

Revised date: 6 August 2020
Accepted date: 17 October 2020
Please cite this article as: H. Shahabi, A. Shirzadi, S. Ronoud, et al., Flash flood
susceptibility mapping using a novel deep learning model based on deep belief network,
back propagation and genetic algorithm, (2020), https://doi.org/10.1016/j.gsf.2020.10.007
This is a PDF file of an article that has undergone enhancements after acceptance, such
as the addition of a cover page and metadata, and formatting for readability, but it is
not yet the definitive version of record. This version will undergo additional copyediting,
typesetting and review before it is published in its final form, but we are providing this
version to give early visibility of the article. Please note that, during the production
process, errors may be discovered which could affect the content, and all legal disclaimers
that apply to the journal pertain.
© 2020 Published by Elsevier.

Journal Pre-proof
Flash flood susceptibility mapping using a novel deep learning model based on deep belief
network, back propagation and genetic algorithm
Himan Shahabia,b, Ataollah Shirzadic, Somayeh Ronoudd, Shahrokh Asadid, Binh Thai Phame,
Fatemeh Mansouripourd, Marten Geertsemaf, John J. Clagueg, Dieu Tien Buih,i,*
buitiendieu@tdtu.edu.vn
a
Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj
of
66177-15175, Iran
b
Board Member of Department of Zrebar Lake Environmental Research, Kurdistan Studies
ro
Institute, University of Kurdistan, Sanandaj 66177-15175, Iran
c
-p
Department of Rangeland and Watershed Management, Faculty of Natural Resources, University
re
of Kurdistan, Sanandaj, Iran
lP
d
Data Mining Laboratory, Department of Engineering, College of Farabi, University of Tehran,
Tehran, Iran
na
e
Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
f
ur
British Columbia, Ministry of Forests, Lands, Natural Resource Operations and Rural
Development, Prince George, BC V2L 1R5, Canada

Jo
g
Department of Earth Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
h
Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh
City, Vietnam
i
Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City,
Vietnam
*
Corresponding author.
Journal Pre-proof
Abstract
Flash floods are responsible for loss of life and considerable property damage in many countries.
Flood susceptibility maps contribute to flood risk reduction in areas that are prone to this hazard if
appropriately used by land-use planners and emergency managers. The main objective of this
study is to prepare an accurate flood susceptibility map for the Haraz watershed in Iran using a
novel modeling approach (DBPGA) based on Deep Belief Network (DBN) with Back
Propagation (BP) algorithm optimized by the Genetic Algorithm (GA). For this task, a database
of
comprising ten conditioning factors and 194 flood locations was created using the One-R
ro
Attribute Evaluation (ORAE) technique. Various well-known machine learning and optimization
-p
algorithms were used as benchmarks to compare the prediction accuracy of the proposed model.
re
Statistical metrics include sensitivity, specificity accuracy, root mean square error (RMSE), and
area under the receiver operatic characteristic curve (AUC) were used to assess the validity of the
lP
proposed model. The result shows that the proposed model has the highest goodness-of-fit
na
(AUC=0.989) and prediction accuracy (AUC=0.985), and based on the validation dataset it
outperforms benchmark models including LR (0.885), LMT (0.934), BLR (0.936), ADT (0.976),
ur
NBT (0.974), REPTree (0.811), ANFIS-BAT (0.944), ANFIS-CA (0.921), ANFIS-IWO (0.939),
Jo
ANFIS-ICA (0.947), and ANFIS-FA (0.917). We conclude that the DBPGA model is an excellent
alternative tool for predicting flash flood susceptibility for other regions prone to flash floods.
Keywords: Environmental modeling; Flash flood; Deep belief network; Over-fitting; Iran
Journal Pre-proof
1. Introduction
Flash floods occur when channel discharge rapidly exceeds channel capacity, resulting in over-
banking flow (Khosravi et al., 2019). Extreme, short-duration rainfall increases both average flow
velocity and the velocity of the peak flow (Charlton et al., 2006). Flash floods are responsible for
natural disasters, with total fatalities of more than 20,000 per year, and much damage to
infrastructure and agricultural systems (Zhou et al., 2017). About 90% of global flash flood
fatalities are in Asia. Projections suggest that flooding in Asia could increase by nearly 200% by
of
2050 (Arnell and Gosling, 2016). Causes of flash floods include heavy rainfall, hardening of large
ro
areas of land due to urbanization, and soil degradation (Huppert and Sparks, 2006; Schillaci et al.,
-p
2017; Wijkman and Timberlake, 2019). Risk commonly increases with population growth and,
re
possibly, as climate changes.
lP
Iran is particularly prone to flash floods. In 2019 alone, flash floods killed 78 people, injured 1076
others, and displaced about 300,000 people; in total, some 10 million people were affected (UN
na
Office for the Coordination of Humanitarian Affairs, 2019). One of the most susceptible areas to
ur
flash flooding is the Haraz watershed in Mazandaran Province (Tien Bui et al., 2018b; Khosravi
et al., 2019). Twenty-eight villages in this area were destroyed during flash floods (Khosravi et
Jo
al., 2016a).
Because flash floods are so deadly and damaging, researchers are making efforts to predict
hazards and risks in flood-prone areas. These efforts include development of advanced systems to
predict areas vulnerable to flooding. Because flood prediction is time-consuming and flood-prone
areas are complex, flood prediction models are particularly data-specific and require many
simplifying assumptions (Lohani et al., 2014). A key initiative is the development of advanced
flood predictive models, including physically based rainfall-runoff, regression, and ―on-off‖
Journal Pre-proof
classification models (Tien Bui and Hoang, 2017). Physically based models, for example HEC-
RAS (Brunner, 1995) and MIKE (Zhou et al., 2012), require lengthy monitoring hydrological
datasets , consequently their use in flood modeling is highly challenging (Kim et al., 2015; Nayak
et al., 2005). Regression models are widely used in spatial and temporal flood modeling, but they
too require long time-series datasets from hydrological stations to accurately forecast extreme
discharges. Commonly, discharge records are short or incomplete and thus rarely can be used for
accurate flood prediction. The newest physically based approach is the ―on-off‖ classification
of
model. This model does not require data from hydrological stations; instead it uses historical
ro
flood and geo-environmental data, which are classified into flood and non-flood categories using
-p
data-driven and machine learning models (Tien Bui et al., 2016a).
re
Recent progress in geographic information system (GIS) and remote sensing (RS) techniques has
driven development of new flood prediction models, but applications of new models in mountain
lP
areas still face challenges (Tien Bui et al., 2019). To overcome these challenges, machine learning
na
models are increasingly being developed and used in flood modeling because of their high
performance, accuracy, and predictive capability (Ahmadlou et al., 2019; Tien Bui et al., 2019a).
ur
The increasing popularity of machine learning algorithms stems, in part, from the fact that they
Jo
predict flood nonlinearity solely from flood historical datasets without the need for knowledge of
complex mathematical expressions of physical processes and basin behavior (Mosavi et al.,
2018). Machine learning algorithms can easily be implemented (fast training, testing, and
validation) with low computation costs. They also are less complex than other physically based
and conventional models (Mekanik et al., 2013; Mosavi et al., 2017). Khosravi et al. (2019)
compared the performance of three expert knowledge-based models (Vise kriterijumska
optimizacijaik ompromisno Resenje (VIKOR), Technique for Order Preference by Similarity to

Journal Pre-proof
Ideal Solution (TOPSIS), and Simple Additive Weighting (SAW)) with two machine learning
methods (Naïve Bayes Tree (NBT) and Naïve Bayes (NB)) for flood susceptibility mapping in the
flood-prone Ningdu watershed in China. They found that the machine learning algorithms
outperformed and outclassed the three expert knowledge-based models.
A survey of the literature shows that many machine learning algorithms have been developed and
formulated for flood modeling: functional algorithms (i.e. support vector machines (SVM),
logistic regression (LR), and artificial neural networks (ANN)), Bayes-based algorithms
of
(Bayesian logistic regression (BLR)); and decision tree algorithms (random forest (RF),
ro
alternating decision tree (ADT), logistic model trees (LMT), naïve Bayes tree (NBT), reduced
-p
error pruning tree (REPT), and classification and regression trees (CART)). However, differences
re
in the choice of flood conditioning factors by researchers and in the probability distribution
lP
function (PDF) used in the algorithms (Tien Bui et al., 2018b) make it difficult to compare and
rate the algorithms. Nevertheless, Khosravi et al. (2019) compared four machine learning
na
algorithms (LMT, NBT, REPT, and ADT) that they applied to predict floods in the Haraz
watershed. They demonstrated that the ADT model had the highest prediction capability for flash
ur
flood susceptibility assessment, followed by, respectively, the NBT, LMT, and REPT models.
Jo
Chen et al. (2020) compared three machine learning algorithms (NBT, ADT, and RF) for spatial
prediction of flooding in the Quannan area, China, and found that the RF algorithm outperformed
the NBT and ADT algorithms.
A recent development in flood prediction modeling is the use of hybrid machine learning models.
Hybrid models have been shown to improve and enhance the prediction accuracy of single
statistically based benchmark models, for example the LR model and bivariate models such as
frequency ratio (FR), evidential belief function (EBF), and weights-of-evidence (WoE) models
Journal Pre-proof
(Pham et al., 2018). Tehrany et al. (2014) coupled WoE and SVM and concluded that the WoE-
SVM (RBF) hybrid model outperformed the benchmark WoE and SVM models for flood
susceptibility mapping at Brisbane, Australia.
Some hybrid models have been developed for flood susceptibility mapping in the Haraz
watershed, which is the area we chose for our study. Chapi et al. (2017) developed a hybrid model
(Bagging-LMT) that combines Bagging and the LMT algorithm and compared its goodness-of-fit
and prediction accuracy with four state-of-the-art soft computing benchmark models (LMT, LR,
of
BLR, and RF). They found the hybrid model performed best; its performance was particularly
ro
higher than that of the LMT algorithm. Shafizadeh-Moghadam et al. (2018) coupled eight single
-p
machine learning algorithms with seven ensemble forecasting models and concluded that the
re
combination of boosted regression tree (BRT) algorithm and the ensemble forecasting model
lP
EMmedian provided the highest performance for flood modeling in the study area. Tien Bui et al.
(2018b) coupled the adaptive neuro-fuzzy inference system (ANFIS) algorithm with the
na
imperialistic competitive (ANFIS-ICA) and firefly (ANFIS-FA) algorithms; they found that the
hybrid model (ANFIS-ICA) outperformed the single algorithms used alone. In contrast, Tien Bui
ur
et al. (2019b) combined the EBF and LR algorithms in a new hybrid model (EBF-LR), and
Jo
concluded that the hybrid model failed to outperform the standalone models. Tien Bui et al.
(2018a) integrated ANFIS with the cultural algorithm (ANFIS-CA), bees algorithm (ANFIS-BA),
and invasive weed optimization (ANFIS-IWO) algorithm. They concluded that ANFIS-IWO had
the highest goodness-of-fit among the models tested, although ANFIS-BA had higher prediction
accuracy than the ANFIS-IWO and ANFIS-CA models. Finally, Shahabi et al. (2020) introduced
hybrid models of bagging based on four kernels (i.e. coarse, cosine, cubic, and weighted) of the
Journal Pre-proof
K-Nearest Neighbor (KNN) algorithm. and concluded that the Bagging-Cubic ensemble model
had the highest ability to predict flood locations in Haraz area.
In the early days of artificial intelligence, deep learning (DL) methods (also referred to as deep
structured learning or hierarchical learning) began to be used in artificial intelligence research
after having become popular in other scientific fields such as computer vision, big data mining,
human activity recognition, character recognition, speech recognition, digital image processing,
and natural language processing (Ball et al., 2017; Huang and Xiang, 2018). DL is a statistical
of
technique for classifying patterns using neural networks with multiple layers based on training
ro
datasets (Marcus, 2018). DL offers several advantages over other methods: (i) they are becoming
-p
more applicable and useful as the sizes of available training datasets increase; (ii) the size of DL
re
models? has grown over time with improvements in computer infrastructure and speed; (iii) they
lP
can solve complex real-world problems by incrementally improving their accuracy over time, (iv)
they can perform unsupervised and semi-supervised learning (Nielsen, 2015).

na
Of the different types of DL used for classification, the deep belief network (DBN) (Hinton et al.,
ur
2006) is the most potent and efficient predictor and classifier (Ronoud and Asadi, 2019). DBN
uses a Restricted Boltzmann Machine (RBM) that can extract features from a large number of
Jo
training datasets, resulting in improvements in classification accuracy and prediction precision.
Importantly, deep learning models such as DBNs determine the optimal structure of a dataset by
selecting the best number of layers and neurons in each layer to achieve reasonable results (Guo et
al., 2016). There is no a guideline or standard way to select these parameters in the literature, thus
manual searching is generally used (Larochelle et al., 2007; Hinton, 2012; Shen et al., 2015).
In this study, we propose a new model using DBN, and evaluate and test it on different structures.
Our objective is to select an appropriate structure with the highest performance and prediction
Journal Pre-proof
accuracy. We also use back propagation (BP) to minimize cost functions by adapting control
weights (Dreyfus, 1973). Our long-term goal is to develop and explore new models and
techniques that improve flash flood management and mitigation. In this study, we develop a new
deep learning algorithm of deep belief network (DBN) with back propagation (BP) algorithm
optimized by genetic algorithm (GA) (DBPGA) for flash flood susceptibility mapping at Haraz in
northern Iran. Although DL models have been successfully used for landslide susceptibility
mapping (Ding et al., 2016; Ghorbanzadeh et al., 2019; Xiao et al., 2018), no such study has been
of
conducted on flood susceptibility assessment. Our proposed model improves the flood weights by
ro
using the BP algorithm to decrease the cost function and the GA algorithm to optimize the
-p
topology of the network in order to increase performance and prediction accuracy during the
re
modeling process using a training dataset. We compare the results from the new deep learning
proposed model with other state-of-the-art soft computing benchmark models such as machine
lP
learning (REPTree, NBT, LR, LMT, BLR, and ADT) and metaheuristic algorithms (ANFIS-IWO,
na
ANFIS-ICA, ANFIS-FA, ANFIS-CA, and ANFIS-BAT ) to check the efficiency and capability of
the developed model. The modeling process has been done in MATLAB (2018b); the flash flood
ur
susceptibility map was generated using ArcGIS10.2 software.

Jo
2. Description of the study area
The Haraz watershed is located in the mountainous Mazandaran Province, in northern Iran (Fig.
1). Major population centers are Polur, Tashal, Tiran, Rineh, Kandovan, Abasak, Gaznak,
Baladeh, and Noor (Khosravi et al., 2016a). The Haraz watershed experiences near-annual
catastrophic flash floods that cause fatalities, damage property and infrastructure, and disrupt
traffic, commerce, and public services, notably in recent years. Flash floods result from torrential
Journal Pre-proof
and are exacerbated by deforestation, recent extensive replacement of orchards by residential
areas, and the lack of flood control measures (Tien Bui et al., 2018b; Khosravi et al., 2018).
The watershed has an area of 4014 km2 and ranges in elevation from 300 m to about 5600 m a.s.l.,
with slopes up to 66o (Fig. 1). Average annual rainfall at Haraz is about 780 mm; the wettest
months are January, February, March, and October, with average monthly rainfall amounts of
about 160 mm. Average annual evaporation is about 1300 mm. Average temperature at Haraz
ranges from a minimum of 5°C to a maximum of 23°C. Average annual temperature is about 8°C.
of
Mountainous areas have a moderately cold climate, whereas the Caspian Sea shoreline has a mild
ro
humid climate.
-p
The study area is underlain by Mesozoic formations (56.4%), followed by Cenozoic (38.9%) and
re
Paleozoic (4.7%) formations. Most of the area is by rangeland (92%); the remainder is forest, bare
lP
land, irrigated land, residential area, and garden land.

na
ur
3. Data acquisition
Jo
3.1 Flash flood inventory map
We used data from 194 historical (1995–2015) flash floods in the Haraz watershed to map flood
distribution. We divided the data set into 155 (80%) locations used for flood modeling and 39
(20%) locations) for evaluation processes. Environmental experts employed by the Mazandaran
Regional Water Authority validated locations of the flash floods through field surveys.
Additionally, we randomly selected 194 non-flash flood locations within the study area and
divided them into modeling and evaluation groups using the same 80:20 ratio as we used for the
flood locations.
Journal Pre-proof
3.2 Flash flood conditioning factors
The selection of flood conditioning factors has a direct impact on the accuracy of mathematical
models (Kia et al., 2012). Based on previous research in the study area ( Khosravi et al., 2016a, b;
Chapi et al., 2017), we selected 11 flood conditioning factors for this study: Topographic factors
(slope angle, elevation, and curvature); hydrological factors (topographic wetness index (TWI),
stream power index (SPI), distance to river, river density, and rainfall); a geological factor
(lithology); and land cover factors (Normalized Difference Vegetation Index (NDVI), and land
of
use). We prepared a digital elevation model (DEM) of the study area from the ASTER Global
ro
DEM (https://gdex.cr.usgs.gov/gdex/) with a cell size of 30 m ×30 m. The DEM was used to
-p
provide maps of primary and secondary factors such as slope angle, elevation, plan curvature,
re
distance to river, and river density using ArcGIS 10.3, and maps of TWI and SPI using SAGA-
GIS 2.8 software. The spatial databases were constructed and then resampled using the "Resample
lP
tool" in ArcGIS 10.3 in a matrix with 5299 columns and 3027 rows and a cell size of 20 m ×20 m
na
for spatial analysis and model development. We briefly discuss below the role of each factor in
the occurrence of flooding.

ur
Topographic factors
Jo
Slope angle. Slope angle has a direct effect on flooding, and most researchers consider to be one
of the most important factors in flood modeling ( Rahmati et al., 2016; Termeh et al., 2018; Wang
et al., 2019b; Costache and Bui, 2020). It controls surface runoff, its velocity of velocity, and
infiltration. In general, the lower the slope angle (e.g., areas around rivers or flat terrain), the
higher the rate of infiltration and the lower the flow velocity; all other things being equal, such
areas have a higher likelihood of flooding (Chapi et al., 2017). We constructed the slope angle
map with eight classes based on the natural breaks classification method (Table 2 and Fig. 2a).
Journal Pre-proof
Elevation. Elevation generally has an inverse relationship with flooding (Fernández and Lutz,
2010), it has an inverse relationship with flooding. As elevation decreases, terrain typically
becomes flatter and the amount of water carried by streams and rivers increases (Cao et al., 2016).
An elevation map was constructed with nine classes using the manual classification method
(Table 2 and Fig. 2b).
Curvature. Some flood researchers consider curvature to an important flood conditioning factor
(Ahmadlou et al., 2019; Hong et al., 2018). Runoff accelerates or decelerates depending on slope
of
form: concave (negative curvature), flat (zero curvature), and convex (positive curvature). Convex
ro
slopes accelerate overland flow and may also affect infiltration and soil saturation (Cao et al.,
-p
2016). Concave slopes decelerate overland flow and may increase infiltration (Young and
re
Mutchler, 1969). The curvature map was constructed in three categories: convex, flat, and
concave (Table 2 and Fig. 2c).

lP
Hydrological factors
na
Topographic wetness index (TWI). The topographic watershed index (TWI) is a hydrological
ur
metric, defined as a ratio between specific basin area and slope angle (Wilson and Gallant, 2000).
It provides a measure of water accumulation, saturation, and flood possibility for each pixel in a
Jo
given watershed (Beven, 2011; Manfreda et al., 2011) and is formulated as follows (Beven and
Kirkby, 1979):
(1)
where is the specific catchment area (m2/m) and β is the slope angle (°). We constructed a TWI
map with six intervals using the natural breaks classification method (Table 2 and Fig. 2d).
Sediment power index (SPI). The sediment power index (SPI) provides a measure of the erosive
power of discharge relative to specific area within the watershed (Poudyal et al., 2010), It reflects
Journal Pre-proof
the power of flow at a given location in a watershed (Cao et al., 2016). The higher the SPI value,
the higher the power of the flow (Turoğlu and Dölek, 2011). SPI can be computed as follows
(Moore and Wilson, 1992):
(2)
where denotes the specific catchment area (m2/m) and β is the slope angle (°). In this study, we
divided SPI into five classes using the nature breaks classification method (Table 2 and Fig. 2e).
Distance to river. Areas close to rivers are more susceptible to flooding than more distant areas,
of
(Butler et al., 2006; Chapi et al., 2017). We extracted the river networks from the DEM with the
ro
―ArcHydro‖ tool in ArcGIS 10.2 and prepared a map with eight class using the natural breaks
classification method (Table 2 and Fig. 2f).

-p
re
River density. River density is defined as the total stream length (m) within an area divided by
lP
watershed area (km2) (Elmore et al., 2013). All other factors being equal, higher stream densities
are associated with higher likelihoods of flooding (Tehrany et al., 2015b). Fraser and Schumer
na
(2012) have argued that larger flood peaks and volumes are associated with higher stream
ur
densities in perennial watersheds, but ephemeral watersheds have lower flood peaks. We prepared
a river density map using the ―Line density‖ tool in ArcGIS 10.2, with six classes selected with
Jo
the natural breaks classification method (Table 2 and Fig. 2g).
Rainfall. Intuitively, rainfall is related to flash flooding. Short-duration torrential rainfall or long-
duration, lower intensity rainfall can cause flooding ( Organization, 1994; Kron, 2002; Marchi et
al., 2010; Cao et al., 2016). We prepared a rainfall map for the Haraz watershed based on a
dataset of 20 years (1991–2011) of rainfall from 17 rain gauges. To create the map, we used a
variety of interpolation methods: simple kriging, ordinary kriging, inverse distance weighting
(IDW) with powers of 1–5, a radial basis function (RBF) with a completely regularized spline,
Journal Pre-proof
and spline with a tension kernel function. We chose the IDW method and a power of ―1‖ because
it had the lowest RMSE error value. The rainfall map has six classes calculated using the natural
breaks classification method (Table 2 and Fig. 2h).
Geological factor
Lithology. Flooding can be affected by lithology and geologic structures, notably porosity,
permeability, and joint and fracture spacing (Derbyshire et al., 2013). In our study area, lithology
is as an indicator of water infiltration (Santos and Reis, 2018). Infiltration on highly resistant
of
rocks is low (Rahmati et al., 2016), resulting in a higher potential for flooding. We extracted
ro
lithology units from a geologic map of the study area at a scale of 1:100,000 provided by the
-p
Geological Survey and Mineral Explorations of Iran. We defined six units in Arc GIS 10.3:
re
Quaternary, Tertiary, Cretaceous, Jurassic, Triassic, and Permian, and Triassic (Table 2 and Fig.
lP
2i).
Land cover factors

na
Land use. Land use can affect infiltration and thus runoff (Rahmati et al., 2016; Santos and Reis,
ur
2018). Vegetation, especially forest, intercepts rainfall and reduces the rapidity of runoff (Tehrany
Jo
et al., 2014). We used a Landsat 8 OLI satellite image acquired in April 2013 and provided by the
Armed Forces Geographical Organization of Iran. We selected characteristic pixels for rangeland,
barren land, forest, garden, wood land, irrigated land, residential areas, and water bodies using
this image and supplemented with a field survey and Google Earth images. We used the neural
network algorithm (ANN), maximum likelihood ratio (MLR), and support vector machine (SVM)
within Environment for Visualizing Images (ENVI 5.1) software to classify all pixels into the
seven land use classes (Table 2 and Fig. 2j).

Journal Pre-proof
Normalized Difference Vegetation Index (NDVI). NDVI is a metric used to study the greenness
of the land surface (Rouse et al., 1974) and the presence of water bodies (Gao, 1996). Changes in
NDVI reflect changes in vegetation and surface water cover over time (Ahmed and Akter, 2017),
and can show the relationship between flooding and vegetation within a watershed (Tehrany et
al., 2013). Higher vegetation densities are assumed to have lower probabilities of flooding within
the study area (Chapi et al., 2017), The metric has values ranging from +1 (highest vegetation
density) and -1 (lowest vegetation density). The NDVI map for the Haraz watershed was
of
generated in ENVI 5.1 software with six classes based on the Landsat 8 OLI image acquired in
ro
2013. Bands 3 and 4 were used to prepare the NDVI map (Table 2 and Figure 2k). NDVI values
-p
were computed as follows: (Tucker and Sellers, 1986):
re
(3)
lP
3.3. Deep Belief Network (DBN)
Artificial Neural Networks (ANNs), inspired by the human brain, were introduced in the 1960s
na
and gained success in a variety of artificial intelligence applications including classification,

ur
regression, clustering, and prediction (Ahmadizar et al., 2015). ANNs are flexible mathematical
Jo
structures that learn intricate relationships between input and output data. The structure that is
most common across the different types of ANNs is the Back-Propagation Network (BP). BP,
however, suffers from the use of random weights at the beginning of the network training process.
Partly due to the problem, a new approach to deep-network pre-training, the Deep Belief Network
(DBN), was introduced in 2006 and led to significant progress in deep learning (Hinton et al.,
2006).
A DBN consists of several Restricted Boltzmann Machines (RBMs), which are the undirected
generative probabilistic model that uses one hidden layer to model the probabilistic distribution of
Journal Pre-proof
visible variables. Our DBN uses a stack of RBMs to process information hierarchies, which
extract high-level features among the raw data. Fig. 3 graphically shows a DBN with m input, N
RBM, and one output O. The bias of the visible and hidden layers is not shown in the figure for
simplicity. The numbers and letters shown in the neurons are the indexes of the neurons.
DBN training comprises three steps:
Step 1. Unsupervised and greedy layer-wise pre-training using a stack of RBMs.
of
Step 2. First fine-tuning step – Randomly assign the connection weights matrix between the latest
ro
hidden layer and the output neuron, and then calculate the error.
-p
Step 3. Second fine-tuning step – Use error Back Propagation.
re
The RBM training process is described below.
lP
3.3.1 Restricted Boltzmann Machine
The Restricted Boltzmann Machine uses an encoding-decoding pattern with an encoder that
na
converts inputs into a higher-level feature representation (Fig. 4). The decoder can then
reconstruct the input (Lopes and Ribeiro, 2015). RBM training through the reconstruction of input
ur
data is a major advantage of DBN because this procedure is unsupervised and does not require
Jo
labeled data.
The RBM consists of a set of visible units { } and a set of hidden units { } ,
where and are, respectively, the number of visible units and the number of hidden units. In
RBM, the energy of the joint configuration { } considering bias is (Bengio, 2009):
(4)
In Eq. (4), x is the vector of visible units value, h is the vector of hidden units values, W is the
weight matrix, b is the bias vector for visible units, c is the bias vector for hidden units, and vT, bT,
Journal Pre-proof
and cT are column vectors; bT is the transpose of vector b, cT is the transpose of vector c, and vT is
the transpose of vector v. Eq. (4) can be rewritten as follows (Tieleman and Hinton, 2009):
gv gh gv gh
E ( v, h)  Wij vi h j   bi vi   c j h j (5)
i 1 j 1 i 1 j 1
in which vi and hj are binary states of, respectively, visible unit i and hidden unit j; bi and cj are
biases of, respectively, visible unit i and hidden unit j; and Wij is the weight between those two
units. Probabilities of each possible state { }are defined as:
of
( ) (6)
ro
where Z is the normalizing constant and equals:
Z   exp   E  v, h  
-p
re
(7)
v ,h
lP
The probability of a data point, represented by the state v of the visible vector, is:
na
∑ ( ) (8)
ur
The hidden unit activators are independent of the visible unit activators are mutually independent
(and vice versa):

Jo
gh
P h | v    P  hj | v  (9)
j 1
Note that if one layer is specified, the distribution of the other layer is factorial. Since neurons are
binary, the probability of a hidden neuron being ‗on‘ (value = 1) is as follows:
 
P (h j  1| v)    c j  Wij vi  (10)
 i 
Journal Pre-proof
where , is a logistic sigmoid function ( ( )). Similarly, the conditional
probability of a visible node with respect to the hidden vector is:
 
P (vi  1| h)    bi  Wij h j  (11)
 j 
This is a probabilistic version of the normal sigmoid activation function. The goal is to maximize
the log-likelihood of the training data or to minimize its negative log-likelihood. The negative
log-likelihood gradient for the training data concerning model parameters is given by Eqs. (12),
of
(13), and (14):
ro
( ) ⟨ ⟩ ⟨ ⟩ (12)
-p
re
( ) ⟨ ⟩ ⟨ ⟩ (13)
lP
( ) ⟨ ⟩ ⟨ ⟩ (14)
na
where ⟨ ⟩ is the expected value with respect to the distribution a. The learning rule for updating
ur
weights of the training data log-likelihood can be written as:

Jo
(⟨ ⟩ ⟨ ⟩ ) (15)
where α is the learning rate. Eqs. (16) and (17) show the rules of weights training in the biases:
⟨ ⟩ ⟨ ⟩ (16)
(⟨ ⟩ ⟨ ⟩ ) (17)
3.3.2 Calculating the log-likelihood gradient of the training data

Journal Pre-proof
In order to use the log-likelihood gradient of the training data to update Wij based in Eq. (15), it is
necessary to calculate two values, ⟨ ⟩ and ⟨ ⟩ . These two values are commonly
termed the positive and negative phases, respectively (Tieleman and Hinton, 2009). ⟨ ⟩ is
easily calculated by considering the visible units v, the values of which have been determined
from training data, and by assigning the value 1 to each hidden unit with a probability value
calculated by Eq. (10). Thus, if the calculated probability value is higher than a random number
of
with a uniform distribution in the interval of (0, 1), this hidden unit can be considered as 1.
Therefore, ⟨ ⟩ can be easily calculated by obtaining hj values (Tieleman and Hinton, 2009).
ro
The negative phase is more difficult to calculate (Palm, 2012). Methods for this calculating the
-p
negative phase have been proposed by several researchers (Hinton, 2010; Keyvanrad and
re
Homayounpour, 2015; Le Roux and Bengio, 2008; Tieleman, 2008; Tieleman and Hinton, 2009);
lP
they differ in how the objective function gradient is approximated. Currently, the most popular
method is CD-1 (Hinton, 2010), which includes one step of Gibbs sampling where all hidden
na
units are updated (in parallel) according to Eq. (10) before all visible units (in parallel) are
ur
updated according to Eq. (11). Expressed differently, the visible units vi are determined first by
Jo
the input instance. Then, the hidden states hj are calculated from Eq. (10). and are
determined by repeating this process using Eqs. (10) and (11) via one step of the visible and
hidden unit reconstruction. The weight updating rules can be expressed as:
(⟨ ⟩ ⟨ ⟩ ) (18)
The updating rules of the visible and hidden layer biases are as follows:
(⟨ ⟩ ⟨ ⟩ ) (19)
Journal Pre-proof
(⟨ ⟩ ⟨ ⟩ ) (20)
3.3.3 The new proposed deep learning model of DBPGA
It is vital to determine the topology of the neural network because it affects the network‘s learning
capacity and ability to generalize (Ahmadizar et al., 2015). Despite the promising outcomes of
many deep learning algorithms in a variety of applications, determining the appropriate number of
layers and the number of nodes per layer for a particular task is difficult (Guo et al., 2015). There
of
is thus a need to find the optimal architecture of a deep belief network. Finding the optimal
ro
topology of a DBN for a network can be considered a search problem. The DBPGA model uses
GA, which is a search algorithm that can evolutionarily find optimal or near-optimal solutions
-p
(Mansourypoor and Asadi, 2017; Mehmanpazir and Asadi, 2017). Borrowing from genetics, the
re
steps and operators of GA to optimize the structure of the DBN are as:
lP
Step 1. Parameter initialization
Determine population size, number of generations, number of chromosome genes, and minimum
na
and maximum of gene values.

ur
Step 2. Chromosome encoding and population initialization

Jo
Chromosomes are depicted as positive integers representing network architecture (direct
encoding). Fig. 5 shows an example of representation of a chromosome and its corresponding
network for a problem that includes 10 input features and one output O. The numbers 54 and 21
represent the number of neurons, respectively, in the first and second hidden layers. The
population size is N, in which all of the N chromosomes are randomly initialized.
Step 3. Evaluation
Journal Pre-proof
The chromosome‘s fitness is obtained by calculating the percentage of classification accuracy on
the training data:
(21)
where TRP, TRN, FAP, and FAN, are the True Positive, True Negative, False Positive, and False
Negative rates, respectively.
Step 4. Selection
of
Roulette wheel selection and random selection are used to select parents for, respectively, the
ro
crossover and mutation.
Step 5. Crossover
-p
After selecting the parent, a single-point crossover mechanism is used to generate a new
re
population. The crossover point is chosen randomly. Fig. 6 shows a simple example of a single-
lP
point crossover.
Step 6. Mutation
na
A parent is mutated by decreasing or increasing the value of one random selected gene, producing
ur
a new chromosome. Fig. 7 shows a simple example of a mutation. In this example, the third gene,
Jo
which is selected randomly, is mutated from 9 to 25.
Step 7. Survivor Selection
The N chromosomes with the highest fitness values are selected from the current population, and
the population resulting from crossover and mutation are selected as survivors to create the new
population.
Step 8. Stop criterion

Journal Pre-proof
If the condition of a certain number of generations is met, the algorithm stops, and the best
chromosome is returned from the current population. Otherwise, the algorithm goes back to step 3
to create a new population. Fig. 8 shows the GA flowchart for finding the optimal topology of the
DBN.
4 Background of the employed algorithms
of
4.1 Machine learning algorithms
ro
4.1.1 Logistic Regression
Logistic Regression (LR) belongs to the multivariate statistical methods that we used to establish
-p
and compute the coefficient for each flash flood conditioning factor (the independent variable),
re
based on the dependent variable (binary, flood, and non-flood classes) (Chapi et al., 2017). The
lP
higher the coefficient of the conditioning factor is, the greater the probability that a flood will
happen (Mousavi et al., 2011; Shirzadi et al., 2012). The coefficients are determined with a
na
confidence level (95% or 99%) and then are assigned to each significant conditioning factor
ur
during the modeling process to prepare the flash flood susceptibility map. The following
Jo
equations are used for generating the map:
(22)
(23)
where is the probability of flash flood occurrence calculated by the LR model, Z is the linear
function of the LR model, is the constant coefficient extracted by the training model, n is the
number of flash flood conditioning factors, b is the weight of each flash flood conditioning factor,
and x is the specific flash flood conditioning factor. In this study, the LR model was used as one
Journal Pre-proof
of the state-of-the-art computing benchmark models to assess the capability of the proposed new
deep learning model for flash flood susceptibility mapping.
4.1.2 Logistic Model Tree
The Logistic Model Tree (LMT) is a decision tree algorithm that uses logistic regression together
with the C4.5 decision tree (Quinlan, 1993). In LMT, the tree is first split using a feature selection
function - the ‗information gain ratio‘ technique. Then, a regression plan is used to replace leaf
nodes (Witten and Tibshirani, 2011). The logistic regression function using the LogitBoost (LB)
of
algorithm is assigned to a tree node, and then the weights are computed (Tien Bui et al., 2016b).
ro
The Classification and Regression Tree (CART) algorithm is used to decrease or prevent over-
-p
fitting in the pruning stage (Tien Bui et al., 2016b). The C4.5 decision tree can split flash flood
re
conditioning factors into flood and non-flood classes based on their probability (Chen et al., 2018,
2019). The LB algorithm is run with least-squares for each class Ci (flood or non-flood) as:
lP
∑ (24)
na
where N is the number of flash flood conditioning factors and is the coefficient of the ith
ur
component in the input vector x. Finally, the posterior probability | is obtained through
Jo
linear regression in the leaf nodes as follows (Landwehr et al., 2005):
| (25)
∑ ́
where is the number of flash flood conditioning factors, is the linear regression
function, and is the natural logarithm.
4.1.3 Bayesian Logistic Regression
Bayesian Logistic Regression (BLR) is a hybrid model based on the logistic regression model and
the Bayes-based theorem method (Taheri et al., 2019). BLR uses the prior distribution function to
Journal Pre-proof
analyze uncertainties in the model and can solve posterior distributions by the likelihood function
(Ghosh et al., 2007; Tien Bui et al., 2018b). The relationship between the class label (flood and
non-flood) and flash flood conditioning factors is determined in a Bayesian framework. There are
three steps in BLR: (1) the prior probability is determined for the parameters; (2) the likelihood
function is specified for data; and (3) the posterior distribution is computed for the parameters
(Avali et al., 2014). If are the flash flood conditioning factors of the training
dataset x, and flash flood and non-flood are class labels , the logistic function is used to
of
generate the posterior probability of a sample belonging to a specific class label for categorical or
ro
binary factors as follows:
| ⁄ -p ∑
(26)
re
where ⁄ is the prior log odds ratio, b is the bias, and are
lP
the weights that are learned during the modeling process of the training dataset, and is a
function that can be obtained as follows:

na
| ⁄ | (27)
ur
Additionally, for continuous data, a Gaussian prior distribution function is used to calculate the
Jo
weights for each flash flood conditioning factors as follows:
( | ) ⁄
(28)
√
where and are, respectively, the coefficients of conditioning factors and the standard
deviation of the Gaussian distribution.
4.1.4 Alternating Decision Tree
The Alternating Decision Tree (ADT; Freund and Mason, 1999) is a well-known and robust
decision tree algorithm that uses a boosting algorithm for classification (Tien Bui et al., 2018b).
Journal Pre-proof
One of the advantages of this algorithm in comparison to other machine learning methods such as
C4.5, random forest and classification, and regression trees (Breiman et al., 1984) is that it builds
a decision tree structure based on simple rules. ADT has two structural components: Decision
Node (DN) and Prediction Node (PN). The PN, which contains a single number, determines a
condition (Khosravi et al., 2018). For numeric predictions, the tree is grown by a boosting
algorithm; then final prediction scores are used to assign each Prediction Node (Hong et al.,
2015). All contribution weights are summed to achieve the final prediction probability.
of
Take R1 and R2 to be, respectively, a base ruler mapping and a base condition to the real number
ro
from the instances, and α and β to be two real numbers. If the prediction of α is defined as
and β is defined as -p
̅ ( ̅ is a negotiation of R), the values of α and β can be
re
calculated according to the following equation (Freund and Mason, 1999):
lP
̅
⁄ ; ⁄ ̅
(29)
where W is the sum of the weights of the training instances, and the best R1 and R2 are computed
na
by minimizing Zt (R1, R2), which is formulated as follows:

ur
√ √ ̅ ̅ ̅ (30)
Jo
4.1.5 Naïve Bayes Tree
The Naïve Bayes Tree (NBT) combines Naïve Bayes (NB) and Decision Tree (DT) algorithms
based on the Bayes theorem (Kohavi, 1996). It enhances and improves the classification power of
individual NB and DT models (Kohavi, 1996). Among the advantages of NBT are that it requires
little computer memory, is a fast learning algorithm, is efficient and straightforward, performs
excellently, and results are easily interpreted. It is, therefore, one of the most used algorithms
among environmental researchers (Wang et al., 2015). The pre-pruning technique employed with
this method uses one of the following steps: (1) the data splitting process is done at the node; or
Journal Pre-proof
(2) a leaf is generated on the data with a local NB model at that specific node (Landwehr et al.,
2005). The NBT model uses the entropy approach to growing trees (Khosravi et al., 2018). Take
Y to be a training dataset and | | to be the total number of flash flood conditioning factors. Flash
flood conditioning factors can be divided into l classes as Si (i = 1,2, …l). While establishing
decision trees, gain ratio (GR) values are computed to control tree growth as follows (Quinlan,
1986):
| |
∑
of
| | | |
(31)
∑
| | | |
ro
where |Yi| is the number of the flood conditioning factors belonging to the class |Yi|. The
-p
independent assumption between the conditioning factors, is included in the NBT as
re
class conditional independence (Shirzadi et al., 2017). The NB classifier can be computed using
the following equation (Pham et al., 2017):

lP
∏
na
(32)
ur
where PP (ti) refers to the prior probability of the output variables ti = (1, 0), ri is the i-th attribute
Jo
in the training dataset, and σ and ε are the mean value and standard deviation of ri.
4.1.6 Reduced Error Pruning Tree
The Reduced Error Pruning Tree (REPTree) algorithm is an ensemble of Decision Tree (DT), and
Reduced Error Pruning (REP), which based on information gain or variance reduction approaches
generates a decision or regression tree (Quinlan, 1987). When the output of a decision tree is
large, the DT algorithm simplifies the modeling process by using the training dataset; the
complexity of the structure of the tree is also reduced by using REP (Mohamed et al., 2012). REP
Journal Pre-proof
is the most popular pruning method for eliminating the leaves and branches of the tree with low
classification power (Galathiya et al., 2012). The REPTree algorithm locates the sub-tree with the
most accurate power classification (Pham et al., 2019b). The most crucial advantage of REPTree
is that it reduces the complexity of the tree structure and also prevents the over-fitting problem
during the modeling process without sacrificing accuracy (Quinlan, 1987). The performance of
the REPTree is achieved by using reducing the variance and reduced error pruning techniques or
the highest information gain from entropy (Srinivasan and Mekala, 2014). The gain ration in this
of
algorithm can be formulated as follows: (Tien Bui et al., 2012).
ro
Yi
E Y    i 1E Y i 
n
Gain ratio  x ,Y  Y
Y Y
 i 1 i log 2 i
n
Y Y
-p (33)
re
where is the entropy of a training dataset and the attribute belongs to a training dataset with
lP
subsets
na
4.2 Evolutionary/optimization algorithms
4.2.1 Bat Algorithm

ur
The Bat Algorithm (BA) is an intelligent optimization algorithm proposed by (Yang, 2010b) to
Jo
simulate the echolocation behavior of bats. It provides better results for optimization problems
than many popular traditional and heuristic algorithms (Srivastava and Sahana, 2017; Srivastava
et al., 2015).
Bats detect prey or avoid obstacles by the emitting sound that strikes the object and is reflected to
the animals‘ ears. To simulate this behavior, suppose that the initial population of bats is n. At
time - 1, the location and the flight velocity of the th bat are, respectively, and , and
Journal Pre-proof
the current global optimal location is . At time , the velocity and position of the th bat are
updated using the following equations:
(34)
(35)
(36)
where and are, respectively, the minimum and maximum values of bat frequencies, and
of
is the normalized random value [ ].
ro
The following equation is used to produce a new solution:
̄ -p (37)
re
where is a random number ( [ ]), ̄ is the average amplitude of all bats at time, and
lP
is a solution randomly selected from the current optimal solution. When a bat finds prey, it
changes amplitude and sound pulse emission rate by:

na
(38)
ur
[ ]
Jo
(39)
where and are random values, and are amplitude and pulse emission rate of a bat at time t,
respectively.
4.2.2 Cultural algorithm (CA)
The Cultural Algorithm (CA) is an evolutionary algorithm introduced by (Reynolds, 1994). It is a
dual-inheritance algorithm consisting of two search spaces – the belief space, which models the
cultural information about the population; and a population space, which represents individuals at
a genotypic and/or phenotypic level. These two spaces are connected via a communication
Journal Pre-proof
protocol that defines (i) the rules for selecting groups of individuals to adapt the belief set and (ii)
the way that the beliefs influence all individuals in the population space. CA is used for solving
optimization problems that require a large amount of domain knowledge with extensive data,
numerous domain limitations, many objectives, and multiple agents in a vast distributed social
network. The population space can include any population-based computational model, for
example genetic algorithms or evolutionary programming (Reynolds, 1994). The belief space
supports the information reservoir of all experiences among individuals.
of
4.2.3. Invasive Weed Optimization
ro
An Invasive Weed Optimization (IWO) algorithm proposed by (Mehrabian and Lucas, 2006) is a
-p
novel population-based, evolutionary optimization algorithm that tries to simulate resistance,
re
adaptability, and randomness of a weed community. IWO searches for the general optimal
solution of the problem in the solution space. To simulate the behavior of weeds, the algorithm
lP
operates through three steps: initialization, reproduction, spatial dispersal and selection.
na
Initialization
ur
In IWO, weeds represent the feasible solutions of problems. The initial population with Nwo weed
individuals is randomly generated in the solution space, in which each weed consists of variables
Jo
that represent a feasible solution.
Reproduction
Each weed in the population then reproduces seeds. Every weed produces new weeds
depending on its fitness. Weeds with higher fitness produce more seeds. The formula of weeds
producing seeds is:
f- f min
weed n =  s max - smin  + s min (40)
f max - f min
Journal Pre-proof
where is the fitness value of current weed, and are, respectively, the maximum and
the minimum fitness values of the current population, and and represent, respectively
the maximum and the minimum number of seeds.
Spatial dispersal
Seeds in the normally distributed group with a mean planting position and standard deviation are
produced by the following equation:
of
( ) ( ) ( ) ( )
ro
(41)
where is the number of maximum iterations,

-p
is the current standard deviation, and is the
re
nonlinear modulation index.
Competitive exclusion
lP
If the number of grass plants exceeds the maximum numbers of grasses in the group, the grass
na
with the worst fitness is removed from the group so that a constant number of plants remain. This
process continues until the maximum number of iterations is reached, and then the minimum
ur
colony cost function of the grass plants is stored.

Jo
4.2.4. Imperialistic Competitive Algorithm
The Imperialistic Competitive Algorithm (ICA) is a population-based metaheuristic algorithm
used to solve many types of optimization problems. The goal is to find an optimal solution in an
array of variable values called a ‗country‘. The cost to a country is calculated by evaluating the
cost function for that country. The better solution is one with less cost. The best solution with the
lowest cost is chosen and set by imperialists. The rest of the countries are ‗colonies‘.
There are colonies, and . Initial empires are formed by

Journal Pre-proof
assigning the colonies to imperialists and are formulated according to the power of the
imperialist:
Yk
Pk  (42)
v 1Y v
N im
where is the power of imperialist and { } { } shows the
normalized cost, where is the cost to imperialist .
The number of initial colonies possessed by imperialist is calculated as ‗round‘ {
of
}{ }, where ‗round‘ is a function that provides the nearest integer of a fractional
ro
number in the set of colonies of imperialist . In the assimilation process, a colony in each empire
moves in the direction toward its imperialist. The moving distance-p is a random number in
re
interval [ ] , where and is the distance between colony and imperialist.
lP
‗Revolution‘ involves a change in the position of some colonies. After assimilation and revolution
are completed within an empire, the cost of each colony is compared with that of its imperialist,
na
and a colony is swapped with the imperialist if the colony has less cost than the imperialist.
ur
Imperialist competition is an important step based on the total power of an empire.

Jo
If be the total cost of empire , we first calculate for each empire as:
{ ( )}
{ ( )} (43)
where is a positive number between 0 and 1, but close to 0. We then compute the normalized
total cost of empire and the power of empire by:
{ } { }
(44)
Journal Pre-proof
NTC k
EPk  (45)
v 1 NTC v
N im
After a vector | ||
| is defined, the weakest colony from the weakest empire is assigned to the empire with the
largest index, where is a random number chosen from a uniform distribution in [0, 1].
4.2.5 Firefly Algorithm
of
The Firefly Algorithm (FA) is a population-based metaheuristic algorithm for solving
optimization problems developed by (Yang, 2010a). The below assumptions were made in
ro
formulating this algorithm:
-p
(i) Fireflies are unisexual. So, one firefly will be attracted to other fireflies regardless of its sex.
re
(ii) Attractiveness is proportional to firefly brightness. Therefore, the fireflies with higher
lP
brightness have higher attractiveness to others. However, the attractiveness decreased when the
distance of the two fireflies increases.

na
(iii) If there is no brighter one, a bright firefly will move randomly.

ur
According to (Yang, 2009), the attractiveness of a firefly is determined by its light intensity and
Jo
the attractiveness can be defined as follows:
(46)
where is the attractiveness parameter at , and is the distance between two
fireflies. The parameter is the absorption coefficient, which is usually 1. The distance between
the two fireflies and is defined as follows:

Journal Pre-proof
√∑ ( )
(47)
where D is the problem dimension.
Movements of fireflies are based on their attractiveness. The movement of a less attractive
firefly , which is attracted to a brighter firefly , is determined by:
( ) (
of
) (48)
ro
where and are the
-p
dimension values of firefly and firefly ,
re
respectively. Besides, where is a random variable
that is uniformly distributed in the range [0, 1], [ ] [ ] is the step parameter, and
lP
indicates the iteration numbers. Thus, ( ) means

na
that firefly is better than firefly in terms of its fitness value.

ur
5 Validation and comparison of the models

Jo
5.1 Statistical measures
We determined the goodness-of-fit and performance of all the models for our flash flood mapping
using a variety of statistical metrics, including sensitivity, specificity, accuracy, MSE, and RMSE.
Sensitivity is the number of flood pixels that correctly classified as flood, whereas specificity is
the number of flood locations that correctly classified as non-flood locations (Chapi et al., 2017;
Shafizadeh-Moghadam et al., 2018; Khosravi et al., 2019). Accuracy refers to the number of flood
and non-flood locations correctly classified as, respectively, flood and non-flood. The lower the
Journal Pre-proof
MSE and RMSE metrics, the higher the performance of the model (Bui et al., 2018a). All
statistical metrics were computed based on true positive (TRP), true negative (TRN), false
positive (FAP), and false negative (FAN) scores. The metrics can be expressed as:
(49)
(50)
(51)
of
√ ∑ (52)
ro
where pv is the predictive value in the training or testing dataset, tv is the target value (actual)
-p
from the flood susceptibility models, and n is the total number of samples.
re
5.2 ROC curve and AUC analysis
lP
The ROC curve is a graphical tool used to assess the performance of the model (Fawcett, 2006;
Gorsevski et al., 2006). It is plotted with sensitivity (TP Rate) on the y-axis and 1-specificity (FP
na
Rate) on the x-axis (Hanley, 1989). A specific decision criterion can be extracted for each point
ur
on the ROC curve to predict the accuracy of the model (Shirzadi et al., 2018). Quantitatively, the
Jo
area under the ROC curve (AUC) is used to assessing model performance; the higher the value of
AUC (in the case of an accurate model, AUC close to 1), the higher the performance of the model
(Shirzadi et al., 2019).
5.3 Statistical tests (Friedman test and Wilcoxon sign rank test)
We used Friedman and Wilcoxon sign rank tests in this study to validate and compare the
performance of the flood models. The Freidman test, which is a non-parametric test introduced by
Friedman (19xx), is one of the most reliable tests for documenting differences among models
(Shirzadi et al., 2019). In our study, we assume the null hypothesis that there is no difference
Journal Pre-proof
between the two flood models and then calculate the p-value and chi-square (χ2) value. If the p-
value is smaller than α=0.05 (standard value) and the χ2 is higher than 3.841 (standard value), the
null hypothesis is rejected (Chen et al., 2019a) and therefore there is a significant difference
between the two models. However, the Freidman test cannot provide pairwise comparison of the
flood models; thus the Wilcoxon sign rank test is used (Khosravi et al., 2018). The Wilcoxon sign
rank test is based on the same null hypothesis as the Freidman test; however, two values (p and z)
are calculated for evaluation. If the p-value is smaller than α = 0.05 and the z value exceeds
of
critical values ranging from −1.96 to +1.96, the null hypothesis is rejected (Miraki et al., 2019)
ro
and there is a significant difference in a pairwise comparison of the models.
-p
5.4 Factor selection using One-R Attribute Evaluation method
re
The effectiveness of a flood susceptibility assessment depends significantly on the quality of the
lP
data used, especially the factors that affect flood occurrences in the selected area (Nguyen et al.,
2019b; Nohani et al., 2019). There may be some factors that are initially selected that are not
na
important for modeling flood susceptibility. Therefore, it is essential to evaluate the importance of
ur
each factor so that the most suitable factors can be chosen to best model flood susceptibility
(Pham et al., 2019a). We selected the One-R Attribute Evaluation method (ORAE) (Nguyen et al.,
Jo
2019a) for this study to evaluate the importance of each conditioning factor for flood
susceptibility modeling. ORAE helps to increase the quality of data used and improve the
performance of models by preventing redundancy, decreasing noise and the dimensionality of
input space, and dealing with over-fitting problems (Micheletti et al., 2014). This method ranks
the importance of conditioning factors by determining the statistical correlation between a set of
input variables and output variables (Kavitha et al., 2012). In this method, one rule (One-R) is
separately built for each input variable in the training dataset, and thereafter the rule with smallest
Journal Pre-proof
error metric is selected for independently sorting all variables according to their importance to
solve flood prediction problems (Nguyen et al., 2019a).
6. Result and analysis
6.1. Correlation between flood conditioning factors and flood locations based on SWARA weights
In this study, we used SWARA weights (SW) to determine which class of each conditioning
factor is most closely related to flood occurrence (Fig. 9). It is evident that the lower the slope
of
angle value, the lower the probability of flood incidence. The first class of slope angle (0°-0.5°)
ro
has the highest SWARA weight (0.4) compared to other classes. The trend of the weights in the
-p
elevation factor is similar to the slope angle trend – the lower the elevation, the higher the SW
re
value, and accordingly the lower the probability of flood occurrence. The highest and the lowest
weights were obtained for, respectively, the first (328–350 m) and the last (>4000 m) elevation
lP
classes. Concave (SW=0.46) and flat (SW=0.43) slopes have the highest-class weights and thus
na
are less important for flood occurrence in the study area than convex slopes. The SPI class of
2000–3000 has the highest SW (0.32) within the SPI group and thus is less susceptible to flood
ur
occurrence than other SPI classes. Similarly, as the SW of TWI increases, the probability of flood
Jo
occurrence decreases; the last class (6.96–11.5) has the highest weight (SW=0.08) within this
group. In the case of river density, the highest SW (0.37) was obtained for the first and second
classes of river density. Distance to river shows a significant relation to flood locations that is
similar to slope angle and elevation. The less the distance to the river network will be, the higher
the SW and thus the higher the susceptibility to flooding is. The first class (0–50 m) has SW of
0.59. The class of lithology most susceptible to flooding is the Triassic formation (SW=0.31).
Areas already covered by water (water bodies) are most susceptible to flooding (SW=0.75);
Journal Pre-proof
weights for other land-use classes are 0.15 (residential area) 0.06 (gardens), 0.02 (forest land),
0.01 (grassland), and 0.00 (farmland and barren land). Finally, the lower the rainfall is, the higher
the SW will be (0.40 for 183–333 mm), and therefore the lower the probability for flood
incidence.
6.2. The most important factors for flood modeling
We assessed the predictive power of each conditioning factor for flood occurrence using the
of
ORAE technique in 10-fold cross-validation on the training dataset. The Average Merit (AM) of
ro
ORAE was computed to determine the importance of the factors (Fig. 10). Slope angle with the
-p
highest value of average merit (AM=88.848) was the most important factor and also has the
re
highest predictive power for flood modeling. It is followed by distance to river (AM=87.050),
drainage density (AM=85.251), TWI (AM=80.935), elevation (AM=78.057), curvature

lP
(AM=75.899), SPI (AM=74.460), lithology (AM=56.115), rainfall (AM=55.036), and land use
na
(AM=51.798). The AM values show that all conditioning factors play a decisive role in flood
incidence.
ur
Jo
6.3 Application of the novel deep learning model
We designed and developed our novel deep learning model (DBPGA) in MATLAB R2018 and
ArcGIS 10.3. The model was trained (it learned) with the training dataset, similar to other smart
learning models. We selected 80% of all data for training (the modeling process) and the rest
(20%) for validation (Fig. 11). Fig. 11a and d show how flood (target) and non-flood (output)
values compare. The lower the distance between the target and output, the better the model was
Journal Pre-proof
successfully trained. The goodness of fit and the performance of the proposed model were
checked with MSE and RMSE metrics. Fig. 11b and e show these values for, respectively, the
training and testing datasets. The values of MSE and RMSE in the modeling process using the
training dataset are, respectively, 0.053 and 0.232 (Fig. 11a, d), and for the testing dataset 0.050
and 0.224 (Fig. 11b, e). Also, standard deviation and mean are reported for the training (0.00,
0.232) and testing (–0.02, 0.225) data sets (Fig. 11c, f).
of
ro
6.4 Development of the flood susceptibility map
-p
Our novel deep learning proposed model (DBPGA) learned and performed well based on the
re
training and testing datasets and its outperformed benchmark models. The next step, study area
converted to a CSV format, and then a Flood Susceptibility Index (FSI) for each pixel of the study
lP
area calculated. We next prepared a FSM using the classification method of natural breaks and
na
based on the FSIs of all pixels ( Chapi et al., 2017; Chen et al., 2018; Bui et al., 2019; Shirzadi et
al., 2019). We tested three well-known classification methods in the GIS: the natural breaks,
ur
quantile, and geometrical interval methods. We found that the natural breaks method performed
Jo
best. The geometrical interval method underestimated FSIs, and the quantile method placed most
areas far from the river network and also placed slopes in the high and very high susceptibility
classes.
The FSM prepared using the method of natural breaks includes five classes of flood
susceptibility: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility
(MS), high susceptibility (HS), and very high susceptibility (VHS) (Fig. 12). High-risk flood
Journal Pre-proof
inundation areas are readily discernible on this map. We enlarged two places on this map to
confirm graphically the performance of the model.
Fig. 13 shows the susceptibility classes generated by the deep learning model using the three
classification methods. Using the natural breaks classification method, we find that the VLS
covers the largest area (61.192%), followed by the VHS (24.499%), LS (6.981%), HS (3.825%),
and MS (3.503%) classes. The VHS class included the largest percentage of flood locations
(98.010%), followed by the HS class (1.990%). The very high percentages of flood locations in
of
the VHS class confirms the capability of the proposed model. The quantile method is the next best
ro
classification method.
-p
re
6.5. Performance evaluation of the proposed model
lP
We further evaluated the classification performance of our proposed deep learning model by
comparing it with some machine-learning, soft-computing benchmark models and also some
na
optimization algorithms that have been previously used for flood modeling in the Haraz
ur
watershed.
6.5.1 Parameter tuning

Jo
The proper selection of parameters, used in the modeling process, is a critical issue to obtain an
appropriate solution (Ronoud and Asadi, 2019). Table 3 shows the optimal parameters used in
evolutionary models. These parameters were set for our model by trial-and-error using the GA
algorithm.
6.5.2 Classification performance

Journal Pre-proof
We ran our deep learning model 20 times to document its classification performance. The average
classification accuracy, sensitivity, and specificity of our model and other classification models,
which are presented in Table 4, are based on the confusion matrix shown in Fig. 14. The best
topology obtained by the GA algorithm is 10 (number of inputs)-25 (number of neurons in the
hidden layer)-1 (output) (Table 3). The optimization algorithms are compared with the proposed
model using MSE, RMSE, SD, and Mean statistical metrics (Table 5). The results in Tables 3 and
4 allow us to highlight key observations and results.
of
(i) The DBPGA model has the best sensitivity (100%), indicating that the model correctly
ro
classified all 155 flood locations as the flood. It performed better than other all machine learning
algorithms. -p
re
(ii) The DBPGA model did not predict non-flood locations as well as other machine learning
lP
models. Its specificity of 87.500% indicates that the model correctly classified 87.5% of non-
flood locations as non-flood. Although this value is high, it is lower than the specificity of all
na
other machine learning models except the ADT model.

ur
(iii) The value of the accuracy metric of the DBPGA is the highest (93.589%) of the models with
which it is compared. It indicates that our model successfully predicted 92.308% of flood and
Jo
non-flood locations correctly.
(iv) MSE and RMSE errors of our model are lower than those of other optimization algorithms
and are reasonable and acceptable.
In summary, the DBPGA model outperformed and outclassed other optimization algorithms in
terms of sensitivity and accuracy measures because it has a robust topology.

Journal Pre-proof
6.5.3 ROC curves and AUC values
Figure 15 shows the ROC curves for the training (goodness-of-fit or performance) and testing
(prediction accuracy) datasets. This figure shows that our deep learning model has high
performance (AUC=0.988) and prediction accuracy (AUC=0.985). We conducted a literature
review for the study area and discovered that some research has previously been done on flood
susceptibility mapping using machine learning and optimization algorithms. The results of these
studies are shown in Table 6 and Fig. 16. AUC of the training dataset for the LR, LMT, BLR,
of
ADT, NBT, REPTree, ANFIS-BAT, ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA
ro
models are, respectively, 0.886, 0.967, 0.966, *, *, *, 0.946, 0.942, 0.948, 0.951, and 0.932, where
-p
the symbol ―*‖ indicates that values for ADT, NBT, and REPTree were not reported in previously
re
published work. The corresponding values of AUC for the testing dataset are 0.985, 0.885, 0.934,
0.936, 0.976, 0.974, 0.811, 0.944, 0.921, 0.939, 0.947, and 0.917. These data indicate that the
lP
DBPGA deep learning model has a higher performance and prediction accuracy than all other
na
models that have been used for flood modeling in the study area.
ur
Jo
We also assessed differences between the new model and benchmark models through statistical
inference. The Friedman test indicates that, at a 95% confidence level, there are differences in the
performance of the new model and all benchmark models (Table 7). We checked the pairwise
differences between the models using the Wilcoxon signed-rank test (Table 8). At the 95%
confidence level, the new model and each compared model performed differently, leading us to
reject the null hypothesis.

Journal Pre-proof
7. Discussion
Risks from flash floods may be increasing due to climate and land-use changes and population
increases; thus, there is the need for better flood susceptibility mapping. Many models, including
regression and rainfall-runoff models are limited by the lack of hydrological monitoring data.
Newer ―on-off‖ classification models, such as machine learning artificial intelligence algorithms,
show more promise with more modest data needs. Artificial intelligence models can more
accurately predict and map flood-prone areas by taking into account all factors controlling
of
flooding. In this study, we developed a new deep learning intelligence model, DBPGA, for flash
ro
flood susceptibility mapping in the Haraz watershed in northern Iran. We used 194 floods and 194
-p
non-flood locations to test ten conditioning factors by the ORAE technique. We also used the
re
SWARA model to determine spatial relationships between flooding and conditioning factors.
lP
The results of the SWARA model (Fig. 8) show that floods in the study area mainly occur on low-
angle slopes at lower elevations, in agreement with results of other studies ( Khosravi et al.,
na
2016a, 2018; Tien Bui et al., 2018a, 2019b). For example, Khosravi et al. (2018) used four
machine learning algorithms (LMT, REPT, NBT, and ADT) and 11 conditioning factors tested by
ur
the IGR technique to model flooding in Haraz They concluded that slope is the most important
Jo
factor in flood susceptibility mapping in the Haraz area. They further argued that there with less
time for infiltration of water on steeper ground, leading to increase on these slopes. Tien Bui et al.
(2018b) used two novel hybrid algorithms (ANFIS-ICA and ANFIS-FA) to predict the spatial
distribution of floods in the Haraz watershed. They performed a sensitivity analysis to check the
importance of the ten flood conditioning factors they tested and found that all 10 factors were
significant for predicting flood occurrence in the study area.

Journal Pre-proof
Flat and concave slope forms are more prone to flooding than convex slopes. Flat slope forms
typically lower in elevation and thus are more likely to receive and collect overbank flows and
runoff (Tien Bui et al., 2018a). Concave slope affects flooding by converging flows toward flat
ground. They hold more water within a smaller area during a storm or a period of snowmelt,
become more rapidly saturated than convex slopes, and thus are more prone to flooding. These
results are consistent with those of Pradhan (2010), Kazakis et al. (2015) and Cao et al. (2016)
who similarly argue that curvature affects the amount of surface runoff and inﬁltration.
of
Distance to river is another important factor for flood modeling, especially in mountainous areas.
ro
The SWARA analysis reveals that lands bordering rivers (flood plains) are at higher risk of
-p
flooding because they are vulnerable to overbank flows. Indeed, distance to river is considered to
re
be one of the most important conditioning factors in most previous flood susceptibility
lP
assessments ( Youssef et al., 2016; Tien Bui et al., 2018b; Ahmadlou et al., 2019; Tien Bui et al.,
2019b). For example, Pham et al. (2020) used Credal Decision Tree (CDT) as a base classifier
na
along with four ensemble models including AdaBoostM1, Bagging, Dagging, and MultiBoostAB
for flood modeling in Markazi Province, Iran. They found that distance to river was the most
ur
important factor for flood incidence in their study area.

Jo
TWI provides a measure of water accumulation on a surface (Tien Bui et al., 2018a) through its
relations to soil moisture and topography. The SWARA analysis showed that the probability of
flooding increases at higher TWI values. In high TWI pixels, infiltration is low; when it rains,
runoff selectively pools in these areas. Rainfall is a prominent factor involved in flooding;
although some researchers do not consider it as one of the conditioning factors for flood modeling
(Tehrany et al., 2015a; Youssef et al., 2016). In this study, we considered rainfall as a factor,
although its role of rainfall was not what we expected. We initially thought that the probability of
Journal Pre-proof
flooding would increase with increasing rainfall, but this proved not to be the case. The highest
rainfall is mainly at higher elevations in the mountains in the Haraz watershed, but most flood
locations are not in these areas. Floods, of course, will not happen without rainfall, but the
relationship between rain and flooding is more complex than one that simply equates the amount
of rain with the severity of flooding. Although many researchers consider rain as the most
important factor in the occurrence of floods, our results indicate that this is not always the case.
Khosravi et al. (2019) considered rainfall along with other factors in their modeling and
of
assessment of flooding in one of China‘s most flood-prone areas. Using the IGR technique, they
ro
found that rainfall ranked last, below NDVI and lithology, in explaining flood incidence in their
-p
study area. Additionally, Wang et al. (2019a) used an ensemble model (IRN-DEMATEL-ANP),
re
which is a combination of interval rough numbers (IRN), decision-making trial and evaluation
laboratory (DEMATEL), analytic network process (ANP), and weighted linear combination
lP
(WLC) methods, to evaluate flood susceptibility in Shangyou County, China. Rainfall ranked
na
eighth among 11 flood conditioning factors, far lower than the first and second factors, (elevation
and slope angle). On the other hand, Samanta et al. (2018) concluded that rainfall and TWI were
ur
the most important flood conditioning factors in a study that used the frequency ratio technique in
Jo
the Subarnarekha River basin, India.
The SWARA analysis revealed that areas of Triassic and Quaternary formations are more
susceptible flooding than areas with other lithologies. These formations have lower permeability
than other lithological units and hence during heavy rainfall or a period of rapid snowmelt, and
hence will become saturated sooner and more easily transfer runoff towards rivers. SWARA
showed that most residential and agricultural areas are located in areas with a high potential for
flooding. Residential areas in the Haraz watershed are susceptible to flooding because they have
Journal Pre-proof
primarily hardened impermeable surfaces. Most agriculture in the northern part of Iran and
notably in the Haraz watershed is rice cultivation. The water table in these areas is at or very close
to the ground surface, and therefore many of these areas are vulnerable to flooding. Although
lithology and land use proved to be important factors in this study, they ranked in eighth and tenth
place, respectively. Wang et al. (2019a) found that, of the 11 flood conditioning factors they
considered, land use ranked ninth and lithology tenth. In contrast, Khosravi et al. (2019)
concluded that lithology and land use ranked second and third among 12 flood conditioning
of
factors, respectively, after NDVI, which ranked first.
ro
We used the ORAE technique to prioritize conditioning factors. The results accord with the
-p
findings of the SWARA analysis, in that slope and distance to river are the most critical factors
re
for flood modeling. These factors are essential in most catchments but are accentuated in
mountainous watersheds.
lP
We developed a new deep learning model with a structure optimized by a BP algorithm. After
na
training the model, we produced a flood susceptibility map. We then compared our map with
some earlier maps generated by other machine learning intelligence models, including LR, LMT,
ur
BLR, ADT, NBT, and REPT and also some optimization algorithms, including ANFIS-BAT,
Jo
ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA. Our new proposed model has the highest
goodness of fit and performance, and based on the training and testing datasets, is more powerful
and outperforms the other models.
Deep belief network has become a promising approach in machine learning because of the
advantages it offers over other methods, including quick inference and the ability to encode the
high-order structures of a network. DBN uses a hierarchical structure with several Restricted
Boltzmann Machines that operate through a greedy layer-wise learning algorithm, one layer at a
Journal Pre-proof
time. Finally, the stochastic gradient descent allows the user to fine-tune the entire network
according to the supervised training criteria. The unsupervised RBM-based pre-training step
initializes the network using only unlabeled data. Network initialization has proven to be a good
starting point for the next supervised fine-tuning step and significantly reduces the risk of being
trapped in the local optima (Kustikova and Druzhkov, 2014).
The architecture of a neural network affects the learning capacity and generalizability of the
network. DBPGA uses the Genetic Algorithm to find the optimal or near-optimal topology of the
of
DBN. GA searches within the very large solution space of the network topology via genetic
ro
crossover and mutation operators and optimizes the DBN topology. The DBN with RBM-based
-p
pre-training and optimization of the network was optimized by GA.
re
There are, however, still many problems with flood modeling. The uncertainties arising from the
lP
input data as well as the selection of the appropriate model to fit the training data are noteworthy.
Nevertheless, by reducing the uncertainties, a suitable model for more accurate flood prediction
na
can be obtained. Thus, it is possible to select more conditioning factors than can be rasterized and
then use one of the factor selection methods to select the factors that are most important. Factors
ur
that have less influence on modeling or cause over-fitting and noise problems can be removed;
Jo
thus, to some extent, increase the accuracy of the model prediction.
On the other hand, the results of a given model may differ from region to region, and even within
a given region. Therefore, the best model or method should be chosen using a trial-and-error
method to achieve the most significant predictive power and save time and money. A specific
standard for flood modeling that can be generalized to all watersheds or regions is not possible
due to the differences in environmental factors, as well as the different structures of the models.
We are far from creating a standard framework for machine learning modeling that is accepted
Journal Pre-proof
and used by all researchers in all catchments in the same way as hydrological physical-based
models. We suggest that the path forward is to seek ways to reduce uncertainties and generate
flood susceptibility maps with high accuracy. Efforts to do this will rely on the development and
combination of flood studies with GIS and data mining tools to create a powerful technique that
will increase the power prediction of flood models.
8. Conclusion
In this study, we tested a new robust deep learning model (DBPGA) to spatially predict flash
of
floods in the Haraz watershed, northern Iran. We used 194 floods and 194 non-flood locations to
ro
construct training and testing databases for, respectively, the modeling and evaluation processes.
-p
Eleven initially selected conditioning factors were assessed by the One-R Attribute Evaluation
re
(ORAE) method in the modeling stage by training the dataset. Our model successfully learned
from iterative inputs, and its applicability was confirmed by statistical measures, to recognize and
lP
detect flood-prone areas in the study area. The most important findings can be summarized as
na
follows: (i) Although all 11 flood conditioning factors affect flood occurrence, the most important
factor is the slope angle. It is followed by distance-to-river and river density factors, reflecting the
ur
fact that the Haraz watershed is mountainous with high steep slopes that transfer water towards
Jo
rivers, resulting in overbank flooding.
(ii) The SWARA technique indicated that flash floods preferentially occur on flat slopes at low
elevations, near rivers, and in areas with high drainage density, high TWI values, and lower
rainfall.
(iii) DBPGA shows promise for use in other regions prone to flash floods due to its functionality
and high performance.

Journal Pre-proof
(iv) The goodness-of-fit and prediction accuracy of the new proposed model exceed those of other
machine learning models (LR, LMT, BLR, ADT, NBT, and REPTree), and optimization
algorithms (ANFIS-BAT, ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA) that have
previously been used in the Haraz watershed.
(v) By defining a proper topology in our new proposed model, we have made a contribution
towards building a powerful flash flood susceptibility mapping tool.
of
Acknowledgment
ro
This research was financial supported by the Iran National Science Foundation (INSF) through
research project No. 96004000.

-p
re
Declaration of interests
lP
The authors declare that they have no known competing financial interests or personal relationships that
na
could have appeared to influence the work reported in this paper.

ur
Jo
References
Ahmadizar, F., Soltanian, K., AkhlaghianTab, F. and Tsoulos, I., 2015, Artificial neural network
development by means of a novel combination of grammatical evolution and genetic
algorithm. Engineering Applications of Artificial Intelligence 39, 1-13.
Ahmadlou, M., Karimi, M., Alizadeh, S., Shirzadi, A., Parvinnejhad, D., Shahabi, H. and Panahi,
M., 2019, Flood susceptibility assessment using integration of adaptive network-based
fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT
algorithms (BA). Geocarto International 34 (11), 1252-1272.

Journal Pre-proof
Ahmed, K.R. and Akter, S., 2017, Analysis of landcover change in southwest Bengal delta due to
floods by NDVI, NDWI and K-means cluster with Landsat multi-spectral surface
reflectance satellite data. Remote Sensing Applications: Society and Environment 8, 168-
181.
Arnell, N.W. and Gosling, S.N., 2016, The impacts of climate change on river flood risk at the
global scale. Climatic Change 134 (3), 387-401.
Avali, V.R., Cooper, G.F. and Gopalakrishnan, V., Year, Application of bayesian logistic
of
regression to mining biomedical data. AMIA Annual Symposium Proceedings, 266.
ro
Ball, J.E., Anderson, D.T. and Chan, C.S., 2017, Comprehensive survey of deep learning in
-p
remote sensing: theories, tools, and challenges for the community. Journal of Applied
re
Remote Sensing 11 (4), 042609.
Bengio, Y., 2009, Learning deep architectures for AI. Foundations and trends® in Machine
lP
Learning 2 (1), 1-127.

na
Beven, K.J. and Kirkby, M.J., 1979, A physically based, variable contributing area model of basin
hydrology/Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin

ur
versant. Hydrological Sciences Journal 24 (1), 43-69.

Jo
Beven, K.J., 2011, Rainfall-runoff modelling: the primer. John Wiley & Sons
Breiman, L., Friedman, J., Olshen, R. and Stone, C., 1984, Classification and regression trees–crc
press. Boca Raton, Florida.
Brunner, G.W., 1995, HEC-RAS River Analysis System. Hydraulic Reference Manual. Version
1.0, Report, Hydrologic Engineering Center Davis CA

Journal Pre-proof
Butler, D., Kokkalidou, A. and Makropoulos, C.K., 2006, Supporting the siting of new urban
developments for integrated urban water resource management. Integrated urban water
resources management. Springer19-34.
Cao, C., Xu, P., Wang, Y., Chen, J., Zheng, L. and Niu, C., 2016, Flash flood hazard
susceptibility mapping using frequency ratio and statistical index methods in coalmine
subsidence areas. Sustainability 8 (9), 948.
Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T. and Khosravi, K., 2017,
of
A novel hybrid artificial intelligence approach for flood susceptibility assessment.
ro
Environmental modelling & software 95, 229-245.
-p
Charlton, R., Fealy, R., Moore, S., Sweeney, J. and Murphy, C., 2006, Assessing the Impact of
re
Climate Change on Water Supply and Flood Hazard in Ireland Using Statistical
Downscaling and Hydrological Modelling Techniques. Climatic Change 74 (4), 475-491.

lP
Chen, W., Shahabi, H., Shirzadi, A., Li, T., Guo, C., Hong, H., Li, W., Pan, D., Hui, J. and Ma,
na
M., 2018, A novel ensemble approach of bivariate statistical-based logistic model tree
classifier for landslide susceptibility assessment. Geocarto International 33 (12), 1398-

ur
1420.
Jo
Chen, W., Pradhan, B., Li, S., Shahabi, H., Rizeei, H.M., Hou, E. and Wang, S., 2019a, Novel
hybrid integration approach of bagging-based fisher‘s linear discriminant function for
groundwater potential analysis. Natural Resources Research, 1-20.
Chen, W., Zhao, X., Shahabi, H., Shirzadi, A., Khosravi, K., Chai, H., Zhang, S., Zhang, L., Ma,
J. and Chen, Y., 2019b, Spatial prediction of landslide susceptibility by combining
evidential belief function, logistic regression and logistic model tree. Geocarto
International (just-accepted), 1-25.

Journal Pre-proof
Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., Wang, X., Bian, H., Zhang, S. and
Pradhan, B., 2020, Modeling flood susceptibility using data-driven approaches of naïve
bayes tree, alternating decision tree, and random forest methods. Science of The Total
Environment 701, 134979.
Costache, R. and Bui, D.T., 2020, Identification of areas prone to flash-flood phenomena using
multiple-criteria decision-making, bivariate statistics, machine learning and their
ensembles. Science of The Total Environment 712, 136492.
of
Derbyshire, E., Hails, J.R. and Gregory, K.J., 2013, Geomorphological processes: studies in
ro
physical geography. Elsevier
-p
Ding, A., Zhang, Q., Zhou, X. and Dai, B., Year, Automatic recognition of landslide based on
re
CNN and texture change detection. 2016 31st Youth Academic Annual Conference of
Chinese Association of Automation (YAC), 444-448.

lP
Dreyfus, S., 1973, The computational solution of optimal control problems with time lag. IEEE
na
Transactions on Automatic Control 18 (4), 383-385.
Elmore, A.J., Julian, J.P., Guinn, S.M. and Fitzpatrick, M.C., 2013, Potential stream density in
ur
Mid-Atlantic US watersheds. PLoS One 8 (8), e74819.

Jo
Fawcett, T., 2006, An introduction to ROC analysis. Pattern recognition letters 27 (8), 861-874.
Fernández, D. and Lutz, M., 2010, Urban flood hazard zoning in Tucumán Province, Argentina,
using GIS and multicriteria decision analysis. Engineering Geology 111 (1-4), 90-98.
Fraser, N. and Schumer, R., 2012, Low stream density watersheds produce flashier floods than
high stream density watersheds in ephemeral streams across the southwestern United
States. AGUFM 2012, H41F-1240.
Freund, Y. and Mason, L., Year, The alternating decision tree learning algorithm. icml, 124-133.
Journal Pre-proof
Galathiya, A., Ganatra, A. and Bhensdadia, C., 2012, Improved decision tree induction algorithm
with feature selection, cross validation, model complexity and reduced error pruning.
International Journal of Computer Science and Information Technologies 3 (2), 3427-
3431.
Gao, B.-C., 1996, NDWI—A normalized difference water index for remote sensing of vegetation
liquid water from space. Remote sensing of environment 58 (3), 257-266.
Ghorbanzadeh, O., Blaschke, T., Gholamnia, K., Meena, S.R., Tiede, D. and Aryal, J., 2019,
of
Evaluation of different machine learning methods and deep-learning convolutional neural
ro
networks for landslide detection. Remote Sensing 11 (2), 196.
-p
Ghosh, J.K., Delampady, M. and Samanta, T., 2007, An introduction to Bayesian analysis: theory
re
and methods. Springer Science & Business Media
Gorsevski, P.V., Gessler, P.E., Foltz, R.B. and Elliot, W.J., 2006, Spatial prediction of landslide
lP
hazard using logistic regression and ROC analysis. Transactions in GIS 10 (3), 395-415.
na
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S. and Lew, M.S., 2015, Deep learning for visual
understanding: A review. Neurocomputing.

ur
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S. and Lew, M.S., 2016, Deep learning for visual
Jo
understanding: A review. Neurocomputing 187, 27-48.
Hanley, J.A., 1989, Receiver operating characteristic (ROC) methodology: the state of the art.
Crit Rev Diagn Imaging 29 (3), 307-335.
Hinton, G., 2010, A practical guide to training restricted Boltzmann machines. Momentum 9 (1),
926.
Hinton, G.E., Osindero, S. and Teh, Y.-W., 2006, A fast learning algorithm for deep belief nets.
Neural Comput 18 (7), 1527-1554.

Journal Pre-proof
Hinton, G.E., 2012, A practical guide to training restricted Boltzmann machines. Neural
networks: Tricks of the trade. Springer599-619.
Hong, H., Pradhan, B., Xu, C. and Bui, D.T., 2015, Spatial prediction of landslide hazard at the
Yihuang area (China) using two-class kernel logistic regression, alternating decision tree
and support vector machines. CATENA 133, 266-281.
Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.-X., Chen, W., Kougias, I. and
Kazakis, N., 2018, Flood susceptibility assessment in Hengfeng area coupling adaptive
of
neuro-fuzzy inference system with genetic algorithm and differential evolution. Science of
ro
The Total Environment 621, 1124-1141.
-p
Huang, L. and Xiang, L.-y., 2018, Method for Meteorological Early Warning of Precipitation-
re
Induced Landslides Based on Deep Neural Network. Neural Processing Letters 48 (2),
1243-1260.
lP
Huppert, H.E. and Sparks, R.S.J., 2006, Extreme natural hazards: population growth,
na
globalization and environmental change. Philosophical Transactions of the Royal Society
A: Mathematical, Physical and Engineering Sciences 364 (1845), 1875-1888.

ur
Kavitha, A., Kavitha, R. and Viji Gripsy, J., 2012, Empirical Evaluation of Feature Selection
Jo
Technique in Educational Data Mining.. ARPN Journal of Science and Technology, 2 11.
Kazakis, N., Kougias, I. and Patsialis, T., 2015, Assessment of flood hazard areas at a regional
scale using an index-based approach and Analytical Hierarchy Process: Application in
Rhodope–Evros region, Greece. Science of the Total Environment 538, 555-563.
Keyvanrad, M.A. and Homayounpour, M.M., 2015, Deep Belief Network Training Improvement
Using Elite Samples Minimizing Free Energy. International Journal of Pattern
Recognition and Artificial Intelligence 29 (05), 1551006.

Journal Pre-proof
Khosravi, K., Nohani, E., Maroufinia, E. and Pourghasemi, H.R., 2016a, A GIS-based flood
susceptibility assessment and its mapping in Iran: a comparison between frequency ratio
and weights-of-evidence bivariate statistical models with multi-criteria decision-making
technique. Natural Hazards 83 (2), 947-987.
Khosravi, K., Pourghasemi, H.R., Chapi, K. and Bahri, M., 2016b, Flash flood susceptibility
analysis and its mapping using different bivariate models in Iran: a comparison between
Shannon‘s entropy, statistical index, and weighting factor models. Environmental
of
monitoring and assessment 188 (12), 656.
ro
Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I. and Bui,
-p
D.T., 2018, A comparative assessment of decision trees algorithms for flash flood
re
susceptibility modeling at Haraz watershed, northern Iran. Science of the Total
Environment 627, 744-755.

lP
Khosravi, K., Shahabi, H., Pham, B.T., Adamowski, J., Shirzadi, A., Pradhan, B., Dou, J., Ly, H.-
na
B., Gróf, G., Ho, H.L., Hong, H., Chapi, K. and Prakash, I., 2019, A comparative
assessment of flood susceptibility modeling using Multi-Criteria Decision-Making

ur
Analysis and Machine Learning Methods. Journal of Hydrology 573, 311-323.

Jo
Kia, M.B., Pirasteh, S., Pradhan, B., Mahmud, A.R., Sulaiman, W.N.A. and Moradi, A., 2012, An
artificial neural network model for flood simulation using GIS: Johor River Basin,
Malaysia. Environmental Earth Sciences 67 (1), 251-264.
Kim, B., Sanders, B.F., Famiglietti, J.S. and Guinot, V., 2015, Urban flood modeling with porous
shallow-water equations: A case study of model errors in the presence of anisotropic
porosity. Journal of Hydrology 523, 680-692.

Journal Pre-proof
Kohavi, R., Year, Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Kdd,
202-207.
Kron, W., 2002, Keynote lecture: Flood risk= hazard× exposure× vulnerability. Flood defence,
82-97.
Kustikova, V. and Druzhkov, P., 2014, A survey of deep learning methods and software for image
classification and object detection. OGRW2014 5.
Landwehr, N., Hall, M. and Frank, E., 2005, Logistic model trees. Machine Learning 59 (1-2),
of
161-205.
ro
Larochelle, H., Erhan, D., Courville, A., Bergstra, J. and Bengio, Y., Year, An empirical
-p
evaluation of deep architectures on problems with many factors of variation. Proceedings
re
of the 24th international conference on Machine learning, 473-480.
Le Roux, N. and Bengio, Y., 2008, Representational power of restricted Boltzmann machines and
lP
deep belief networks. Neural Comput 20 (6), 1631-1649.

na
Lohani, A.K., Goel, N. and Bhatia, K., 2014, Improving real time flood forecasting using fuzzy
inference system. Journal of hydrology 509, 25-41.

ur
Lopes, N. and Ribeiro, B., 2015, Machine Learning for Adaptive Many-core Machines: A
Jo
Practical Approach. Springer
Manfreda, S., Di Leo, M. and Sole, A., 2011, Detection of flood-prone areas using digital
elevation models. Journal of Hydrologic Engineering 16 (10), 781-790.
Mansourypoor, F. and Asadi, S., 2017, Development of a Reinforcement Learning-based
Evolutionary Fuzzy Rule-Based System for diabetes diagnosis. Computers in Biology and
Medicine 91, 337-352.

Journal Pre-proof
Marchi, L., Borga, M., Preciso, E. and Gaume, E., 2010, Characterisation of selected extreme
flash floods in Europe and implications for flood risk management. Journal of Hydrology
394 (1-2), 118-133.
Marcus, G., 2018, Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.
Mehmanpazir, F. and Asadi, S., 2017, Development of an evolutionary fuzzy expert system for
estimating future behavior of stock price. Journal of Industrial Engineering International
13 (1), 29-46.
of
Mehrabian, A.R. and Lucas, C., 2006, A novel numerical optimization algorithm inspired from
ro
weed colonization. Ecological informatics 1 (4), 355-366.
-p
Mekanik, F., Imteaz, M., Gato-Trinidad, S. and Elmahdi, A., 2013, Multiple regression and
re
Artificial Neural Network for long-term rainfall forecasting using large scale climate
modes. Journal of Hydrology 503, 11-21.

lP
Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, M. and
na
Kanevski, M., 2014, Machine learning feature selection methods for landslide
susceptibility mapping. Mathematical Geosciences 46 (1), 33-57.

ur
Miraki, S., Zanganeh, S.H., Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H. and Pham, B.T.,
Jo
2019, Mapping groundwater potential using a novel hybrid intelligence approach. Water
resources management 33 (1), 281-302.
Moore, I.D. and Wilson, J.P., 1992, Length-slope factors for the Revised Universal Soil Loss
Equation: Simplified method of estimation. Journal of soil and water conservation 47 (5),
423-428.
Journal Pre-proof
Mosavi, A., Rabczuk, T. and Varkonyi-Koczy, A.R., Year, Reviewing the novel machine learning
tools for materials design. International Conference on Global Research and Education,
50-58.
Mosavi, A., Ozturk, P. and Chau, K.-w., 2018, Flood prediction using machine learning models:
Literature review. Water 10 (11), 1536.
Mousavi, S.Z., Kavian, A., Soleimani, K., Mousavi, S.R. and Shirzadi, A., 2011, GIS-based
spatial prediction of landslide susceptibility using logistic regression model. Geomatics,
of
Natural Hazards and Risk 2 (1), 33-50.
ro
Nayak, P., Sudheer, K., Rangan, D. and Ramasastri, K., 2005, Short‐term flood forecasting with a
-p
neurofuzzy model. Water Resources Research 41 (4).
re
Nguyen, P.T., Tuyen, T.T., Shirzadi, A., Pham, B.T., Shahabi, H., Omidvar, E., Amini, A.,
Entezami, H., Prakash, I., Phong, T.V., Vu, T.B., Thanh, T., Saro, L. and Bui, D.T.,
lP
2019a, Development of a Novel Hybrid Intelligence Approach for Landslide Spatial

na
Prediction. Applied Sciences 9 (14), 2824.
Nguyen, V.V., Pham, B.T., Vu, B.T., Prakash, I., Jha, S., Shahabi, H., Shirzadi, A., Ba, D.N.,
ur
Kumar, R. and Chatterjee, J.M., 2019b, Hybrid machine learning approaches for landslide
Jo
susceptibility modeling. Forests 10 (2), 157.
Nielsen, M.A., 2015, Neural networks and deep learning. Determination press San Francisco, CA,
USA:
Nohani, E., Moharrami, M., Sharafi, S., Khosravi, K., Pradhan, B., Pham, B.T., Lee, S. and
Melesse, A.M., 2019, Landslide Susceptibility Mapping Using Different GIS-Based
Bivariate Models. Water 11 (7), 1402.

Journal Pre-proof
Organization, W.M., 1994, Guide to hydrological practices. Secretariat of the World
Meteorological Organization
Palm, R.B., 2012, Prediction as a candidate for learning deep hierarchical models of data.
Technical University of Denmark, Palm 25.
Pham, B.T., Bui, D.T., Prakash, I. and Dholakia, M., 2017, Hybrid integration of Multilayer
Perceptron Neural Networks and machine learning ensembles for landslide susceptibility
assessment at Himalayan area (India) using GIS. Catena 149, 52-63.
of
Pham, B.T., Shirzadi, A., Bui, D.T., Prakash, I. and Dholakia, M., 2018, A hybrid machine
ro
learning ensemble approach based on a radial basis function neural network and rotation
-p
forest for landslide susceptibility modeling: A case study in the Himalayan area, India.
re
International Journal of Sediment Research 33 (2), 157-170.
Pham, B.T., Prakash, I., Dou, J., Singh, S.K., Trinh, P.T., Tran, H.T., Le, T.M., Van Phong, T.,
lP
Khoi, D.K. and Shirzadi, A., 2019a, A novel hybrid approach of landslide susceptibility
na
modelling using rotation forest ensemble and different base classifiers. Geocarto
International, 1-25.
ur
Pham, B.T., Prakash, I., Singh, S.K., Shirzadi, A., Shahabi, H. and Bui, D.T., 2019b, Landslide
Jo
susceptibility modeling using Reduced Error Pruning Trees and different ensemble
techniques: Hybrid machine learning approaches. CATENA 175, 203-218.
Pham, B.T., Avand, M., Janizadeh, S., Phong, T.V., Al-Ansari, N., Ho, L.S., Das, S., Le, H.V.,
Amini, A. and Bozchaloei, S.K., 2020, GIS based hybrid computational approaches for
flash flood susceptibility assessment. Water 12 (3), 683.

Journal Pre-proof
Poudyal, C.P., Chang, C., Oh, H.-J. and Lee, S., 2010, Landslide susceptibility maps comparing
frequency ratio and artificial neural networks: a case study from the Nepal Himalaya.
Environmental Earth Sciences 61 (5), 1049-1064.
Pradhan, B., 2010, Flood susceptible mapping and risk area delineation using logistic regression,
GIS and remote sensing. Journal of Spatial Hydrology 9 (2).
Quinlan, J., 1993, C4. 5: Programs for machine learning. Morgan Kaufmann, San Francisco. C4.
5: Programs for machine learning. Morgan Kaufmann, San Francisco., -.
of
Quinlan, J.R., 1986, Induction of decision trees. Machine learning 1 (1), 81-106.
ro
Quinlan, J.R., 1987, Simplifying decision trees. International journal of man-machine studies 27
(3), 221-234.
-p
re
Rahmati, O., Pourghasemi, H.R. and Zeinivand, H., 2016, Flood susceptibility mapping using
frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto
lP
International 31 (1), 42-70.

na
Reynolds, R.G., Year, An introduction to cultural algorithms. Proceedings of the third annual
conference on evolutionary programming, 131-139.

ur
Ronoud, S. and Asadi, S., 2019, An evolutionary deep belief network extreme learning-based for
Jo
breast cancer diagnosis. Soft Computing 23 (24), 13139-13159.
Rouse Jr, J., Haas, R., Deering, D., Schell, J. and Harlan, J., 1974, Monitoring the Vernal
Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation.[Great
Plains Corridor].
Samanta, R.K., Bhunia, G.S., Shit, P.K. and Pourghasemi, H.R., 2018, Flood susceptibility
mapping using geospatial frequency ratio technique: a case study of Subarnarekha River
Basin, India. Modeling Earth Systems and Environment 4 (1), 395-408.

Journal Pre-proof
Santos, P.P. and Reis, E., 2018, Assessment of stream flood susceptibility: a cross‐analysis
between model results and flood losses. Journal of Flood Risk Management 11, S1038-
S1050.
Schillaci, C., Acutis, M., Lombardo, L., Lipani, A., Fantappiè, M., Märker, M. and Saia, S., 2017,
Spatio-temporal topsoil organic carbon mapping of a semi-arid Mediterranean region: The
role of land use, soil texture, topographic indices and the influence of remote sensing data
to modelling. Science of The Total Environment 601-602, 821-832.
of
Shafizadeh-Moghadam, H., Valavi, R., Shahabi, H., Chapi, K. and Shirzadi, A., 2018, Novel
ro
forecasting approaches using combination of machine learning and statistical models for
-p
flood susceptibility mapping. J Environ Manage 217, 1-11.
re
Shahabi, H., Shirzadi, A., Ghaderi, K., Omidvar, E., Al-Ansari, N., Clague, J.J., Geertsema, M.,
Khosravi, K., Amini, A. and Bahrami, S., 2020, Flood detection and susceptibility
lP
mapping using sentinel-1 remote sensing data and a machine learning approach: Hybrid
na
intelligence of bagging ensemble based on k-nearest neighbor classifier. Remote Sensing
12 (2), 266.
ur
Shen, F., Chao, J. and Zhao, J., 2015, Forecasting exchange rate using deep belief networks and
Jo
conjugate gradient method. Neurocomputing 167, 243-253.
Shirzadi, A., Saro, L., Joo, O.H. and Chapi, K., 2012, A GIS-based logistic regression model in
rock-fall susceptibility mapping along a mountainous road: Salavat Abad case study,
Kurdistan, Iran. Natural hazards 64 (2), 1639-1656.
Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A., Shahabi, H. and
Revhaug, I., 2017, Shallow landslide susceptibility assessment using a novel hybrid
intelligence approach. Environmental Earth Sciences 76 (2), 60.

Journal Pre-proof
Shirzadi, A., Soliamani, K., Habibnejhad, M., Kavian, A., Chapi, K., Shahabi, H., Chen, W.,
Khosravi, K., Thai Pham, B., Pradhan, B., Ahmad, A., Bin Ahmad, B. and Tien Bui, D.,
2018, Novel GIS Based Machine Learning Algorithms for Shallow Landslide
Susceptibility Mapping. Sensors 18 (11), 3777.
Shirzadi, A., Solaimani, K., Roshan, M.H., Kavian, A., Chapi, K., Shahabi, H., Keesstra, S.,
Ahmad, B.B. and Bui, D.T., 2019, Uncertainties of prediction accuracy in shallow
landslide modeling: Sample size and raster resolution. CATENA 178, 172-188.
of
Srinivasan, D.B. and Mekala, P., 2014, Mining social networking data for classification using
ro
reptree. International Journal of Advance Research in Computer Science and Management
Studies 2 (10).
-p
re
Srivastava, S., Sahana, S.K., Pant, D. and Mahanti, P., 2015, Hbrid Microscopic Discrete
Evolutionary Model for Traffic Signal Optimization. Journal of Next Generation

lP
Information Technology 6 (2), 1.

na
Srivastava, S. and Sahana, S.K., 2017, Nested hybrid evolutionary model for traffic signal
optimization. Applied Intelligence 46 (1), 113-123.

ur
Taheri, K., Shahabi, H., Chapi, K., Shirzadi, A., Gutiérrez, F. and Khosravi, K., 2019, Sinkhole
Jo
susceptibility mapping: A comparison between Bayes‐based machine learning algorithms.
Land Degradation & Development 30 (7), 730-745.
Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2013, Spatial prediction of flood susceptible areas
using rule based decision tree (DT) and a novel ensemble bivariate and multivariate
statistical models in GIS. Journal of Hydrology 504, 69-79.

Journal Pre-proof
Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2014, Flood susceptibility mapping using a novel
ensemble weights-of-evidence and support vector machine models in GIS. Journal of
hydrology 512, 332-343.
Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2015a, Flood susceptibility analysis and its
verification using a novel ensemble support vector machine and frequency ratio method.
Stochastic Environmental Research and Risk Assessment 29 (4), 1149-1165.
Tehrany, M.S., Pradhan, B., Mansor, S. and Ahmad, N., 2015b, Flood susceptibility assessment
of
using GIS-based support vector machine model with different kernel types. Catena 125,
ro
91-101.
-p
Termeh, S.V.R., Kornejady, A., Pourghasemi, H.R. and Keesstra, S., 2018, Flood susceptibility
re
mapping using novel ensembles of adaptive neuro fuzzy inference system and
metaheuristic algorithms. Science of the Total Environment 615, 438-451.

lP
Tieleman, T., Year, Training restricted Boltzmann machines using approximations to the
na
likelihood gradient. Proceedings of the 25th international conference on Machine learning,
1064-1071.
ur
Tieleman, T. and Hinton, G., Year, Using fast weights to improve persistent contrastive
Jo
divergence. Proceedings of the 26th Annual International Conference on Machine
Learning, 1033-1040.
Tien Bui, D., Pradhan, B., Lofman, O. and Revhaug, I., 2012, Landslide susceptibility assessment
in vietnam using support vector machines, decision tree, and Naive Bayes Models.
Mathematical problems in Engineering, 974638.
Tien Bui, D., Pradhan, B., Nampak, H., Bui, Q.-T., Tran, Q.-A. and Nguyen, Q.-P., 2016a, Hybrid
artificial intelligence approach based on neural fuzzy inference model and metaheuristic
Journal Pre-proof
optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area
using GIS. Journal of Hydrology 540, 317-330.
Tien Bui, D., Tuan, T.A., Klempe, H., Pradhan, B. and Revhaug, I., 2016b, Spatial prediction
models for shallow landslide hazards: a comparative assessment of the efficacy of support
vector machines, artificial neural networks, kernel logistic regression, and logistic model
tree. Landslides 13 (2), 361-378.
Tien Bui, D. and Hoang, N.-D., 2017, A Bayesian framework based on a Gaussian mixture model
of
and radial-basis-function Fisher discriminant analysis (BayGmmKda V1. 1) for spatial
ro
prediction of floods. Geoscientific Model Development 10 (9), 3391-3409.
-p
Tien Bui, D., Khosravi, K., Li, S., Shahabi, H., Panahi, M., Singh, V.P., Chapi, K., Shirzadi, A.,
re
Panahi, S. and Chen, W., 2018a, New hybrids of anfis with several optimization
algorithms for flood susceptibility modeling. Water 10 (9), 1210.

lP
Tien Bui, D., Panahi, M., Shahabi, H., Singh, V.P., Shirzadi, A., Chapi, K., Khosravi, K., Chen,
na
W., Panahi, S. and Li, S., 2018b, Novel hybrid evolutionary algorithms for spatial
prediction of floods. Scientific reports 8 (1), 15364.

ur
Tien Bui, D., Shahabi, H., Shirzadi, A., Chapi, K., Pradhan, B., Chen, W., Khosravi, K., Panahi,
Jo
M., Bin Ahmad, B. and Saro, L., 2018c, Land subsidence susceptibility mapping in south
korea using machine learning algorithms. Sensors 18 (8), 2464.
Tien Bui, D., Khosravi, K., Shahabi, H., Daggupati, P., Adamowski, J.F., M Melesse, A., Thai
Pham, B., Pourghasemi, H.R., Mahmoudi, M. and Bahrami, S., 2019a, Flood spatial
modeling in northern Iran using remote sensing and gis: A comparison between evidential
belief functions and its ensemble with a multivariate logistic regression model. Remote
Sensing 11 (13), 1589.

Journal Pre-proof
Tien Bui, D., Ngo, P.-T.T., Pham, T.D., Jaafari, A., Minh, N.Q., Hoa, P.V. and Samui, P., 2019b,
A novel hybrid approach based on a swarm intelligence optimized extreme learning
machine for flash flood susceptibility mapping. CATENA 179, 184-196.
Tien Bui, D., Hoang, N.-D., Martínez-Álvarez, F., Ngo, P.-T.T., Hoa, P.V., Pham, T.D., Samui, P.
and Costache, R., 2020. A novel deep learning neural network approach for predicting
flash flood susceptibility: A case study at a high frequency tropical storm area. Science of
The Total Environment 701, 134413.
of
Tucker, C. and Sellers, P., 1986, Satellite remote sensing of primary production. International
ro
journal of remote sensing 7 (11), 1395-1416.
-p
Turoğlu, H. and Dölek, İ., 2011, Floods and their likely impacts on ecological environment in
re
Bolaman River basin (Ordu, Turkey). Research Journal of Agricultural Science 43 (4),
167-173.
lP
UN Office for the Coordination of Humanitarian Affairs, 2019, Islamic Republic of Iran:
na
Situation Overview: Floods, As of 13 April 2019. https://reliefweb.int/report/iran-islamic-
republic/islamic-republic-iran-situation-overview-floods-13-april-2019, Report
ur
Wang, S., Jiang, L. and Li, C., 2015, Adapting naive Bayes tree for text classification. Knowledge
Jo
and Information Systems 44 (1), 77-89.
Wang, Y., Hong, H., Chen, W., Li, S., Pamučar, D., Gigović, L., Drobnjak, S., Tien Bui, D. and
Duan, H., 2019a, A hybrid GIS multi-criteria decision-making method for flood
susceptibility mapping at Shangyou, China. Remote Sensing 11 (1), 62.
Wang, Y., Hong, H., Chen, W., Li, S., Panahi, M., Khosravi, K., Shirzadi, A., Shahabi, H.,
Panahi, S. and Costache, R., 2019b, Flood susceptibility mapping in Dingnan County
(China) using adaptive neuro-fuzzy inference system with biogeography based

Journal Pre-proof
optimization and imperialistic competitive algorithm. Journal of environmental
management 247, 712-729.
Wijkman, A. and Timberlake, L., 2019, Natural disasters: acts of God or acts of man? Routledge
Wilson, J.P. and Gallant, J.C., 2000, Terrain analysis: principles and applications. John Wiley &
Sons
Witten, D.M. and Tibshirani, R., 2011, Penalized classification using Fisher's linear discriminant.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (5), 753-
of
772.
ro
Xiao, L., Zhang, Y. and Peng, G., 2018, Landslide susceptibility assessment using integrated deep
-p
learning algorithm along the China-Nepal Highway. Sensors 18 (12), 4436.
re
Yang, X.-S., Year, Firefly algorithms for multimodal optimization. International symposium on
stochastic algorithms, 169-178.

lP
Yang, X.-S., 2010a, Firefly algorithm, stochastic test functions and design optimisation. arXiv
na
preprint arXiv:1003.1409.
Yang, X.-S., 2010b, A new metaheuristic bat-inspired algorithm. Nature inspired cooperative
ur
strategies for optimization (NICSO 2010). Springer65-74.

Jo
Young, R.A. and Mutchler, C.K., 1969, Soil movement on irregular slopes. Water Resources
Research 5 (5), 1084-1089.
Youssef, A.M., Pradhan, B. and Sefry, S.A., 2016, Flash flood susceptibility assessment in Jeddah
city (Kingdom of Saudi Arabia) using bivariate and multivariate statistical models.
Environmental Earth Sciences 75 (1), 12.

Journal Pre-proof
Zhou, Q., Mikkelsen, P.S., Halsnæs, K. and Arnbjerg-Nielsen, K., 2012, Framework for economic
pluvial flood risk assessment considering climate change effects and adaptation benefits.
Journal of Hydrology 414, 539-549.
Zhou, Q., Leng, G. and Feng, L., 2017, Predictability of state-level flood damage in the
conterminous United States: the role of hazard, exposure and vulnerability. Scientific
reports 7 (1), 5354.
of
ro
Figure 1 Locations of floods in the study area
-p
Figure 2 Flash flood condition factors used in this study: (a) slope angle, (b) elevation, (c)
re
curvature, (d) TWI, (e) SPI, (f) distance to river, (g) river density, (h) rainfall, (i) lithology, (j)
land use, and (k) NDVI

lP
Figure 3 Schematic of a Deep Belief Network structure

na
Figure 4 Restricted Boltzmann Machine
Figure 5 Chromosome representation in DBPGA model

ur
Figure 6 The single-point crossover operator

Jo
Figure 7 Mutation operator in the DBPGA model
Figure 8 The GA flowchart for finding the optimal topology of the DBN
Figure 9 SWARA weights of flood conditioning factors in the study area
Figure 10 The order of the importance of conditioning factors in flood occurrence in the study
area
Figure 11 The goodness-of-fit and prediction accuracy of the DBPGA model. (a) Trend of flood
and non-flood locations using the training dataset. (b) MSE and RMSE of the training dataset. (c)
Journal Pre-proof
Standard deviation and mean values of the training dataset. (d) Trend of flood and non-flood
locations using the testing dataset. (e) MSE and RMSE of the testing dataset, (f) Standard
deviation and mean values of the testing dataset.
Figure 12 Flood susceptibility map prepared by the DBPGA model
Figure 13 Histogram of flood locations and susceptibility classes for three classification models
including natural breaks (NB), geometrical interval (GI) and quantile (Q)
Figure 14 Confusion matrix of the new proposed model: (a) Training dataset. (b) Testing dataset
of
Figure 15 ROC curve and AUC for the novel deep learning proposed model: (a) Training dataset,
ro
(b) Testing dataset
-p
Figure 16 A graphically comparison between the novel deep learning model and other benchmark
re
models
lP
na
Table 1 Lithological units of the study area

ur
Table 2 Flash flood conditioning factors and its classes and classifications methods
Jo
Table 3 The optimal values of GA parameters for flood susceptibility modeling
Table 4 Comparison of classification performance of DBPGA model with some benchmark
machine learning models using testing dataset

Journal Pre-proof
Table 5 Comparison of classification performance of DBPGA model and some optimization
algorithms using testing dataset
Table 6 Performance evaluation of the new deep learning proposed model with other soft
computing benchmark models
Table 7 Average rank of the flash flood susceptibility models for the study area using the
of
Friedman‘s test
ro
-p
Table 8 Performance of the novel deep learning model compared to other models using the
re
Wilcoxon signed-rank test (two-tailed)
lP
na
Graphical abstract:
ur
Highlights
Jo
 A novel deep learning model, DBPGA, was suggested for flash flood susceptibility mapping.
 The One-R Attribute Evaluation (ORAE) technique was used to select optimal conditioning factors.
 The DBPGA model outperformed and outclassed all algorithms that earlier used in the study area.
 The proposed model as a promising tool can be useful to predict flash flood in other similar regions.

Journal Pre-Proof

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

Flash flood susceptibility mapping using a novel deep learning

Himan Shahabi, Ataollah Shirzadi, Somayeh Ronoud, Shahrokh

Received date: 8 May 2020

© 2020 Published by Elsevier.

network, back propagation and genetic algorithm

Fatemeh Mansouripourd, Marten Geertsemaf, John J. Clagueg, Dieu Tien Buih,i,*

Development, Prince George, BC V2L 1R5, Canada

compared the performance of three expert knowledge-based models (Vise kriterijumska

optimizacijaik ompromisno Resenje (VIKOR), Technique for Order Preference by Similarity to

outperformed and outclassed the three expert knowledge-based models.

the NBT and ADT algorithms.

susceptibility mapping at Brisbane, Australia.

had the highest ability to predict flood locations in Haraz area.

structured learning or hierarchical learning) began to be used in artificial intelligence research

they can perform unsupervised and semi-supervised learning (Nielsen, 2015).

training datasets, resulting in improvements in classification accuracy and prediction precision.

susceptibility map was generated using ArcGIS10.2 software.

2. Description of the study area

and are exacerbated by deforestation, recent extensive replacement of orchards by residential

land, irrigated land, residential area, and garden land.

3.1 Flash flood inventory map

3.2 Flash flood conditioning factors

the occurrence of flooding.

(Table 2 and Fig. 2b).

concave (Table 2 and Fig. 2c).

(Moore and Wilson, 1992):

classification method (Table 2 and Fig. 2f).

the natural breaks classification method (Table 2 and Fig. 2g).

breaks classification method (Table 2 and Fig. 2h).

Land cover factors

seven land use classes (Table 2 and Fig. 2j).

3.3. Deep Belief Network (DBN)

and gained success in a variety of artificial intelligence applications including classification,

DBN training comprises three steps:

Step 1. Unsupervised and greedy layer-wise pre-training using a stack of RBMs.

3.3.1 Restricted Boltzmann Machine

units. Probabilities of each possible state { }are defined as:

(and vice versa):

binary, the probability of a hidden neuron being ‗on‘ (value = 1) is as follows:

where , is a logistic sigmoid function ( ( )). Similarly, the conditional

probability of a visible node with respect to the hidden vector is:

weights of the training data log-likelihood can be written as:

3.3.2 Calculating the log-likelihood gradient of the training data

3.3.3 The new proposed deep learning model of DBPGA

Step 1. Parameter initialization

and maximum of gene values.

Step 2. Chromosome encoding and population initialization

Chromosomes are depicted as positive integers representing network architecture (direct

encoding). Fig. 5 shows an example of representation of a chromosome and its corresponding

population size is N, in which all of the N chromosomes are randomly initialized.

The chromosome‘s fitness is obtained by calculating the percentage of classification accuracy on

the training data:

Negative rates, respectively.

which is selected randomly, is mutated from 9 to 25.

Step 7. Survivor Selection

Step 8. Stop criterion

4 Background of the employed algorithms

equations are used for generating the map:

deep learning model for flash flood susceptibility mapping.

4.1.2 Logistic Model Tree

linear regression in the leaf nodes as follows (Landwehr et al., 2005):

function, and is the natural logarithm.