Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Journal Pre-proof

Flash flood susceptibility mapping using a novel deep learning


model based on deep belief network, back propagation and
genetic algorithm

Himan Shahabi, Ataollah Shirzadi, Somayeh Ronoud, Shahrokh


Asadi, Binh Thai Pham, Fatemeh Mansouripour, Marten
Geertsema, John J. Clague, Dieu Tien Bui

PII: S1674-9871(20)30240-1
DOI: https://doi.org/10.1016/j.gsf.2020.10.007
Reference: GSF 1100

To appear in:

Received date: 8 May 2020


Revised date: 6 August 2020
Accepted date: 17 October 2020

Please cite this article as: H. Shahabi, A. Shirzadi, S. Ronoud, et al., Flash flood
susceptibility mapping using a novel deep learning model based on deep belief network,
back propagation and genetic algorithm, (2020), https://doi.org/10.1016/j.gsf.2020.10.007

This is a PDF file of an article that has undergone enhancements after acceptance, such
as the addition of a cover page and metadata, and formatting for readability, but it is
not yet the definitive version of record. This version will undergo additional copyediting,
typesetting and review before it is published in its final form, but we are providing this
version to give early visibility of the article. Please note that, during the production
process, errors may be discovered which could affect the content, and all legal disclaimers
that apply to the journal pertain.

© 2020 Published by Elsevier.


Journal Pre-proof

Flash flood susceptibility mapping using a novel deep learning model based on deep belief

network, back propagation and genetic algorithm

Himan Shahabia,b, Ataollah Shirzadic, Somayeh Ronoudd, Shahrokh Asadid, Binh Thai Phame,

Fatemeh Mansouripourd, Marten Geertsemaf, John J. Clagueg, Dieu Tien Buih,i,*

buitiendieu@tdtu.edu.vn
a
Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj

of
66177-15175, Iran
b
Board Member of Department of Zrebar Lake Environmental Research, Kurdistan Studies

ro
Institute, University of Kurdistan, Sanandaj 66177-15175, Iran
c
-p
Department of Rangeland and Watershed Management, Faculty of Natural Resources, University
re
of Kurdistan, Sanandaj, Iran
lP

d
Data Mining Laboratory, Department of Engineering, College of Farabi, University of Tehran,

Tehran, Iran
na

e
Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
f
ur

British Columbia, Ministry of Forests, Lands, Natural Resource Operations and Rural

Development, Prince George, BC V2L 1R5, Canada


Jo

g
Department of Earth Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
h
Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh

City, Vietnam
i
Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City,

Vietnam

*
Corresponding author.
Journal Pre-proof

Abstract

Flash floods are responsible for loss of life and considerable property damage in many countries.

Flood susceptibility maps contribute to flood risk reduction in areas that are prone to this hazard if

appropriately used by land-use planners and emergency managers. The main objective of this

study is to prepare an accurate flood susceptibility map for the Haraz watershed in Iran using a

novel modeling approach (DBPGA) based on Deep Belief Network (DBN) with Back

Propagation (BP) algorithm optimized by the Genetic Algorithm (GA). For this task, a database

of
comprising ten conditioning factors and 194 flood locations was created using the One-R

ro
Attribute Evaluation (ORAE) technique. Various well-known machine learning and optimization

-p
algorithms were used as benchmarks to compare the prediction accuracy of the proposed model.
re
Statistical metrics include sensitivity, specificity accuracy, root mean square error (RMSE), and

area under the receiver operatic characteristic curve (AUC) were used to assess the validity of the
lP

proposed model. The result shows that the proposed model has the highest goodness-of-fit
na

(AUC=0.989) and prediction accuracy (AUC=0.985), and based on the validation dataset it

outperforms benchmark models including LR (0.885), LMT (0.934), BLR (0.936), ADT (0.976),
ur

NBT (0.974), REPTree (0.811), ANFIS-BAT (0.944), ANFIS-CA (0.921), ANFIS-IWO (0.939),
Jo

ANFIS-ICA (0.947), and ANFIS-FA (0.917). We conclude that the DBPGA model is an excellent

alternative tool for predicting flash flood susceptibility for other regions prone to flash floods.

Keywords: Environmental modeling; Flash flood; Deep belief network; Over-fitting; Iran
Journal Pre-proof

1. Introduction

Flash floods occur when channel discharge rapidly exceeds channel capacity, resulting in over-

banking flow (Khosravi et al., 2019). Extreme, short-duration rainfall increases both average flow

velocity and the velocity of the peak flow (Charlton et al., 2006). Flash floods are responsible for

natural disasters, with total fatalities of more than 20,000 per year, and much damage to

infrastructure and agricultural systems (Zhou et al., 2017). About 90% of global flash flood

fatalities are in Asia. Projections suggest that flooding in Asia could increase by nearly 200% by

of
2050 (Arnell and Gosling, 2016). Causes of flash floods include heavy rainfall, hardening of large

ro
areas of land due to urbanization, and soil degradation (Huppert and Sparks, 2006; Schillaci et al.,

-p
2017; Wijkman and Timberlake, 2019). Risk commonly increases with population growth and,
re
possibly, as climate changes.
lP

Iran is particularly prone to flash floods. In 2019 alone, flash floods killed 78 people, injured 1076

others, and displaced about 300,000 people; in total, some 10 million people were affected (UN
na

Office for the Coordination of Humanitarian Affairs, 2019). One of the most susceptible areas to
ur

flash flooding is the Haraz watershed in Mazandaran Province (Tien Bui et al., 2018b; Khosravi

et al., 2019). Twenty-eight villages in this area were destroyed during flash floods (Khosravi et
Jo

al., 2016a).

Because flash floods are so deadly and damaging, researchers are making efforts to predict

hazards and risks in flood-prone areas. These efforts include development of advanced systems to

predict areas vulnerable to flooding. Because flood prediction is time-consuming and flood-prone

areas are complex, flood prediction models are particularly data-specific and require many

simplifying assumptions (Lohani et al., 2014). A key initiative is the development of advanced

flood predictive models, including physically based rainfall-runoff, regression, and ―on-off‖
Journal Pre-proof

classification models (Tien Bui and Hoang, 2017). Physically based models, for example HEC-

RAS (Brunner, 1995) and MIKE (Zhou et al., 2012), require lengthy monitoring hydrological

datasets , consequently their use in flood modeling is highly challenging (Kim et al., 2015; Nayak

et al., 2005). Regression models are widely used in spatial and temporal flood modeling, but they

too require long time-series datasets from hydrological stations to accurately forecast extreme

discharges. Commonly, discharge records are short or incomplete and thus rarely can be used for

accurate flood prediction. The newest physically based approach is the ―on-off‖ classification

of
model. This model does not require data from hydrological stations; instead it uses historical

ro
flood and geo-environmental data, which are classified into flood and non-flood categories using

-p
data-driven and machine learning models (Tien Bui et al., 2016a).
re
Recent progress in geographic information system (GIS) and remote sensing (RS) techniques has

driven development of new flood prediction models, but applications of new models in mountain
lP

areas still face challenges (Tien Bui et al., 2019). To overcome these challenges, machine learning
na

models are increasingly being developed and used in flood modeling because of their high

performance, accuracy, and predictive capability (Ahmadlou et al., 2019; Tien Bui et al., 2019a).
ur

The increasing popularity of machine learning algorithms stems, in part, from the fact that they
Jo

predict flood nonlinearity solely from flood historical datasets without the need for knowledge of

complex mathematical expressions of physical processes and basin behavior (Mosavi et al.,

2018). Machine learning algorithms can easily be implemented (fast training, testing, and

validation) with low computation costs. They also are less complex than other physically based

and conventional models (Mekanik et al., 2013; Mosavi et al., 2017). Khosravi et al. (2019)

compared the performance of three expert knowledge-based models (Vise kriterijumska

optimizacijaik ompromisno Resenje (VIKOR), Technique for Order Preference by Similarity to


Journal Pre-proof

Ideal Solution (TOPSIS), and Simple Additive Weighting (SAW)) with two machine learning

methods (Naïve Bayes Tree (NBT) and Naïve Bayes (NB)) for flood susceptibility mapping in the

flood-prone Ningdu watershed in China. They found that the machine learning algorithms

outperformed and outclassed the three expert knowledge-based models.

A survey of the literature shows that many machine learning algorithms have been developed and

formulated for flood modeling: functional algorithms (i.e. support vector machines (SVM),

logistic regression (LR), and artificial neural networks (ANN)), Bayes-based algorithms

of
(Bayesian logistic regression (BLR)); and decision tree algorithms (random forest (RF),

ro
alternating decision tree (ADT), logistic model trees (LMT), naïve Bayes tree (NBT), reduced

-p
error pruning tree (REPT), and classification and regression trees (CART)). However, differences
re
in the choice of flood conditioning factors by researchers and in the probability distribution
lP

function (PDF) used in the algorithms (Tien Bui et al., 2018b) make it difficult to compare and

rate the algorithms. Nevertheless, Khosravi et al. (2019) compared four machine learning
na

algorithms (LMT, NBT, REPT, and ADT) that they applied to predict floods in the Haraz

watershed. They demonstrated that the ADT model had the highest prediction capability for flash
ur

flood susceptibility assessment, followed by, respectively, the NBT, LMT, and REPT models.
Jo

Chen et al. (2020) compared three machine learning algorithms (NBT, ADT, and RF) for spatial

prediction of flooding in the Quannan area, China, and found that the RF algorithm outperformed

the NBT and ADT algorithms.

A recent development in flood prediction modeling is the use of hybrid machine learning models.

Hybrid models have been shown to improve and enhance the prediction accuracy of single

statistically based benchmark models, for example the LR model and bivariate models such as

frequency ratio (FR), evidential belief function (EBF), and weights-of-evidence (WoE) models
Journal Pre-proof

(Pham et al., 2018). Tehrany et al. (2014) coupled WoE and SVM and concluded that the WoE-

SVM (RBF) hybrid model outperformed the benchmark WoE and SVM models for flood

susceptibility mapping at Brisbane, Australia.

Some hybrid models have been developed for flood susceptibility mapping in the Haraz

watershed, which is the area we chose for our study. Chapi et al. (2017) developed a hybrid model

(Bagging-LMT) that combines Bagging and the LMT algorithm and compared its goodness-of-fit

and prediction accuracy with four state-of-the-art soft computing benchmark models (LMT, LR,

of
BLR, and RF). They found the hybrid model performed best; its performance was particularly

ro
higher than that of the LMT algorithm. Shafizadeh-Moghadam et al. (2018) coupled eight single

-p
machine learning algorithms with seven ensemble forecasting models and concluded that the
re
combination of boosted regression tree (BRT) algorithm and the ensemble forecasting model
lP

EMmedian provided the highest performance for flood modeling in the study area. Tien Bui et al.

(2018b) coupled the adaptive neuro-fuzzy inference system (ANFIS) algorithm with the
na

imperialistic competitive (ANFIS-ICA) and firefly (ANFIS-FA) algorithms; they found that the

hybrid model (ANFIS-ICA) outperformed the single algorithms used alone. In contrast, Tien Bui
ur

et al. (2019b) combined the EBF and LR algorithms in a new hybrid model (EBF-LR), and
Jo

concluded that the hybrid model failed to outperform the standalone models. Tien Bui et al.

(2018a) integrated ANFIS with the cultural algorithm (ANFIS-CA), bees algorithm (ANFIS-BA),

and invasive weed optimization (ANFIS-IWO) algorithm. They concluded that ANFIS-IWO had

the highest goodness-of-fit among the models tested, although ANFIS-BA had higher prediction

accuracy than the ANFIS-IWO and ANFIS-CA models. Finally, Shahabi et al. (2020) introduced

hybrid models of bagging based on four kernels (i.e. coarse, cosine, cubic, and weighted) of the
Journal Pre-proof

K-Nearest Neighbor (KNN) algorithm. and concluded that the Bagging-Cubic ensemble model

had the highest ability to predict flood locations in Haraz area.

In the early days of artificial intelligence, deep learning (DL) methods (also referred to as deep

structured learning or hierarchical learning) began to be used in artificial intelligence research

after having become popular in other scientific fields such as computer vision, big data mining,

human activity recognition, character recognition, speech recognition, digital image processing,

and natural language processing (Ball et al., 2017; Huang and Xiang, 2018). DL is a statistical

of
technique for classifying patterns using neural networks with multiple layers based on training

ro
datasets (Marcus, 2018). DL offers several advantages over other methods: (i) they are becoming

-p
more applicable and useful as the sizes of available training datasets increase; (ii) the size of DL
re
models? has grown over time with improvements in computer infrastructure and speed; (iii) they
lP

can solve complex real-world problems by incrementally improving their accuracy over time, (iv)

they can perform unsupervised and semi-supervised learning (Nielsen, 2015).


na

Of the different types of DL used for classification, the deep belief network (DBN) (Hinton et al.,
ur

2006) is the most potent and efficient predictor and classifier (Ronoud and Asadi, 2019). DBN

uses a Restricted Boltzmann Machine (RBM) that can extract features from a large number of
Jo

training datasets, resulting in improvements in classification accuracy and prediction precision.

Importantly, deep learning models such as DBNs determine the optimal structure of a dataset by

selecting the best number of layers and neurons in each layer to achieve reasonable results (Guo et

al., 2016). There is no a guideline or standard way to select these parameters in the literature, thus

manual searching is generally used (Larochelle et al., 2007; Hinton, 2012; Shen et al., 2015).

In this study, we propose a new model using DBN, and evaluate and test it on different structures.

Our objective is to select an appropriate structure with the highest performance and prediction
Journal Pre-proof

accuracy. We also use back propagation (BP) to minimize cost functions by adapting control

weights (Dreyfus, 1973). Our long-term goal is to develop and explore new models and

techniques that improve flash flood management and mitigation. In this study, we develop a new

deep learning algorithm of deep belief network (DBN) with back propagation (BP) algorithm

optimized by genetic algorithm (GA) (DBPGA) for flash flood susceptibility mapping at Haraz in

northern Iran. Although DL models have been successfully used for landslide susceptibility

mapping (Ding et al., 2016; Ghorbanzadeh et al., 2019; Xiao et al., 2018), no such study has been

of
conducted on flood susceptibility assessment. Our proposed model improves the flood weights by

ro
using the BP algorithm to decrease the cost function and the GA algorithm to optimize the

-p
topology of the network in order to increase performance and prediction accuracy during the
re
modeling process using a training dataset. We compare the results from the new deep learning

proposed model with other state-of-the-art soft computing benchmark models such as machine
lP

learning (REPTree, NBT, LR, LMT, BLR, and ADT) and metaheuristic algorithms (ANFIS-IWO,
na

ANFIS-ICA, ANFIS-FA, ANFIS-CA, and ANFIS-BAT ) to check the efficiency and capability of

the developed model. The modeling process has been done in MATLAB (2018b); the flash flood
ur

susceptibility map was generated using ArcGIS10.2 software.


Jo

2. Description of the study area

The Haraz watershed is located in the mountainous Mazandaran Province, in northern Iran (Fig.

1). Major population centers are Polur, Tashal, Tiran, Rineh, Kandovan, Abasak, Gaznak,

Baladeh, and Noor (Khosravi et al., 2016a). The Haraz watershed experiences near-annual

catastrophic flash floods that cause fatalities, damage property and infrastructure, and disrupt

traffic, commerce, and public services, notably in recent years. Flash floods result from torrential
Journal Pre-proof

and are exacerbated by deforestation, recent extensive replacement of orchards by residential

areas, and the lack of flood control measures (Tien Bui et al., 2018b; Khosravi et al., 2018).

The watershed has an area of 4014 km2 and ranges in elevation from 300 m to about 5600 m a.s.l.,

with slopes up to 66o (Fig. 1). Average annual rainfall at Haraz is about 780 mm; the wettest

months are January, February, March, and October, with average monthly rainfall amounts of

about 160 mm. Average annual evaporation is about 1300 mm. Average temperature at Haraz

ranges from a minimum of 5°C to a maximum of 23°C. Average annual temperature is about 8°C.

of
Mountainous areas have a moderately cold climate, whereas the Caspian Sea shoreline has a mild

ro
humid climate.

-p
The study area is underlain by Mesozoic formations (56.4%), followed by Cenozoic (38.9%) and
re
Paleozoic (4.7%) formations. Most of the area is by rangeland (92%); the remainder is forest, bare
lP

land, irrigated land, residential area, and garden land.


na
ur

3. Data acquisition
Jo

3.1 Flash flood inventory map

We used data from 194 historical (1995–2015) flash floods in the Haraz watershed to map flood

distribution. We divided the data set into 155 (80%) locations used for flood modeling and 39

(20%) locations) for evaluation processes. Environmental experts employed by the Mazandaran

Regional Water Authority validated locations of the flash floods through field surveys.

Additionally, we randomly selected 194 non-flash flood locations within the study area and

divided them into modeling and evaluation groups using the same 80:20 ratio as we used for the

flood locations.
Journal Pre-proof

3.2 Flash flood conditioning factors

The selection of flood conditioning factors has a direct impact on the accuracy of mathematical

models (Kia et al., 2012). Based on previous research in the study area ( Khosravi et al., 2016a, b;

Chapi et al., 2017), we selected 11 flood conditioning factors for this study: Topographic factors

(slope angle, elevation, and curvature); hydrological factors (topographic wetness index (TWI),

stream power index (SPI), distance to river, river density, and rainfall); a geological factor

(lithology); and land cover factors (Normalized Difference Vegetation Index (NDVI), and land

of
use). We prepared a digital elevation model (DEM) of the study area from the ASTER Global

ro
DEM (https://gdex.cr.usgs.gov/gdex/) with a cell size of 30 m ×30 m. The DEM was used to

-p
provide maps of primary and secondary factors such as slope angle, elevation, plan curvature,
re
distance to river, and river density using ArcGIS 10.3, and maps of TWI and SPI using SAGA-

GIS 2.8 software. The spatial databases were constructed and then resampled using the "Resample
lP

tool" in ArcGIS 10.3 in a matrix with 5299 columns and 3027 rows and a cell size of 20 m ×20 m
na

for spatial analysis and model development. We briefly discuss below the role of each factor in

the occurrence of flooding.


ur

Topographic factors
Jo

Slope angle. Slope angle has a direct effect on flooding, and most researchers consider to be one

of the most important factors in flood modeling ( Rahmati et al., 2016; Termeh et al., 2018; Wang

et al., 2019b; Costache and Bui, 2020). It controls surface runoff, its velocity of velocity, and

infiltration. In general, the lower the slope angle (e.g., areas around rivers or flat terrain), the

higher the rate of infiltration and the lower the flow velocity; all other things being equal, such

areas have a higher likelihood of flooding (Chapi et al., 2017). We constructed the slope angle

map with eight classes based on the natural breaks classification method (Table 2 and Fig. 2a).
Journal Pre-proof

Elevation. Elevation generally has an inverse relationship with flooding (Fernández and Lutz,

2010), it has an inverse relationship with flooding. As elevation decreases, terrain typically

becomes flatter and the amount of water carried by streams and rivers increases (Cao et al., 2016).

An elevation map was constructed with nine classes using the manual classification method

(Table 2 and Fig. 2b).

Curvature. Some flood researchers consider curvature to an important flood conditioning factor

(Ahmadlou et al., 2019; Hong et al., 2018). Runoff accelerates or decelerates depending on slope

of
form: concave (negative curvature), flat (zero curvature), and convex (positive curvature). Convex

ro
slopes accelerate overland flow and may also affect infiltration and soil saturation (Cao et al.,

-p
2016). Concave slopes decelerate overland flow and may increase infiltration (Young and
re
Mutchler, 1969). The curvature map was constructed in three categories: convex, flat, and

concave (Table 2 and Fig. 2c).


lP

Hydrological factors
na

Topographic wetness index (TWI). The topographic watershed index (TWI) is a hydrological
ur

metric, defined as a ratio between specific basin area and slope angle (Wilson and Gallant, 2000).

It provides a measure of water accumulation, saturation, and flood possibility for each pixel in a
Jo

given watershed (Beven, 2011; Manfreda et al., 2011) and is formulated as follows (Beven and

Kirkby, 1979):

(1)

where is the specific catchment area (m2/m) and β is the slope angle (°). We constructed a TWI

map with six intervals using the natural breaks classification method (Table 2 and Fig. 2d).

Sediment power index (SPI). The sediment power index (SPI) provides a measure of the erosive

power of discharge relative to specific area within the watershed (Poudyal et al., 2010), It reflects
Journal Pre-proof

the power of flow at a given location in a watershed (Cao et al., 2016). The higher the SPI value,

the higher the power of the flow (Turoğlu and Dölek, 2011). SPI can be computed as follows

(Moore and Wilson, 1992):

(2)

where denotes the specific catchment area (m2/m) and β is the slope angle (°). In this study, we

divided SPI into five classes using the nature breaks classification method (Table 2 and Fig. 2e).

Distance to river. Areas close to rivers are more susceptible to flooding than more distant areas,

of
(Butler et al., 2006; Chapi et al., 2017). We extracted the river networks from the DEM with the

ro
―ArcHydro‖ tool in ArcGIS 10.2 and prepared a map with eight class using the natural breaks

classification method (Table 2 and Fig. 2f).


-p
re
River density. River density is defined as the total stream length (m) within an area divided by
lP

watershed area (km2) (Elmore et al., 2013). All other factors being equal, higher stream densities

are associated with higher likelihoods of flooding (Tehrany et al., 2015b). Fraser and Schumer
na

(2012) have argued that larger flood peaks and volumes are associated with higher stream
ur

densities in perennial watersheds, but ephemeral watersheds have lower flood peaks. We prepared

a river density map using the ―Line density‖ tool in ArcGIS 10.2, with six classes selected with
Jo

the natural breaks classification method (Table 2 and Fig. 2g).

Rainfall. Intuitively, rainfall is related to flash flooding. Short-duration torrential rainfall or long-

duration, lower intensity rainfall can cause flooding ( Organization, 1994; Kron, 2002; Marchi et

al., 2010; Cao et al., 2016). We prepared a rainfall map for the Haraz watershed based on a

dataset of 20 years (1991–2011) of rainfall from 17 rain gauges. To create the map, we used a

variety of interpolation methods: simple kriging, ordinary kriging, inverse distance weighting

(IDW) with powers of 1–5, a radial basis function (RBF) with a completely regularized spline,
Journal Pre-proof

and spline with a tension kernel function. We chose the IDW method and a power of ―1‖ because

it had the lowest RMSE error value. The rainfall map has six classes calculated using the natural

breaks classification method (Table 2 and Fig. 2h).

Geological factor

Lithology. Flooding can be affected by lithology and geologic structures, notably porosity,

permeability, and joint and fracture spacing (Derbyshire et al., 2013). In our study area, lithology

is as an indicator of water infiltration (Santos and Reis, 2018). Infiltration on highly resistant

of
rocks is low (Rahmati et al., 2016), resulting in a higher potential for flooding. We extracted

ro
lithology units from a geologic map of the study area at a scale of 1:100,000 provided by the

-p
Geological Survey and Mineral Explorations of Iran. We defined six units in Arc GIS 10.3:
re
Quaternary, Tertiary, Cretaceous, Jurassic, Triassic, and Permian, and Triassic (Table 2 and Fig.
lP

2i).

Land cover factors


na

Land use. Land use can affect infiltration and thus runoff (Rahmati et al., 2016; Santos and Reis,
ur

2018). Vegetation, especially forest, intercepts rainfall and reduces the rapidity of runoff (Tehrany
Jo

et al., 2014). We used a Landsat 8 OLI satellite image acquired in April 2013 and provided by the

Armed Forces Geographical Organization of Iran. We selected characteristic pixels for rangeland,

barren land, forest, garden, wood land, irrigated land, residential areas, and water bodies using

this image and supplemented with a field survey and Google Earth images. We used the neural

network algorithm (ANN), maximum likelihood ratio (MLR), and support vector machine (SVM)

within Environment for Visualizing Images (ENVI 5.1) software to classify all pixels into the

seven land use classes (Table 2 and Fig. 2j).


Journal Pre-proof

Normalized Difference Vegetation Index (NDVI). NDVI is a metric used to study the greenness

of the land surface (Rouse et al., 1974) and the presence of water bodies (Gao, 1996). Changes in

NDVI reflect changes in vegetation and surface water cover over time (Ahmed and Akter, 2017),

and can show the relationship between flooding and vegetation within a watershed (Tehrany et

al., 2013). Higher vegetation densities are assumed to have lower probabilities of flooding within

the study area (Chapi et al., 2017), The metric has values ranging from +1 (highest vegetation

density) and -1 (lowest vegetation density). The NDVI map for the Haraz watershed was

of
generated in ENVI 5.1 software with six classes based on the Landsat 8 OLI image acquired in

ro
2013. Bands 3 and 4 were used to prepare the NDVI map (Table 2 and Figure 2k). NDVI values

-p
were computed as follows: (Tucker and Sellers, 1986):
re
(3)
lP

3.3. Deep Belief Network (DBN)

Artificial Neural Networks (ANNs), inspired by the human brain, were introduced in the 1960s
na

and gained success in a variety of artificial intelligence applications including classification,


ur

regression, clustering, and prediction (Ahmadizar et al., 2015). ANNs are flexible mathematical
Jo

structures that learn intricate relationships between input and output data. The structure that is

most common across the different types of ANNs is the Back-Propagation Network (BP). BP,

however, suffers from the use of random weights at the beginning of the network training process.

Partly due to the problem, a new approach to deep-network pre-training, the Deep Belief Network

(DBN), was introduced in 2006 and led to significant progress in deep learning (Hinton et al.,

2006).

A DBN consists of several Restricted Boltzmann Machines (RBMs), which are the undirected

generative probabilistic model that uses one hidden layer to model the probabilistic distribution of
Journal Pre-proof

visible variables. Our DBN uses a stack of RBMs to process information hierarchies, which

extract high-level features among the raw data. Fig. 3 graphically shows a DBN with m input, N

RBM, and one output O. The bias of the visible and hidden layers is not shown in the figure for

simplicity. The numbers and letters shown in the neurons are the indexes of the neurons.

DBN training comprises three steps:

Step 1. Unsupervised and greedy layer-wise pre-training using a stack of RBMs.

of
Step 2. First fine-tuning step – Randomly assign the connection weights matrix between the latest

ro
hidden layer and the output neuron, and then calculate the error.

-p
Step 3. Second fine-tuning step – Use error Back Propagation.
re
The RBM training process is described below.
lP

3.3.1 Restricted Boltzmann Machine

The Restricted Boltzmann Machine uses an encoding-decoding pattern with an encoder that
na

converts inputs into a higher-level feature representation (Fig. 4). The decoder can then

reconstruct the input (Lopes and Ribeiro, 2015). RBM training through the reconstruction of input
ur

data is a major advantage of DBN because this procedure is unsupervised and does not require
Jo

labeled data.

The RBM consists of a set of visible units { } and a set of hidden units { } ,

where and are, respectively, the number of visible units and the number of hidden units. In

RBM, the energy of the joint configuration { } considering bias is (Bengio, 2009):

(4)

In Eq. (4), x is the vector of visible units value, h is the vector of hidden units values, W is the

weight matrix, b is the bias vector for visible units, c is the bias vector for hidden units, and vT, bT,
Journal Pre-proof

and cT are column vectors; bT is the transpose of vector b, cT is the transpose of vector c, and vT is

the transpose of vector v. Eq. (4) can be rewritten as follows (Tieleman and Hinton, 2009):

gv gh gv gh
E ( v, h)  Wij vi h j   bi vi   c j h j (5)
i 1 j 1 i 1 j 1

in which vi and hj are binary states of, respectively, visible unit i and hidden unit j; bi and cj are

biases of, respectively, visible unit i and hidden unit j; and Wij is the weight between those two

units. Probabilities of each possible state { }are defined as:

of
( ) (6)

ro
where Z is the normalizing constant and equals:

Z   exp   E  v, h  
-p
re
(7)
v ,h
lP

The probability of a data point, represented by the state v of the visible vector, is:
na

∑ ( ) (8)
ur

The hidden unit activators are independent of the visible unit activators are mutually independent

(and vice versa):


Jo

gh
P h | v    P  hj | v  (9)
j 1

Note that if one layer is specified, the distribution of the other layer is factorial. Since neurons are

binary, the probability of a hidden neuron being ‗on‘ (value = 1) is as follows:

 
P (h j  1| v)    c j  Wij vi  (10)
 i 
Journal Pre-proof

where , is a logistic sigmoid function ( ( )). Similarly, the conditional

probability of a visible node with respect to the hidden vector is:

 
P (vi  1| h)    bi  Wij h j  (11)
 j 

This is a probabilistic version of the normal sigmoid activation function. The goal is to maximize

the log-likelihood of the training data or to minimize its negative log-likelihood. The negative

log-likelihood gradient for the training data concerning model parameters is given by Eqs. (12),

of
(13), and (14):

ro
( ) ⟨ ⟩ ⟨ ⟩ (12)

-p
re
( ) ⟨ ⟩ ⟨ ⟩ (13)
lP

( ) ⟨ ⟩ ⟨ ⟩ (14)
na

where ⟨ ⟩ is the expected value with respect to the distribution a. The learning rule for updating
ur

weights of the training data log-likelihood can be written as:


Jo

(⟨ ⟩ ⟨ ⟩ ) (15)

where α is the learning rate. Eqs. (16) and (17) show the rules of weights training in the biases:

⟨ ⟩ ⟨ ⟩ (16)

(⟨ ⟩ ⟨ ⟩ ) (17)

3.3.2 Calculating the log-likelihood gradient of the training data


Journal Pre-proof

In order to use the log-likelihood gradient of the training data to update Wij based in Eq. (15), it is

necessary to calculate two values, ⟨ ⟩ and ⟨ ⟩ . These two values are commonly

termed the positive and negative phases, respectively (Tieleman and Hinton, 2009). ⟨ ⟩ is

easily calculated by considering the visible units v, the values of which have been determined

from training data, and by assigning the value 1 to each hidden unit with a probability value

calculated by Eq. (10). Thus, if the calculated probability value is higher than a random number

of
with a uniform distribution in the interval of (0, 1), this hidden unit can be considered as 1.

Therefore, ⟨ ⟩ can be easily calculated by obtaining hj values (Tieleman and Hinton, 2009).

ro
The negative phase is more difficult to calculate (Palm, 2012). Methods for this calculating the

-p
negative phase have been proposed by several researchers (Hinton, 2010; Keyvanrad and
re
Homayounpour, 2015; Le Roux and Bengio, 2008; Tieleman, 2008; Tieleman and Hinton, 2009);
lP

they differ in how the objective function gradient is approximated. Currently, the most popular

method is CD-1 (Hinton, 2010), which includes one step of Gibbs sampling where all hidden
na

units are updated (in parallel) according to Eq. (10) before all visible units (in parallel) are
ur

updated according to Eq. (11). Expressed differently, the visible units vi are determined first by
Jo

the input instance. Then, the hidden states hj are calculated from Eq. (10). and are

determined by repeating this process using Eqs. (10) and (11) via one step of the visible and

hidden unit reconstruction. The weight updating rules can be expressed as:

(⟨ ⟩ ⟨ ⟩ ) (18)

The updating rules of the visible and hidden layer biases are as follows:

(⟨ ⟩ ⟨ ⟩ ) (19)
Journal Pre-proof

(⟨ ⟩ ⟨ ⟩ ) (20)

3.3.3 The new proposed deep learning model of DBPGA

It is vital to determine the topology of the neural network because it affects the network‘s learning

capacity and ability to generalize (Ahmadizar et al., 2015). Despite the promising outcomes of

many deep learning algorithms in a variety of applications, determining the appropriate number of

layers and the number of nodes per layer for a particular task is difficult (Guo et al., 2015). There

of
is thus a need to find the optimal architecture of a deep belief network. Finding the optimal

ro
topology of a DBN for a network can be considered a search problem. The DBPGA model uses

GA, which is a search algorithm that can evolutionarily find optimal or near-optimal solutions
-p
(Mansourypoor and Asadi, 2017; Mehmanpazir and Asadi, 2017). Borrowing from genetics, the
re
steps and operators of GA to optimize the structure of the DBN are as:
lP

Step 1. Parameter initialization

Determine population size, number of generations, number of chromosome genes, and minimum
na

and maximum of gene values.


ur

Step 2. Chromosome encoding and population initialization


Jo

Chromosomes are depicted as positive integers representing network architecture (direct

encoding). Fig. 5 shows an example of representation of a chromosome and its corresponding

network for a problem that includes 10 input features and one output O. The numbers 54 and 21

represent the number of neurons, respectively, in the first and second hidden layers. The

population size is N, in which all of the N chromosomes are randomly initialized.

Step 3. Evaluation
Journal Pre-proof

The chromosome‘s fitness is obtained by calculating the percentage of classification accuracy on

the training data:

(21)

where TRP, TRN, FAP, and FAN, are the True Positive, True Negative, False Positive, and False

Negative rates, respectively.

Step 4. Selection

of
Roulette wheel selection and random selection are used to select parents for, respectively, the

ro
crossover and mutation.

Step 5. Crossover
-p
After selecting the parent, a single-point crossover mechanism is used to generate a new
re
population. The crossover point is chosen randomly. Fig. 6 shows a simple example of a single-
lP

point crossover.

Step 6. Mutation
na

A parent is mutated by decreasing or increasing the value of one random selected gene, producing
ur

a new chromosome. Fig. 7 shows a simple example of a mutation. In this example, the third gene,
Jo

which is selected randomly, is mutated from 9 to 25.

Step 7. Survivor Selection

The N chromosomes with the highest fitness values are selected from the current population, and

the population resulting from crossover and mutation are selected as survivors to create the new

population.

Step 8. Stop criterion


Journal Pre-proof

If the condition of a certain number of generations is met, the algorithm stops, and the best

chromosome is returned from the current population. Otherwise, the algorithm goes back to step 3

to create a new population. Fig. 8 shows the GA flowchart for finding the optimal topology of the

DBN.

4 Background of the employed algorithms

of
4.1 Machine learning algorithms

ro
4.1.1 Logistic Regression

Logistic Regression (LR) belongs to the multivariate statistical methods that we used to establish

-p
and compute the coefficient for each flash flood conditioning factor (the independent variable),
re
based on the dependent variable (binary, flood, and non-flood classes) (Chapi et al., 2017). The
lP

higher the coefficient of the conditioning factor is, the greater the probability that a flood will

happen (Mousavi et al., 2011; Shirzadi et al., 2012). The coefficients are determined with a
na

confidence level (95% or 99%) and then are assigned to each significant conditioning factor
ur

during the modeling process to prepare the flash flood susceptibility map. The following
Jo

equations are used for generating the map:

(22)

(23)

where is the probability of flash flood occurrence calculated by the LR model, Z is the linear

function of the LR model, is the constant coefficient extracted by the training model, n is the

number of flash flood conditioning factors, b is the weight of each flash flood conditioning factor,

and x is the specific flash flood conditioning factor. In this study, the LR model was used as one
Journal Pre-proof

of the state-of-the-art computing benchmark models to assess the capability of the proposed new

deep learning model for flash flood susceptibility mapping.

4.1.2 Logistic Model Tree

The Logistic Model Tree (LMT) is a decision tree algorithm that uses logistic regression together

with the C4.5 decision tree (Quinlan, 1993). In LMT, the tree is first split using a feature selection

function - the ‗information gain ratio‘ technique. Then, a regression plan is used to replace leaf

nodes (Witten and Tibshirani, 2011). The logistic regression function using the LogitBoost (LB)

of
algorithm is assigned to a tree node, and then the weights are computed (Tien Bui et al., 2016b).

ro
The Classification and Regression Tree (CART) algorithm is used to decrease or prevent over-

-p
fitting in the pruning stage (Tien Bui et al., 2016b). The C4.5 decision tree can split flash flood
re
conditioning factors into flood and non-flood classes based on their probability (Chen et al., 2018,

2019). The LB algorithm is run with least-squares for each class Ci (flood or non-flood) as:
lP

∑ (24)
na

where N is the number of flash flood conditioning factors and is the coefficient of the ith
ur

component in the input vector x. Finally, the posterior probability | is obtained through
Jo

linear regression in the leaf nodes as follows (Landwehr et al., 2005):

| (25)
∑ ́

where is the number of flash flood conditioning factors, is the linear regression

function, and is the natural logarithm.

4.1.3 Bayesian Logistic Regression

Bayesian Logistic Regression (BLR) is a hybrid model based on the logistic regression model and

the Bayes-based theorem method (Taheri et al., 2019). BLR uses the prior distribution function to
Journal Pre-proof

analyze uncertainties in the model and can solve posterior distributions by the likelihood function

(Ghosh et al., 2007; Tien Bui et al., 2018b). The relationship between the class label (flood and

non-flood) and flash flood conditioning factors is determined in a Bayesian framework. There are

three steps in BLR: (1) the prior probability is determined for the parameters; (2) the likelihood

function is specified for data; and (3) the posterior distribution is computed for the parameters

(Avali et al., 2014). If are the flash flood conditioning factors of the training

dataset x, and flash flood and non-flood are class labels , the logistic function is used to

of
generate the posterior probability of a sample belonging to a specific class label for categorical or

ro
binary factors as follows:

| ⁄ -p ∑
(26)
re
where ⁄ is the prior log odds ratio, b is the bias, and are
lP

the weights that are learned during the modeling process of the training dataset, and is a

function that can be obtained as follows:


na

| ⁄ | (27)
ur

Additionally, for continuous data, a Gaussian prior distribution function is used to calculate the
Jo

weights for each flash flood conditioning factors as follows:

( | ) ⁄
(28)

where and are, respectively, the coefficients of conditioning factors and the standard

deviation of the Gaussian distribution.

4.1.4 Alternating Decision Tree

The Alternating Decision Tree (ADT; Freund and Mason, 1999) is a well-known and robust

decision tree algorithm that uses a boosting algorithm for classification (Tien Bui et al., 2018b).
Journal Pre-proof

One of the advantages of this algorithm in comparison to other machine learning methods such as

C4.5, random forest and classification, and regression trees (Breiman et al., 1984) is that it builds

a decision tree structure based on simple rules. ADT has two structural components: Decision

Node (DN) and Prediction Node (PN). The PN, which contains a single number, determines a

condition (Khosravi et al., 2018). For numeric predictions, the tree is grown by a boosting

algorithm; then final prediction scores are used to assign each Prediction Node (Hong et al.,

2015). All contribution weights are summed to achieve the final prediction probability.

of
Take R1 and R2 to be, respectively, a base ruler mapping and a base condition to the real number

ro
from the instances, and α and β to be two real numbers. If the prediction of α is defined as

and β is defined as -p
̅ ( ̅ is a negotiation of R), the values of α and β can be
re
calculated according to the following equation (Freund and Mason, 1999):
lP

̅
⁄ ; ⁄ ̅
(29)

where W is the sum of the weights of the training instances, and the best R1 and R2 are computed
na

by minimizing Zt (R1, R2), which is formulated as follows:


ur

√ √ ̅ ̅ ̅ (30)
Jo

4.1.5 Naïve Bayes Tree

The Naïve Bayes Tree (NBT) combines Naïve Bayes (NB) and Decision Tree (DT) algorithms

based on the Bayes theorem (Kohavi, 1996). It enhances and improves the classification power of

individual NB and DT models (Kohavi, 1996). Among the advantages of NBT are that it requires

little computer memory, is a fast learning algorithm, is efficient and straightforward, performs

excellently, and results are easily interpreted. It is, therefore, one of the most used algorithms

among environmental researchers (Wang et al., 2015). The pre-pruning technique employed with

this method uses one of the following steps: (1) the data splitting process is done at the node; or
Journal Pre-proof

(2) a leaf is generated on the data with a local NB model at that specific node (Landwehr et al.,

2005). The NBT model uses the entropy approach to growing trees (Khosravi et al., 2018). Take

Y to be a training dataset and | | to be the total number of flash flood conditioning factors. Flash

flood conditioning factors can be divided into l classes as Si (i = 1,2, …l). While establishing

decision trees, gain ratio (GR) values are computed to control tree growth as follows (Quinlan,

1986):

| |

of
| | | |
(31)

| | | |

ro
where |Yi| is the number of the flood conditioning factors belonging to the class |Yi|. The

-p
independent assumption between the conditioning factors, is included in the NBT as
re
class conditional independence (Shirzadi et al., 2017). The NB classifier can be computed using

the following equation (Pham et al., 2017):


lP


na

(32)
ur

where PP (ti) refers to the prior probability of the output variables ti = (1, 0), ri is the i-th attribute
Jo

in the training dataset, and σ and ε are the mean value and standard deviation of ri.

4.1.6 Reduced Error Pruning Tree

The Reduced Error Pruning Tree (REPTree) algorithm is an ensemble of Decision Tree (DT), and

Reduced Error Pruning (REP), which based on information gain or variance reduction approaches

generates a decision or regression tree (Quinlan, 1987). When the output of a decision tree is

large, the DT algorithm simplifies the modeling process by using the training dataset; the

complexity of the structure of the tree is also reduced by using REP (Mohamed et al., 2012). REP
Journal Pre-proof

is the most popular pruning method for eliminating the leaves and branches of the tree with low

classification power (Galathiya et al., 2012). The REPTree algorithm locates the sub-tree with the

most accurate power classification (Pham et al., 2019b). The most crucial advantage of REPTree

is that it reduces the complexity of the tree structure and also prevents the over-fitting problem

during the modeling process without sacrificing accuracy (Quinlan, 1987). The performance of

the REPTree is achieved by using reducing the variance and reduced error pruning techniques or

the highest information gain from entropy (Srinivasan and Mekala, 2014). The gain ration in this

of
algorithm can be formulated as follows: (Tien Bui et al., 2012).

ro
Yi
E Y    i 1E Y i 
n

Gain ratio  x ,Y  Y
Y Y
 i 1 i log 2 i
n

Y Y
-p (33)
re
where is the entropy of a training dataset and the attribute belongs to a training dataset with
lP

subsets
na

4.2 Evolutionary/optimization algorithms

4.2.1 Bat Algorithm


ur

The Bat Algorithm (BA) is an intelligent optimization algorithm proposed by (Yang, 2010b) to
Jo

simulate the echolocation behavior of bats. It provides better results for optimization problems

than many popular traditional and heuristic algorithms (Srivastava and Sahana, 2017; Srivastava

et al., 2015).

Bats detect prey or avoid obstacles by the emitting sound that strikes the object and is reflected to

the animals‘ ears. To simulate this behavior, suppose that the initial population of bats is n. At

time - 1, the location and the flight velocity of the th bat are, respectively, and , and
Journal Pre-proof

the current global optimal location is . At time , the velocity and position of the th bat are

updated using the following equations:

(34)

(35)

(36)

where and are, respectively, the minimum and maximum values of bat frequencies, and

of
is the normalized random value [ ].

ro
The following equation is used to produce a new solution:

̄ -p (37)
re
where is a random number ( [ ]), ̄ is the average amplitude of all bats at time, and
lP

is a solution randomly selected from the current optimal solution. When a bat finds prey, it

changes amplitude and sound pulse emission rate by:


na

(38)
ur

[ ]
Jo

(39)

where and are random values, and are amplitude and pulse emission rate of a bat at time t,

respectively.

4.2.2 Cultural algorithm (CA)

The Cultural Algorithm (CA) is an evolutionary algorithm introduced by (Reynolds, 1994). It is a

dual-inheritance algorithm consisting of two search spaces – the belief space, which models the

cultural information about the population; and a population space, which represents individuals at

a genotypic and/or phenotypic level. These two spaces are connected via a communication
Journal Pre-proof

protocol that defines (i) the rules for selecting groups of individuals to adapt the belief set and (ii)

the way that the beliefs influence all individuals in the population space. CA is used for solving

optimization problems that require a large amount of domain knowledge with extensive data,

numerous domain limitations, many objectives, and multiple agents in a vast distributed social

network. The population space can include any population-based computational model, for

example genetic algorithms or evolutionary programming (Reynolds, 1994). The belief space

supports the information reservoir of all experiences among individuals.

of
4.2.3. Invasive Weed Optimization

ro
An Invasive Weed Optimization (IWO) algorithm proposed by (Mehrabian and Lucas, 2006) is a

-p
novel population-based, evolutionary optimization algorithm that tries to simulate resistance,
re
adaptability, and randomness of a weed community. IWO searches for the general optimal

solution of the problem in the solution space. To simulate the behavior of weeds, the algorithm
lP

operates through three steps: initialization, reproduction, spatial dispersal and selection.
na

Initialization
ur

In IWO, weeds represent the feasible solutions of problems. The initial population with Nwo weed

individuals is randomly generated in the solution space, in which each weed consists of variables
Jo

that represent a feasible solution.

Reproduction

Each weed in the population then reproduces seeds. Every weed produces new weeds

depending on its fitness. Weeds with higher fitness produce more seeds. The formula of weeds

producing seeds is:

f- f min
weed n =  s max - smin  + s min (40)
f max - f min
Journal Pre-proof

where is the fitness value of current weed, and are, respectively, the maximum and

the minimum fitness values of the current population, and and represent, respectively

the maximum and the minimum number of seeds.

Spatial dispersal

Seeds in the normally distributed group with a mean planting position and standard deviation are

produced by the following equation:

of
( ) ( ) ( ) ( )

ro
(41)

where is the number of maximum iterations,


-p
is the current standard deviation, and is the
re
nonlinear modulation index.

Competitive exclusion
lP

If the number of grass plants exceeds the maximum numbers of grasses in the group, the grass
na

with the worst fitness is removed from the group so that a constant number of plants remain. This

process continues until the maximum number of iterations is reached, and then the minimum
ur

colony cost function of the grass plants is stored.


Jo

4.2.4. Imperialistic Competitive Algorithm

The Imperialistic Competitive Algorithm (ICA) is a population-based metaheuristic algorithm

used to solve many types of optimization problems. The goal is to find an optimal solution in an

array of variable values called a ‗country‘. The cost to a country is calculated by evaluating the

cost function for that country. The better solution is one with less cost. The best solution with the

lowest cost is chosen and set by imperialists. The rest of the countries are ‗colonies‘.

There are colonies, and . Initial empires are formed by


Journal Pre-proof

assigning the colonies to imperialists and are formulated according to the power of the

imperialist:

Yk
Pk  (42)
v 1Y v
N im

where is the power of imperialist and { } { } shows the

normalized cost, where is the cost to imperialist .

The number of initial colonies possessed by imperialist is calculated as ‗round‘ {

of
}{ }, where ‗round‘ is a function that provides the nearest integer of a fractional

ro
number in the set of colonies of imperialist . In the assimilation process, a colony in each empire

moves in the direction toward its imperialist. The moving distance-p is a random number in
re
interval [ ] , where and is the distance between colony and imperialist.
lP

‗Revolution‘ involves a change in the position of some colonies. After assimilation and revolution

are completed within an empire, the cost of each colony is compared with that of its imperialist,
na

and a colony is swapped with the imperialist if the colony has less cost than the imperialist.
ur

Imperialist competition is an important step based on the total power of an empire.


Jo

If be the total cost of empire , we first calculate for each empire as:

{ ( )}

{ ( )} (43)

where is a positive number between 0 and 1, but close to 0. We then compute the normalized

total cost of empire and the power of empire by:

{ } { }

(44)
Journal Pre-proof

NTC k
EPk  (45)
v 1 NTC v
N im

After a vector | ||

| is defined, the weakest colony from the weakest empire is assigned to the empire with the

largest index, where is a random number chosen from a uniform distribution in [0, 1].

4.2.5 Firefly Algorithm

of
The Firefly Algorithm (FA) is a population-based metaheuristic algorithm for solving

optimization problems developed by (Yang, 2010a). The below assumptions were made in

ro
formulating this algorithm:

-p
(i) Fireflies are unisexual. So, one firefly will be attracted to other fireflies regardless of its sex.
re
(ii) Attractiveness is proportional to firefly brightness. Therefore, the fireflies with higher
lP

brightness have higher attractiveness to others. However, the attractiveness decreased when the

distance of the two fireflies increases.


na

(iii) If there is no brighter one, a bright firefly will move randomly.


ur

According to (Yang, 2009), the attractiveness of a firefly is determined by its light intensity and
Jo

the attractiveness can be defined as follows:

(46)

where is the attractiveness parameter at , and is the distance between two

fireflies. The parameter is the absorption coefficient, which is usually 1. The distance between

the two fireflies and is defined as follows:


Journal Pre-proof

√∑ ( )

(47)

where D is the problem dimension.

Movements of fireflies are based on their attractiveness. The movement of a less attractive

firefly , which is attracted to a brighter firefly , is determined by:

( ) (

of
) (48)

ro
where and are the
-p
dimension values of firefly and firefly ,
re
respectively. Besides, where is a random variable

that is uniformly distributed in the range [0, 1], [ ] [ ] is the step parameter, and
lP

indicates the iteration numbers. Thus, ( ) means


na

that firefly is better than firefly in terms of its fitness value.


ur

5 Validation and comparison of the models


Jo

5.1 Statistical measures

We determined the goodness-of-fit and performance of all the models for our flash flood mapping

using a variety of statistical metrics, including sensitivity, specificity, accuracy, MSE, and RMSE.

Sensitivity is the number of flood pixels that correctly classified as flood, whereas specificity is

the number of flood locations that correctly classified as non-flood locations (Chapi et al., 2017;

Shafizadeh-Moghadam et al., 2018; Khosravi et al., 2019). Accuracy refers to the number of flood

and non-flood locations correctly classified as, respectively, flood and non-flood. The lower the
Journal Pre-proof

MSE and RMSE metrics, the higher the performance of the model (Bui et al., 2018a). All

statistical metrics were computed based on true positive (TRP), true negative (TRN), false

positive (FAP), and false negative (FAN) scores. The metrics can be expressed as:

(49)

(50)

(51)

of
√ ∑ (52)

ro
where pv is the predictive value in the training or testing dataset, tv is the target value (actual)

-p
from the flood susceptibility models, and n is the total number of samples.
re
5.2 ROC curve and AUC analysis
lP

The ROC curve is a graphical tool used to assess the performance of the model (Fawcett, 2006;

Gorsevski et al., 2006). It is plotted with sensitivity (TP Rate) on the y-axis and 1-specificity (FP
na

Rate) on the x-axis (Hanley, 1989). A specific decision criterion can be extracted for each point
ur

on the ROC curve to predict the accuracy of the model (Shirzadi et al., 2018). Quantitatively, the
Jo

area under the ROC curve (AUC) is used to assessing model performance; the higher the value of

AUC (in the case of an accurate model, AUC close to 1), the higher the performance of the model

(Shirzadi et al., 2019).

5.3 Statistical tests (Friedman test and Wilcoxon sign rank test)

We used Friedman and Wilcoxon sign rank tests in this study to validate and compare the

performance of the flood models. The Freidman test, which is a non-parametric test introduced by

Friedman (19xx), is one of the most reliable tests for documenting differences among models

(Shirzadi et al., 2019). In our study, we assume the null hypothesis that there is no difference
Journal Pre-proof

between the two flood models and then calculate the p-value and chi-square (χ2) value. If the p-

value is smaller than α=0.05 (standard value) and the χ2 is higher than 3.841 (standard value), the

null hypothesis is rejected (Chen et al., 2019a) and therefore there is a significant difference

between the two models. However, the Freidman test cannot provide pairwise comparison of the

flood models; thus the Wilcoxon sign rank test is used (Khosravi et al., 2018). The Wilcoxon sign

rank test is based on the same null hypothesis as the Freidman test; however, two values (p and z)

are calculated for evaluation. If the p-value is smaller than α = 0.05 and the z value exceeds

of
critical values ranging from −1.96 to +1.96, the null hypothesis is rejected (Miraki et al., 2019)

ro
and there is a significant difference in a pairwise comparison of the models.

-p
5.4 Factor selection using One-R Attribute Evaluation method
re
The effectiveness of a flood susceptibility assessment depends significantly on the quality of the
lP

data used, especially the factors that affect flood occurrences in the selected area (Nguyen et al.,

2019b; Nohani et al., 2019). There may be some factors that are initially selected that are not
na

important for modeling flood susceptibility. Therefore, it is essential to evaluate the importance of
ur

each factor so that the most suitable factors can be chosen to best model flood susceptibility

(Pham et al., 2019a). We selected the One-R Attribute Evaluation method (ORAE) (Nguyen et al.,
Jo

2019a) for this study to evaluate the importance of each conditioning factor for flood

susceptibility modeling. ORAE helps to increase the quality of data used and improve the

performance of models by preventing redundancy, decreasing noise and the dimensionality of

input space, and dealing with over-fitting problems (Micheletti et al., 2014). This method ranks

the importance of conditioning factors by determining the statistical correlation between a set of

input variables and output variables (Kavitha et al., 2012). In this method, one rule (One-R) is

separately built for each input variable in the training dataset, and thereafter the rule with smallest
Journal Pre-proof

error metric is selected for independently sorting all variables according to their importance to

solve flood prediction problems (Nguyen et al., 2019a).

6. Result and analysis

6.1. Correlation between flood conditioning factors and flood locations based on SWARA weights

In this study, we used SWARA weights (SW) to determine which class of each conditioning

factor is most closely related to flood occurrence (Fig. 9). It is evident that the lower the slope

of
angle value, the lower the probability of flood incidence. The first class of slope angle (0°-0.5°)

ro
has the highest SWARA weight (0.4) compared to other classes. The trend of the weights in the

-p
elevation factor is similar to the slope angle trend – the lower the elevation, the higher the SW
re
value, and accordingly the lower the probability of flood occurrence. The highest and the lowest

weights were obtained for, respectively, the first (328–350 m) and the last (>4000 m) elevation
lP

classes. Concave (SW=0.46) and flat (SW=0.43) slopes have the highest-class weights and thus
na

are less important for flood occurrence in the study area than convex slopes. The SPI class of

2000–3000 has the highest SW (0.32) within the SPI group and thus is less susceptible to flood
ur

occurrence than other SPI classes. Similarly, as the SW of TWI increases, the probability of flood
Jo

occurrence decreases; the last class (6.96–11.5) has the highest weight (SW=0.08) within this

group. In the case of river density, the highest SW (0.37) was obtained for the first and second

classes of river density. Distance to river shows a significant relation to flood locations that is

similar to slope angle and elevation. The less the distance to the river network will be, the higher

the SW and thus the higher the susceptibility to flooding is. The first class (0–50 m) has SW of

0.59. The class of lithology most susceptible to flooding is the Triassic formation (SW=0.31).

Areas already covered by water (water bodies) are most susceptible to flooding (SW=0.75);
Journal Pre-proof

weights for other land-use classes are 0.15 (residential area) 0.06 (gardens), 0.02 (forest land),

0.01 (grassland), and 0.00 (farmland and barren land). Finally, the lower the rainfall is, the higher

the SW will be (0.40 for 183–333 mm), and therefore the lower the probability for flood

incidence.

6.2. The most important factors for flood modeling

We assessed the predictive power of each conditioning factor for flood occurrence using the

of
ORAE technique in 10-fold cross-validation on the training dataset. The Average Merit (AM) of

ro
ORAE was computed to determine the importance of the factors (Fig. 10). Slope angle with the

-p
highest value of average merit (AM=88.848) was the most important factor and also has the
re
highest predictive power for flood modeling. It is followed by distance to river (AM=87.050),

drainage density (AM=85.251), TWI (AM=80.935), elevation (AM=78.057), curvature


lP

(AM=75.899), SPI (AM=74.460), lithology (AM=56.115), rainfall (AM=55.036), and land use
na

(AM=51.798). The AM values show that all conditioning factors play a decisive role in flood

incidence.
ur
Jo

6.3 Application of the novel deep learning model

We designed and developed our novel deep learning model (DBPGA) in MATLAB R2018 and

ArcGIS 10.3. The model was trained (it learned) with the training dataset, similar to other smart

learning models. We selected 80% of all data for training (the modeling process) and the rest

(20%) for validation (Fig. 11). Fig. 11a and d show how flood (target) and non-flood (output)

values compare. The lower the distance between the target and output, the better the model was
Journal Pre-proof

successfully trained. The goodness of fit and the performance of the proposed model were

checked with MSE and RMSE metrics. Fig. 11b and e show these values for, respectively, the

training and testing datasets. The values of MSE and RMSE in the modeling process using the

training dataset are, respectively, 0.053 and 0.232 (Fig. 11a, d), and for the testing dataset 0.050

and 0.224 (Fig. 11b, e). Also, standard deviation and mean are reported for the training (0.00,

0.232) and testing (–0.02, 0.225) data sets (Fig. 11c, f).

of
ro
6.4 Development of the flood susceptibility map

-p
Our novel deep learning proposed model (DBPGA) learned and performed well based on the
re
training and testing datasets and its outperformed benchmark models. The next step, study area

converted to a CSV format, and then a Flood Susceptibility Index (FSI) for each pixel of the study
lP

area calculated. We next prepared a FSM using the classification method of natural breaks and
na

based on the FSIs of all pixels ( Chapi et al., 2017; Chen et al., 2018; Bui et al., 2019; Shirzadi et

al., 2019). We tested three well-known classification methods in the GIS: the natural breaks,
ur

quantile, and geometrical interval methods. We found that the natural breaks method performed
Jo

best. The geometrical interval method underestimated FSIs, and the quantile method placed most

areas far from the river network and also placed slopes in the high and very high susceptibility

classes.

The FSM prepared using the method of natural breaks includes five classes of flood

susceptibility: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility

(MS), high susceptibility (HS), and very high susceptibility (VHS) (Fig. 12). High-risk flood
Journal Pre-proof

inundation areas are readily discernible on this map. We enlarged two places on this map to

confirm graphically the performance of the model.

Fig. 13 shows the susceptibility classes generated by the deep learning model using the three

classification methods. Using the natural breaks classification method, we find that the VLS

covers the largest area (61.192%), followed by the VHS (24.499%), LS (6.981%), HS (3.825%),

and MS (3.503%) classes. The VHS class included the largest percentage of flood locations

(98.010%), followed by the HS class (1.990%). The very high percentages of flood locations in

of
the VHS class confirms the capability of the proposed model. The quantile method is the next best

ro
classification method.

-p
re
6.5. Performance evaluation of the proposed model
lP

We further evaluated the classification performance of our proposed deep learning model by

comparing it with some machine-learning, soft-computing benchmark models and also some
na

optimization algorithms that have been previously used for flood modeling in the Haraz
ur

watershed.

6.5.1 Parameter tuning


Jo

The proper selection of parameters, used in the modeling process, is a critical issue to obtain an

appropriate solution (Ronoud and Asadi, 2019). Table 3 shows the optimal parameters used in

evolutionary models. These parameters were set for our model by trial-and-error using the GA

algorithm.

6.5.2 Classification performance


Journal Pre-proof

We ran our deep learning model 20 times to document its classification performance. The average

classification accuracy, sensitivity, and specificity of our model and other classification models,

which are presented in Table 4, are based on the confusion matrix shown in Fig. 14. The best

topology obtained by the GA algorithm is 10 (number of inputs)-25 (number of neurons in the

hidden layer)-1 (output) (Table 3). The optimization algorithms are compared with the proposed

model using MSE, RMSE, SD, and Mean statistical metrics (Table 5). The results in Tables 3 and

4 allow us to highlight key observations and results.

of
(i) The DBPGA model has the best sensitivity (100%), indicating that the model correctly

ro
classified all 155 flood locations as the flood. It performed better than other all machine learning

algorithms. -p
re
(ii) The DBPGA model did not predict non-flood locations as well as other machine learning
lP

models. Its specificity of 87.500% indicates that the model correctly classified 87.5% of non-

flood locations as non-flood. Although this value is high, it is lower than the specificity of all
na

other machine learning models except the ADT model.


ur

(iii) The value of the accuracy metric of the DBPGA is the highest (93.589%) of the models with

which it is compared. It indicates that our model successfully predicted 92.308% of flood and
Jo

non-flood locations correctly.

(iv) MSE and RMSE errors of our model are lower than those of other optimization algorithms

and are reasonable and acceptable.

In summary, the DBPGA model outperformed and outclassed other optimization algorithms in

terms of sensitivity and accuracy measures because it has a robust topology.


Journal Pre-proof

6.5.3 ROC curves and AUC values

Figure 15 shows the ROC curves for the training (goodness-of-fit or performance) and testing

(prediction accuracy) datasets. This figure shows that our deep learning model has high

performance (AUC=0.988) and prediction accuracy (AUC=0.985). We conducted a literature

review for the study area and discovered that some research has previously been done on flood

susceptibility mapping using machine learning and optimization algorithms. The results of these

studies are shown in Table 6 and Fig. 16. AUC of the training dataset for the LR, LMT, BLR,

of
ADT, NBT, REPTree, ANFIS-BAT, ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA

ro
models are, respectively, 0.886, 0.967, 0.966, *, *, *, 0.946, 0.942, 0.948, 0.951, and 0.932, where

-p
the symbol ―*‖ indicates that values for ADT, NBT, and REPTree were not reported in previously
re
published work. The corresponding values of AUC for the testing dataset are 0.985, 0.885, 0.934,

0.936, 0.976, 0.974, 0.811, 0.944, 0.921, 0.939, 0.947, and 0.917. These data indicate that the
lP

DBPGA deep learning model has a higher performance and prediction accuracy than all other
na

models that have been used for flood modeling in the study area.
ur
Jo

We also assessed differences between the new model and benchmark models through statistical

inference. The Friedman test indicates that, at a 95% confidence level, there are differences in the

performance of the new model and all benchmark models (Table 7). We checked the pairwise

differences between the models using the Wilcoxon signed-rank test (Table 8). At the 95%

confidence level, the new model and each compared model performed differently, leading us to

reject the null hypothesis.


Journal Pre-proof

7. Discussion

Risks from flash floods may be increasing due to climate and land-use changes and population

increases; thus, there is the need for better flood susceptibility mapping. Many models, including

regression and rainfall-runoff models are limited by the lack of hydrological monitoring data.

Newer ―on-off‖ classification models, such as machine learning artificial intelligence algorithms,

show more promise with more modest data needs. Artificial intelligence models can more

accurately predict and map flood-prone areas by taking into account all factors controlling

of
flooding. In this study, we developed a new deep learning intelligence model, DBPGA, for flash

ro
flood susceptibility mapping in the Haraz watershed in northern Iran. We used 194 floods and 194

-p
non-flood locations to test ten conditioning factors by the ORAE technique. We also used the
re
SWARA model to determine spatial relationships between flooding and conditioning factors.
lP

The results of the SWARA model (Fig. 8) show that floods in the study area mainly occur on low-

angle slopes at lower elevations, in agreement with results of other studies ( Khosravi et al.,
na

2016a, 2018; Tien Bui et al., 2018a, 2019b). For example, Khosravi et al. (2018) used four

machine learning algorithms (LMT, REPT, NBT, and ADT) and 11 conditioning factors tested by
ur

the IGR technique to model flooding in Haraz They concluded that slope is the most important
Jo

factor in flood susceptibility mapping in the Haraz area. They further argued that there with less

time for infiltration of water on steeper ground, leading to increase on these slopes. Tien Bui et al.

(2018b) used two novel hybrid algorithms (ANFIS-ICA and ANFIS-FA) to predict the spatial

distribution of floods in the Haraz watershed. They performed a sensitivity analysis to check the

importance of the ten flood conditioning factors they tested and found that all 10 factors were

significant for predicting flood occurrence in the study area.


Journal Pre-proof

Flat and concave slope forms are more prone to flooding than convex slopes. Flat slope forms

typically lower in elevation and thus are more likely to receive and collect overbank flows and

runoff (Tien Bui et al., 2018a). Concave slope affects flooding by converging flows toward flat

ground. They hold more water within a smaller area during a storm or a period of snowmelt,

become more rapidly saturated than convex slopes, and thus are more prone to flooding. These

results are consistent with those of Pradhan (2010), Kazakis et al. (2015) and Cao et al. (2016)

who similarly argue that curvature affects the amount of surface runoff and infiltration.

of
Distance to river is another important factor for flood modeling, especially in mountainous areas.

ro
The SWARA analysis reveals that lands bordering rivers (flood plains) are at higher risk of

-p
flooding because they are vulnerable to overbank flows. Indeed, distance to river is considered to
re
be one of the most important conditioning factors in most previous flood susceptibility
lP

assessments ( Youssef et al., 2016; Tien Bui et al., 2018b; Ahmadlou et al., 2019; Tien Bui et al.,

2019b). For example, Pham et al. (2020) used Credal Decision Tree (CDT) as a base classifier
na

along with four ensemble models including AdaBoostM1, Bagging, Dagging, and MultiBoostAB

for flood modeling in Markazi Province, Iran. They found that distance to river was the most
ur

important factor for flood incidence in their study area.


Jo

TWI provides a measure of water accumulation on a surface (Tien Bui et al., 2018a) through its

relations to soil moisture and topography. The SWARA analysis showed that the probability of

flooding increases at higher TWI values. In high TWI pixels, infiltration is low; when it rains,

runoff selectively pools in these areas. Rainfall is a prominent factor involved in flooding;

although some researchers do not consider it as one of the conditioning factors for flood modeling

(Tehrany et al., 2015a; Youssef et al., 2016). In this study, we considered rainfall as a factor,

although its role of rainfall was not what we expected. We initially thought that the probability of
Journal Pre-proof

flooding would increase with increasing rainfall, but this proved not to be the case. The highest

rainfall is mainly at higher elevations in the mountains in the Haraz watershed, but most flood

locations are not in these areas. Floods, of course, will not happen without rainfall, but the

relationship between rain and flooding is more complex than one that simply equates the amount

of rain with the severity of flooding. Although many researchers consider rain as the most

important factor in the occurrence of floods, our results indicate that this is not always the case.

Khosravi et al. (2019) considered rainfall along with other factors in their modeling and

of
assessment of flooding in one of China‘s most flood-prone areas. Using the IGR technique, they

ro
found that rainfall ranked last, below NDVI and lithology, in explaining flood incidence in their

-p
study area. Additionally, Wang et al. (2019a) used an ensemble model (IRN-DEMATEL-ANP),
re
which is a combination of interval rough numbers (IRN), decision-making trial and evaluation

laboratory (DEMATEL), analytic network process (ANP), and weighted linear combination
lP

(WLC) methods, to evaluate flood susceptibility in Shangyou County, China. Rainfall ranked
na

eighth among 11 flood conditioning factors, far lower than the first and second factors, (elevation

and slope angle). On the other hand, Samanta et al. (2018) concluded that rainfall and TWI were
ur

the most important flood conditioning factors in a study that used the frequency ratio technique in
Jo

the Subarnarekha River basin, India.

The SWARA analysis revealed that areas of Triassic and Quaternary formations are more

susceptible flooding than areas with other lithologies. These formations have lower permeability

than other lithological units and hence during heavy rainfall or a period of rapid snowmelt, and

hence will become saturated sooner and more easily transfer runoff towards rivers. SWARA

showed that most residential and agricultural areas are located in areas with a high potential for

flooding. Residential areas in the Haraz watershed are susceptible to flooding because they have
Journal Pre-proof

primarily hardened impermeable surfaces. Most agriculture in the northern part of Iran and

notably in the Haraz watershed is rice cultivation. The water table in these areas is at or very close

to the ground surface, and therefore many of these areas are vulnerable to flooding. Although

lithology and land use proved to be important factors in this study, they ranked in eighth and tenth

place, respectively. Wang et al. (2019a) found that, of the 11 flood conditioning factors they

considered, land use ranked ninth and lithology tenth. In contrast, Khosravi et al. (2019)

concluded that lithology and land use ranked second and third among 12 flood conditioning

of
factors, respectively, after NDVI, which ranked first.

ro
We used the ORAE technique to prioritize conditioning factors. The results accord with the

-p
findings of the SWARA analysis, in that slope and distance to river are the most critical factors
re
for flood modeling. These factors are essential in most catchments but are accentuated in

mountainous watersheds.
lP

We developed a new deep learning model with a structure optimized by a BP algorithm. After
na

training the model, we produced a flood susceptibility map. We then compared our map with

some earlier maps generated by other machine learning intelligence models, including LR, LMT,
ur

BLR, ADT, NBT, and REPT and also some optimization algorithms, including ANFIS-BAT,
Jo

ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA. Our new proposed model has the highest

goodness of fit and performance, and based on the training and testing datasets, is more powerful

and outperforms the other models.

Deep belief network has become a promising approach in machine learning because of the

advantages it offers over other methods, including quick inference and the ability to encode the

high-order structures of a network. DBN uses a hierarchical structure with several Restricted

Boltzmann Machines that operate through a greedy layer-wise learning algorithm, one layer at a
Journal Pre-proof

time. Finally, the stochastic gradient descent allows the user to fine-tune the entire network

according to the supervised training criteria. The unsupervised RBM-based pre-training step

initializes the network using only unlabeled data. Network initialization has proven to be a good

starting point for the next supervised fine-tuning step and significantly reduces the risk of being

trapped in the local optima (Kustikova and Druzhkov, 2014).

The architecture of a neural network affects the learning capacity and generalizability of the

network. DBPGA uses the Genetic Algorithm to find the optimal or near-optimal topology of the

of
DBN. GA searches within the very large solution space of the network topology via genetic

ro
crossover and mutation operators and optimizes the DBN topology. The DBN with RBM-based

-p
pre-training and optimization of the network was optimized by GA.
re
There are, however, still many problems with flood modeling. The uncertainties arising from the
lP

input data as well as the selection of the appropriate model to fit the training data are noteworthy.

Nevertheless, by reducing the uncertainties, a suitable model for more accurate flood prediction
na

can be obtained. Thus, it is possible to select more conditioning factors than can be rasterized and

then use one of the factor selection methods to select the factors that are most important. Factors
ur

that have less influence on modeling or cause over-fitting and noise problems can be removed;
Jo

thus, to some extent, increase the accuracy of the model prediction.

On the other hand, the results of a given model may differ from region to region, and even within

a given region. Therefore, the best model or method should be chosen using a trial-and-error

method to achieve the most significant predictive power and save time and money. A specific

standard for flood modeling that can be generalized to all watersheds or regions is not possible

due to the differences in environmental factors, as well as the different structures of the models.

We are far from creating a standard framework for machine learning modeling that is accepted
Journal Pre-proof

and used by all researchers in all catchments in the same way as hydrological physical-based

models. We suggest that the path forward is to seek ways to reduce uncertainties and generate

flood susceptibility maps with high accuracy. Efforts to do this will rely on the development and

combination of flood studies with GIS and data mining tools to create a powerful technique that

will increase the power prediction of flood models.

8. Conclusion

In this study, we tested a new robust deep learning model (DBPGA) to spatially predict flash

of
floods in the Haraz watershed, northern Iran. We used 194 floods and 194 non-flood locations to

ro
construct training and testing databases for, respectively, the modeling and evaluation processes.

-p
Eleven initially selected conditioning factors were assessed by the One-R Attribute Evaluation
re
(ORAE) method in the modeling stage by training the dataset. Our model successfully learned

from iterative inputs, and its applicability was confirmed by statistical measures, to recognize and
lP

detect flood-prone areas in the study area. The most important findings can be summarized as
na

follows: (i) Although all 11 flood conditioning factors affect flood occurrence, the most important

factor is the slope angle. It is followed by distance-to-river and river density factors, reflecting the
ur

fact that the Haraz watershed is mountainous with high steep slopes that transfer water towards
Jo

rivers, resulting in overbank flooding.

(ii) The SWARA technique indicated that flash floods preferentially occur on flat slopes at low

elevations, near rivers, and in areas with high drainage density, high TWI values, and lower

rainfall.

(iii) DBPGA shows promise for use in other regions prone to flash floods due to its functionality

and high performance.


Journal Pre-proof

(iv) The goodness-of-fit and prediction accuracy of the new proposed model exceed those of other

machine learning models (LR, LMT, BLR, ADT, NBT, and REPTree), and optimization

algorithms (ANFIS-BAT, ANFIS-CA, ANFIS-IWO, ANFIS-ICA, and ANFIS-FA) that have

previously been used in the Haraz watershed.

(v) By defining a proper topology in our new proposed model, we have made a contribution

towards building a powerful flash flood susceptibility mapping tool.

of
Acknowledgment

ro
This research was financial supported by the Iran National Science Foundation (INSF) through

research project No. 96004000.


-p
re
Declaration of interests
lP

The authors declare that they have no known competing financial interests or personal relationships that
na

could have appeared to influence the work reported in this paper.


ur
Jo

References

Ahmadizar, F., Soltanian, K., AkhlaghianTab, F. and Tsoulos, I., 2015, Artificial neural network

development by means of a novel combination of grammatical evolution and genetic

algorithm. Engineering Applications of Artificial Intelligence 39, 1-13.

Ahmadlou, M., Karimi, M., Alizadeh, S., Shirzadi, A., Parvinnejhad, D., Shahabi, H. and Panahi,

M., 2019, Flood susceptibility assessment using integration of adaptive network-based

fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT

algorithms (BA). Geocarto International 34 (11), 1252-1272.


Journal Pre-proof

Ahmed, K.R. and Akter, S., 2017, Analysis of landcover change in southwest Bengal delta due to

floods by NDVI, NDWI and K-means cluster with Landsat multi-spectral surface

reflectance satellite data. Remote Sensing Applications: Society and Environment 8, 168-

181.

Arnell, N.W. and Gosling, S.N., 2016, The impacts of climate change on river flood risk at the

global scale. Climatic Change 134 (3), 387-401.

Avali, V.R., Cooper, G.F. and Gopalakrishnan, V., Year, Application of bayesian logistic

of
regression to mining biomedical data. AMIA Annual Symposium Proceedings, 266.

ro
Ball, J.E., Anderson, D.T. and Chan, C.S., 2017, Comprehensive survey of deep learning in

-p
remote sensing: theories, tools, and challenges for the community. Journal of Applied
re
Remote Sensing 11 (4), 042609.

Bengio, Y., 2009, Learning deep architectures for AI. Foundations and trends® in Machine
lP

Learning 2 (1), 1-127.


na

Beven, K.J. and Kirkby, M.J., 1979, A physically based, variable contributing area model of basin

hydrology/Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin


ur

versant. Hydrological Sciences Journal 24 (1), 43-69.


Jo

Beven, K.J., 2011, Rainfall-runoff modelling: the primer. John Wiley & Sons

Breiman, L., Friedman, J., Olshen, R. and Stone, C., 1984, Classification and regression trees–crc

press. Boca Raton, Florida.

Brunner, G.W., 1995, HEC-RAS River Analysis System. Hydraulic Reference Manual. Version

1.0, Report, Hydrologic Engineering Center Davis CA


Journal Pre-proof

Butler, D., Kokkalidou, A. and Makropoulos, C.K., 2006, Supporting the siting of new urban

developments for integrated urban water resource management. Integrated urban water

resources management. Springer19-34.

Cao, C., Xu, P., Wang, Y., Chen, J., Zheng, L. and Niu, C., 2016, Flash flood hazard

susceptibility mapping using frequency ratio and statistical index methods in coalmine

subsidence areas. Sustainability 8 (9), 948.

Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T. and Khosravi, K., 2017,

of
A novel hybrid artificial intelligence approach for flood susceptibility assessment.

ro
Environmental modelling & software 95, 229-245.

-p
Charlton, R., Fealy, R., Moore, S., Sweeney, J. and Murphy, C., 2006, Assessing the Impact of
re
Climate Change on Water Supply and Flood Hazard in Ireland Using Statistical

Downscaling and Hydrological Modelling Techniques. Climatic Change 74 (4), 475-491.


lP

Chen, W., Shahabi, H., Shirzadi, A., Li, T., Guo, C., Hong, H., Li, W., Pan, D., Hui, J. and Ma,
na

M., 2018, A novel ensemble approach of bivariate statistical-based logistic model tree

classifier for landslide susceptibility assessment. Geocarto International 33 (12), 1398-


ur

1420.
Jo

Chen, W., Pradhan, B., Li, S., Shahabi, H., Rizeei, H.M., Hou, E. and Wang, S., 2019a, Novel

hybrid integration approach of bagging-based fisher‘s linear discriminant function for

groundwater potential analysis. Natural Resources Research, 1-20.

Chen, W., Zhao, X., Shahabi, H., Shirzadi, A., Khosravi, K., Chai, H., Zhang, S., Zhang, L., Ma,

J. and Chen, Y., 2019b, Spatial prediction of landslide susceptibility by combining

evidential belief function, logistic regression and logistic model tree. Geocarto

International (just-accepted), 1-25.


Journal Pre-proof

Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., Wang, X., Bian, H., Zhang, S. and

Pradhan, B., 2020, Modeling flood susceptibility using data-driven approaches of naïve

bayes tree, alternating decision tree, and random forest methods. Science of The Total

Environment 701, 134979.

Costache, R. and Bui, D.T., 2020, Identification of areas prone to flash-flood phenomena using

multiple-criteria decision-making, bivariate statistics, machine learning and their

ensembles. Science of The Total Environment 712, 136492.

of
Derbyshire, E., Hails, J.R. and Gregory, K.J., 2013, Geomorphological processes: studies in

ro
physical geography. Elsevier

-p
Ding, A., Zhang, Q., Zhou, X. and Dai, B., Year, Automatic recognition of landslide based on
re
CNN and texture change detection. 2016 31st Youth Academic Annual Conference of

Chinese Association of Automation (YAC), 444-448.


lP

Dreyfus, S., 1973, The computational solution of optimal control problems with time lag. IEEE
na

Transactions on Automatic Control 18 (4), 383-385.

Elmore, A.J., Julian, J.P., Guinn, S.M. and Fitzpatrick, M.C., 2013, Potential stream density in
ur

Mid-Atlantic US watersheds. PLoS One 8 (8), e74819.


Jo

Fawcett, T., 2006, An introduction to ROC analysis. Pattern recognition letters 27 (8), 861-874.

Fernández, D. and Lutz, M., 2010, Urban flood hazard zoning in Tucumán Province, Argentina,

using GIS and multicriteria decision analysis. Engineering Geology 111 (1-4), 90-98.

Fraser, N. and Schumer, R., 2012, Low stream density watersheds produce flashier floods than

high stream density watersheds in ephemeral streams across the southwestern United

States. AGUFM 2012, H41F-1240.

Freund, Y. and Mason, L., Year, The alternating decision tree learning algorithm. icml, 124-133.
Journal Pre-proof

Galathiya, A., Ganatra, A. and Bhensdadia, C., 2012, Improved decision tree induction algorithm

with feature selection, cross validation, model complexity and reduced error pruning.

International Journal of Computer Science and Information Technologies 3 (2), 3427-

3431.

Gao, B.-C., 1996, NDWI—A normalized difference water index for remote sensing of vegetation

liquid water from space. Remote sensing of environment 58 (3), 257-266.

Ghorbanzadeh, O., Blaschke, T., Gholamnia, K., Meena, S.R., Tiede, D. and Aryal, J., 2019,

of
Evaluation of different machine learning methods and deep-learning convolutional neural

ro
networks for landslide detection. Remote Sensing 11 (2), 196.

-p
Ghosh, J.K., Delampady, M. and Samanta, T., 2007, An introduction to Bayesian analysis: theory
re
and methods. Springer Science & Business Media

Gorsevski, P.V., Gessler, P.E., Foltz, R.B. and Elliot, W.J., 2006, Spatial prediction of landslide
lP

hazard using logistic regression and ROC analysis. Transactions in GIS 10 (3), 395-415.
na

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S. and Lew, M.S., 2015, Deep learning for visual

understanding: A review. Neurocomputing.


ur

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S. and Lew, M.S., 2016, Deep learning for visual
Jo

understanding: A review. Neurocomputing 187, 27-48.

Hanley, J.A., 1989, Receiver operating characteristic (ROC) methodology: the state of the art.

Crit Rev Diagn Imaging 29 (3), 307-335.

Hinton, G., 2010, A practical guide to training restricted Boltzmann machines. Momentum 9 (1),

926.

Hinton, G.E., Osindero, S. and Teh, Y.-W., 2006, A fast learning algorithm for deep belief nets.

Neural Comput 18 (7), 1527-1554.


Journal Pre-proof

Hinton, G.E., 2012, A practical guide to training restricted Boltzmann machines. Neural

networks: Tricks of the trade. Springer599-619.

Hong, H., Pradhan, B., Xu, C. and Bui, D.T., 2015, Spatial prediction of landslide hazard at the

Yihuang area (China) using two-class kernel logistic regression, alternating decision tree

and support vector machines. CATENA 133, 266-281.

Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.-X., Chen, W., Kougias, I. and

Kazakis, N., 2018, Flood susceptibility assessment in Hengfeng area coupling adaptive

of
neuro-fuzzy inference system with genetic algorithm and differential evolution. Science of

ro
The Total Environment 621, 1124-1141.

-p
Huang, L. and Xiang, L.-y., 2018, Method for Meteorological Early Warning of Precipitation-
re
Induced Landslides Based on Deep Neural Network. Neural Processing Letters 48 (2),

1243-1260.
lP

Huppert, H.E. and Sparks, R.S.J., 2006, Extreme natural hazards: population growth,
na

globalization and environmental change. Philosophical Transactions of the Royal Society

A: Mathematical, Physical and Engineering Sciences 364 (1845), 1875-1888.


ur

Kavitha, A., Kavitha, R. and Viji Gripsy, J., 2012, Empirical Evaluation of Feature Selection
Jo

Technique in Educational Data Mining.. ARPN Journal of Science and Technology, 2 11.

Kazakis, N., Kougias, I. and Patsialis, T., 2015, Assessment of flood hazard areas at a regional

scale using an index-based approach and Analytical Hierarchy Process: Application in

Rhodope–Evros region, Greece. Science of the Total Environment 538, 555-563.

Keyvanrad, M.A. and Homayounpour, M.M., 2015, Deep Belief Network Training Improvement

Using Elite Samples Minimizing Free Energy. International Journal of Pattern

Recognition and Artificial Intelligence 29 (05), 1551006.


Journal Pre-proof

Khosravi, K., Nohani, E., Maroufinia, E. and Pourghasemi, H.R., 2016a, A GIS-based flood

susceptibility assessment and its mapping in Iran: a comparison between frequency ratio

and weights-of-evidence bivariate statistical models with multi-criteria decision-making

technique. Natural Hazards 83 (2), 947-987.

Khosravi, K., Pourghasemi, H.R., Chapi, K. and Bahri, M., 2016b, Flash flood susceptibility

analysis and its mapping using different bivariate models in Iran: a comparison between

Shannon‘s entropy, statistical index, and weighting factor models. Environmental

of
monitoring and assessment 188 (12), 656.

ro
Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I. and Bui,

-p
D.T., 2018, A comparative assessment of decision trees algorithms for flash flood
re
susceptibility modeling at Haraz watershed, northern Iran. Science of the Total

Environment 627, 744-755.


lP

Khosravi, K., Shahabi, H., Pham, B.T., Adamowski, J., Shirzadi, A., Pradhan, B., Dou, J., Ly, H.-
na

B., Gróf, G., Ho, H.L., Hong, H., Chapi, K. and Prakash, I., 2019, A comparative

assessment of flood susceptibility modeling using Multi-Criteria Decision-Making


ur

Analysis and Machine Learning Methods. Journal of Hydrology 573, 311-323.


Jo

Kia, M.B., Pirasteh, S., Pradhan, B., Mahmud, A.R., Sulaiman, W.N.A. and Moradi, A., 2012, An

artificial neural network model for flood simulation using GIS: Johor River Basin,

Malaysia. Environmental Earth Sciences 67 (1), 251-264.

Kim, B., Sanders, B.F., Famiglietti, J.S. and Guinot, V., 2015, Urban flood modeling with porous

shallow-water equations: A case study of model errors in the presence of anisotropic

porosity. Journal of Hydrology 523, 680-692.


Journal Pre-proof

Kohavi, R., Year, Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Kdd,

202-207.

Kron, W., 2002, Keynote lecture: Flood risk= hazard× exposure× vulnerability. Flood defence,

82-97.

Kustikova, V. and Druzhkov, P., 2014, A survey of deep learning methods and software for image

classification and object detection. OGRW2014 5.

Landwehr, N., Hall, M. and Frank, E., 2005, Logistic model trees. Machine Learning 59 (1-2),

of
161-205.

ro
Larochelle, H., Erhan, D., Courville, A., Bergstra, J. and Bengio, Y., Year, An empirical

-p
evaluation of deep architectures on problems with many factors of variation. Proceedings
re
of the 24th international conference on Machine learning, 473-480.

Le Roux, N. and Bengio, Y., 2008, Representational power of restricted Boltzmann machines and
lP

deep belief networks. Neural Comput 20 (6), 1631-1649.


na

Lohani, A.K., Goel, N. and Bhatia, K., 2014, Improving real time flood forecasting using fuzzy

inference system. Journal of hydrology 509, 25-41.


ur

Lopes, N. and Ribeiro, B., 2015, Machine Learning for Adaptive Many-core Machines: A
Jo

Practical Approach. Springer

Manfreda, S., Di Leo, M. and Sole, A., 2011, Detection of flood-prone areas using digital

elevation models. Journal of Hydrologic Engineering 16 (10), 781-790.

Mansourypoor, F. and Asadi, S., 2017, Development of a Reinforcement Learning-based

Evolutionary Fuzzy Rule-Based System for diabetes diagnosis. Computers in Biology and

Medicine 91, 337-352.


Journal Pre-proof

Marchi, L., Borga, M., Preciso, E. and Gaume, E., 2010, Characterisation of selected extreme

flash floods in Europe and implications for flood risk management. Journal of Hydrology

394 (1-2), 118-133.

Marcus, G., 2018, Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mehmanpazir, F. and Asadi, S., 2017, Development of an evolutionary fuzzy expert system for

estimating future behavior of stock price. Journal of Industrial Engineering International

13 (1), 29-46.

of
Mehrabian, A.R. and Lucas, C., 2006, A novel numerical optimization algorithm inspired from

ro
weed colonization. Ecological informatics 1 (4), 355-366.

-p
Mekanik, F., Imteaz, M., Gato-Trinidad, S. and Elmahdi, A., 2013, Multiple regression and
re
Artificial Neural Network for long-term rainfall forecasting using large scale climate

modes. Journal of Hydrology 503, 11-21.


lP

Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, M. and
na

Kanevski, M., 2014, Machine learning feature selection methods for landslide

susceptibility mapping. Mathematical Geosciences 46 (1), 33-57.


ur

Miraki, S., Zanganeh, S.H., Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H. and Pham, B.T.,
Jo

2019, Mapping groundwater potential using a novel hybrid intelligence approach. Water

resources management 33 (1), 281-302.

Moore, I.D. and Wilson, J.P., 1992, Length-slope factors for the Revised Universal Soil Loss

Equation: Simplified method of estimation. Journal of soil and water conservation 47 (5),

423-428.
Journal Pre-proof

Mosavi, A., Rabczuk, T. and Varkonyi-Koczy, A.R., Year, Reviewing the novel machine learning

tools for materials design. International Conference on Global Research and Education,

50-58.

Mosavi, A., Ozturk, P. and Chau, K.-w., 2018, Flood prediction using machine learning models:

Literature review. Water 10 (11), 1536.

Mousavi, S.Z., Kavian, A., Soleimani, K., Mousavi, S.R. and Shirzadi, A., 2011, GIS-based

spatial prediction of landslide susceptibility using logistic regression model. Geomatics,

of
Natural Hazards and Risk 2 (1), 33-50.

ro
Nayak, P., Sudheer, K., Rangan, D. and Ramasastri, K., 2005, Short‐term flood forecasting with a

-p
neurofuzzy model. Water Resources Research 41 (4).
re
Nguyen, P.T., Tuyen, T.T., Shirzadi, A., Pham, B.T., Shahabi, H., Omidvar, E., Amini, A.,

Entezami, H., Prakash, I., Phong, T.V., Vu, T.B., Thanh, T., Saro, L. and Bui, D.T.,
lP

2019a, Development of a Novel Hybrid Intelligence Approach for Landslide Spatial


na

Prediction. Applied Sciences 9 (14), 2824.

Nguyen, V.V., Pham, B.T., Vu, B.T., Prakash, I., Jha, S., Shahabi, H., Shirzadi, A., Ba, D.N.,
ur

Kumar, R. and Chatterjee, J.M., 2019b, Hybrid machine learning approaches for landslide
Jo

susceptibility modeling. Forests 10 (2), 157.

Nielsen, M.A., 2015, Neural networks and deep learning. Determination press San Francisco, CA,

USA:

Nohani, E., Moharrami, M., Sharafi, S., Khosravi, K., Pradhan, B., Pham, B.T., Lee, S. and

Melesse, A.M., 2019, Landslide Susceptibility Mapping Using Different GIS-Based

Bivariate Models. Water 11 (7), 1402.


Journal Pre-proof

Organization, W.M., 1994, Guide to hydrological practices. Secretariat of the World

Meteorological Organization

Palm, R.B., 2012, Prediction as a candidate for learning deep hierarchical models of data.

Technical University of Denmark, Palm 25.

Pham, B.T., Bui, D.T., Prakash, I. and Dholakia, M., 2017, Hybrid integration of Multilayer

Perceptron Neural Networks and machine learning ensembles for landslide susceptibility

assessment at Himalayan area (India) using GIS. Catena 149, 52-63.

of
Pham, B.T., Shirzadi, A., Bui, D.T., Prakash, I. and Dholakia, M., 2018, A hybrid machine

ro
learning ensemble approach based on a radial basis function neural network and rotation

-p
forest for landslide susceptibility modeling: A case study in the Himalayan area, India.
re
International Journal of Sediment Research 33 (2), 157-170.

Pham, B.T., Prakash, I., Dou, J., Singh, S.K., Trinh, P.T., Tran, H.T., Le, T.M., Van Phong, T.,
lP

Khoi, D.K. and Shirzadi, A., 2019a, A novel hybrid approach of landslide susceptibility
na

modelling using rotation forest ensemble and different base classifiers. Geocarto

International, 1-25.
ur

Pham, B.T., Prakash, I., Singh, S.K., Shirzadi, A., Shahabi, H. and Bui, D.T., 2019b, Landslide
Jo

susceptibility modeling using Reduced Error Pruning Trees and different ensemble

techniques: Hybrid machine learning approaches. CATENA 175, 203-218.

Pham, B.T., Avand, M., Janizadeh, S., Phong, T.V., Al-Ansari, N., Ho, L.S., Das, S., Le, H.V.,

Amini, A. and Bozchaloei, S.K., 2020, GIS based hybrid computational approaches for

flash flood susceptibility assessment. Water 12 (3), 683.


Journal Pre-proof

Poudyal, C.P., Chang, C., Oh, H.-J. and Lee, S., 2010, Landslide susceptibility maps comparing

frequency ratio and artificial neural networks: a case study from the Nepal Himalaya.

Environmental Earth Sciences 61 (5), 1049-1064.

Pradhan, B., 2010, Flood susceptible mapping and risk area delineation using logistic regression,

GIS and remote sensing. Journal of Spatial Hydrology 9 (2).

Quinlan, J., 1993, C4. 5: Programs for machine learning. Morgan Kaufmann, San Francisco. C4.

5: Programs for machine learning. Morgan Kaufmann, San Francisco., -.

of
Quinlan, J.R., 1986, Induction of decision trees. Machine learning 1 (1), 81-106.

ro
Quinlan, J.R., 1987, Simplifying decision trees. International journal of man-machine studies 27

(3), 221-234.
-p
re
Rahmati, O., Pourghasemi, H.R. and Zeinivand, H., 2016, Flood susceptibility mapping using

frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto
lP

International 31 (1), 42-70.


na

Reynolds, R.G., Year, An introduction to cultural algorithms. Proceedings of the third annual

conference on evolutionary programming, 131-139.


ur

Ronoud, S. and Asadi, S., 2019, An evolutionary deep belief network extreme learning-based for
Jo

breast cancer diagnosis. Soft Computing 23 (24), 13139-13159.

Rouse Jr, J., Haas, R., Deering, D., Schell, J. and Harlan, J., 1974, Monitoring the Vernal

Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation.[Great

Plains Corridor].

Samanta, R.K., Bhunia, G.S., Shit, P.K. and Pourghasemi, H.R., 2018, Flood susceptibility

mapping using geospatial frequency ratio technique: a case study of Subarnarekha River

Basin, India. Modeling Earth Systems and Environment 4 (1), 395-408.


Journal Pre-proof

Santos, P.P. and Reis, E., 2018, Assessment of stream flood susceptibility: a cross‐analysis

between model results and flood losses. Journal of Flood Risk Management 11, S1038-

S1050.

Schillaci, C., Acutis, M., Lombardo, L., Lipani, A., Fantappiè, M., Märker, M. and Saia, S., 2017,

Spatio-temporal topsoil organic carbon mapping of a semi-arid Mediterranean region: The

role of land use, soil texture, topographic indices and the influence of remote sensing data

to modelling. Science of The Total Environment 601-602, 821-832.

of
Shafizadeh-Moghadam, H., Valavi, R., Shahabi, H., Chapi, K. and Shirzadi, A., 2018, Novel

ro
forecasting approaches using combination of machine learning and statistical models for

-p
flood susceptibility mapping. J Environ Manage 217, 1-11.
re
Shahabi, H., Shirzadi, A., Ghaderi, K., Omidvar, E., Al-Ansari, N., Clague, J.J., Geertsema, M.,

Khosravi, K., Amini, A. and Bahrami, S., 2020, Flood detection and susceptibility
lP

mapping using sentinel-1 remote sensing data and a machine learning approach: Hybrid
na

intelligence of bagging ensemble based on k-nearest neighbor classifier. Remote Sensing

12 (2), 266.
ur

Shen, F., Chao, J. and Zhao, J., 2015, Forecasting exchange rate using deep belief networks and
Jo

conjugate gradient method. Neurocomputing 167, 243-253.

Shirzadi, A., Saro, L., Joo, O.H. and Chapi, K., 2012, A GIS-based logistic regression model in

rock-fall susceptibility mapping along a mountainous road: Salavat Abad case study,

Kurdistan, Iran. Natural hazards 64 (2), 1639-1656.

Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A., Shahabi, H. and

Revhaug, I., 2017, Shallow landslide susceptibility assessment using a novel hybrid

intelligence approach. Environmental Earth Sciences 76 (2), 60.


Journal Pre-proof

Shirzadi, A., Soliamani, K., Habibnejhad, M., Kavian, A., Chapi, K., Shahabi, H., Chen, W.,

Khosravi, K., Thai Pham, B., Pradhan, B., Ahmad, A., Bin Ahmad, B. and Tien Bui, D.,

2018, Novel GIS Based Machine Learning Algorithms for Shallow Landslide

Susceptibility Mapping. Sensors 18 (11), 3777.

Shirzadi, A., Solaimani, K., Roshan, M.H., Kavian, A., Chapi, K., Shahabi, H., Keesstra, S.,

Ahmad, B.B. and Bui, D.T., 2019, Uncertainties of prediction accuracy in shallow

landslide modeling: Sample size and raster resolution. CATENA 178, 172-188.

of
Srinivasan, D.B. and Mekala, P., 2014, Mining social networking data for classification using

ro
reptree. International Journal of Advance Research in Computer Science and Management

Studies 2 (10).
-p
re
Srivastava, S., Sahana, S.K., Pant, D. and Mahanti, P., 2015, Hbrid Microscopic Discrete

Evolutionary Model for Traffic Signal Optimization. Journal of Next Generation


lP

Information Technology 6 (2), 1.


na

Srivastava, S. and Sahana, S.K., 2017, Nested hybrid evolutionary model for traffic signal

optimization. Applied Intelligence 46 (1), 113-123.


ur

Taheri, K., Shahabi, H., Chapi, K., Shirzadi, A., Gutiérrez, F. and Khosravi, K., 2019, Sinkhole
Jo

susceptibility mapping: A comparison between Bayes‐based machine learning algorithms.

Land Degradation & Development 30 (7), 730-745.

Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2013, Spatial prediction of flood susceptible areas

using rule based decision tree (DT) and a novel ensemble bivariate and multivariate

statistical models in GIS. Journal of Hydrology 504, 69-79.


Journal Pre-proof

Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2014, Flood susceptibility mapping using a novel

ensemble weights-of-evidence and support vector machine models in GIS. Journal of

hydrology 512, 332-343.

Tehrany, M.S., Pradhan, B. and Jebur, M.N., 2015a, Flood susceptibility analysis and its

verification using a novel ensemble support vector machine and frequency ratio method.

Stochastic Environmental Research and Risk Assessment 29 (4), 1149-1165.

Tehrany, M.S., Pradhan, B., Mansor, S. and Ahmad, N., 2015b, Flood susceptibility assessment

of
using GIS-based support vector machine model with different kernel types. Catena 125,

ro
91-101.

-p
Termeh, S.V.R., Kornejady, A., Pourghasemi, H.R. and Keesstra, S., 2018, Flood susceptibility
re
mapping using novel ensembles of adaptive neuro fuzzy inference system and

metaheuristic algorithms. Science of the Total Environment 615, 438-451.


lP

Tieleman, T., Year, Training restricted Boltzmann machines using approximations to the
na

likelihood gradient. Proceedings of the 25th international conference on Machine learning,

1064-1071.
ur

Tieleman, T. and Hinton, G., Year, Using fast weights to improve persistent contrastive
Jo

divergence. Proceedings of the 26th Annual International Conference on Machine

Learning, 1033-1040.

Tien Bui, D., Pradhan, B., Lofman, O. and Revhaug, I., 2012, Landslide susceptibility assessment

in vietnam using support vector machines, decision tree, and Naive Bayes Models.

Mathematical problems in Engineering, 974638.

Tien Bui, D., Pradhan, B., Nampak, H., Bui, Q.-T., Tran, Q.-A. and Nguyen, Q.-P., 2016a, Hybrid

artificial intelligence approach based on neural fuzzy inference model and metaheuristic
Journal Pre-proof

optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area

using GIS. Journal of Hydrology 540, 317-330.

Tien Bui, D., Tuan, T.A., Klempe, H., Pradhan, B. and Revhaug, I., 2016b, Spatial prediction

models for shallow landslide hazards: a comparative assessment of the efficacy of support

vector machines, artificial neural networks, kernel logistic regression, and logistic model

tree. Landslides 13 (2), 361-378.

Tien Bui, D. and Hoang, N.-D., 2017, A Bayesian framework based on a Gaussian mixture model

of
and radial-basis-function Fisher discriminant analysis (BayGmmKda V1. 1) for spatial

ro
prediction of floods. Geoscientific Model Development 10 (9), 3391-3409.

-p
Tien Bui, D., Khosravi, K., Li, S., Shahabi, H., Panahi, M., Singh, V.P., Chapi, K., Shirzadi, A.,
re
Panahi, S. and Chen, W., 2018a, New hybrids of anfis with several optimization

algorithms for flood susceptibility modeling. Water 10 (9), 1210.


lP

Tien Bui, D., Panahi, M., Shahabi, H., Singh, V.P., Shirzadi, A., Chapi, K., Khosravi, K., Chen,
na

W., Panahi, S. and Li, S., 2018b, Novel hybrid evolutionary algorithms for spatial

prediction of floods. Scientific reports 8 (1), 15364.


ur

Tien Bui, D., Shahabi, H., Shirzadi, A., Chapi, K., Pradhan, B., Chen, W., Khosravi, K., Panahi,
Jo

M., Bin Ahmad, B. and Saro, L., 2018c, Land subsidence susceptibility mapping in south

korea using machine learning algorithms. Sensors 18 (8), 2464.

Tien Bui, D., Khosravi, K., Shahabi, H., Daggupati, P., Adamowski, J.F., M Melesse, A., Thai

Pham, B., Pourghasemi, H.R., Mahmoudi, M. and Bahrami, S., 2019a, Flood spatial

modeling in northern Iran using remote sensing and gis: A comparison between evidential

belief functions and its ensemble with a multivariate logistic regression model. Remote

Sensing 11 (13), 1589.


Journal Pre-proof

Tien Bui, D., Ngo, P.-T.T., Pham, T.D., Jaafari, A., Minh, N.Q., Hoa, P.V. and Samui, P., 2019b,

A novel hybrid approach based on a swarm intelligence optimized extreme learning

machine for flash flood susceptibility mapping. CATENA 179, 184-196.

Tien Bui, D., Hoang, N.-D., Martínez-Álvarez, F., Ngo, P.-T.T., Hoa, P.V., Pham, T.D., Samui, P.

and Costache, R., 2020. A novel deep learning neural network approach for predicting

flash flood susceptibility: A case study at a high frequency tropical storm area. Science of

The Total Environment 701, 134413.

of
Tucker, C. and Sellers, P., 1986, Satellite remote sensing of primary production. International

ro
journal of remote sensing 7 (11), 1395-1416.

-p
Turoğlu, H. and Dölek, İ., 2011, Floods and their likely impacts on ecological environment in
re
Bolaman River basin (Ordu, Turkey). Research Journal of Agricultural Science 43 (4),

167-173.
lP

UN Office for the Coordination of Humanitarian Affairs, 2019, Islamic Republic of Iran:
na

Situation Overview: Floods, As of 13 April 2019. https://reliefweb.int/report/iran-islamic-

republic/islamic-republic-iran-situation-overview-floods-13-april-2019, Report
ur

Wang, S., Jiang, L. and Li, C., 2015, Adapting naive Bayes tree for text classification. Knowledge
Jo

and Information Systems 44 (1), 77-89.

Wang, Y., Hong, H., Chen, W., Li, S., Pamučar, D., Gigović, L., Drobnjak, S., Tien Bui, D. and

Duan, H., 2019a, A hybrid GIS multi-criteria decision-making method for flood

susceptibility mapping at Shangyou, China. Remote Sensing 11 (1), 62.

Wang, Y., Hong, H., Chen, W., Li, S., Panahi, M., Khosravi, K., Shirzadi, A., Shahabi, H.,

Panahi, S. and Costache, R., 2019b, Flood susceptibility mapping in Dingnan County

(China) using adaptive neuro-fuzzy inference system with biogeography based


Journal Pre-proof

optimization and imperialistic competitive algorithm. Journal of environmental

management 247, 712-729.

Wijkman, A. and Timberlake, L., 2019, Natural disasters: acts of God or acts of man? Routledge

Wilson, J.P. and Gallant, J.C., 2000, Terrain analysis: principles and applications. John Wiley &

Sons

Witten, D.M. and Tibshirani, R., 2011, Penalized classification using Fisher's linear discriminant.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (5), 753-

of
772.

ro
Xiao, L., Zhang, Y. and Peng, G., 2018, Landslide susceptibility assessment using integrated deep

-p
learning algorithm along the China-Nepal Highway. Sensors 18 (12), 4436.
re
Yang, X.-S., Year, Firefly algorithms for multimodal optimization. International symposium on

stochastic algorithms, 169-178.


lP

Yang, X.-S., 2010a, Firefly algorithm, stochastic test functions and design optimisation. arXiv
na

preprint arXiv:1003.1409.

Yang, X.-S., 2010b, A new metaheuristic bat-inspired algorithm. Nature inspired cooperative
ur

strategies for optimization (NICSO 2010). Springer65-74.


Jo

Young, R.A. and Mutchler, C.K., 1969, Soil movement on irregular slopes. Water Resources

Research 5 (5), 1084-1089.

Youssef, A.M., Pradhan, B. and Sefry, S.A., 2016, Flash flood susceptibility assessment in Jeddah

city (Kingdom of Saudi Arabia) using bivariate and multivariate statistical models.

Environmental Earth Sciences 75 (1), 12.


Journal Pre-proof

Zhou, Q., Mikkelsen, P.S., Halsnæs, K. and Arnbjerg-Nielsen, K., 2012, Framework for economic

pluvial flood risk assessment considering climate change effects and adaptation benefits.

Journal of Hydrology 414, 539-549.

Zhou, Q., Leng, G. and Feng, L., 2017, Predictability of state-level flood damage in the

conterminous United States: the role of hazard, exposure and vulnerability. Scientific

reports 7 (1), 5354.

of
ro
Figure 1 Locations of floods in the study area

-p
Figure 2 Flash flood condition factors used in this study: (a) slope angle, (b) elevation, (c)
re
curvature, (d) TWI, (e) SPI, (f) distance to river, (g) river density, (h) rainfall, (i) lithology, (j)

land use, and (k) NDVI


lP

Figure 3 Schematic of a Deep Belief Network structure


na

Figure 4 Restricted Boltzmann Machine

Figure 5 Chromosome representation in DBPGA model


ur

Figure 6 The single-point crossover operator


Jo

Figure 7 Mutation operator in the DBPGA model

Figure 8 The GA flowchart for finding the optimal topology of the DBN

Figure 9 SWARA weights of flood conditioning factors in the study area

Figure 10 The order of the importance of conditioning factors in flood occurrence in the study

area

Figure 11 The goodness-of-fit and prediction accuracy of the DBPGA model. (a) Trend of flood

and non-flood locations using the training dataset. (b) MSE and RMSE of the training dataset. (c)
Journal Pre-proof

Standard deviation and mean values of the training dataset. (d) Trend of flood and non-flood

locations using the testing dataset. (e) MSE and RMSE of the testing dataset, (f) Standard

deviation and mean values of the testing dataset.

Figure 12 Flood susceptibility map prepared by the DBPGA model

Figure 13 Histogram of flood locations and susceptibility classes for three classification models

including natural breaks (NB), geometrical interval (GI) and quantile (Q)

Figure 14 Confusion matrix of the new proposed model: (a) Training dataset. (b) Testing dataset

of
Figure 15 ROC curve and AUC for the novel deep learning proposed model: (a) Training dataset,

ro
(b) Testing dataset

-p
Figure 16 A graphically comparison between the novel deep learning model and other benchmark
re
models
lP
na

Table 1 Lithological units of the study area


ur

Table 2 Flash flood conditioning factors and its classes and classifications methods
Jo

Table 3 The optimal values of GA parameters for flood susceptibility modeling

Table 4 Comparison of classification performance of DBPGA model with some benchmark

machine learning models using testing dataset


Journal Pre-proof

Table 5 Comparison of classification performance of DBPGA model and some optimization

algorithms using testing dataset

Table 6 Performance evaluation of the new deep learning proposed model with other soft

computing benchmark models

Table 7 Average rank of the flash flood susceptibility models for the study area using the

of
Friedman‘s test

ro
-p
Table 8 Performance of the novel deep learning model compared to other models using the
re
Wilcoxon signed-rank test (two-tailed)
lP
na

Graphical abstract:
ur

Highlights
Jo

 A novel deep learning model, DBPGA, was suggested for flash flood susceptibility mapping.

 The One-R Attribute Evaluation (ORAE) technique was used to select optimal conditioning factors.

 The DBPGA model outperformed and outclassed all algorithms that earlier used in the study area.

 The proposed model as a promising tool can be useful to predict flash flood in other similar regions.

You might also like