Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Earth-Science Reviews 223 (2021) 103858

Contents lists available at ScienceDirect

Earth-Science Reviews
journal homepage: www.elsevier.com/locate/earscirev

Review Article

Deep learning for geological hazards analysis: Data, models, applications,


and opportunities
Zhengjing Ma, Gang Mei *
School of Engineering and Technology, China University of Geosciences (Beijing), 100083 Beijing, China

A R T I C L E I N F O A B S T R A C T

Keywords: As natural disasters are induced by geodynamic activities or abnormal changes in the environment, geological
Geological hazards hazards tend to wreak havoc on the environment and human society. Recently, the dramatic increase in the
Earth observation data volume of various types of Earth observation ‘big data’ from multiple sources, and the rapid development of deep
Deep learning
learning as a state-of-the-art data analysis tool, have enabled novel advances in geological hazard analysis, with
Landslide detection
the ultimate aim to mitigate the devastation associated with these hazards. Motivated by numerous applications,
Seismic phase picking
this paper presents an overview of the advances in the utilization of deep learning for geological hazard analysis.
First, six commonly available Earth observation data sources are described, e.g., unmanned aerial vehicles,
satellite platforms, and in-situ monitoring systems. Second, the deep learning background and six typical deep
learning models are introduced, such as convolutional neural networks and recurrent neural networks. Third,
focusing on six typical geological hazards, i.e., landslides, debris flows, rockfalls, avalanches, earthquakes, and
volcanoes, the deep learning applications for geological hazard analysis are reviewed, and common application
paradigms are summarized. Finally, the challenges and opportunities for the application of deep learning models
for geological hazard analysis are highlighted, with the aim to inspire further related research.

1. Introduction imagery) and real-time in-situ monitoring information. As typical big


data, geological hazard-related data have four ‘V’ characteristics: vol­
Geological hazards (Geohazards) refer to natural disasters that are ume (the enormity of the data), velocity (rapid change of the data),
caused by geodynamic activity or abnormal changes in the geological variety (diverse data sources), and veracity (the uncertainty of the data)
environment, which typically include internal earth processes (e.g., (Reichstein et al., 2019). However, it is difficult to sensibly introduce big
earthquakes, volcanic activity, and emissions) and related geophysical data into conventional methods. First, traditional methods are typically
processes (e.g., landslides, debris flows, rockfalls, and avalanches) time-consuming. For example, in seismic phase picking, manually
(United Nations Office for Disaster Risk Reduction, 2009). Geological checking and identifying the waveform is labor-intensive. In landslide
hazards often have severe consequences, cause disorder in society and detection, conventional machine learning methods require cumbersome
the economy, damage the environment, and have enormous costs in feature selection. Second, traditional methods investigate and predict
terms of life and property (Brabb, 1991; Nadim et al., 2006; Badoux geological hazards by using linear and lower-order properties; conse­
et al., 2016; Grahn and Jaldell, 2017; Froude and Petley, 2018). To quently, these traditional methods have difficulty extracting available
mitigate these severe devastations, the analysis of geological hazards information from big data and fail to represent complex geodynamic and
using data that are collected from various sources is necessary. physical processes. For example, pixel-based methods employed for
Recent advances in multiplatform remote sensing from space and landslide detection tend to ignore the geometric and contextual infor­
airborne- to ground-based sensors have increased higher than ever the mation in remote sensing imagery.
potential for comprehensive understanding and precise predictions of Over the past few years, deep learning has come to the fore in ap­
geological hazards (Agapiou, 2017; Casagli et al., 2017). Abundant data plications for geological hazard analysis. Deep learning is a subdisci­
become available for geological hazard analysis, which includes diverse pline of machine learning that consists of successive operations that
imagery (e.g., optical imagery, multispectral imagery, and radar progressively extract complex features by utilizing the results of

* Corresponding author.
E-mail address: gang.mei@cugb.edu.cn (G. Mei).

https://doi.org/10.1016/j.earscirev.2021.103858
Received 25 March 2021; Received in revised form 30 October 2021; Accepted 2 November 2021
Available online 8 November 2021
0012-8252/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

previous operations as input (Eraslan et al., 2019; Goodfellow et al., 2.2. Unmanned aerial vehicles (UAVs)
2016; Chen et al., 2019). Compared with physics-based models, which
are based on physical processes associated with geohazards and are Recently, unmanned aerial vehicles (UAVs) have been rapidly
restricted by strict application conditions, deep learning models have developed, and they are characterized by high performance, low cost,
the inherent and distinct advantages based on data-driven paradigms, operational flexibility, and high spatial resolution. These advantages
and as a result, can be more flexible in automatically obtaining enable these devices to be powerful tools for real-time ground-based
high-level information from big data (Ma et al., 2021a; LeCun et al., rapid survey and detailed scale analysis of large damaged areas (Colo­
2015; Reichstein et al., 2019; Bergen et al., 2019; Yuan et al., 2020). mina and Molina, 2014; Travelletti et al., 2012; James and Robson,
Studies have gradually demonstrated the potential of deep learning 2012; Remondino et al., 2011). By combining multiple camera sensors
to analyze geological hazard data effectively. It has been introduced into (e.g., thermal sensor cameras, multispectral cameras, and hyperspectral
multiple tasks for various types of geological hazards, e.g., landslide and cameras), UAVs can rapidly produce many images in geological hazard
debris flow detection, landslide susceptibility assessment, seismic data investigations (Giordan et al., 2018; Rossi et al., 2018; Chae et al.,
interpolation, and denoising, along with earthquake detection and 2017).
localization.
In this paper, we discuss various applications of deep learning in the 2.3. Aerial photography
analysis of several representative geological hazards. To the best of the
authors’ knowledge, this paper is the first comprehensive survey of In many cases, the interpretation of aerial photographs remains the
recent advances in deep learning-based geological hazard analysis. The most common method for identifying and mapping geological hazards
main contributions of this survey can be summarized as follows: (Guzzetti et al., 2012). The primary reasons are the simplicity of the
techniques and tools that are required for interpreting aerial photo­
(1) We investigate several major heterogeneous data sources. In the graphs (e.g., stereoscopes) and the low cost. Aerial photographs are
application of deep learning models to geological hazard analysis, typically 21 cm × 21 cm, and their scale is typically between 1:5000 and
these sources provide an enormous amount of data support. 1:70000. Hence, reasonably many photographs are available to cover
(2) We present the background of deep learning models and discuss the large area where a hazard occurred (Keefer, 2002; Malamud et al.,
the current state-of-the-art deep learning techniques by consid­ 2004).
ering several common models. Furthermore, we emphasize their Aerial photographs capture a range of characteristics that are useful
potentials for applications in various geological hazard analyses, for image recognition and classification, which include the surface shape
which contributes to the selection of a suitable deep learning of the topography, along with the color, tone, patchiness, and texture.
model in specified application scenarios. Moreover, computer-aided stereo vision techniques facilitate the gen­
(3) We review the real-world applications of deep learning models in eration of stereo aerial photographs, where a suitable combination of
geological hazard analysis according to various hazard types. photographs that are identified by overlapping sides and laterals is
These applications demonstrate the strength of various deep instrumental for best identifying and mapping geohazards, especially
learning models in diverse geological hazard analyses. slope-related geohazards (Nichol et al., 2006). A common example is the
(4) We discuss the challenges and potential solutions of deep learning capture of the landslide distribution from such photographs as a basis to
models that are applied in geological hazard analysis, and high­ further map landslide inventories (Harp et al., 2011).
light novel opportunities and promising future directions.
2.4. Satellite platforms
The rest of the paper is organized as follows. Section 2 introduces six
major heterogeneous data sources. Section 3 reviews the deep learning In the past few decades, increasingly many satellites have been
background and presents the eight most influential deep learning launched into the low Earth orbit. These synthetic aperture radar sys­
models developed in detail. Section 4 investigates the applications of tems, passive optical imaging systems, and global navigation satellite
deep learning models in six typical geological hazards. Section 5 dis­ systems (GNSSs) provide a large amount of Earth observation data and,
cusses the challenges and future directions. Section 6 concludes this thus, enhance the monitoring capability for geological hazards (Elliott,
survey. 2020). Currently, the satellites that typically provide data that are
related to the identification, detection, and mapping of geological haz­
2. Deep learning for geological hazard analysis: data source ards include Sentinel, RapidEye, ALOS, Landsat, GeoEye, and QuickBird.
Additional details are presented in Table 1 and Table 2 (see the tables in
Currently, the data that are employed for deep learning applications the appendix).
in geological hazard analysis originate from many heterogeneous sour­ The acquired satellite data of the study areas of potential geological
ces. In this section, six major categories are introduced: field surveys, hazards can be categorized into four groups according to the following
unmanned aerial vehicles, aerial platforms, satellite platforms, in-situ applications: (i) multispectral optical images for optical information
monitoring systems, and seismic reflection. analysis of geohazard areas (Martha et al., 2010; Kumar et al., 2021), (ii)
high-resolution digital elevation models (DEMs) for surface morphology
2.1. Field surveys analysis (Lan et al., 2010; Schulz, 2007; Prokop and Panholzer, 2009;
Derron and Jaboyedoff, 2010), (iii) synthetic aperture radar (SAR) im­
After the occurrence of catastrophic damage, for the purpose of ages and interferometric synthetic aperture radar (InSAR) images for
collecting more information as available on the hazard as possible (e.g., detection of situations in hazardous areas without limitation of time and
visual characteristics of certain geological hazards), the field survey is weather (Martha et al., 2010; Parker et al., 2011; Mondini et al., 2011),
commonly employed. It refers to field observations that are conducted and (iv) global positioning system (GPS) data for measuring surface
by humans (Guzzetti et al., 2012). The field survey is typically applied to deformations in the affected area (Hu et al., 2018; Schaefer et al., 2019;
investigate landslides or debris flows, thus assessing these geological Guzzetti et al., 2009; Hooper et al., 2007; Cascini et al., 2010).
hazards’ potential impact at the local level, and demonstrating unique First, multispectral images obtained by high-resolution, or very-
information about the evolution of disasters. high-resolution optical satellite sensors, are preferred for the analysis
and classification of geological hazards (Guzzetti et al., 2012). One of
the reasons is that changes in land cover that are triggered by geological
hazards can induce variations in the spectral signature at the ground

2
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

surface. For example, the bands in the cloud-free multispectral images information. Common monitoring targets include surface movement
from the RapidEye, including blue (440 ~ 510 nm), green (520 ~ (Dai et al., 2020), groundwater (Comiti et al., 2014; Berti and Simoni,
590 nm), red (630 ~ 685 nm), and near-infrared (760 ~ 850 nm) can 2010), and rainfall (Nikolopoulos et al., 2014; Underwood et al., 2016;
provide optical information for the application of deep learning models Guzzetti et al., 2007); the corresponding monitoring devices include
in landslide detection (Ghorbanzadeh et al., 2019a). inclinometers, extensometers, soil moisture probes, pore water pressure
Furthermore, in vegetated mountainous areas, the spectral indices sensors, and rain gauges.
(e.g., normalized vegetation and soil indices) from multispectral images Seismic networks are designed to record ground motions that are
are available to enhance the recognition performed by deep learning generated by seismic waves that propagate from natural and anthro­
models in vegetation areas versus soil areas (Shi et al., 2020). Typically, pogenic sources. Depending on the scope of the investigation, seismic
the normalized difference vegetation index (NDVI) can be calculated by networks are categorized as global, regional, and local systems (Cremen
bands of optical images (Piralilou et al., 2019). and Galasso, 2020; Allen and Melgar, 2019). A global network consists
Second, DEM data, especially high-resolution DEM data, are of a set of ‘primary’ stations that cover most countries. Regional net­
commonly utilized to extract topographic information (Pike, 1988). works are typically located in the expected epicenter area or high seis­
From the perspective of deep learning, DEM data can provide abundant micity area of a region. The limited number of stations in the local
surrounding spatial information in tasks related to geological hazards network are mainly located near target sites/infrastructure, as in the
such as landslides. The occurrence of these hazards is frequently tightly case of temporary deployments for capturing aftershock sequences of
related to the surroundings. Consequently, DEM data are effectively large earthquakes or networks for monitoring volcanic activity.
employed in deep learning models for either landslide identification or Typically, a seismic station is equipped with a receiver (e.g., seis­
landslide sensitivity assessments (Jaboyedoff et al., 2012). Typically, mometer) and a recorder (e.g., datalogger). Three-component broad­
DEM data can be obtained from radar or optical stereophotogrammetry, band seismometers can monitor ground movement in 3 directions (e.g.,
including SPOT, Pleiades, Worldview, airborne laser profilers, and light vertically and horizontally east-west and north-south) and cover a wide
detection and ranging (LiDAR) sensors (Ardizzone et al., 2007; Van Den band of frequencies. The seismic data that are collected with three
Eeckhaut et al., 2007). channels are continually converted into seismograms and stored.
Third, in adverse meteorological situations, the application of SAR Furthermore, digital seismic networks that contain digital real-time
images to train deep learning models is an alternative for the detection acquisition systems are in operation. The objective is to continuously
of geohazards (Elliott, 2020; Dai et al., 2020; Biggs and Wright, 2020). monitor the seismicity and rapidly identify locations of earthquakes
SAR images derived from Sentinel-1 and Sentinel-2 have served as the within minutes.
major training dataset for most deep learning models applied to
avalanche detection, where training images are generated according to 2.6. Seismic reflection
various SAR features. These features tend to be derived from the dif­
ference between the reference images and the active images, involving A seismic survey involves imaging of a subsurface by reflections and
horizontal and polarization (Bianchi et al., 2021). Furthermore, SAR refractions of (artificial) seismic signals. The returned waves that are
data are also available for the detection of landslides and debris flows. generated by suitable sources at suitable locations are measured using
An example is the application of pre-disaster SAR images, post-disaster receivers (e.g., terrestrial seismometers or underwater hydrophones) at
SAR images to train deep learning models that aim to analyze sedi­ other locations and stored using recorders. Currently, seismic surveys
ment hazard areas after heavy rainfalls (An et al., 2020). These SAR are conducted in environments that range from land to ocean. The most
images can also be applied in landslide susceptibility assessments as an commonly used survey method is multichannel acquisition, where a
input data source, thus providing landslide inventory data (Ghorban­ multicomponent receiver records motion components in multiple di­
zadeh et al., 2020). rections and produces highly detailed images (seismograms) from
On the other hand, interferometric SAR (InSAR) images, which are shallow (tens of meters) to deep (several kilometers or even tens of ki­
also known as interferograms, are derived from the phase difference lometers) intervals.
between differential interferometric SAR images (Milillo et al., 2014; In each experiment (shot), the reflected energy that is generated by
Kuraoka et al., 2018; Carlà et al., 2016). Commonly, InSAR data the shot is detected and recorded by detectors as a seismic trace. A
contribute to the insight of deep learning models into the deformation seismogram is an ordered collection of seismic traces, which is termed a
process of the ground surface (Chaussard et al., 2013, 2015, 2017; Elliott gather. A single-channel seismogram is recorded by a single detector or a
et al., 2016; Dumont et al., 2018; Garthwaite et al., 2019). A typical set of detectors. A multichannel seismogram is recorded using multiple
application is to recognize volcanic deformation in InSAR data by using detectors or groups of detectors at various locations. Additionally,
deep learning models. reflection seismic data can be graphically presented as a 2D or 3D grid of
Finally, GPS is extensively applied for surface deformation moni­ pixels, where each pixel contains the value of the model parameter (e.g.,
toring due to its capability to locate topographical surface characteris­ P-wave velocity, S-wave velocity, porosity, density, or anisotropic
tics reliably and simply (Rawat et al., 2011). GNSS measures ground parameters).
motion at a very high frequency and spatial accuracy in real-time (Li
et al., 2015; Mufundirwa et al., 2010); the monitoring data provided by 3. Deep learning for geological hazard analysis: models
GPS and GNSS enable deep learning models to be trained for predicting
the future state of surface deformations in monitored areas; a typical In this section, we will present an overview of deep learning, and we
application scenario is landslide displacement prediction (Ma et al., will introduce in detail six typical deep learning models that are
2021b). increasingly being applied to geological hazards analysis. These models
include (1) convolutional neural networks (CNNs) (Nikolopoulos et al.,
2.5. In-situ monitoring systems 2014); (2) recurrent neural networks (RNNs) (Berti and Simoni, 2010);
(3) deep generative models (Turhan and Bilge, 2018), such as deep
To continuously collect various monitoring data, monitoring systems belief networks (DBNs) (Hinton et al., 2006a), autoencoders (AEs), and
are typically deployed in key locations throughout hazard-prone areas. generative adversarial networks (GANs) (Goodfellow et al., 2014); and
A monitoring system consists of multiple in-situ sensors. (4) graph neural networks (GNNs) (Scarselli et al., 2009).
In consideration of the limitations of remote sensing technology, for
fast-moving hazards (e.g., landslides or debris flows), real-time or near-
real-time monitoring systems provide a complement with more detailed

3
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

3.1. Brief introduction to deep learning environment (Panigrahi et al., 2021). The advantage of transfer learning
is that it can overcome the dependence on large amounts of data or
Deep learning originates from artificial neural networks (ANNs) (Ma labeled datasets, by leveraging knowledge from pre-trained models from
et al., 2021a). Typically, an ANN consists of a collection of connected one environment to solve new problems in a new environment (Good­
neural neurons, as illustrated in Fig. 1(a). fellow et al., 2016; Zhang et al., 2019a; Wang et al., 2021a). Once
The propagation process of neural networks achieves through the performing a new task, it is only necessary to fine-tune the later layers of
operation of some non-linear activation function and parameters {W, b}, the original model.
where W is a weight vector, and b is a bias. At each layer, using an Due to the superior performance of transfer learning, it is extraor­
operation W x + b, neurons obtain a weighted sum as input from the dinarily suitable for handling limited datasets, which is a problem that is
output of the preceding layer, and then passes the output through an shared by most geohazard-specific tasks (Marmanis et al., 2016).
activation function σ(⋅), e.g., Sigmoid, Tanh, and Rectified Linear Unit Transfer learning has been applied to several geohazard research sce­
(ReLU) (see Fig. 1(b)). narios in which the data that is collected in the study area are scarce or
The series of layers between the input and output is termed the inadequate. For example, transfer learning has been employed in the
hidden layer (Litjens et al., 2017). Multiple hidden layers can be stacked detection of avalanches and volcanic activity, which contributes to the
to create a ‘deep’ neural network. For a fully connected deep neural accurate performance of deep learning models in situations where SAR
network, every neuron in a layer l − 1 is connected with all the other data is scarce (Bueno et al., 2020; Sinha et al., 2019a; Titos et al., 2020).
neurons in the subsequent layer l. The process of data propagation is
expressed in Eq. (1). Multiple fully connected neural networks form a 3.2. Convolutional neural networks (CNNs)
multilayer perceptron (MLP), which is the simplest and most basic
deep learning model. The tremendous flexibility of CNNs, have rendered them among the
( ) most successful and prevalent deep learning models (Krizhevsky et al.,
x(l) = σ W(l) ⋅x(l− 1) + b(l) (1)
2012; Gu et al., 2018). Although initially designed for computer vision
The training process of the deep learning model follows an empirical tasks, CNNs have been extended to various domains, including signal
risk minimization paradigm, where the parameters of the model are processing and natural language processing, and exhibit impressive
optimized by utilizing forward and backward propagation. performance when performing on diverse types of data (Krizhevsky
More specifically, in the forward propagation process, the discrep­ et al., 2012; He et al., 2016a; Girshick, 2015; Ren et al., 2017).
ancy between the outputs y′ and ground truth y of the last layer is Typically, a basic CNN model consists of three layers, convolutional,
calculated as the error to be minimized, according to the defined loss pooling, and fully connected layers, where a convolutional layer is
function. The loss function varies according to the deep learning task (i. responsible for extracting features, and thus enables a CNN model to
e., classification and regression). Typically, mean square error and integrate features more effectively with spatial contextual information
mean absolute error functions are suitable for regression tasks, while from the input data (see Fig. 3). A convolution layer commonly involves
cross-entropy and KLD loss functions are suitable for most classifica­ two main operations: convolution and activation. The details are as
tion tasks (Tsoumakas and Katakis, 2007). In the backpropagation follows:
process, a suitable optimization algorithm is employed to minimize the For example, in a convolution operation, a filter
( )
loss. Common optimization algorithms include gradient descent and its Fk ∈ ℝw×w×d3 1 ≤ k ≤ K(l) , which is also termed a kernel, is applied
variants (e.g., RMSprop and Adam function). into given input data x ∈ ℝd1 ×d2 ×d3 for extracting local features, where w
According to training objectives and paradigms, deep learning is the fixed size of the filter Fk and K(l) is the number of sets of kernels.
models are typically divided into two major categories: supervised and As illustrated in Fig. 4, convolution is a simple linear operation,
unsupervised learning. As illustrated in Fig. 2, supervised learning where each filter is applied to overlap over a certain location (i, j) of the
aims at training a model that accepts features as input, and outputs a input data x, and calculate a dot product between the elements in the
prediction for a target variable. Unsupervised learning aims at
local region (i.e., the receptive field) [x]i j ∈ ℝw×w×d3 and the filter Fk,
describing unlabeled input data by learning the valuable properties of
and thus outputting a feature map Oi jk∈ ℝ(d1 − w+1)×(d2 − w+1)×K (see Eq.
(l)
the training set.
Furthermore, novel transfer learning has emerged as an improve­ (2)). A set of filters swiped across the input data according to a certain
ment on the deep learning training paradigm, which extends the stride and repeat the above process in each swipe position, where a
application of a deep learning model beyond a specific task and stride indicates the step size to move a filter (Ma et al., 2019). The gap
i′ − i and j′ − j between two receptive fields xi j and xi′ j′ is controlled by a

Fig. 1. The forward propagation of a neural network: (a) the operating process of neural networks and (b) a fully connected architecture of neural networks.

4
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Fig. 2. The supervised learning process for a deep learning model. (a) The data are partitioned into three parts. (b) The training process is monitored by periodically
evaluating losses on the validation set. To maximize the usage of the GPU memory, only a small subset of the training set is utilized in each optimization step. (c)
Finally, The model is evaluated on the test set.

Fig. 3. Basic architecture of a conventional CNN.

Fig. 4. Visualization of an example for operation on the convolutional layer, where the stride refers to the amount of movement between applied filters on the input
matrix. (a) 2D point of view. (b) 3D point of view.

stride. This set of filters results in a feature map Ok ∈ ℝ(d1 − w+1)×(d2 − w+1) ,
〈 〉 ∑w ∑
w ∑
d3
which can detect the same feature in different locations of the input Oki j = [x]i j , Fk = [x]i+i′ − 1,j+j − 1,l [Fk ]i ,j ,l
′ ′ ′ (2)
data. Finally, an activation function performs the nonlinear trans­ ′ ′
i =1 j =1 l=1
formation to extract features ̃ x from the input data x (see Eq. (3)).

5
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

( )
̃xi jk = σ Oi jk , will perform the class prediction of the pixels that correspond to that
∀i ∈ [d1 − w + 1], (3) position, which facilitates the identification of information relating to
∀j ∈ [d2 −[ w + ] 1], hazards including landslides, mudslides, and avalanches from satellite
∀k ∈ K (l) images. In addition, in the analysis of earthquake and volcano activities,
CNN-based models can also be used to identify seismic waveform fea­
To reduce the number of learnable parameters and increase robust­
tures from seismograms.
ness to distortions such as slight shifts and rotations of the input data,
Common CNN-based models for semantic segmentation include fully
pooling layers are also typically employed to downsample the extracted
convolutional networks (FCNs) and their variants (Long et al., 2015).
features from the convolutional layer. Two main types of pooling layers
FCNs substitute fully connected layers in the original CNNs with con­
are max pooling and average pooling.
volutional layers, which thus involve only convolutional (subsampling
Furthermore, a number of novel operations can further enhance the
or upsampling) operations. The advantages of FCNs over the classic
performance of basic CNNs, e.g., attention mechanism (Vaswani et al.,
CNNs are that (1) they avoid the loss of spatial information; (2) they
2017; Luong et al., 2015). Similar to other concepts in deep learning,
significantly reduce the computational parameters involved; and (3)
attention mechanism is also an attempt to mimic human brain actions,
they increase the representational capability. Therefore, FCNs are more
which can selectively concentrate on specified features while ignoring
suitable for segmentation tasks, whose major limitation is that they are
others in deep learning models. The attention mechanism facilitates
prone to ignore the relationships between pixels and thus insensitive to
CNNs to improve representations and overcome computational limita­
the details of an image (Pan et al., 2020). This limitation may compro­
tions, thereby enabling CNN-based models to recognize objects from
mise FCNs from effectively capturing global context information (Min­
cluttered backgrounds and complex scenes.
aee et al., 2021).
Considering the advantages of the attention mechanism in computer
To overcome the limitation, various variants of FCN have emerged,
vision applications, it has been incorporated into a number of deep
for example, SegNet (Badrinarayanan et al., 2017) and U-Net (Ronne­
learning models for landslide detection, thereby enhancing the capa­
berger et al., 2015), which improved on the original FCN by employing
bility of these models in capturing important information in remotely
encoder-decoder architectures (see Fig. 5). In these architectures, the
sensed images. For example, attention mechanisms are employed to
encoder incrementally reduces the spatial dimension, while the decoder
enable CNN-based models to emphatically distinguish the specific fea­
incrementally recovers the detail and spatial dimension of the object.
tures of diverse landslides from complicated environments (Ji et al.,
The decoder receives information from the encoder part of the same
2020).
level of the feature mapping by exploiting the skip connection, thus
Another interesting operation is the skip connection. By adding or
enabling finer localization (Falk et al., 2019).
directly concatenating, skip concatenation allows the output of one layer
As the most common variant of FCNs employed in geological hazard
of a deep neural network to be applied as input to the next layer and to
analysis, U-Net, initially designed for medical images, has been
all other layers, thus avoiding information loss in deep learning models.
demonstrated to be particularly effective in satellite image segmenta­
Skip connections are also known as residual connections (He et al.,
tion. U-Net has been introduced in geological hazard analysis, which can
2016b). A residual block is a stack of layers, where the output of one
realize accurate segmentation for satellite images in detecting regional
layer can be connected to a deeper block of another layer, which
objects (e.g., landslides, avalanches, and volcanoes) (Bianchi et al.,
effectively simplifies the training of deep learning models.
2021). Since U-Net maintains the structural integrity of images, the ar­
Overall, the performance of CNN-based models depends on the
chitecture is also considered for the accurate reconstruction of damaged
different architectures involving various operations. The first successful
seismic data. In addition, other popular CNN-based models commonly
CNN model was the LeNet (LeCun et al., 1998), which is the simplest
applied to semantic segmentation include Mask R-CNN (He et al., 2017)
and most basic CNN model. AlexNet was the first (Krizhevsky et al.,
and DeepLab (Chen et al., 2018).
2012); VGGNet focuses on the effect of the convolutional neural
Furthermore, in semantic segmentation tasks, the attention mecha­
network depth on its accuracy, thus stacking more convolutional layers
nism contributes to a more accurate segmentation by automatically
and reducing parameters by employing smaller convolutional kernels
identifying more relevant information. For example, CNN-based models
(Simonyan and Zisserman, 2015); GoogleNet focuses on obtaining
incorporating global attention, enable further capture of significant
better representations and concatenates the feature maps of filters of
global spatial information from SAR images, e.g., pixel-to-pixel posi­
different sizes. Furthermore, it replaces the fully connected layer with a
tional relationships (An et al., 2020). The operation of skip connections
global average pooling operation to reduce the number of parameters
is also suitable for improving deep learning models that are applied in
(Szegedy et al., 2015); ResNet, which includes a series of residual
semantic segmentation tasks related to geological hazard analysis. For
blocks, can solve the vanishing gradient problem, allowing the training
example, residual blocks were integrated into a deep U-Net model
of an extremely deep network up to a thousand layers (He et al., 2016a);
developed for seismic data reconstruction, facilitating the model to
DenseNet, which is designed to adopt information from previous layers,
efficiently combine features from different levels of data, and improving
concatenates feature maps from different layers (Huang et al., 2017).
training efficiency (Tang et al., 2020).
With their target detection ability, CNNs can be quite advantageous
in various applications for geological hazard identification. For
example, when applied to landslide detection, CNN-based models can 3.3. Recurrent neural networks (RNN)
capture landslide-related optical information from training datasets.
These training datasets are image patches of a certain size obtained by RNNs are another type of commonly utilized deep learning model
sampling in various methods from the original remote sensing images that is designed to handle sequence data in which the current input data
(Sharma et al., 2017). Typically, the optimal patches obtained by a are constantly dependent on the previously implemented input. The
suitable sampling method and size setting are able to capture the objective of RNNs is to capture the dependency between the current time
spatially localized correlation of the central pixel with the surrounding step and the previous time step.
pixels, thus enabling more accurate landslide detection (Soares et al., In RNNs, the recurrent layers (hidden layers) consist of recurrent
2020; Ghorbanzadeh et al., 2019b; Ghorbanzadeh and Blaschke, 2019). cells, where feedback connections between recurrent cells allow for a
Semantic segmentation, a common task for CNN-based models in circular flow of information, which enables it to update its current state
computer vision, is extensively applied in geohazard analysis (Ronne­ based on past data and current input data (see Fig. 6). A basic RNN is
berger et al., 2015; Chen et al., 2018). For example, in semantic seg­ constructed by defining the transition function and the output function
mentation, given a position in the spatial dimension, CNN-based models
(see Eq. (4)). Given the input xt ∈ ℝn×d at time step t, the output ot ∈

6
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Fig. 5. Network architecture of (a) SegNet and (b) U-Net.

as the hidden state, and three gates control input, output, and memory
states in the memory cell, including an input gate i, a forget gate f, and
an output gate o (see Eqs. (6)–(8)).
it = σ(x
( t ⋅Wx i + ht− 1 ⋅Wh i + bi ))
ft = σ xt ⋅Wx f + ht− 1 ⋅Wh f + bf (6)
ot = σ(xt ⋅Wx o + ht− 1 ⋅Wh o + bo )
{
0 if t = 0
ct = (7)
ct = ft ⊙ ct− 1 + it ⊙ tanh(xt ⋅Wx c + ht− 1 ⋅Wh c + bc ) if t ∕
=0
{
0 if t = 0
Fig. 6. Standard RNN architecture and an unfolded structure. ht = (8)
ot ⊙ tanh(ct ) if t ∕
=0

ℝn×o is obtained by a hidden state ht ∈ ℝn×h , where n denotes the where all W and b denote the weight matrices and biases for the different
number of samples, d denotes the number of inputs for each sample, h gates or the cell, ⊙ indicates the element-wise product.
denotes the number of hidden units (see Eq. (5)) (Schmidt, 2019). GRUs are simplified LSTMs in which the input gates are substituted
{ by update gates z and the forget gates are substituted by reset gates r.
ht =
0 if t = 0
(4) The output gate has been removed, which enables the number of pa­
σ (Wx h ⋅xt + Wh h ⋅ht− 1 + bh ) if t ∕
=0 rameters to be reduced (see Eq. (9) and Eq. (10)).

ot = ht ⋅Wh o + bo (5) rt = σ (xt ⋅Wx r + ht− 1 ⋅Wh r + br )


zt = σ (xt ⋅Wx z + ht− 1 ⋅Wh z + bz ) (9)
where σ(⋅) denotes non-linear activation function. The subscript t, x, h, ′
ht = tanh(xt ⋅Wx h + (rt ⊙ ht− 1 )⋅Wh h + bh )
and o indicate the time step, input layer, hidden layer, and output layer,
respectively. Correspondingly, bh ∈ ℝ1×h denotes the bias of the hidden
{
0 if t = 0
layer, bo ∈ ℝ1×o denotes the bias of the output layer; Wx h ∈ ℝd×h de­ ht = ′ (10)
zt ⊙ ht− 1 + (1 − zt ) ⊙ ht if t ∕
=0
notes the weight matrix between the input layer and the hidden layer;
Wh h ∈ ℝh×h denotes the hidden-state-to-hidden-state matrix, which in­ The structures of RNN, LSTM, and GRU are presented in Fig. 7.
dicates how the hidden variables from the previous time step t − 1 are Owing to the excellent performance in capturing temporal de­
employed at the current time step t; and Wh o ∈ ℝh×o denotes the weight pendencies, RNN-based models can be employed to predict the evolu­
matrix between the hidden layer and the output layer. tion of geological hazards over time, and thus identify underlying risks.
There are two main improved variants of RNNs: long short-term For example, LSTM can predict the displacement of landslides. On the
memory (LSTM) networks (Hochreiter and Schmidhuber, 1997) and other hand, as demonstrated in the field of speech recognition, RNN-
gated recurrent units (GRUs) (Chung et al., 2014). Both variants are based models are also adept at signal processing, which provides great
designed to address the vanishing gradient problem and to capture potential for applications with seismic data.
long-term dependencies in sequence data. By using gating mechanisms, Furthermore, improvements in the performance of RNN-based
both LSTM and GRUs extract more important information from higher models can be achieved by novel operations emerging in deep
dimensional data. LSTM introduce a memory cell c with the same shape learning, including attention mechanisms and skip connections. For

7
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Fig. 7. Structure of (a) RNN cell, (b) LSTM cell and (c) GRU cell.

example, skip connections can be applied when processing seismic bias b is added to the input value, i denotes the ith visible unit, j denotes
traces by using RNN-based models, to improve the extraction of infor­ the jth hidden unit, vi and hj represent the state of the corresponding
mation (Yoon et al., 2020). When using RNN-based models to handle visible and hidden units, respectively.
seismic signals, the employment of attention mechanisms contributes to ∑ ∑ ∑
the ability of the model to pay attention to more relevant and important E(v, h) = − vi hj Wi j − vi ai − hj bj (11)
i,j i j
information. For example, leveraging the local attention mechanism
allows the model to capture and exploit local features in the seismic This distinctive training approach enables DBNs to exhibit the
waveform (Mousavi et al., 2020). following advantages: (1) DBNs can avoid the gradient disappearance
problem that can occur when training other deep learning models. (2)
3.4. Deep generative models Pre-training of DBNs also avoids over- and under-fitting problems, while
enhancing the generalization of the models. This solution is potentially
3.4.1. Deep belief networks (DBNs) appropriate for geological hazard-related tasks with a limited sample
Deep belief networks (DBNs) are a type of deep learning model that size. (3) The training is rapid, and the computational complexity is
is stacked by simple neural network restricted Boltzmann machines inexpensive. Most importantly, DBNs have excellent feature extraction
(RBMs) (Hinton et al., 2006b). RBMs are composed of a two-layer capabilities. Therefore, an RBM can extract more important features
network, i.e., a hidden layer h and a visible layer v, where the visible from geohazard-related data (e.g., landslide inventories), resulting in
layer handles the input data, and the hidden layer performs feature more efficient and accurate performance in specific applications (Ye
extraction by employing dimensionality reduction and various data et al., 2019).
encoding mechanisms (Srivastava and Salakhutdinov, 2014). In a DBN, Currently, DBNs are rarely employed in geohazard scenarios. On the
each pair of two layers is able to be viewed as an RBM (see Fig. 8) (Wang one hand, classic DBNs are mostly designed to deal with one-
et al., 2021b). dimensional inputs, whereas geohazard-related tasks involve multiple
Typically, the training process of DBNs is divided into two phases: a inputs (Li et al., 2019). The application of DBNs to geohazard-related
pre-training phase and a discriminative fine-tuning phase. First, a DBN is tasks can be challenging due to the complexity of obtaining valuable
pre-trained layer by layer, i.e., several RBMs it contains are trained information from various inputs. On the other hand, when dealing with
sequentially in an unsupervised manner. This learning strategy enables image and time-series data related to geohazards, CNN-based models
DBNs to automatically extract increasingly abstract representations and RNN-based models, which are specifically designed to extract the
from massive amounts of unlabeled data. Then, in the discriminative relevant features of these data, tend to be more widely utilized. There­
fine-tuning stage, DBNs can perform classification or other tasks on fore, it is worth further exploring how effective and applicable DBNs are
small, labeled datasets by fine-tuning alone. for geohazard-related tasks.
During the training process, different from most deep learning
models, DBNs substitute an energy function for the loss function that is 3.4.2. Autoencoders (AEs)
employed for minimization. Thus, the purpose of training is to identify AEs are common generative models with a specific encoding and
the parameters that minimize the energy. The energy function is defined decoding architecture, which is capable of revealing latent features of
by Eq. (11), where the input data is multiplied by a specified weight W, a data without any labeling information, and has superior performance in
dimensional reduction and denoising of data (Hinton and Zemel, 1994).
For this reason, this type of deep learning models has been introduced
into current geological hazard analysis, for example achieving efficient
and highly accurate landslide susceptibility assessments from landslide
inventories without labeled information (Huang et al., 2020).
Typically, an AE involves two components, the encoder, and the
decoder, connected by a bottleneck-shaped hidden layer, as illustrated
in Fig. 9. The encoder compresses the data by mapping a given input to
the hidden layer, while the decoder reconstructs an approximation of
the input. Specifically, the encoder is designed to map the input data
x ∈ ℝd , to the latent representation h ∈ ℝp ; the decoder is designed to
map the latent representation h to a reconstructed version x ∈ ℝd . The

process is expressed in Eq. (12), where σ(⋅) refers to the activation


function.
Fig. 8. Structure of (a) RBM and (b) DBN.

8
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

3.4.3. Generative adversarial networks (GANs)


GANs are a type of generative model that employs adversarial
training. Analogous to an AE, a GAN consists of two parts: generators
and discriminators. The generator G is designed to mimic the real data
distribution Pdata (see see Fig. 10). Typically, the generator G generates
samples (the fake data) G(z) from a noise sample z following a prior
noise distribution Pz; while the discriminator D, commonly a binary
classifier, is designed to distinguish between fake data and a real sample
by applying certain functions (e.g., a Sigmoid function).
The adversarial training process for GANs is predicated on the idea of
a zero-sum game, where the generator G and the discriminator D have
ambivalent objectives. On the one hand, the objective of the generator G
is to minimize the probability of the discriminator performing a correct
identification. On the other hand, the objective of the discriminator D is
to maximize the probability of assigning the correct labels to the real
data and fake data. The ultimate objective of the learning process is to
enable the discriminator no longer be able to distinguish real data and
fake data, thus acquiring a competently powered generator G. The
process can be formulated as a mini-max problem (see Eq. (14)), where
argmin is employed for the arguments of the minimum in the objective
function of the generator G, and argmax is employed for the arguments
of the maximum in the objective function of the discriminator D.
G∗ = argminG maxD ℒG AN (G, D) (14)
Fig. 9. Structure of (a) AE and (b) VAE.
where ℒG AN (G, D) is the GAN objective function.

h = σ (Wx + b) ℒG AN (G, D) = Ex∼Pdata logD(x) + Ex∼Pz log(1 − D(G(z))) (15)


′ ′ ′ ′ (12)
x = σ (W h + b ) Regarding the improvement aspects of GANs, similar to AE-based
Following a training paradigm that minimizes the loss function ℒ, models, the two components of GANs can also employ arbitrary deep
the training objective of AEs is learning to produce reconstructed data learning models, thus achieving better performance in different tasks
that are as similar to the original input as possible. As expressed in in Eq. based on different data types, by exploiting the characteristics of these
(13), an AE tunes parameters (W and b) in training by minimizing a loss models. It is notable that AE-based models can also serve as generators
function ℒ (e.g., squared errors). or discriminators of GANs.
Furthermore, providing GANs with auxiliary information y (e.g.,
(13)
′ ′ ′ ′ ′
ℒ(x, x ) = ‖x − x ‖2 = ‖x − σ (W (σ (Wx + b)) + b ) ‖2 class labels) contributes to improving the performance of GANs. Build­
The versatility of AEs depends largely on the design of the encoder ing on this idea, one of the most typical variants of GANs, the Condi­
and decoder networks. For example, encoders and decoders can be tional GAN (CGAN), was developed. Both the generator and
MLPs, RNNs, or CNNs and to this extent, the flexibility extends the discriminator of a CGAN are constrained by the auxiliary information,
applicability of AE-based models when confronted with different data which not only reduces the stochasticity of the generated data, thus
and different tasks. For example, in seismic analysis, an AE-based model producing data that adhere to a certain pattern but also accelerates its
consisting of convolutional layers is employed to extract features from training. The objective function of a CGAN is formulated in Eq. (16),
seismograms and reduce the number of dimensions, thus performing which contains the labels y and the target labels y′ associated with a real
seismic detection or other tasks (Mousavi et al., 2019b). sample x.
Indeed, improvements on the AE-based model are beyond modifying ( ) [ ( )]
ℒG AN G, D = Ex∼pdata logD x∣y
the network of encoders and decoders. Currently, there are three com­ (16)
mon variants of AEs with their own advantages. (1) The denoising +Ez∼pz [log(1 − D(G(z|y' )|y' ) ) ]
autoencoder (DAE) is designed to recover the original undistorted input
The superior generative capability of GANs is usually exploited to
and to obtain a robust representation from corrupted input data (Vincent
augment training datasets, especially those containing large amounts of
et al., 2010). (2) The sparse autoencoder (SAE) is designed to ensure that
noise or missing values, which are inevitable for geological hazard-
the number of active units in the layers is minimal and to learn better
related tasks. Consequently, the current main application for GANs in
representations, and it can solve the problem that the number of hidden
geological hazard analysis is the reconstruction of seismic data. For
units exceeds the number of inputs (Makhzani and Frey, 2014). (3) The
example, missing seismic traces are interpolated by learning approxi­
variational autoencoder (VAE) is designed to regularize the latent space
mate data distributions from the observed seismic dataset (Alwon,
to avoid overfitting and ensure that the latent space maintains smooth
2019).
properties (Kingma and Welling, 2014). In the VAE model, the encoder
outputs the mean and variance for each latent dimension and samples
from the distribution to generate new data. 3.5. Graph neural networks (GNNs)
Furthermore, AE-based models have the significant advantage of
handling situations where labels in the dataset are imbalanced (Yang In the last few years, GNNs have become among the state-of-the-art
et al., 2021). Imbalanced datasets are common in geological deep learning models (Hamilton et al., 2017; Veličković et al., 2018).
hazard-related tasks. For example, an AE-based model with excellent Different from other deep learning models that are adept at capturing
generalization can be obtained by training on remote sensing datasets high-level features from hidden Euclidean data (e.g., images, text,
that have very limited avalanche events (Sinha et al., 2019b). video), GNNs are available for graph-structured data generated in
non-Euclidean domains that represent complex relationships and in­
terdependencies between objects. Most data can be represented as

9
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Fig. 10. Structure of GAN.

special graph-structured data. For example, for sources of geological outline its characteristics and list the adopted data and deep learning
hazard related data, a monitoring system can be considered a graph models. These models aim at solving specified problems in the general
where each monitoring device can be considered a node on the graph. workflow of geological hazard analysis (see Fig. 11).
A GNN aggregates a node with its neighbors, and finally obtains a
representation of each node combined with all its neighbors, to perform 4.1. Landslide prevention
further tasks, e.g. classification or regression of node information.
Recently, GNNs have been introduced for geological hazard-related Landslides are the most common geological hazards; at least 17% of
applications. For example, by representing multiple seismic stations as natural hazard fatalities across the world can be attributed to landslides
graph-structured data, the GNN model can simultaneously conduct (Pourghasemi et al., 2013). Landslides are almost ubiquitous in areas
seismic source characterization by adopting data from multiple seismic with slopes and are attributed to multiple interactions between condi­
stations (van den Ende and Ampuero, 2020). tioning factors (e.g., soil, rock, and other geological environmental
characteristics) and triggering factors (e.g., weather or seismic) (Froude
4. Deep learning for geological hazard analysis: applications and Petley, 2018). Conditioning factors primarily influence the location
of landslides, while triggering factors determine the occurrence of
In this section, we focus on deep learning applications in geological landslides. They work together to control the extent of the landslide
hazard analysis over the last few years for six representative geological impact area and the size distribution of the slope damage.
hazards: landslides, debris flows, rockfalls, avalanches, earthquakes, While the major landslide conditions and triggers and the simple
and volcano activity. For each type of geological hazard, we briefly physics behind how they govern landslide occurrence are well known,

Fig. 11. A general workflow of deep learning application for geological hazard analysis.

10
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

predicting when and where landslides will occur remains challenging. employed to predict the short-term behavior of landslides. Additional
Recent studies have demonstrated that deep learning effectively enables details are presented in Table 3 (see the table in the appendix).
the identification of landslide-related information from big data; hence,
its combination with earth observation data is a promising potential 4.1.1. Landslide detection
solution (Dai et al., 2020). The application of deep learning in landslide detection focuses pri­
Substantial and high-quality measurement data are available from marily on image analysis. The reason is that landslides severely damage
satellite radar and in-situ sensors under the latest advances. The impli­ the surrounding environment, especially the vegetation, thereby
cation is that deep learning can leverage the aforementioned data to exposing brighter parts of the ground (e.g., rocks and soil), which
realize a variety of tasks in landslide research with higher accuracy and changes various properties of the image data in that area. In predisaster
lower cost, such as (1) landslide detection, which is used mainly to ac­ scenarios, deep learning enables the comprehensive detection of active
quire interesting landslide information (e.g., landslide characteristics slopes at a regional scale to identify potential landslides. In postdisaster
and size) and facilitate the development of a more detailed landslide scenarios, deep learning enables the rapid acquisition of landslide in­
inventory; (2) landslide susceptibility assessment, which is mainly formation for emergency response.
employed to predict which areas are more susceptible to future land­ Furthermore, landslide image analysis includes object-based image
slides; and (3) landslide displacement prediction, which is mainly analysis (Martha et al., 2012) and pixel-based image analysis (Danneels

Fig. 12. Deep learning application for landslide detection. (a) Change detection using pre- and post-landslide image data. (b) Objective detection using spectral and
spatial information. (c) Objective detection by applying only optical imagery data. (d) Transfer learning by using pre-trained model. (e) Multi-source information are
stacked as multichannel images for feature extraction by CNN-based model. (f) Feature extraction from multi-source information by employing data fusion. Landslide
images are taken from Ji et al. (2020) and NASA Earth Observatory (2014).

11
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

et al., 2007). Typically, pixel-based methods recognize landslides on the accurately detect landslides even under cloud cover conditions by using
basis of the discrepancies between landslide and nonlandslide pixel augmented Sentinel-2 images. This finding suggests that an increase in
features (Lu et al., 2019), which usually detect changes by leveraging the number of features has the potential to enhance the distinction be­
spectral information. Comparatively, object-based classification tween similar features and thus improve detection accuracy.
methods regarded individual landslides as ‘objects’ and recognize Similarly, Wang et al. (2019e) extracted four bands (red, green, blue,
landslides by simultaneously exploiting spectral and spatial information and near-infrared band values) from predisaster and postdisaster remote
(Stumpf and Kerle, 2011). Established deep learning models of change sensing images and fused them with NDVI indices. The fused remote
detection for a landslide mainly utilize pixel-based binary classification sensing image with nine bands as features is input into a simple CNN
between landslides and their background, namely, semantic segmenta­ with two convolutional layers and achieves high accuracy and fast
tion, where common inputs are raster images with several bands and landslide change detection. The results demonstrate that the mixed band
labeled pixels. increases the recognition efficiency. Considering that the utilized data­
set is predominantly obtained under conditions of fair weather and
4.1.1.1. Change detection using pre- and post-landslide image data. larger vegetation coverage, it is still a challenge to achieve accurate
Landslide detection is often considered as a type of land cover change landslide detection by using satellite images under adverse weather
detection due to the impact of landslide occurrence on the ground sur­ conditions, e.g., rain and snow.
face (Lv et al., 2018). In conventional methods for land cover change To further avoid false-positives that are caused by unfavorable fac­
detection, binary thresholds are required to divide the images in terms of tors (e.g., low resolution, weather effects, and brightness) in optical
change magnitude, where the determination of the optimal binary images, SAR images can be a compelling solution. However, challenges
threshold is laborious and subjective (Zanetti and Bruzzone, 2018; Im remain. SAR images suffer from severe speckle noise, segmentation
et al., 2007). As a data-driven approach, deep learning can avoid target (landslide area) size variation, and image incompleteness, among
time-consuming preprocessing; it enables direct detection of ‘change’ other issues.
and ‘no change’ regions without the binary threshold (see Fig. 12a). To address the above problem, An et al. (2020) demonstrated how
Currently, deep learning-based methods have achieved satisfactory different attention modules could be applied in a deep learning model,
performance in land cover change detection (Khan et al., 2017; Wang for extracting information from SAR images under adverse weather
et al., 2019b; Zhang et al., 2016). However, change detection on land­ conditions. In the model, four ResNet blocks are utilized as the backbone
slides remains a challenge, considering that pre- and post-landslide network, and the employed attention modules include (1) an attention
images are significantly different from conventional land cover tasks tuning module, which is applied to the first and last ResNet to collect
regarding spectra, moisture, and phenomena. rich channel information, and (2) a global attention module, which
To solve this, Lv et al. (2020) developed a novel dual-path fully consists of a position attention module (PAM) and a channel attention
convolutional network (DP-FCN) that consists of two cooperative module (CAM). The PAM extracts the global spatial information from
modules with diverse objectives. The first module is an extension of the the feature maps of the disaster area (anomaly class) and nondisaster
U-Net architecture that utilizes four convolutional layers of encoders area (normal class). The CAM efficiently classifies each pixel based on
and decoders to simultaneously extract high- and low-level features various features that are generated by the multichannel. The model
from dual temporal images. These extracted different features are achieves accurate segmentation of the landslide area after a rainstorm
concatenated to preserve the details related to landslides in very by utilizing a three-channel image that is synthesized from three types of
high-resolution remote sensing images. Two extracted 3D feature cubes images: pre-disaster SAR, post-disaster SAR, and DEM. As a preliminary
are then fed into the second module, which utilizes two fully convolu­ exploration, their approach advances solutions to developing reasonable
tional layers and one dense convolutional layer for joint learning. models according to different data, and demonstrates that attention
Therefore, landslide areas are detected by directly extracting landslide mechanisms can improve information capture to prevent the loss of
features from a bitemporal image, which avoids preprocessing opera­ valuable information.
tions, including the calculation of binary thresholds and the generation Another challenge in applying pixel-based change detection is that
of the change magnitude image in traditional methods. Similar to other the generation of numerous broken pixel blocks causes the model to
deep learning models, DP-FCN requires a large amount of labeled struggle to distinguish between landslides and features that are similar
training data. to landslides. In a novel integrated approach, Shi et al. (2020) treated
Another challenge of change detection for landslides is that landslide landslides as objects with multiple attribute information (e.g., landslide
areas suffer from serious spatial uncertainty, which increases the diffi­ source and landslide trajectory). The researchers developed an elegant
culty of extracting effective landslide features from images. To further architecture that contains two CNN models for classification and
improve the accuracy of localization in landslide detection, a deep object-oriented change detection. The first CNN model with the U-Net
learning model (FCN-PP) was developed by Lei et al. (2019). Instead of architecture effectively extracts high-level features from historical
employing concatenation, the model introduces a pyramid pooling landslides. By sharing the weights from the first trained CNN, the second
module into the fusion of multi-scale features, thus obtaining a more CNN model uses a Siamese structure to compare the similarity of pre-
powerful feature representation. Experimental results demonstrate that and post-landslide images and utilizes a fully connected conditional
FCN-PP exhibits a robust capability to extract landslide-related spatial random field to refine the boundaries. Furthermore, the framework re­
information, and thus achieves better segmentation results in landslide alizes fast identification of high-resolution aerial landslide images with a
areas. In this work, time-consuming preprocessing is still required to parallel computing training strategy. The framework ingeniously com­
filter the large amount of noise contained in the bitemporal images. bines deep learning methods with remote sensing processing techniques
From a data-driven perspective, deep learning models trained by to obtain high-quality landslide area segmentation results.
more diverse datasets are another alternative approach to enhance the
performance in change detection. Previous deep learning models 4.1.1.2. Objective detection using spectral and spatial information. Deep
employ only a single band or RGB band as a feature, which can affect the learning models are similarly introduced into objective detection, in
performance of models for distinguishing ground objects (e.g., soil area) which spectral and spatial information is the foremost data source (see
with spectral or geometrical features that are similar to those of land­ Fig. 12b). The former is employed to learn features and obtain target
slides. To solve this, Ullo et al. (2019) employed five bands from satellite classification results, which include diverse satellite imagery (e.g.,
images and stacked the pre- and post-disaster images into a CNN model panchromatic band images and multiple band images); the latter is
with nine convolutional layers. As a result, it was revealed that CNN can employed to target locations and capture contextual information (e.g.,

12
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

morphological features), commonly high-resolution DEM (Guzzetti CNN model while utilizing a combination of DEM topographical infor­
et al., 2012). Extracting sufficiently valuable landslide-related infor­ mation and spectral information of optical images as input, where
mation from these two data sources is a challenge for objective detection ResNet-50 realized the most accurate detection.
by applying deep learning models. Moreover, the high-resolution DEM that is derived from LiDAR
As an initial attempt to develop a deep learning-based method for provides more complementary information on the bare ground topog­
hyperspectral image landslide detection, Ye et al. (2019) used a DBN raphy. For example, Prakash et al. (2020) stacked multiple landslide
model with three hidden layers to gradually extract high-level features topographic features that were extracted from radar DEM images and
from hyperspectral images and landslide inventory maps (with infor­ multiple spectral features of landslides that were extracted from optical
mation on multiple predisposing factors, such as fault zones, earth­ images and fed them into a modified U-Net architecture (with ResNet34
quakes, soil, rivers, roads, rainfall, and vegetation coverage) and utilized blocks in its downsampling path). Semantic segmentation of landslides
a logistic regression layer that was added at the end of the model for at a regional scale was effectively performed.
recognition. The model was applied in the detection of In practice, landslide datasets typically suffer from an unbalanced
earthquake-induced landslides and outperformed typical classification class distribution, where more pixels belong to background objects (e.g.,
methods (e.g., SVM). Satisfactory experimental results demonstrated the vegetation, urban area, and water). To reduce the class imbalance be­
ability of a DBN to extract deep features of complex data and thus to tween positive and negative classes, Yu et al. (2020) designed a CNN
obtain spectral-spatial features of the data. Moreover, one advantage of model for contour-based semantic segmentation that consists of
DBN over other models (e.g., CNN) is that it requires a smaller amount of ResNet-101 and a pyramid pool module by extending the pyramid scene
training data, and it is able to handle the problem of overfitting small parsing network (PSPNet). On national scales, PSPNet can effectively
samples. detect landslides in the case of an imbalanced data distribution. It pro­
Considering that CNNs capture the local correlation of spatial in­ vides a practical solution for the supplementation of landslide databases
formation due to convolutional operations, they remain an ideal in countries that lack data.
candidate for handling high-dimensional spatial data in hyperspectral As illustrated in Fig. 12e, the above study shows that spectral layers
images. To evaluate the performance of CNN models that utilize spectral and topographical layers (e.g., elevation, slope, and curvature) are
and topographical information as inputs for landslide detection, Ghor­ directly stacked to form a ‘multichannel image’ as regular input to a
banzadeh et al. (2019a) utilized CNNs of various depths and input CNN-based model, which can effectively detect landslides. Furthermore,
windows of various sizes for target monitoring, where spectral infor­ data fusion is a method that is available for further improving the model
mation of five bands and three main topographical information are performance (see Fig. 12f). Consequently, Sameen and Pradhan (2019)
stacked as input. A stacking implies that the topographical information proposed a novel feature-level fusion method that converts spectral data
is treated as an additional channel, enabling the dimension of the input and terrain variables into an abstract representation by using a neural
matrix to be a × a × 8, where a × a are window sizes, and 8 is the network and fuses them using various operations (e.g., activation
number of channels. The results demonstrated that the supplementation functions). The fused data are fed into a CNN model with residual blocks
of topographic information contributes to the distinction of subsidence for detection. The method implements the extraction of complementary
areas from landslide areas with similar spectral features. features from spectral (RGB bands) and topographic information (alti­
To further improve the performance of the classic CNN model for tude, slope, aspect, and curvature) to detect landslides more effectively.
landslide detection in study areas, Ghorbanzadeh et al. (2019b) During a postdisaster emergency response, it is difficult to collect
considered an improved method of obtaining training datasets as a so­ high spatial resolution DEM images within a limited time. For remote
lution, for example, by employing a fishnet tool in ArcGIS software for mountain areas with complicated topography, this scenario is especially
sampling. The CNN model achieved an increase in detection accuracy, as difficult. One solution is to perform high-precision landslide detection
expected. The result implies an appropriate sampling method can enable by applying only optical imagery data (Fig. 12c). For example, Qi et al.
the input data to cover more detailed information of a landslide, which (2020) developed a CNN model (ResU-Net) with a U-Net architecture
contributes to the feature extraction of the CNN-based model. Further­ and 56 convolutional layers, where residual blocks are employed as an
more, they compared and analyzed the improved performance of basic alternative to each convolutional layer in the encoding path of the
CNN models when adopting patches of various sizes (32 × 32, 48 × 48, original U-Net model. This modification avoids model degradation
64 × 64, and 80 × 80 pixels). The results reveal that the window size of (deterioration of model predictions over time for similar data), thereby
the sample patch (i.e., input size of CNN-based models) can significantly enabling ResU-Net to accurately detect landslides from optical images
influence the performance of CNN-based models for landslide detection. alone. Yi and Zhang (2020) developed a novel end-to-end CNN model,
Accordingly, improving the sampling method and size of the training namely, LandsNet, that adds a residual block and attention module
data could be considered for improving the performance of CNN-based along with multiscale fusion operations. Consequently, it can accurately
models applied in landslide detection. recognize and map landslides from single-temporal optical satellite
Another interesting opportunity would be to explore more archi­ images.
tecture designs, which will facilitate the performance of CNN models in In summary, the most commonly employed deep neural networks for
objective detection. For example, Soares et al. (2020) introduced the object detection tasks in landslides are ResNet and U-Net, which
U-Net model to perform the landslide detection. They combined spectral improve the accuracy of detection by extracting more valuable
information that was collected from high-resolution sensors and topo­ landslide-related information from the input data. The elegant archi­
graphic information that was provided by DEMs to separate sampled tecture and the high-quality data contribute to the accuracy and
images into three sizes (32 × 32, 64 × 64 and 128 × 128 pixels). The robustness of landslide detection. However, challenges remain because
satisfactory performance demonstrates that an appropriate architecture high-quality data is not always immediately available.
contributes to capturing more meaningful features from optical and
topographical information, achieving highly accurate detection. 4.1.1.3. Transfer learning to overcome limited samples. The detection of
For extracting distinctive features of landslides from complicated landslides via feature extraction from high-resolution images requires
backgrounds, Ji et al. (2020) developed a novel 3D attention module many samples due to large spatial-spectral variations. However, the
that extracts an integrated spatial and channel attention map to replace collected image data are typically insufficient for training a deep
the attention module that processes the channel and spatial attention learning model. Even trainable deep learning models are potentially
maps separately. The proposed module is integrated into CNN landslide affected by model degradation (deterioration of model predictions over
detection models with various architectures, and it was demonstrated time for the same/similar data).
that the attention module significantly improved the performance of the

13
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

To overcome the challenges posed by limited samples, transfer In the last few years, various basic deep learning models have been
learning, which specializes in small sample problems, has also been gradually introduced into landslide susceptibility assessment, which
introduced as a promising solution for landslide detection with a small includes MLP, CNN, RNN, and AE. These models can automatically
number of images (Ghorbanzadeh et al., 2019a; Ding et al., 2017) (see extract high-level features from different data representations and ach­
Fig. 12d). For example, Ullo et al. (2020) employed a pretrained Mask ieve significant performance in landslide susceptibility evaluation. The
R-CNN model (an object detection model that can draw a bounding box general workflow of the aforementioned models is illustrated in Fig. 13.
that surrounds the objects) to perform transfer learning to extract As one of the simplest and most basic deep learning models, the MLP
landslides in terrain with minimal landslide photographs. By utilizing model has been used to replace the traditional statistical and machine
ResNet as the backbone network of the developed models, the result learning models in landslide susceptibility assessment. The model out­
demonstrates that the employment of transfer learning and an appro­ performs other traditional models in prediction and is unaffected by the
priate architecture effectively improves the performance of detection landslide sampling method (Hua et al., 2021; Bui et al., 2020; Dou et al.,
models that have limited input samples. This work is that the training 2020; Dao et al., 2020). As a general rule, a reasonable design of
images still require manual annotation before training, which is a hyperparameters and optimization algorithms improves the perfor­
time-consuming process. mance of a model. From a comparison of three optimization algorithms,
Similarly, Lu et al. (2020) utilized a CNN-based model (ImageNet) to Nhu et al. (2020) concluded that Adam algorithm is suitable for
recognize landslides from image data and used a feature transfer obtaining improved prediction results for MLP models in landslide sus­
learning approach to obtain a pre-trained model. To address the problem ceptibility assessment.
that the classification results of pre-trained models are commonly Due to convolutional operations, CNNs can capture the contextual
inconsistent with the real boundary of the landslide area, this work information in the input data, thus achieving a powerful feature
combines ImageNet with object-oriented image analysis methods, which extraction performance. For landslide sensitivity assessment, CNNs are
increases the accuracy of the deep learning model segmentation. also an excellent deep learning models for extracting valuable features
For training small samples, the proposed transfer learning-based directly from the raw data, and it is capable of eliminating the
approach has achieved satisfactory performance. However, when cumbersome traditional feature selection process. However, a number of
applied to detect earthquake-induced rock landslides, some exposed discrepancies exist between landslide susceptibility assessment and
rocks are incorrectly classified as landslides by the model, because the image processing. Consequently, to exploit the feature capturing capa­
spectral and structural characteristics of rock landslides are closely bility of CNNs effectively, the processing of landslide data into appro­
similar to those of bare rocks. Furthermore, the spectral characteristics priate data representations poses a major challenge for introducing CNN
of landslides are also misclassified by the model due to their similarity to models into landslide susceptibility assessments.
bare ground. This factor will limit the accuracy of this work. As a first exploration, Wang et al. (2019d) applied three vanilla CNN
When applying transfer learning, it is another interesting opportu­ architectures (1D, 2D, and 3D-CNN) to the corresponding data repre­
nity to explore different CNN architectures for improving the accuracy sentations, which were transformed from the original landslide data. For
of landslide detection. For example, Catani (2021) utilized four pur­ example, the 3D data representation means that the input data is rep­
portedly best-performing CNN architectures as pre-trained classifiers, resented as a 3D matrix with dimensions c × n × n, where c indicates the
including GoogLeNet, GoogLeNet-places365, ResNet, and Inception.v3. number of landslide influencing factors, and n indicates the rows and
The performance of these architectures demonstrated that the first two columns of each data layer. The experimental results demonstrate that
are compact and fast, with a time-saving advantage, whereas the latter the feature extraction capability of CNNs exceeds that of conventional
two are more accurate. Despite the satisfactory performance achieved, machine learning methods (SVM), and thus can be applied to develop
this work suffers from classification errors, especially regarding land­ reliable landslide sensitivity maps.
slide false negatives. Therefore, considering that the ultimate objective Similarly, Fang et al. (2020) treated multiple landslide inventory
of this work is to incorporate the pre-trained models into UAV systems, maps that were stacked together as a ‘multichannel image’, where each
for the purpose of real-time landslide mapping and investigation of channel denotes a single predisposing factor, and input this ‘image’ into
specific landslide-related topographic features, it is imperative to reduce a 1D-CNN to capture local representations among landslide features.
the false alarm rate and increase the reliability of the employed This is a hybrid model where CNN is simply employed to extract fea­
methods. Therefore, it remains a challenge to improve the prediction tures, and traditional machine learning methods are employed as clas­
accuracy of deep learning models when employing transfer learning. sifiers. The experimental results indicate that this hybrid model achieves
Collectively, these deep learning-based landslide detection methods higher classification accuracy than the single CNN model.
demonstrate that deep learning models can achieve accurate segmen­ However, many challenges remain because that the occurrence of
tation of landslide areas, which mainly depend on model improvements landslides is not only related to the landslide, but also closely related to
and data processing improvements. In terms of model design, deep the surrounding environment of the landslide. To enhance the accuracy
learning models based on U-Net and ResNet architectures have of landslide susceptibility assessment, the CNN-based approach requires
demonstrated excellent information extraction capabilities, thus consideration of multi-scale spatial information around landslides or
achieving high accuracy segmentation results. Novel deep neural net­ non-landslides. To address this problem, Yi et al. (2020) proposed a
works, e.g., attention blocks, are also increasingly favored, which can multi-scale sampling strategy, based on the assumption of spatial auto­
further enable the extraction of valuable information. In terms of data correlation, to fuse abundant spatial information, thereby generating a
processing, it is interesting to further explore on how to process data in a multiscale dataset. The dataset was applied to the vanilla CNN archi­
more time-efficient and cost-effective approach, thus allowing these tecture, and the results confirmed that the generated multiscale data
data to provide more geohazard-related information for deep learning facilitated the CNN model to automatically extract various deep fea­
models. Moreover, the deployment of pretrained models could be a tures, thus improving the accuracy of landslide susceptibility maps.
promising trend, for example, their application in UAVs. Furthermore, existing CNN models employed for landslide suscep­
tibility mapping are relatively simple, and the extracted features do not
4.1.2. Landslide susceptibility assessment involve any localization information. As an improvement to the basic
Landslide susceptibility assessment is to predict whether an area is CNN architecture, Hajimoradlou et al. (2020) introduced U-Net for
prone to geological hazards according to given data from an inventory feature extraction in landslide susceptibility assessments. The developed
(Guzzetti et al., 1999). The adopted data usually includes multiple model can track the surface and align it with the ground contours to
environmental factors of a geological hazard as well as a historical re­ extract relevant spatial features. Finally, an effective susceptibility
cord of its own occurrence. assessment map is generated.

14
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Environmental Data representations


factors
Landuse
Rainfall One-hot
Soil
TWI
Lithology
Aspect
Slope
1D data 2D data 3D data

Sequence data Sorted sequence data

Output Deep learning models


Simple CNN
Landslide Non-landslide

Landslide Susceptibility Map

Very low Simple RNN


Low
Moderate
High
Very High

Fig. 13. A general workflow of deep learning application for landslide susceptibility assessment.

Deep learning models such as CNNs involve many hyperparameters, develop more data representations suitable for the model or to improve
including batch size, learning rate, and number of hidden layers. A the model architecture.
reasonable selection of these hyperparameters is therefore also an In addition to supervised deep learning models, various unsuper­
important challenge to improve the accuracy of landslide susceptibility vised deep learning models are applied to landslide susceptibility
assessments. Sameen et al. (2020) utilized Bayesian optimization to assessment. These models can fully utilize the high sparsity and
determine the hyperparameters. The results indicate that Bayesian nonlinear correlation of landslide features to extract high-level features.
optimization improves the accuracy of the employed classic CNN by 3%. As an example, Huang et al. (2020) developed a model, namely, a fully
Furthermore, Pham et al. (2020) introduced a moth flame optimization connected sparse AE (FC-SAE). This model extracts the optimal features
(MFO) algorithm into a CNN model that was designed for landslide from predisposing factors without considering a priori assumptions
susceptibility assessments to search for optimal hyperparameters. based on the existence of a linear correlation between them. It is not
Another common deep learning model, namely, RNN, has been necessary to provide external labels for the landslide information.
introduced into landslide susceptibility assessments. Since the RNN Similarly, Nam and Wang (2020) integrated AE models (stacked AEs
model was developed for handling sequence data, the problem to be (StAEs) or sparse AEs (SpAEs)) into conventional machine learning
considered is how to convert the landslide data into sequential data models (random forest and support vector machine) and, thus, realized
applicable to the model. For example, Mutlu et al. (2019) provided a better predictive performance by capturing more meaningful feature
solution, they constituted a sequence by combining previously processed representations. The major advantage of the AE-based approach is that it
landslide occurrence information with the current location information does not require external labels for landslide information. Moreover, it
and fed it into an RNN for further prediction. Moreover, Wang et al. reduces the computational cost by reducing the dimensionality of the
(2020b) stacked predisposing factors into a sequence and sorted them landslide data. These advantages are expected to expand the scope of
according to importance. This method enabled the predisposing factors application and thus facilitate the application of AE-based models in
to be sequentially fed into the RNN model according to their importance; landslide susceptibility assessments.
thus, critical information that contributes to landslide identification
could be fully utilized. To further explore this issue, Thi Ngo et al. (2021) 4.1.3. Landslide displacement prediction
employed a basic CNN model and a basic RNN model for mapping Deformation prediction is also a major application scenario for deep
landslide sensitivity on a national scale, respectively. The comparison of learning in landslides prevention. Deformation prediction focuses on the
the experimental results revealed that the RNN obtained better perfor­ short term behavior of landslides and provides an important basis for
mance than the CNN. More work is still expected to apply RNN-based landslide early warning systems. Typically, early warning systems are
models for landslide susceptibility assessments, for example, to used to provide actionable warning information in advance of a

15
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

landslide, thereby mitigating potentially significant damage (Piciullo although DBNs can extract relevant features from deformation data with
et al., 2018). high nonlinearity and variability, it is difficult to capture the temporal
As a nonlinear dynamic process under the control of a complicated dependence of these time-series data. More comparisons are necessary.
environment, in landslide deformation, the influencing factors and Reasonable application of diverse data can further improve the
deformation behavior at the previous moment impact the deformation performance of deep learning models in deformation prediction. For
behavior at the next moment (Xu and Niu, 2018). Consequently, deep instance, Meng et al. (2020) compared the LSTM model performance
learning is an ideal tool for capturing high-level temporal correlations when adopting single-factor (cumulative displacement time-series)
from complex, highly nonlinear, and variable field monitoring data to versus multifactor prediction (reservoir landslide displacements,
realize more accurate displacement prediction. A commonly applied monthly precipitation, and reservoir water level) and demonstrated that
paradigm is introduced as follows (see the workflow in Fig. 14). the latter reduces overfitting and increases the accuracy. Other influ­
First, a few field-specific monitoring points are selected, and the encing factors outside of precipitation and reservoir levels still need to
collected time series data are denoised (e.g., via wavelet transform) and be introduced into predicting landslide displacement behavior. Fusing
decomposed into several components according to influencing factors. these features into deep learning models to improve the accuracy of
Second, a deep learning model is applied to predict the displacement of predicting landslide displacements is an impending and exciting
the landslide in the short term. Upon completion of the prediction, challenge.
various methods are employed to determine the alarm threshold.
Suitable models can increase the accuracy of deformation prediction. 4.2. Debris flow investigation
As the most popular model for predicting sequential data, RNN and its
variants are preferred for displacement prediction. Typically, an LSTM A ‘debris flow’ is a geological hazard in which large quantities of
model is introduced to predict dynamic landslide displacement by highly concentrated, nonplastic, saturated debris rapidly flow through a
adopting data that are decomposed into static and dynamic components steep channel (Hungr et al., 2014). When a debris flow occurs, it de­
(Xu and Niu, 2018; Xie et al., 2019; Yang et al., 2019; Jiang et al., 2020; stroys all objects in its path and, thus, causes substantial damage. Debris
Xing et al., 2020). Furthermore, other deep learning models are avail­ flows, which occur worldwide, often occur without any indication
able. For example, Li et al. (2020) employed a DBN model to extract (Coussot and Meunier, 1996). Therefore, there are few applications of
reservoir landslide displacement-related features from three denoised deep learning in the debris flow investigation.
time-series datasets (reservoir landslide displacements, monthly pre­ For debris flow investigation, the assessment of susceptibility is a
cipitation, and reservoir water level) without prior knowledge and typical task that contributes to regional debris flow monitoring and early
realized satisfactory predictive performance. This work does not warning. Inspired by a common application paradigm of deep learning
consider RNN and its related variants as a benchmark. Theoretically, models in landslide susceptibility assessment, Zhang et al. (2019b)

Fig. 14. A general workflow of deep learning application for landslide displacement prediction.

16
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

introduced classic MLP and CNN-based models into debris flow sus­ Alps (Techel et al., 2015). It is estimated that the economic cost of road
ceptibility assessment, where 15 external factors that are associated closures and infrastructure damage in Europe exceeds one billion Euros
with debris flow hazard events were utilized as a one-dimensional vector annually (Rudolf-Miklau et al., 2015).
input, and they effectively assessed the probability of debris flow Therefore, reliable investigation for avalanche activities has signifi­
occurrence in a specified region. cant socioeconomic implications for people who live in snow-covered
On the other hand, the immediate availability of information related mountain areas. On the one hand, these investigations assist in
to debris flows can contribute to debris flow investigations, including avalanche risk mitigation by estimating the stability of snowpack and
the response to debris flow emergencies, which is essential for rescue the variability of avalanche activity; on the other hand, they can furnish
and mitigation activities. Ideally, for example, spatial information on valuable information on climate change.
debris flows can be obtained through CNN-based models by utilizing Substantial time consumption, high expense, impassable accessi­
data from satellite images. However, the available data for model bility, and avalanche danger constrain the traditional field measurement
training are difficult to obtain. Multispectral imagery tends to be methods available for avalanche activity investigations. Fortunately, the
disturbed by weather, while high-resolution SAR images that are im­ development of satellite remote sensing has improved the current ability
mune to weather effects are quite expensive. to achieve continuous monitoring of avalanche activity in spatial and
To overcome these limitations, Yokoya et al. (2020) proposed an temporal terms. Considering that spatial and temporal data related to
impressive solution, where numerical simulation techniques are avalanche activity can further improve the identification of avalanche
exploited to alleviate the requirements for remote sensing data. In a risk zones and avalanche periods, SAR is a desirable data source.
comprehensive framework, they ingeniously combined numerical An advantage of SAR products is being able to continuously cover
simulation methods with CNN-based models (LinkNet). In this work, large areas without being affected by light and meteorological condi­
numerical simulation methods generated large amounts of synthetic tions, which is consequently available for detecting avalanche debris.
data by simulating debris flow hazard scenarios, which satisfy the However, conventional segmentation methods that capture only pixel-
training requirements of the CNN-based model. After sufficient training, wise features in radar backscatter tend to incur a large number of false
once the remote sensing images have been fed into the model, detailed positives, and thus are ineffective in avalanche detection.
information regarding topographic deformation can be rapidly ob­ To improve the accuracy of avalanche detection, it is desirable to
tained. The encouraging results show promising opportunities for explore automated methods, which can capture background information
combining numerical simulation methods with deep learning models. around pixels and high-level features (e.g., shape and texture) of
Additional details are presented in Table 4 (see the table in the avalanche debris. Fortuitously, this is exactly where deep learning
appendix). models, especially CNN-based models, have expertise.
In a work that employed the transfer learning strategy, based on a
4.3. Rockfall detection pre-trained CNN-based model (ResNet) trained with a natural image
dataset, Waldeland et al. (2018) demonstrated the effectiveness of deep
As a geological hazard that occurs with high frequency, rockfall learning techniques for classifying avalanches in SAR images. The work
occurs when the rock and soil mass on a steep slope experiences a local is divided into a two-stage process, where deep learning techniques are
slip or break phenomenon, and its dynamics include free-falling, rolling, applied in the second stage. In the second stage, a pre-trained ResNet is
bouncing, and sliding motions (Dorren, 2003). Few studies have been responsible for identifying whether candidate regions selected by
conducted using deep learning on rockfall due to its rapidity, lack of traditional methods contain avalanches. To enable SAR images to be
readily identifiable precursors, sudden collapse, and complex effectively utilized by pre-trained models that specialize in standard
mechanism. RGB images, an interesting solution has been proposed in this work. The
Several deep learning models have been applied in rockfall detection five channels of the input image are converted into three channels via
on the Moon and Mars with the objective of elucidating the geological various combinations, where the red channel corresponds to the
processes on the surfaces of the planets, which provide potential support avalanche event image and the green and blue channels to the reference
for future applications of deep learning on earth rockfalls. image.
For example, Bickel et al. (2020a) introduced a CNN model (Reti­ Furthermore, other than information from SAR images, can addi­
naNet) with multiple convolutional layers and residual blocks that could tional information improve the performance of CNN-based models in
capture invariant and distinguishing features (e.g., albedo, shadows, and detecting avalanches? To explore this possibility, Sinha et al. (2019a)
light shadows) and produce bounding boxes or segregated masks around applied SAR data and an independent avalanche inventory as input to a
the detections. Accordingly, detection contains two major stages. First, pre-trained CNN-based model (VGG-16). The inventory is collected from
manually labeled images are scanned, and rectangular boxes are utilized in-situ observations of mountain corridors where avalanches have
to identify candidate regions in the images where features are likely to occurred, which covers the occurrence of avalanches, and both quanti­
be detected. Second, the candidate regions are classified. Furthermore, tative and qualitative data. This work demonstrates the effectiveness of
the image data must be of sufficient resolution for the detection of CNN-based models for automatic avalanche debris detection, by
meter-scale damage. The method benefits from the high-quality data combining multiple sources of information. It also reveals that
that are collected by reconnaissance orbiter cameras and thus effectively avalanche-related characteristics affect the performance of the deep
identifies rockfalls and generates large-scale rockfall maps with optimal learning model, for example, the size of the avalanche debris.
efficiency and speed. The satisfactory performance of this deep learning The utilization of deep learning based on multisource information for
detection paradigm has been proven for rockfalls on the Moon (Bickel effective avalanche detection is inspiring. The exploration of diverse
et al., 2019) and Mars (Bickel et al., 2020b). input information is expected to continually open up further opportu­
Additional details are presented in Table 5 (see the table in the nities to enhance the performance of the CNN-based models in
appendix). avalanche detection. For example, as one of the most important factors
that contribute to the occurrence of avalanches, topography can provide
4.4. Avalanche detection multiple pieces of information to CNN-based models in the forms of
slopes, slope directions, or other DEM-derived feature maps.
An avalanche is a rapid, massive movement of snow along a hillside By exploiting this additional information, Bianchi et al. (2021)
or slope that occurs in snow-covered mountains throughout the world applied a CNN-based model (U-Net) for determining whether an indi­
(Eckerstorfer et al., 2019). Over the past four decades, approximately vidual pixel corresponds to an avalanche, where inputs consist of SAR
100 people have been killed in avalanches each year in the European images and two feature maps from the DEM. Two additional

17
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

topographical feature maps are obtained from the DEM image: the slope has been rapidly developed in various research directions, such as
angle feature and the angle of reach. The former is provided to the model seismic data interpolation and denoising, seismic phase picking, earth­
as an additional layer of the input image, while the latter is utilized to quake detection, localization, earthquake characterization, and focal
generate an attention mask from an attention block that contains three mechanisms.
stacked convolutional layers. The utilization of the attention module Additional details are presented in Table 7 (see the table in the
enables the model to concentrate more on easy-to-overlook avalanche appendix).
regions.
Overall, CNN-based models have typically served as classifiers in 4.5.1. Seismic data interpolation and denoising
avalanche detection. For real-world applications, this kind of classifier The raw seismic records that are collected are subject to environ­
tends to suffer from the extreme imbalance in the dataset, which mental and economic constraints, which lead to irregular sampling and,
significantly compromises its classification accuracy. As revealed by thus, the presence of erroneous or missing traces. These incomplete
Bianchi et al. (2021), a common solution that modifies the loss function seismic data directly affect the quality of data processing, which further
during model training cannot alleviate the class imbalance problem in affects the accuracy of downstream tasks.
avalanche detection. More approaches are expected to address this To improve the quality of seismic data, the effective implementation
challenge. of seismic data interpolation is a major challenge. Seismic data inter­
AE-based models are an interesting alternative that can identify polation is an essential method for reconstructing missing traces, which
anomalies by comparing the compressed-reconstructed data with the is a prerequisite for improving other data processing and analysis tasks
original data. This capability enables it to be employed in anomaly in seismic research. Another major challenge is denoising, which in­
detection of rare events, and has achieved impressive performance. In volves separating the noise from the seismic signal or attenuating it.
the case of extremely imbalanced avalanche data, the avalanche can be Existing traditional interpolation methods (e.g., interpolation based
considered the anomaly event, thus rendering avalanche detection into on sparse mathematical transformations) are limited to random missing
an anomaly detection. Building on this idea, Sinha et al. (2019b) scenarios, and struggle to handle regular missing scenarios. To address
demonstrated that the AE-based model (VAE) can be an excellent tool this dilemma, deep learning models serve as a solution by virtue of their
for high accuracy avalanche detection and is able to outperform advantages in automatic feature extraction. As a preliminary explora­
CNN-based models. tion, Wang et al. (2018) addressed seismic data interpolation as an
Additional details are presented in Table 6 (see the table in the image reconstruction problem, and employed a CNN-based model
appendix). (ResNet) to perform anti-aliasing interpolation of regular missing
seismic traces (see Fig. 15a). Trained and evaluated with synthetic data
4.5. Earthquake analysis and observed data, they demonstrated that ResNet is effective in
generating regularly missing data and that the model has robust
Earthquakes, which are phenomena in which the Earth's surface generalization performance. This work confirms that CNN-based models
suddenly shakes violently, are typically caused by the sudden release of can eliminate the linearity, sparsity, and low-order assumptions of
accumulated stresses within the Earth under the tectonic driving forces traditional algorithms, and extract higher-order features of the training
of plate motion and mountain building. According to the World Health data, thus allowing the interpolated data to be closer to the true
Organization, earthquakes were responsible for nearly 750,000 deaths measurements.
and affected more than 125 million people worldwide during 1998 ~ Following the basic idea of regarding seismic data as images, a more
2017. ingenious approach is to combine CNN-based models with generative
To mitigate the social and economic consequences of a damaging models that specialize in data reconstruction. It is exactly the work done
earthquake, over the past 10 years, improvements in seismometer by Mandelli et al. (2019). They deployed an AE-based model that
design, installation, and network density have enhanced the earthquake combines AE with U-Net, considering that the encoding and decoding
hazard monitoring capability. Moreover, the large amount of data that is structure of U-Net typically contributes to maintaining the integrity of
generated lay the foundation for improved studies of the detailed the data, resulting in highly accurate reconstructions. By experimenting
structure and physical properties of the Earth's deeper interior. on synthetic and field data, this hybrid model can reconstruct clean and
Collected seismograms provide many signal records that are associ­ densely sampled gathers in a more efficient and accurate manner.
ated with the Earth's motion (displacement, velocity, or acceleration) To further improve the reconstruction accuracy of U-Net, an increase
over time and, thus, reflect the influence of the combination from the in the number of hidden layers is desirable. Nevertheless, deeper deep
source, the propagation path, the characteristics of the recording in­ learning models potentially could experience gradient vanishing, which
struments, and the environmental noise that results from the conditions may cause ineffective model training. To avoid this problem, Tang et al.
at the recording location. These signals are typically ordered time series (2020) added residual blocks to the deep U-Net (ResU-Net). The model is
that consist of diverse local features (e.g., P-wave and S-wave) and designed mainly for processing seismic data acquired by sparse 2D
global features (e.g., ground waves and scattered waves). Depending on acquisition. On 2D seismic data, ResU-Net demonstrates its robustness in
the polarization patterns, P-waves are more highlighted on the vertical reconstructing random and regular missing traces by capturing more
component seismograms, whereas S-waves are more highlighted on the information, which also simplifies training.
horizontal component seismograms. Furthermore, a deep learning model with high generalization per­
Ideally, each channel (seismic trace) is composed of a wavelet that formance is expected, which would preserve robust data processing
corresponds to the arrival time of the signal. In practice, the arrival capabilities in various scenarios. DeepDenoiser, developed by Zhu et al.
signal is subject to random background oscillations as a result of noise (2019b), appears to be an example for this purpose. The model, inspired
(Arora et al., 2011). All unwanted recorded energy that contaminates by the basic AE, consists of a series of fully convolutional layers to form a
seismic data can be considered seismic noise. To avoid the effects of descending-then-ascending encoding-decoding architecture, which
noise and to obtain clean, noise-free, high-quality seismic images, contains a large number of skip structures, allowing it to learn the fea­
seismic data must be processed carefully. The quality of seismic data can tures from not only seismic signals, but also various noise sources. By
be assessed based on the signal-to-noise ratio, the increase of which learning the sparse representation of seismic data in the time-frequency
implies an improvement in data quality. domain, the model decomposes the input data into signals of interest
Given its substantial potential to automate the processing and anal­ and noise. The results demonstrate that DeepDenoiser has remarkably
ysis of large amounts of seismic data, deep learning is attracting promising generalization performance and can handle white noise,
tremendous research interests in the field of seismology; as a result, it various colored noises, non-seismic signals well, and even signals other

18
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Fig. 15. Deep learning application for seismic data interpolation and denoising. (a) The CNN-based and RNN-based model for reconstructing seismic data with
missing traces. (b) CGAN for reconstructing seismic data with missing traces.

than seismic data. The effectiveness of DeepDenoiser portends the great acquisition data or 3D field data), Kaur et al. (2020) achieved the
potential of AE-based models in seismic data denoising. Furthermore, reconstruction of missing traces on seismic data by leveraging the
the performance of such models in seismic data interpolation deserves cycleGAN model. Apparently, the missing seismic data indeed have a
more exploration. similar mathematical distribution as the observed seismic data, and GAN
The strategy of treating seismic data as images sufficiently exploits can learn these latent features to reconstruct the missing seismic data.
the advantages of CNN-based deep learning models, and it is a motiva­ The cycleGAN as a classic GAN model suffers from limitations, such as
tion to investigate a similar approach. Another interesting opportunity is the vulnerability to model collapse during training (i.e., a form of failure
to consider seismic data as time series, and thus RNN-based deep of the GAN model). More improvements, or the introduction of more
learning models can be utilized to manipulate seismic data. Following GAN models into seismic data interpolation tasks, are desirable.
this idea, Yoon et al. (2020) attempted to implement the reconstruction The capabilities of GANs in seismic data processing have been
of seismic data by leveraging five RNN-based deep learning models extended by employing more state-of-the-art models. For example,
(basic RNN, LSTM, bidirectional LSTM, deep BiLSTM, and DBiLSTM Alwon (2019) developed the generative adversarial noise attenuation
with skip connections). In this work, the traces of seismic data are network (GANAN), inspired by conditional style transfer type networks.
considered time series to be fed into the model, thus completing the The generator network is a U-Net based architecture, and the architec­
prediction of the missing traces between two existing input traces. ture allows for the preservation of high-level details of the input shot by
Although the results are satisfactory, RNN-based deep learning models utilizing skip connections. The results demonstrate that GANAN can
have difficulty capturing the spatial correlation of different traces from convert noisy images to clear images and thus achieve the denoising of
seismic data and handling a large number of input traces seismic data.
simultaneously. A limitation of most GANs is the inability to control the class of the
Alternatively, can seismic data be considered a data distribution? generated data. To improve this situation, as an extension to GANs,
Seismic data may have a known data distribution. As a typical genera­ CGANs can perform conditional generation of seismic data based on
tive model, GAN is exactly designed to learn the data distribution, random noise and given conditions. For common CGANs, the objective
meaning that GAN is able to generate new seismic data by fully sum­ of adding random noise is to enable the generated images with different
marizing the data distribution. The generated seismic data share styles given identical conditions. For CGANs intended for seismic data
mathematical distributions and features similar to the raw data. When interpolation, a common practice is to not add random noise, consid­
missing data can also be generated based on a certain data distribution, ering that seismic data with missing traces require certain interpolation
GAN is very promising for seismic data interpolation (see Fig. 15b). results.
By experimenting on various types of datasets (including 2D short As an initial attempt, Oliveira et al. (2018) employed CGAN to

19
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

generate missing data for portions or entire regions in seismic images. As inherent characteristics of the GAN model. First, compared with other
illustrated in Fig. 15b, in this work, the generator is fed an image x with deep learning models, GAN has two sets of deep neural networks, the
a missing seismic trace within a certain width, and then attempts to generator and the discriminator, which indicates that it may contain
generate an image g(x). The discriminator attempts to accurately classify tens to hundreds of millions of parameters, and thus requires high
a set of fake images (x and g(x)) and a set of real images (x and the computational costs. One solution is the utilization of low-frequency
original complete image y). The model can eventually generate data that data at large spatial sampling rates and decimated trace counts
are highly consistent with the original seismic data. (Alwon, 2019). Second, the training of GANs is prone to failures, e.g.,
To extend seismic data to the frequency domain and thus achieve model collapse, and a solution is to replace the loss function with
improved fidelity in interpolation, Chang et al. (2020) proposed a GAN Wasserstein loss (Wei et al., 2021).
(DD-CGAN) to interpolate seismic data in the time and frequency do­
mains. After enhancing the frequency domain properties by employing 4.5.2. Seismic phase picking
Fourier transform, the proposed model generates data by utilizing Seismic phase picking, or arrival time measurement, is valuable for
ResNet as a conditional network. The reconstructed seismic data share a locating events and analyzing the source mechanism for identifying
similar amplitude in the time domain and a similar spectral distribution events. Typically, automatic phase picking involves identifying the
in the frequency domain. This implies that the reconstructed seismic arrival time of a single phase as quickly and accurately as possible, such
traces can achieve high consistency with the actual traces, thus contin­ as the first P-wave arrival (see Fig. 16a).
uously obtaining the fidelity of the interpolated seismic data. Traditional phase picking methods require human experts to observe
The challenge of utilizing GANs for seismic data is derived from the the entire waveform and provide results. However, the amount of data

Fig. 16. Deep learning application for (a) seismic phase picking, (b) classification of seismic signal and noise, (c) seismic event detection, (d) earthquake charac­
terization and (e) focal mechanisms investigation.

20
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

that are generated by the seismic community has increased dramatically PhaseNet and PickNet: its performance is immune, not only to the dis­
in the last decade. Hence, manual picking will not be able to handle tance from the epicentre, but also to noise. One challenge in training this
large-scale datasets. Due to its robust capability in handling large network is the large number of parameters involved, implying that it is
amounts of data, deep learning is a suitable alternative for handling computationally complex.
massive data and realizing accurate identification. Among many models, Sophisticated architectures and adequate labeled datasets achieve
the CNN model is preferred due to its significant advantages in pattern more accurate phase picking. However, appropriate datasets and
recognition and its ability to maintain invariance under distortion (e.g., adequate computational resources are invariably rare, which pose a
translation and scaling). major challenge in the current application of deep learning models for
When introducing CNN-based models, the picking problem can be seismic phase picking. That is, how to attenuate the dependence of deep
reformulated as a segmentation problem. The reason is that the arrival learning models on large labeled datasets and large amounts of
time points are considered seismic phase edges in a one-dimensional computational cost, whilst ensuring satisfactory accuracy?
space and, therefore, the picking problem shares numerous features In a processing framework called CNN-based Phase-Identification
with the edge detection problem. Classifier (CPIC), Zhu et al. (2019a) demonstrated how a simple CNN for
This idea is the inspiration for Zhu and Beroza (2019) to develop a fast phase picking in aftershocks could be trained by employing a small
PhaseNet model based on the U-Net architecture. Following substantial training set. After training a multi-layer classic CNN model, they itera­
experiments, it has been demonstrated that seismic waveform data are tively applied the model through overlapping windows of continuous
compressed to the deepest level by U-Net, and a separation of P-wave, waveforms, to perform detection and picking tasks. It has been
S-wave and noise features is achieved in the condensed neural network, demonstrated that CPIC guarantees highly accurate picking results,
which allows PhaseNet to achieve accurate and objective picking of the whilst also guaranteeing high efficiency for processing and high appli­
arriving P-wave and S-wave. The advantage is superior to human ex­ cability for deployment. Furthermore, it is resistant to interference from
perts. Furthermore, PhaseNet can perform accurate picking in scenarios noise.
with strong low-frequency background noise. Limited by the epicentre To reduce requirements for the amount of training data, another
distance in the seismic training dataset being less than 200 km, PhaseNet solution is transfer learning. For example, Chai et al. (2020) directly
can only be applied to accurately pick the arrival times of local earth­ applied the PhaseNet model to retrain approximately 3,600
quakes with short epicenter distances. three-component seismograms. The retrained model has been experi­
To improve the applicability of CNN-based models for long-distance mentally demonstrated to outperform the original PhaseNet model, and
seismic data, Wang et al. (2019a) designed an elegant end-to-end even human experts, in scenes with high background noise levels.
CNN-based model (PickNet), which was employed to acquire the first On the other hand, data augmentation can expand the space of
P-wave arrival times and S-wave arrival times of earthquakes with sampled features by increasing the size and complexity of training
epicenter distances up to 1000 km. The model is modified from the samples, which can solve the problem that the deep learning model is
VGG-16 model, inspired by the design of the rich side output residual heavily dependent on large, labeled datasets. The effectiveness of
network, with the addition of informative multi-scale and multi-level various seismic data augmentation strategies on the performances of
side outputs. Accordingly, PickNet can learn the various waveform in­ deep learning models was discussed in detail by Zhu et al. (2020). They
formation contained in the seismogram layer-by-layer, e.g., the impulse compared the performance response for the corresponding PhaseNet
peak indicating the picking time, and can refine the detection results in a model after adopting data that are processed by various data augmen­
multi-layered manner. It has been demonstrated experimentally, that tation methods. The employed augmentation strategy is particularly
this multiscale combination strategy of convolutional layers allows well suited to seismic data and includes random shift, superimposing
PickNet to achieve satisfactory picking accuracy on long-range seismo­ events, superposing noise, false-positive noise, channel dropout,
grams. Noise is a possible adverse influence on the model: PickNet resampling, and synthetic data generation.
performance degrades when the signal-to-noise ratio is less than 2. Experiments by Woollam et al. (2019) further demonstrate that with
Considering that seismic signals and noise are fundamentally data augmentation by, for example, adding varying degrees of Gaussian
different in spectral content, one solution to avoid noise interference is noise, a relatively simple CNN architecture (consisting of three con­
to separate between noise and interesting signals in the time-frequency volutional layers and three upsampling layers) is also capable of
domain (Vaezi and van der Baan, 2015). Buildding on this idea, Dokht achieving high performance on seismic phase picking tasks without a
et al. (2019) incorporated the Fourier transform into a detection large training dataset.
framework that contains a four-layer basic CNN model, exploiting the
spectral contents of phase arrivals for efficient separation of signals. The 4.5.3. Earthquake detection and localization
automatic seismic detection process in this framework is divided into Earthquake detection and localization are fundamental for most
two stages. This process may increase the amount of work. More deep quantitative seismological analyses, since the vast majority of seismo­
learning models that can automatically separate noise are desirable. logical investigations depend on knowledge associated with seismic
Recently, the continuously advancing fields of semantic segmenta­ sources or their generated elastic waves (Mosher and Audet, 2020).
tion and pose estimation have introduced more training strategies and
deep neural networks, which provide more interesting opportunities in 4.5.3.1. The use of CNN-based models. Small and moderate earthquakes
the direction of considering the seismic picking problem as a segmen­ occur more frequently than large events, and typically feature lower
tation problem. For example, Chen et al. (2018) proposed an interesting signal-to-noise arrivals, which tend to be misinterpreted as false arrivals.
DeepLab v3+ architecture where spatial pyramidal pooling captures Therefore, it is complex to detect and locate small to moderate earth­
abundant contextual information, and encoder-decoder structures can quakes by utilizing phase picking alone. Seismic detection and location,
obtain sharp borders, thereby increasing segmentation accuracy. especially for small and moderate earthquakes, typically suffer from
Inspired by the DeepLab v3+ architecture, Pardo et al. (2019) developed increased challenges.
a novel deep learning model (Cospy), which utilizes two DeepLabs as For natural resource repositories, volcanoes, plate boundaries, and
sub-networks by first creating a coarse segmentation mask for locating other areas highly susceptible to small and moderate earthquakes, the
peaks, and then culling. Finally, the results of the sub-networks are rapid provision of robust seismic detection and location can provide a
combined using Hough voting, which ensures the acquisition of valuable early warning and thus mitigate seismic risk. Existing seismic
long-distance information despite the noise entailed. Cospy incorporates detection methods are sensitive to noise, and failure occurs under even
extensive improvements in both the deep learning model and the moderate levels of seismic noise. Furthermore, these methods also have
training methods, which allow it to attain the respective advantages of

21
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

large computational demands, resulting in difficulties in deploying into Instead of a common practice, which draws inspiration from elabo­
early warning systems. rate CNN-based models in other fields, Geng and Wang (2020) exploited
As a preliminary work, Perol et al. (2018) first developed an eight a data-driven neural architecture search strategy to obtain a highly
convolutional layer classic CNN model (ConvNetQuake) for seismic resource-efficient CNN architecture for seismic data detection. By per­
detection. After classifying earthquakes into six geographic clusters by forming hundreds of thousands of random searches in a constrained
applying the K-means algorithm, ConvNetQuake is fed with a window of structure space, they obtained a compact CNN structure (Seismi­
three-channel waveform seismogram data. The model can accurately cPatchNet), a model that can accurately classify multiple seismic
label these inputs as seismic noise or events associated with these six reflection datasets simultaneously at a considerably low computational
geographic clusters. By comparison, the model outperforms other cost. Despite the substantial time and resources required for training, the
detection methods in regard to computational efficiency, and it is ideally availability of transfer learning will allow this highly lightweight and
suitable for areas with high seismic activity rates and adequate instru­ computationally efficient model to provide more practical value.
mentation, for reliable and rapid earthquake warnings. In the training
stages, ConvNetQuake requires a large catalog of localized events, and 4.5.3.2. Use of other deep learning models. Although the CNN-based
such large, labeled datasets are not easily accessible. model performs well, it is not the only available deep learning model
ConvNetQuake provides a paradigm for applying CNN to earthquake for earthquake detection and location. For example, as a general-
detection, in which data are preprocessed by segmenting the input purpose approximation, the MLP model with considerably simpler ar­
waveform into a set of fixed-sized windows and are fed into the CNN to chitecture is also available for seismic data and achieves satisfactory
classify each window into a label. Based on ConvNetQuake, Tous et al. performance.
(2020) proposed a CNN model, namely, UPC-UCV. The training para­ In an integrated approach combining a time-delay projection map­
digm of this model is similar to that of the original ConvNetQuake, ping method with an MLP model, Mosher and Audet (2020) demon­
except for improvements in the data preprocessing strategy and model strated that even a basic MLP model can achieve satisfactory
structure, for example, the utilization of undersampled negative data, an performance in seismic detection and localization tasks, after applying a
increase in the filter size (from 3 to 20) and a reduction in the con­ favorable mapping approach for processing features of seismic data. In
volutional layer (from 8 to 4), which resulted in better model perfor­ this integrated approach, the time-delayed projection mapping captures
mance for P-wave detection. These models classify seismic signals and the coherence information of the seismic wavefield and then provides
noise in the time domain. input features for the MLP model; the MLP model predicts the time
Both aforementioned models exploit the K-means algorithm to period in the dataset that encompasses the seismic event. Without the
geographically classify the seismic signals, thus allowing the clustering requirement of specifying detection threshold parameters, the method
results to serve as labels for the predictions of the CNN-based models. performs well in identifying the locations and origin times of earth­
However, the number of clustered results potentially affects the per­ quakes on a large dataset.
formance of these CNN-based models. For example, the performance of On the flip side, RNN models, which specialize in handling sequen­
ConvNetQuake decreased with experiments involving a larger number tial data, are applicable to capturing the time-domain features of seismic
of clusters (Perol et al., 2018). signals. In a task to identify seismic events and quarry blasts, Linville
Another challenge in detecting and locating small and moderate et al. (2019) employed a basic CNN model and a basic LSTM model to
earthquakes, is that the high-frequency arrival waves of these seismic learn meaningful information from spectrograms, respectively. The re­
events tend to occur within narrow intervals, which causes the signals sults demonstrate, for time-dependent signal modeling, that the LSTM
collected by the seismic network to overlap. When numerous arrivals are model is a more attractive alternative, with the ability to capture the
received, it is a struggle to distinguish which arrivals are generated from time dependence.
the identical source. Furthermore, one of the advantages of RNN-based models is that the
To address this problem, McBrearty et al. (2019) fed pairs of seismic models can be deployed for real-time seismic detection, a task that is
waveform arrivals between two stations into a four-layer classic CNN, extremely difficult for most CNN-based models that utilize the full
thus predicting whether both waveforms source from identical or seismic waveform as input. Considering this advantage, Chin et al.
different sources. The crucial idea is to leverage the advantage of (2020) developed two LSTM models for rapid and accurate real-time
CNN-based models that can directly map the input data to an arbitrary seismic detection, which were deployed in a real-time earthquake
output target by exploiting convolutional operations. For this work, early warning system. The objective of the LSTM-based seismic warning
whether two waveforms are from the same or different sources is system is to obtain the occurrence of seismic events and the duration of
considered as the binary classification target, which achieved accurate P-wave and S-wave.
classification results. Considering that CNNs are able to capture the local correlations of
The aforementioned studies revealed that a reasonably targeted spatial information in images, hybrid deep learning models combining
output increases the opportunities for deep learning models in earth­ CNNs and RNNs are promising in capturing contextual (e.g., temporal,
quake investigation. Is it then possible to establish a reasonable pre­ spatial, and logical) features from multidimensional data. Following this
diction target, which enables CNN-based models to achieve rapid and idea, Zhou et al. (2019) considered seismic signals as a combination of
accurate localization for seismic events? As an interesting preliminary nonsequential and sequential signals and achieved event detection and
work, Kriegerowski et al. (2019) exploited a three-layer classic CNN phase picking by leveraging a hybrid approach combining a CNN-based
model for regression analysis for full-waveform multichannel seismic model and an RNN-based model. First, an eight-layer vanilla CNN is
records, successfully achieving accurate detection of the Cartesian co­ used to detect seismic events from a three-component seismogram. This
ordinates (east, north, and depth) of seismic events. classified event seismogram is then fed to a two-layer bidirectional RNN
Numerous experiments have demonstrated that CNN-based deep that labels events as phase arrivals and noise, similar to handling the
learning models are excellent at learning the morphological patterns of sequential labeling problem. Although the advantages of both models
most waveforms in seismic data. However, high-dimensional sparse are maximized, in this hybrid approach, the two models are trained in
seismic reflection signals are more complex, considering wave propa­ two separate steps.
gation, frequency, amplitude and polarity orientation. In this context, A more efficient solution is to develop hybrid models in an end-to-
when employing simulated reflection signals for detection tasks, it is a end manner. During the training of an end-to-end hybrid model, all
challenge to design a high-performance CNN-based model, which components (i.e., various deep neural networks) in the model can be
should achieve high accuracy classification and low computational trained simultaneously instead of sequentially (Graves et al., 2016).
costs.

22
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

With the end-to-end solution, information loss in traditional multi-stage deep learning framework for detecting and clustering seismic signals in
earthquake detection paradigms can be reduced and the information continuous seismic data. The framework combines a deep scattering
contained in seismic data can be maximized to further improve seismic network (an interpretable CNN model) with a Gaussian mixture model,
detection performance (Zhu et al., 2021). In an earthquake detector where the deep scattering network is employed for feature extraction of
termed CNN-RNN Earthquake Detector (CRED), Mousavi et al. (2019c) seismic data and the Gaussian mixture model is employed for clustering
demonstrated the robustness of the hybrid deep learning model in the data in the latent space. The framework can effectively perform
detecting seismic events based on the spectral structure of the seismic cluster analysis and distinguish complicated seismic signals of various
signal, for example, the ability to detect small and weak events rapidly characteristics, durations, amplitudes, and frequency contents.
and accurately with multiple waveform shapes and low sensitivity to Another application of unsupervised learning is to alleviate the
background noise levels. CRED ingeniously combines RNN-based dependence of deep learning models on high-quality labeled training
models with CNN-based models in the form of residuals, which auto­ data. For this purpose, Wang et al. (2019c) performed a series of ex­
matically extract sparse features from seismograms by exploiting con­ periments by exploiting CGAN, aiming to solve two challenges: (1) to
volutional layers, and then leverage LSTM to learn the time-frequency reliably generate synthetic data by adopting labeled data that are
features of seismic signals, while the residual connections allow the limited, and (2) to employ these synthetic datasets to further hone the
model to capture higher-level features in deeper layers and reduce detection models. In the developed CGAN model, the generator employs
computational complexity. This elegant and flexible architecture does three pipelines for handling three-component waveforms, each with the
not involve significant number of parameters, which indicates that it is identical structure of a four-layer basic CNN model. Different from a
lightweight and flexible, and is therefore expected to be deployed in classic GAN, CGAN requires the input of additional seismic/non-seismic
real-time early warning systems. labels information, thus overcoming the limitations of random samples
A more promising aspect of these end-to-end models is that they also generated with Gaussian noise. The results demonstrate that the syn­
enable the simultaneous performance of multiple tasks by developing thesized high-quality waveforms (both seismic and non-seismic)
specific deep neural networks for each task. For real-world applications, augment the available training data and, hence, increase the accuracy
a deep learning model that is capable of simultaneously performing of the earthquake event detection task.
multiple tasks is considerably preferable to an independent set of deep
learning models for a particular task. This multi-task-able end-to-end 4.5.4. Earthquake characterization
model allows for efficient inference on the one hand, and the sharing of Earthquake characterization, which includes the estimation of the
informative and important features across related tasks on the other. epicenter location, source depth, and magnitude, can facilitate timely
Recent work by Mousavi et al. (2020) presented the preliminary emergency response and information dissemination in earthquake early
results of simultaneous seismic detection and phase picking by devel­ warning systems. Advances in deep learning render the rapid estimation
oping an end-to-end hybrid model (EQTransformer). The model ach­ of earthquake characterization possible without manual analysis. A
ieves not only high accuracy in phase picking comparable to manual challenge in applying deep learning models for earthquake role shaping
picking, but also high efficiency and sensitivity in detecting seismic is how to achieve rapid automated earthquake characterization by
events. EQTransformer consists of a deep encoder and three indepen­ applying short, single-station waveforms.
dent decoders, which cover the basic models (e.g., 1D convolution, As a preliminary exploration to solve this problem, inspired by
bidirectional and unidirectional LSTM). The encoder converts the ConvNetQuake (Perol et al., 2018), Lomax et al. (2019) developed a
seismic signal in the time domain into higher-level representations and novel modified model (ConvNetQuake_ingv), consisting of a 9-layer
contextual information. According to this information, the decoder basic CNN. Distinctively, the final layer is classified by employing a
classifies it into three probability sequences, which involve the presence fully connected layer of 127 neurons, where neurons represent different
of the earthquake signal, the P-wave, and the S-wave at each time point. seismic features, for example, azimuth and magnitude. The final con­
For simultaneously conducting phase picking and earthquake detection, volutional layer has only 64 neurons. The design enables the last con­
two attention modules are incorporated: one for selecting phases in the volutional layer to be available as a bottleneck layer to extract
seismic signal at the local level and the other for detecting earthquakes meaningful features as a compact representation of high-dimensional
by adopting a more global view of the full waveform. data, improving the performance of the model. It was demonstrated
To capture the full potential of available information contained in that the model can detect earthquake events over a wide range of dis­
the data from multiple seismic stations, Zhu et al. (2021) developed an tances and magnitudes and to obtain detailed information on the cor­
end-to-end hybrid model called EQNet with a joint training approach. responding local distances, azimuths, depths, and magnitudes (see
The model consists of three components, which are mainly composed of Fig. 16d).
a series of 1D CNNs, corresponding to the three tasks of feature CNN-based models have demonstrated impressive performance in
extraction, phase picking, and event detection, respectively. During the earthquake characterization. However, the convolution operation is
training process, the parameters of these three components are opti­ sensitive to particular information of the input data, such as amplitude.
mized jointly, avoiding the independent hyperparameter tuning of each One solution is to replace the CNN-based models with RNN-based
stage in multi-stage earthquake detection workflows. This jointly opti­ models, such as LSTM and GRU, since the commonly used RNN vari­
mized training strategy allows the hybrid model to effectively retain ants are insensitive to these information due to the design of the gating
important information from seismograms collected at multiple stations, mechanism. To this end, Mousavi and Beroza (2020) developed a hybrid
resulting in satisfactory performance in several tasks. deep learning model consisting of CNN and LSTM layers, where CNN
The end-to-end manner is highly flexible and presents an upcoming aims only at reducing the dimensionality and extracting features, and
and interesting perspective: these aforementioned hybrid models can be LSTM aims at estimating earthquake magnitude based on amplitude
applied to more tasks associated with seismic data analysis, achieving information of signals. A series of tests have demonstrated that the
generalization across multiple tasks by learning more combinations of model can estimate seismic magnitudes directly from the raw wave­
features in seismic data. forms that are collected from single stations.
Furthermore, there have been a few explorations in unsupervised The above models do not consider the geographic locations of the
learning for seismic detection. Compared with supervised deep learning seismic stations while focusing on the analysis of single station wave­
models that favor the detection of already known signals, unsupervised forms. Indeed, when seismic networks are scarce, microearthquakes are
deep learning models are more applicable to unlabeled seismic signals, typically seldom recorded at multiple seismic stations. For such sce­
and thus search for new classes of seismic signals. narios, it would be an interesting opportunity to link data from multiple
For example, Seydoux et al. (2020) developed a novel unsupervised stations and perform earthquake characterizations.

23
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Considering the non-Euclidean properties of seismic networks, van CNN model.


den Ende and Ampuero (2020) employed a GNN to aggregate data from Similarly, Uchide (2020) utilized a basic two-layer CNN model,
multiple seismic stations. In this work, the seismic network is repre­ where the P-wave arrival times in the input were picked by human ex­
sented as a graph, where each node represents a single station and has perts, to determine whether the first motion polarity was ‘up’ or ‘down’
both temporal and spatial attributes, including a waveform time series (see Fig. 16e). To handle the small sample dataset, the work exploited a
and a geographic location. The graph has four inherent attributes, dual-stage strategy to train the deep learning model. The first step is to
including the latitude, longitude, depth, and magnitude of the source of train the model using a large dataset of medium to large earthquakes.
the earthquake. This model consists of several components. A CNN block The next step is to retrain the model using microseismic data. The results
is designed to process the waveform for each node, the first MLP block is demonstrate the effectiveness of this work in determining microseismic
designed to aggregate these processed features to obtain the corre­ focal mechanisms. However, this training strategy does not completely
sponding graph representation, and the second MLP block predicts four solve the limitations of the model in small datasets since the model
graph attributes, i.e., the earthquake characterization. Although this performance is positively correlated with the number of stations in
GNN-based approach has limitations in regard to the real-time response, different regions.
it can provide more inspiration for investigations on the rapid dissemi­ The above models are mainly concerned with exploiting the indi­
nation of earthquake information. vidual trace information. To simultaneously consider the polarity of
regular arrays, Tian et al. (2020) developed a four-layer basic CNN
4.5.5. Focal mechanisms investigation model (MTCNN) for simultaneously exploiting the polarity information
Crustal stress fields are fundamental to unraveling the generation of of neighboring receivers of a station array. The MTCNN is distinctive in
earthquakes, estimating past and present tectonic activity and exploring that the number of neurons in the final fully connected layer corre­
possible future earthquake generation. The measurement of stresses at sponds to the number of neighboring receivers, with labels -1, 0, and 1
depth throughout extended areas is a challenge, and one solution is to on each neuron indicating that the first motion of the waveform was
thoroughly investigate the focal mechanisms of earthquakes, especially negative, unknown, and positive polarity, respectively. By comparing
microearthquakes located in areas of low seismicity. Typically, for the this with a CNN model for single trace information (STCNN), the
determination of the focal mechanisms, in addition to arrival times, the MTCNN resulted in significantly less polarity prediction error.
first-motion polarities of P-waves are indispensable information (Lentas, Large datasets labeled by human experts provide robust discrimi­
2018; Stein et al., 2003), i.e., the initial upward or downward motion of nations of the first motion. It is desirable to explore additional solutions
the vertical component. in the manner of deep learning, thus escaping the dependence on labeled
With the increasingly large amount of seismic data, it is a challenge data, manual feature engineering, and large training sets. A preliminary
to automately conduct more efficient and accurate measurements for exploration in this direction was completed by Mousavi et al. (2019b),
quantities (especially first motion polarities) related to the focal mech­ they developed an AE-based model consisting of eight one-dimensional
anism. It is apparent that deep learning models are an appropriate so­ convolutional layers, to determine whether the first motion polarity is
lution. On the one hand, CNN-based models have robust pattern an upward or downward motion. In this work, the encoder identifies the
recognition capabilities for seismic data; on the other hand, deep polarity of a waveform by learning its simple patterns, after mapping the
learning models have excellent flexibility in allowing the mapping of time-series waveform to a reduced dimensional latent space. Experi­
certainly given inputs to various expected outputs. For example, deep ments on small datasets have demonstrated the effectiveness of the
learning models can achieve not only predictions of P-wave arrival model. Furthermore, the model is insensitive to noise. Accordingly, the
times, but also classifications of whether the first motion of a P-wave is model may contribute to obtaining better performance in determining
up or down. the first-motion polarity or other tasks, under challenging
Building on this idea, Ross et al. (2018) developed a deep learning signal-to-noise conditions.
framework involving two structurally similar classical CNN models, to
achieve first motion polarity classification. The framework consists of 4.6. Volcano activities detection
two steps. The first CNN model predicts the accurate arrival time of each
P-wave and applies it as a label. By using the labeled samples, the second Volcanic unrest is known to trigger a variety of secondary hazards
CNN model, which contains a softmax activation function as the output that threaten local populations, economies, and infrastructure and even
layer, classifies the first motion polarity into one of three categories (up, affect global air traffic and climate change. Around the world, 800
down, or unknown). The performance of this method is comparable or million people live within 100 km of volcanoes (Loughlin et al., 2015).
even better than that of human experts. For the reduction of severe devastation that is caused by volcanic
As a preliminary work, the performance of the framework is com­ eruptions, the identification and tracking of volcanic activity are reliable
parable or even better than that of human experts. However, challenges approaches that have proven effective (Auker et al., 2013; Mei et al.,
remain. Consider that the CNN model used for phase picking is only 2013).
responsible for regression and cannot present whether the phase existent Additional details are presented in Table 8 (see the table in the
or not, which may further affect the classification performance of the appendix).
second CNN model. Furthermore, their training data come from more
than 2.5 million seismic waveform data painstakingly handpicked and 4.6.1. Classification of volcanic seismic events
labeled over nearly two decades in southern California, which is difficult Due to the relatively low cost and availability of real-time data with a
for other researchers to obtain. high temporal resolution, monitoring systems are deployed at Holocene
As an improved solution, to focus on the automatic determination of volcanoes around the world for the monitoring of volcanic seismic
P-wave first motion polarity, Hara et al. (2019) abandoned the predic­ events, which generate an increasing number of signals in the form of
tion of P-wave arrival times by utilizing CNNs, and substituted the continuous data streams (Lapins et al., 2020; Soubestre et al., 2018).
P-wave arrival results determined by human experts. They developed a The challenge in discriminating volcanic seismic events, is that a
7-layer classic CNN model and demonstrated the high accuracy of the large number of factors (e.g., soil characteristics and station site effects)
model in determining P-wave first motion polarity, by using waveform can have an impact on the monitored signal by altering the phase,
data observed in multiple regions. This work indicates that CNN models amplifying, or attenuating amplitude. These signals can typically be
can be released from region dependence, enabling higher availability. utilized to monitor and identify volcanic activity. Therefore, it is
Considering the subjectivity of human experts when performing phase imperative that effective pattern recognition methods be employed to
picking, certain human mistakes can still affect the performance of this analyze signals associated with volcanic activity.

24
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Spectrograms are a common type of data in seismology. Applications atmospheric artifacts and deformation signals, Anantrasirichai et al.
in seismic analysis have demonstrated that deep learning models, (2019) further employed synthetic interferograms (e.g., deformation,
especially CNN-based models, show superior performance in pattern stratified atmosphere, and turbulent atmosphere) to augment deformed
recognition, can extract the features of signals in spectrograms, and samples. The results of fine-tuning the pretrained AlexNet model
perform accurate classification. As an extension of a similar scenario, it revealed that, compared with the model trained by using real in­
is promising to introduce deep learning models into the classification of terferograms alone, the performance of the model trained by using
volcanic seismic events. synthetic interferograms was improved. This work demonstrates that
Building on this idea, Curilem et al. (2018) converted spectrograms data augmentation can reduce false-positives caused by stratified at­
to 20 × 20 pixel RGB images as input and adopted a simple CNN model mospheric effects to some extent.
with two convolutional layers to effectively classify various types of Another alternative solution is to improve the deep learning model,
seismic events, including volcano-tectonic events, long-period events, enabling automatic denoising, and further eliminate stratified atmo­
tremor events, and tectonic events. The proposed CNN model captures spheric effects in InSAR products. Based on the U-Net model, Sun et al.
the differences between the two-dimensional local structures (band­ (2020) developed an end-to-end model for detecting volcanic surface
width and amplitude) that are formed from the time-frequency depen­ deformation directly from interferograms and undisturbed by atmo­
dence of each event type, and it can even identify spectrograms that spheric noise. The model has improved the capability of localization and
contain noise. extracting features, by substituting the combination of upsampling and
To explore the applicability of more deep learning models in the convolution in the original U-Net model with deconvolution. It was
classification of volcanic seismic events, Canário et al. (2020) compared demonstrated that the developed U-Net-like end-to-end model can
the performances of three popular deep learning models (CNN, LSTM, directly detect volcanic surface deformation by exploiting interfero­
and MLP) in the classification of seismic signals. Furthermore, as an grams and is immune to noise interference.
improvement, they designed a novel CNN model that consists of a series
of 1D convolutions and termed it SeismicNet. SeismicNet directly ex­ 5. Challenges and opportunities
ploits the original signal as input without converting it to images.
Eliminating the requirement of an explicit signal preprocessing step In the following sections, we will discuss the challenges and oppor­
accelerates the classification of the four types of volcanic events. Indeed, tunities in the applications of deep learning models in geological hazard
expeditious classification of the collected volcanic seismic signals is analysis, where large amounts of data that are collected from multiple
essential in eruption crisis scenarios. heterogeneous sources are employed.
Similar to the dilemma experienced by deep learning models applied
to other geohazard-related tasks, large labeled datasets for model 5.1. Challenges
training are also rare in volcanic seismic event classification tasks.
Similarly, the most promising solution is to introduce transfer learning. 5.1.1. The burden of training data collection
To this end, Titos et al. (2020) attempted to extract high-level features of Deep learning approaches are driven by enormous amounts of data,
the spectrogram by adopting a pre-trained classic CNN model (LeNet) and their effectiveness depends on the availability of training data. In
under a transfer learning-based strategy. This approach effectively re­ other words, deficiencies in the quantity or quality of the training data
alizes the classification of seismic volcano signals and reduces the could strongly affect the performance of the model. However, in many
computational cost that is required to develop a new model from cases, satisfactory training data are inevitably in scarce supply.
scratch. Specifically, challenges in the quantity of training data are
commonly associated with the expense of data collection, where pro­
4.6.2. Detection of volcanic deformation hibitive costs limit the coverage of monitoring devices that can collect
Another promising opportunity for relieving the detection of volca­ in-situ information, or prevent high-resolution remote sensing data from
nic activity from the unavailability of data is the detection of volcanic being available. Challenges in training data quality are predominantly
deformation by remote sensing techniques. Satellite imagery data with associated with noise and missing values. For example, in seismic re­
large coverage raise exciting prospects, especially for volcano activity cords, noise is present in the entire signal bandwidth; in high-resolution
detection in remote areas where in situ monitoring is not available remote sensing images, deficiencies in the spectral domain typically
(Loughlin et al., 2015). The most intriguing data come from InSAR, cause salt-and-pepper noise. As mentioned in the above section, the
which produces wrapped interferograms containing multiple fringes, challenges associated with noise can typically be addressed by im­
making it a desirable input for deep learning models that are employed provements in deep learning models.
to detect volcanic deformation, especially for CNN-based models. A Additionally, even high-dimensional, high-quality, large-scale data­
fringe in a wrapped interferogram, which is a high-frequency feature, is sets can be collected, and more challenges remain. First, for different
suitable for edge detection in the CNN model. geohazard-related tasks, more appropriate data enable the application
To explore the performance of CNN-based models in detecting vol­ of deep learning models to achieve more efficient and accurate perfor­
canic surface deformation by using InSAR data, based on the transfer mance. The considerations required are, how to select more appropriate
learning strategy, Anantrasirichai et al. (2018) accurately classified data in a more automatic manner? Second, most classic deep learning
interferometric fringes in wrapped interferograms by applying a models need to handle a cumbersome data cleaning step. More auto­
pre-trained CNN model (AlexNet). As a solution to the highly imbal­ matic and adaptive data cleaning by deep learning models is expected.
anced problem in the dataset, i.e., only four out of 900 volcanoes are Finally, supervised deep learning models are currently still the simplest
deformed, they used four data enhancement methods to generate more and most applicable models for most tasks related to geohazard analysis,
positive patches, thus balancing the training dataset to avoid creating a which implies that labeling is still required for large-scale collected
biased deep learning model. datasets. It is a tremendously labor-intensive and time-consuming
A major challenge to the performance of CNN-based models in process.
detecting volcanic surface deformation by employing InSAR data is the A convenient solution is to exploit publicly labeled datasets. Several
existence of atmospheric signals in interferograms. When atmospheric works have been initiated to provide the corresponding open-source
signals appear as fringes in the wrapped interferogram, CNN-based datasets, which could facilitate the development of deep learning
models tend to classify the signals as volcanic deformation, thus models for multiple geological hazard-related tasks. For example, in the
resulting in false-positive identifications. task of landslide prevention, Ji et al. (2020) contributed a large publicly
To improve the ability of CNN-based models to distinguish between available landslide dataset, including landslide/non-landslide images,

25
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

files with marked landslide boundaries, and relevant DEM data. The landslide inventory that was served as input into sub-images of the
dataset is available at http://study.rsgis.whu.edu.cn/pages/download/. corresponding dimensions required for different CNN structures, the
For the earthquake analysis task, Mousavi et al. (2019a) contributed a prediction results obtained are different. As demonstrated by Wang et al.
large-scale, high-quality dataset recorded by seismic instruments around (2020a), they converted each one-dimensional factor vector into a
the globe, including local earthquake waveforms and seismic noise two-dimensional matrix input to the basic 2D CNN model through an
waveforms without earthquake signals. The dataset can be accessed ingenious method, and the model captured more valuable features from
through https://github.com/smousavi05/STEAD. Indeed, these pub­ that data representation, resulting in the most accurate landslide
licly available datasets can also be utilized as benchmarks, thus estab­ sensitivity maps. Similarly, seismograms from multiple stations are
lishing an equitable assessment environment, for the challenges posed considered image data, sequence data, and even graph-structured data
by comparing different deep learning models. that integrate station information.
However, this kind of public dataset is extremely uncommon. While Furthermore, it is desirable to obtain good representations through
calling for more similarly standard large, labeled datasets, more solu­ improvements in deep learning models, which in turn improve the
tions to increasing the diversity and volume of training data are desired. performance of the models. Several good representation properties
Data augmentation is a reasonable alternative. Low-level data include smoothness, temporal and spatial consistency, sparsity, and
augmentation techniques involve operating on the data itself, including natural clustering, among others (Bengio et al., 2013). For example, in
flipping, panning, rotating, and adding noise to images Zhang et al. earthquake analysis, the elaborate model construction developed by
(2021). As mentioned in the above section, several works have been Mousavi et al. (2019c), allows for a sparser representation from the
conducted to explore the effectiveness of different data augmentations. seismic signal.
For example, for data augmentation of seismic signals, Curilem et al. Ideally, most scenarios would achieve more satisfactory performance
(2018) utilized methods involving amplitude deformation and fre­ by utilizing data from a single source in one format. Indeed, most ap­
quency deformation, adding random standard deviation as noise to the plications of deep learning models related to geohazard analysis involve
signal, and horizontal shifts. Zhou et al. (2019) utilized methods data that are heterogeneous from multiple sources. For example, in
involving the addition of different levels of Gaussian noise. More landslide detection, a model may require information from optical im­
low-level data augmentation methods are pending comparison and ages, DEM data, and various landslide inventory maps. The challenge
analysis. remains to fuse two or more relevant data involved in a certain geo­
Correspondingly, another potential solution is the use of high-level hazard task to enhance the performance of the deep learning model.
data augmentation methods, i.e., data synthesis based on generative Typically, there are three different levels for fusion: data (e.g., data
models, especially GANs. This type of data augmentation method was transformation to harmonize data sources), features (e.g., the fusion of
initially applied in earthquake analysis. Considering its advantageous data representations by leveraging deep learning models), and decision
performance, more exploration is warranted. Furthermore, such data making (e.g., fusion when decision making). Considering that deep
augmentation methods are desirable in other geohazard related tasks. learning models are inherently competent in data fusion, feature fusion
For example, can GAN models be applied to achieve data augmentation, is an efficient alternative. In this regard, Baltrusaitis et al. (2019)
to achieve the expected performance in the detection of hazard areas (e. reviewed how to achieve multi-source data fusion, by developing deep
g., landslides), when satellite image data are missing due to cloud cover learning models that can learn multiple data representations. On the
or sparse ground stations? other hand, Salcedo Sanz et al. (2020) reviewed fusion associated with
On the other hand, transfer learning, a typical solution in deep Earth observation data. These fusion methods from other domains could
learning for limited labeled datasets, has been applied in deep learning be regarded as inspiration, and thus extended to applications in
related to various geohazard analyses. Corresponding implementations geohazard-related analysis.
are predominantly in detection tasks involving the utilization of remote
sensing imagery, e.g., landslide and avalanche detection, where pre- 5.1.3. Interpreting the deep learning models
trained models are obtained by training mainly on ImageNet datasets For deep learning models, the most critical challenge is interpret­
(Sinha et al., 2019a; Lu et al., 2020). These works have demonstrated the ability, meaning that it is difficult to explain how these outstandingly
applicability of transfer learning techniques, including pre-training and performing models achieve these results. This lack of interpretability
fine-tuning, in addressing the absence of geohazard-related available undermines the applicability of deep learning models, even in situations
datasets. where it outperforms human experts (Reichstein et al., 2019). A
The dependence of deep learning models on large labeled datasets convenient and expedient solution is to adapt a number of common
can also be alleviated by combining deep learning models with physics- techniques from the field of computer vision. For example, differenti­
based models. For example, methods such as numerical simulations are ating the contributions of different components in a deep learning model
employed to learn the governing laws behind physical phenomena, by exploiting ablation experiments, or explaining the functionality of
which then generates the training data required by deep learning model components by visualization.
models. As attempted in the work by Yokoya et al. (2020), the dynamics Several attempts have now been made to demonstrate the effec­
of debris flow erosion and deposition were considered, and simulations tiveness of these interpretative methods from the field of computer
based on the corresponding governing equations were computed, vision, mostly by using visualization to explain the role of attention
culminating in insufficient training data for the proposed deep learning mechanisms in deep learning models. For example, for landslide
model. Many more similar attempts are expected. detection, Ji et al. (2020) applied the Grad-CAM visualization method to
analyze the function of the different attention modules, revealing that
5.1.2. Representation and fusion of multi-source data the heat map of the proposed attention module covered the actual
In deep learning, representing data into a format that a model can landslide area more accurately. For earthquake detection and phase
handle has consistently been a challenge (Baltrusaitis et al., 2019). A picking, Mousavi et al. (2020) also observed how deep learning models
data representation refers to the vector or tensor representation for an focus on different parts of the waveform at different attention levels, by
entity, which can be from an image, an audio sample, a time series visualizing the weights of attention modules.
recording, etc. Appropriately crafted inputs that represent data in a However, merely interpreting how deep learning model results are
meaningful way to obtain more high-level features, are essential to the obtained is insufficient for the increase in the availability of geohazard
performance of deep learning models for applications in geohazard analysis. Fortunately, a large number of physics-based models based on
analysis. the physical theory of systems related to geohazards have been available
For example, in landslide susceptibility assessments, converting the from the past. Theoretically, these physics-based models are directly

26
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

interpretable. Therefore, the introduction of physics-based models into consumption, for example, high-performance computing (HPC)
deep learning models, enabling data-driven and theory-driven syn­ resources.
ergies, would be a particularly promising opportunity (Reichstein et al., Several solutions that are potentially available include exploring
2019). This combination of incorporating physical implications into more efficient deep learning models and dimensionality reduction or
deep learning models is currently being applied in the field of natural fusion of data. Another promising solution is to improve the concurrency
disasters. of deep learning models. More powerful parallel and distributed
As an example, regarding tsunamis caused by earthquakes, Cheng frameworks for deep learning models are desirable, to speed up the
et al. (2020) developed a CGAN model that provides a rapid and reliable training process and adapt these scalable models to the processing and
spatiotemporal prediction for nonlinear fluid flows, where both the analysis of geohazard related data.
generator and discriminator employ a CNN model for spatiotemporal For example, partitioning a deep learning model across several GPU
feature extraction. The method simulates a tsunami with a numerical devices reduces the memory consumption on each GPU with the extra
computational model (Fluidity), where two series of waves with advantage of parallel speedup (Dean et al., 2012). For complex deep
different wave phases and wave heights are set as boundary conditions, learning models involving a large number of operational parameters,
and their processed outputs are fed into the CGAN model. This method tensor partitioning is an interesting solution that enables parallel
facilitates the elucidation of the uncertainties that are associated with speedups in different dimensions (Gao et al., 2017).
hazards (e.g., tsunamis) due to variability in factors (e.g., waves). Few studies are currently exploring scalable deep learning models for
Overall, the combination of physics-based models and deep learning geohazard related analysis, considering that the application of deep
models can not only exploit the theoretical advantages of physical learning models in the field is inherently in its infancy. More advances
modeling for interpretability but also exploit the advantages of deep are expected in the era of big data.
learning models for capturing high-level features of the data.
5.2. Opportunity
5.1.4. Reliability of deep learning models
Considering that most deep learning models applied to geohazard In the future, it is envisaged that there will be a dramatic prolifera­
analysis are expected to be deployed in disaster warning systems, reli­ tion of applications of deep learning models in geological hazard anal­
ability is a prerequisite to ensure the applicability of these models in ysis. As available data continue to accumulate and hardware devices
real-world applications. A critical challenge is how to enable trusted become more powerful, deep learning methods are constantly being
results from deep learning models, thereby increasing their reliability. evolved, and novel and interesting applications come along for the ride.
Enhancing interpretability is a preliminary solution, but it cannot solve Two interesting and exciting opportunities are recommended herewith,
the underlying problem; improvements in the genericization and involving the following emerging concepts: knowledge graphs and
repeatability of models are still required. digital twin. The former allows the construction of knowledge bases
Improving the generalization of a model involves the question, is the and inference engines for geological hazard-related analysis involving
model predictive of unknown data? For example, a unique seismic deep learning, while the latter facilitates a broader environment for the
source or a landslide map or a seismogram from a region that has not deployment of deep learning models.
been utilized for training. Most of the aforementioned work has On the one hand, knowledge graphs represent the integration of
demonstrated that both model improvements and data enhancements human knowledge in a structured form, contributing to the learning and
can increase the generalization of deep learning models, and that the inference capabilities of deep learning models, enabling the solutions of
high generalization of these models can be validated by using geohazard complex tasks (Nickel et al., 2016; Wang et al., 2017; Ji et al., 2021).
related data from different regions (Zhu et al., 2020). Specifically, it is a simplified view that represents entities, concepts, and
Another important concern that affects the reliability of deep the relationships between them in certain domains. The major advan­
learning models is repeatability. However, deep learning models applied tages of knowledge graphs are that knowledge graphs: (1) improve the
to different geohazard-related tasks often struggle to replicate the results flexibility of deep learning models, where related knowledge can be
of similar tasks, due to an absence of consistency in the experimental amended and restructured depending on applicative requirements, (2)
environment and a standardized benchmark dataset. Several potential enable semantic inference and retrieval by learning knowledge embed­
solutions include (1) performing training by employing benchmark ding representations, and (3) achieve knowledge sharing, thus ensuring
datasets, (2) providing the open-source code and training models, (3) repeatability and scalability of knowledge. A detailed description of
adequately providing all hyperparameters used during training, and (4) knowledge graphs can be found in the work of Ji et al. (2021).
presenting the statistical significance from the results. Recently, knowledge graphs have been extended to and are exten­
To further increase the reliability of deep learning models, a more sively applied in geoscience. For example, multi-source spatio-temporal
interesting perspective is synergy with physics-based models. For data from Earth observations are integrated with maps, texts and other
example, cross-correction of results with a reliable physics-based model, knowledge from the geoscience literature, to construct a comprehensive
or the design of physical regularization constraints into the loss function dynamic geoscience-related knowledge graph (Zhou et al., 2021).
of the model (Willcox et al., 2021). Similar data sources and knowledge contexts will inspire significant
initiatives for the integration of knowledge graphs, deep learning, and
5.1.5. Scalability for deep learning models geological hazard analysis. For geohazard analysis, an interesting
An important challenge that extends the applicability of deep knowledge graph could include (1) establishing relationships between
learning applications for geological hazard analysis is the scalability of hazard-related factors or different hazards (entities), (2) understanding
the models, which refers to the capability of the model to efficiently geological hazard evolutionary features, and (3) reasoning and pre­
handle increasing workloads (Salcedo Sanz et al., 2020). dicting by leveraging deep learning models.
Typical deep learning models involve large amounts of data, for On the other hand, a digital twin, especially a digital twin of Earth,
example, in earthquake analysis, where common training datasets refers to an information system where a digital replication of the dy­
include millions of samples. For geological hazard detection tasks, the namic Earth system is achieved, and this replication remains under the
high spectral, spatial, and temporal dimensionalities of high-resolution conditions of real-world observations and physical laws (Bauer et al.,
satellite imagery pose computational challenges for the application of 2021). More specifically, it is an evolving digital world, and by utilizing
deep learning models in geological hazards. The massive training data massive real-time measurements available from real-world sensors,
samples also multiply the complexity of deep learning models, which digital twins can not only portray the historical and current state of the
subsequently require a larger computational burden and higher memory Earth system, but also model and simulate hypothetical scenarios

27
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

(various phenomena on Earth) without influencing the real physical Appendix A. Supplementary data
world (Saddik et al., 2021).
A digital twin is closely connected to the ever-advancing technolo­ Supplementary data to this article can be found online at, https://
gies of the Internet of Things (IoT) and big data (Jing et al., 2012; Mei doi.org/10.1016/j.earscirev.2021.103858.
et al., 2020; Zhang et al., 2018; Hong et al., 2017). As IoT systems
become ubiquitous, they can contribute in a timely manner to the gen­ References
eration of higher precision, high-resolution observations, thus enabling
the digital twin to continuously simulate more realistic Earth phenom­ Agapiou, A., 2017. Remote sensing heritage in a petabyte-scale: satellite data and
heritage earth engine© applications. Int. J. Digit. Earth 10, 85–102. https://doi.org/
ena in real-time. This exciting innovation raises considerable opportu­ 10.1080/17538947.2016.1250829.
nities for the synergistic combination of physics-based models and Allen, R., Melgar, D., 2019. Earthquake early warning: advances, scientific challenges,
data-driven deep learning in geological hazard analysis, thus solving a and societal needs. Annu. Rev. Earth Planet. Sci. 47, 361–388. https://doi.org/
10.1146/annurev-earth-053018-060457.
large number of current challenges. Conceptually, a digital twin could Alwon, S., 2019. Generative adversarial networks in seismic data processing. In: 2018
be envisaged where physics-based models and deep learning models SEG International Exposition and Annual Meeting, SEG 2018, pp. 1991–1995.
synergistically demonstrate the evolution of geological hazards and https://doi.org/10.1190/segam2018-2996002.1.
An, Y., Long, J., Mabu, S., 2020. A segmentation network with multiattention and its
enable the exploration of hazard chains through multiple scenario application to sar image analysis. IEEJ Trans. Electr. Electron Eng. 15, 570–576.
simulations and, most importantly, achieve reliable and interpretable https://doi.org/10.1002/tee.23090.
predictions. Anantrasirichai, N., Biggs, J., Albino, F., Bull, D., 2019. A deep learning approach to
detecting volcano deformation from satellite imagery using synthetic datasets.
Indeed, to increase the applicability of current digital twin tech­
Remote Sens. environ. 230, 1–11. https://doi.org/10.1016/j.rse.2019.04.032.
niques, an application to a small area would be sufficient. A number of Anantrasirichai, N., Biggs, J., Albino, F., Hill, P., Bull, D., 2018. Application of machine
studies from other fields can serve as inspiration. For example, Barbie learning to classification of volcanic deformation in routinely generated insar data.
et al. (2021) deployed a collaborative underwater network of ocean J. Geophys. Res. Solid Earth 123, 6592–6606. https://doi.org/10.1029/
2018JB015911.
observation systems based on a digital twin approach, which has the Ardizzone, F., Cardinali, M., Galli, M., Guzzetti, F., Reichenbach, P., 2007. Identification
potential to be applied to the early warning of tsunamis. and mapping of recent rainfall-induced landslides using elevation data collected by
airborne lidar. Nat. Hazard. Earth Syst. Sci. 7, 637–650. https://doi.org/10.5194/
nhess-7-637-2007.
6. Conclusion Arora, K., Cazenave, A., Engdahl, E.R., Kind, R., Manglik, A., Roy, S., Sain, K., Uyeda, S.,
2011. Encyclopedia of Solid Earth Geophysics. Springer Science & Business Media.
In this paper, we summarized the recent advances in the applications Auker, M., Sparks, R., Siebert, L., Crosweller, H., Ewert, J., 2013. A statistical analysis of
the global historical volcanic fatalities record. J. Appl. Volcanol. 2, 1–24. https://
of deep learning models to geological hazard analysis, along with the doi.org/10.1186/2191-5040-2-2.
future challenges and opportunities. First, we surveyed the six main Badoux, A., Andres, N., Techel, F., Hegg, C., 2016. Natural hazard fatalities in
heterogeneous data sources that are available for the application of deep switzerland from 1946 to 2015. Nat. Hazard. Earth Syst. Sci.s 16, 2747–2768.
https://doi.org/10.5194/nhess-16-2747-2016.
learning models to geological hazard analysis. Second, we introduced Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: a deep convolutional encoder-
the background knowledge of deep learning and seven common models. decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
Third, we reviewed the applications of deep learning for hazard analysis 39, 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.
Baltrusaitis, T., Ahuja, C., Morency, L.P., 2019. Multimodal machine learning: a survey
of six typical geological hazards (landslides, debris flows, rockfalls, av­
and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443. https://doi.org/
alanches, earthquakes, and volcanoes) and summarized common 10.1109/TPAMI.2018.2798607.
application paradigms. Finally, we identified open challenges and dis­ Barbie, A., Pech, N., Hasselbring, W., Flogel, S., Wenzhofer, F., Walter, M.,
cussed future directions. Shchekinova, E., Busse, M., Turk, M., Hofbauer, M., Sommer, S., 2021. Developing
an underwater network of ocean observation systems with digital twin prototypes - a
The literature survey reveals that (1) deep learning has been applied field report from the baltic sea. IEEE Internet Comput. 3, 1–7. https://doi.org/
to the aforementioned geological hazards and is primarily utilized for 10.1109/MIC.2021.3065245.
landslides and earthquakes; (2) deep learning can automatically learn Bauer, P., Stevens, B., Hazeleger, W., 2021. A digital twin of earth for the green
transition. Nat. Clim. Change 11, 80–83. https://doi.org/10.1038/s41558-021-
high-level features from various types of data and is less dependent on 00986-y.
domain knowledge; (3) the most common application is semantic seg­ Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new
mentation of image data by using CNN models to perform binary clas­ perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. https://doi.
org/10.1109/TPAMI.2013.50.
sification to determine whether a pixel point corresponds to a geological Bergen, K., Johnson, P., De Hoop, M., Beroza, G., 2019. Machine learning for data-driven
hazard; and (4) due to the complexity of deep learning and the uncer­ discovery in solid earth geoscience. Science 363, 1–10. https://doi.org/10.1126/
tainty of geological hazards, the application of deep learning in science.aau0323.
Berti, M., Simoni, A., 2010. Field evidence of pore pressure diffusion in clayey soils prone
geological hazard analysis poses numerous challenges while generating to landsliding. J. Geophys. Res. Earth Surf. 115, 1–20. https://doi.org/10.1029/
broader opportunities. We expect this survey to inspire researchers to 2009JF001463.
develop more creative and computationally practical deep learning Bianchi, F., Grahn, J., Eckerstorfer, M., Malnes, E., Vickers, H., 2021. Snow avalanche
segmentation in sar images with fully convolutional neural networks. IEEE J.
models for various geological hazard analysis.
Selected Topic Appl. Earth Observ. Remote Sens. 14, 75–82. https://doi.org/
10.1109/JSTARS.2020.3036914.
Declaration of Competing Interest Bickel, V., Aaron, J., Manconi, A., Loew, S., Mall, U., 2020a. Impacts drive lunar rockfalls
over billions of years. Nat. Commun. 11, 1–7. https://doi.org/10.1038/s41467-020-
16653-3.
The authors declare that there are no conflicts of interest. Bickel, V., Conway, S., Tesson, P.A., Manconi, A., Loew, S., Mall, U., 2020b. Deep
learning-driven detection and mapping of rockfalls on mars. IEEE J. Selected Topic
Acknowledgments Appl. Earth Observ. Remote Sens. 13, 2831–2841. https://doi.org/10.1109/
JSTARS.2020.2991588.
Bickel, V., Lanaras, C., Manconi, A., Loew, S., Mall, U., 2019. Automated detection of
This research was jointly supported by the Natural Science Foun­ lunar rockfalls using a convolutional neural network. IEEE Trans. Geosci. Remote
dation of China (Grant No. 11602235), the Fundamental Research Funds Sens. 57, 3501–3511. https://doi.org/10.1109/TGRS.2018.2885280.
Biggs, J., Wright, T., 2020. How satellite insar has grown from opportunistic science to
for China Central Universities (2652018091), and Major Program of routine monitoring over the last decade. Nat. Commun. 11, 1–4. https://doi.org/
Science and Technology of Xinjiang Production and Construction Corps 10.1038/s41467-020-17587-6.
(2020AA002). The authors would like to thank the editor and the re­ Brabb, E.E., 1991. The world landslide problem. Episodes J. Int. Geosci. 14, 52–61.
Bueno, A., Benitez, C., De Angelis, S., Diaz Moreno, A., Ibanez, J., 2020. Volcano-seismic
viewers for their comments. transfer learning and uncertainty quantification with bayesian neural networks. IEEE
Trans. Geosci. Remote Sens. 58, 892–902. https://doi.org/10.1109/
TGRS.2019.2941494.

28
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Bui, D., Tsangaratos, P., Nguyen, V.T., Liem, N., Trinh, P., 2020. Comparing the Dao, D., Jaafari, A., Bayat, M., Mafi-Gholami, D., Qi, C., Moayedi, H., Phong, T., Ly, H.B.,
prediction performance of a deep learning neural network model with conventional Le, T.T., Trinh, P., Luu, C., Quoc, N., Thanh, B., Pham, B., 2020. A spatially explicit
machine learning models in landslide susceptibility assessment. Catena 188, 1–14. deep learning neural network model for the prediction of landslide susceptibility.
https://doi.org/10.1016/j.catena.2019.104426. Catena 188, 1–13. https://doi.org/10.1016/j.catena.2019.104451.
Canário, J., Mello, R., Curilem, M., Huenupan, F., Rios, R., 2020. In-depth comparison of Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q., Mao, M., Ranzato, M.,
deep artificial neural network architectures on seismic events classification. Senior, A., Tucker, P., Yang, K., Ng, A., 2012. Large scale distributed deep networks.
J. Volcanol. Geotherm. Res. 401, 1–16. https://doi.org/10.1016/j. Advances in Neural Information Processing Systems, pp. 1223–1231.
jvolgeores.2020.106881. Derron, M.H., Jaboyedoff, M., 2010. Preface “lidar and dem techniques for landslides
Carlà, T., Raspini, F., Intrieri, E., Casagli, N., 2016. A simple method to help determine monitoring and characterization”. Nat. Hazard. Earth Syst. Sci.s 10, 1–3. https://doi.
landslide susceptibility from spaceborne insar data: the montescaglioso case study. org/10.5194/nhess-10-1877-2010.
Environ. Earth Sci. 75, 1–12. https://doi.org/10.1007/s12665-016-6308-8. Ding, A., Zhang, Q., Zhou, X., Dai, B., 2017. Automatic recognition of landslide based on
Casagli, N., Frodella, W., Morelli, S., Tofani, V., Ciampalini, A., Intrieri, E., Raspini, F., cnn and texture change detection. In: Proceedings - 2016 31st Youth Academic
Rossi, G., Tanteri, L., Lu, P., 2017. Spaceborne, uav and ground-based remote Annual Conference of Chinese Association of Automation, YAC 2016, pp. 444–448.
sensing techniques for landslide mapping, monitoring and early warning. https://doi.org/10.1109/YAC.2016.7804935.
Geoenviron. Disaster 4, 1–9. https://doi.org/10.1186/s40677-017-0073-1. Dokht, R., Kao, H., Visser, R., Smith, B., 2019. Seismic event and phase detection using
Cascini, L., Fornaro, G., Peduto, D., 2010. Advanced low- and full-resolution dinsar map time-frequency representation and convolutional neural networks. Seismol. Res.
generation for slow-moving landslide analysis at different scales. Eng. Geol. 112, Lett. 90, 481–490. https://doi.org/10.1785/0220180308.
29–42. https://doi.org/10.1016/j.enggeo.2010.01.003. Dorren, L., 2003. A review of rockfall mechanics and modelling approaches. Progress
Catani, F., 2021. Landslide detection by deep learning of non-nadiral and crowdsourced Phys. Geogr. 27, 69–87. https://doi.org/10.1191/0309133303pp359ra.
optical images. Landslides 18, 1025–1044. https://doi.org/10.1007/s10346-020- Dou, J., Yunus, A., Merghadi, A., Shirzadi, A., Nguyen, H., Hussain, Y., Avtar, R.,
01513-4. Chen, Y., Pham, B., Yamagishi, H., 2020. Different sampling strategies for predicting
Chae, B.G., Park, H.J., Catani, F., Simoni, A., Berti, M., 2017. Landslide prediction, landslide susceptibilities are deemed less consequential with deep learning. Sci.
monitoring and early warning: a concise review of state-of-the-art. Geosci. J. 21, Total Environ. 720, 1–16. https://doi.org/10.1016/j.scitotenv.2020.137320.
1033–1070. https://doi.org/10.1007/s12303-017-0034-4. Dumont, S., Sigmundsson, F., Parks, M., Drouin, V., Pedersen, G., Jónsdóttir, I.,
Chai, C., Maceira, M., Santos-Villalobos, H., Venkatakrishnan, S., Schoenball, M., Höskuldsson, A., Hooper, A., Spaans, K., Bagnardi, M., Gudmundsson, M.,
Zhu, W., Beroza, G., Thurber, C., 2020. Using a deep neural network and transfer Barsotti, S., Jónsdóttir, K., Högnadóttir, T., Magnússon, E., Hjartardóttir, R.,
learning to bridge scales for seismic phase picking. Geophys. Res. Lett. 47, 1–9. Dürig, T., Rossi, C., Oddsson, B., 2018. Integration of sar data into monitoring of the
https://doi.org/10.1029/2020GL088651. 2014-2015 holuhraun eruption, iceland: contribution of the icelandic volcanoes
Chang, D., Yang, W., Yong, X., Zhang, G., Wang, W., Li, H., Wang, Y., 2020. Seismic data supersite and the futurevolc projects. Front. Earth Sci. 6, 1–19. https://doi.org/
interpolation using dual-domain conditional generative adversarial networks. IEEE 10.3389/feart.2018.00231.
Geosci. Remote Sens. Lett. 1–5. https://doi.org/10.1109/LGRS.2020.3008478. Eckerstorfer, M., Vickers, H., Malnes, E., Grahn, J., 2019. Near-real time automatic snow
Chaussard, E., Amelung, F., Aoki, Y., 2013. Characterization of open and closed volcanic avalanche activity monitoring system using sentinel-1 sar data in norway. Remote
systems in indonesia and mexico using insar time series. J. Geophys. Res. Solid Earth Sens. 11, 1–23. https://doi.org/10.3390/rs11232863.
118, 3957–3969. https://doi.org/10.1002/jgrb.50288. Elliott, J., 2020. Earth observation for the assessment of earthquake hazard, risk and
Chaussard, E., Brgmann, R., Fattahi, H., Johnson, C., Nadeau, R., Taira, T., Johanson, I., disaster management. Surv. Geophys. 41, 1323–1354. https://doi.org/10.1007/
2015. Interseismic coupling and refined earthquake potential on the hayward- s10712-020-09606-4.
calaveras fault zone. J. Geophys. Res. Solid Earth 120, 8570–8590. https://doi.org/ Elliott, J., Walters, R., Wright, T., 2016. The role of space-based observation in
10.1002/2015JB012230. understanding and responding to active tectonics and earthquakes. Nat. Commun. 7,
Chaussard, E., Milillo, P., Brgmann, R., Perissin, D., Fielding, E., Baker, B., 2017. Remote 1–16. https://doi.org/10.1038/ncomms13844.
sensing of ground deformation for monitoring groundwater management practices: van den Ende, M., Ampuero, J.P., 2020. Automated seismic source characterization using
application to the santa clara valley during the 2012-2015 california drought. deep graph neural networks. Geophys. Res. Lett. 47, 1–17. https://doi.org/10.1029/
J. Geophys. Res. Solid Earth 122, 8566–8582. https://doi.org/10.1002/ 2020GL088690.
2017JB014676. Eraslan, G., Avsec, A., Gagneur, J., Theis, F., 2019. Deep learning: new computational
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A., 2018. Deeplab: Semantic modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403. https://doi.org/
image segmentation with deep convolutional nets, atrous convolution, and fully 10.1038/s41576-019-0122-6.
connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848. https://doi.org/ Falk, T., Mai, D., Bensch, R., Çiçek, O., Abdulkadir, A., Marrakchi, Y., Böhm, A.,
10.1109/TPAMI.2017.2699184. Deubner, J., Jäckel, Z., Seiwald, K., Dovzhenko, A., Tietz, O., Dal Bosco, C.,
Chen, Q., Wang, W., Wu, F., De, S., Wang, R., Zhang, B., Huang, X., 2019. A survey on an Walsh, S., Saltukoglu, D., Tay, T., Prinz, M., Palme, K., Simons, M., Diester, I.,
emerging area: deep learning for smart city data. IEEE Trans. Emerg. Topic Comput. Brox, T., Ronneberger, O., 2019. U-net: deep learning for cell counting, detection,
Intell. 3, 392–410. https://doi.org/10.1109/TETCI.2019.2907718. and morphometry. Nat. Methods 16, 67–70. https://doi.org/10.1038/s41592-018-
Cheng, M., Fang, F., Pain, C., Navon, I., 2020. Data-driven modelling of nonlinear spatio- 0261-2.
temporal fluid flows using a deep convolutional generative adversarial network. Fang, Z., Wang, Y., Peng, L., Hong, H., 2020. Integration of convolutional neural network
Comput. Method Appl. Mech. Eng. 365, 1–18. https://doi.org/10.1016/j. and conventional machine learning classifiers for landslide susceptibility mapping.
cma.2020.113000. Comput. Geosci. 139, 1–15. https://doi.org/10.1016/j.cageo.2020.104470.
Chin, T.L., Chen, K.Y., Chen, D.Y., Lin, D.E., 2020. Intelligent real-time earthquake Froude, M., Petley, D., 2018. Global fatal landslide occurrence from 2004 to 2016. Nat.
detection by recurrent neural networks. IEEE Trans. Geosci. Remote Sens. 58, Hazard. Earth Syst. Sci.s 18, 2161–2181. https://doi.org/10.5194/nhess-18-2161-
5440–5449. https://doi.org/10.1109/TGRS.2020.2966012. 2018.
Chung, J., Gülçehre, Çaglar, Cho, K., Bengio, Y., 2014. Empirical Evaluation of Gated Gao, M., Pu, J., Yang, X., Horowitz, M., Kozyrakis, C., 2017. Tetris: Scalable and efficient
Recurrent Neural Networks on Sequence Modeling. ArXiv preprint abs/1412.3555, neural network acceleration with 3d memory. In: International Conference on
1-9. URL: http://arxiv.org/abs/1412.3555. Architectural Support for Programming Languages and Operating Systems - ASPLOS,
Colomina, I., Molina, P., 2014. Unmanned aerial systems for photogrammetry and pp. 751–764. https://doi.org/10.1145/3037697.3037702.
remote sensing: a review. ISPRS J. Photogram. Rem. Sens. 92, 79–97. https://doi. Garthwaite, M., Miller, V., Saunders, S., Parks, M., Hu, G., Parker, A., 2019. A simplified
org/10.1016/j.isprsjprs.2014.02.013. approach to operational insar monitoring of volcano deformation in low-and middle-
Comiti, F., Marchi, L., Macconi, P., Arattano, M., Bertoldi, G., Borga, M., Brardinoni, F., income countries: case study of rabaul caldera, papua new guinea. Front. Earth Sci. 6
Cavalli, M., D’Agostino, V., Penna, D., Theule, J., 2014. A new monitoring station for https://doi.org/10.3389/feart.2018.00240.
debris flows in the european alps: first observations in the gadria basin. Nat. Hazard. Geng, Z., Wang, Y., 2020. Automated design of a convolutional neural network with
73, 1175–1198. https://doi.org/10.1007/s11069-014-1088-5. multi-scale filters for cost-efficient seismic data classification. Nat. Commun. 11,
Coussot, P., Meunier, M., 1996. Recognition, classification and mechanical description of 1–11. https://doi.org/10.1038/s41467-020-17123-6.
debris flows. Earth Sci. Rev. 40, 209–227. https://doi.org/10.1016/0012-8252(95) Ghorbanzadeh, O., Blaschke, T., 2019. Optimizing sample patches selection of cnn to
00065-8. improve the miou on landslide detection. In: GISTAM 2019 - Proceedings of the 5th
Cremen, G., Galasso, C., 2020. Earthquake early warning: recent advances and International Conference on Geographical Information Systems Theory, Applications
perspectives. Earth Sci. Rev. 205, 1–15. https://doi.org/10.1016/j. and Management, pp. 33–40. https://doi.org/10.5220/0007675300330040.
earscirev.2020.103184. Ghorbanzadeh, O., Blaschke, T., Gholamnia, K., Meena, S., Tiede, D., Aryal, J., 2019a.
Curilem, M., Canário, J., Franco, L., Rios, R., 2018. Using cnn to classify spectrograms of Evaluation of different machine learning methods and deep-learning convolutional
seismic events from llaima volcano (chile). Proc. Int. Jt. Conf. Neural Netw. 1–8. neural networks for landslide detection. Remote Sens. 11, 1–21. https://doi.org/
https://doi.org/10.1109/IJCN.N.2018.8489285. 10.3390/rs11020196.
Dai, K., Peng, J., Zhang, Q., Wang, Z., Qu, T., He, C., Li, D., Liu, J., Li, Z., Xu, Q., Ghorbanzadeh, O., Didehban, K., Rasouli, H., Kamran, K.V., Feizizadeh, B., Blaschke, T.,
Burgmann, R., Milledge, D., Tomas, R., Fan, X., Zhao, C., Liu, X., 2020. Entering the 2020. An application of sentinel-1, sentinel-2, and gnss data for landslide
era of earth observation-based landslide warning systems: a novel and exciting susceptibility mapping. ISPRS Int. J. Geo-Inf. 9, 1–31. https://doi.org/10.3390/
framework. IEEE Geosci. Remote Sens. Mag. 8, 136–153. https://doi.org/10.1109/ ijgi9100561.
MGRS.2019.2954395. Ghorbanzadeh, O., Meena, S., Blaschke, T., Aryal, J., 2019b. Uav-based slope failure
Danneels, G., Pirard, E., Havenith, H.B., 2007. Automatic landslide detection from detection using deep-learning convolutional neural networks. Remote Sens. 11,
remote sensing images using supervised classification methods. In: IntGeosci. 1–24. https://doi.org/10.3390/rs11172046.
Remote Sens. Symposium (IGARSS), pp. 3014–3017. https://doi.org/10.1109/ Giordan, D., Hayakawa, Y., Nex, F., Remondino, F., Tarolli, P., 2018. The use of remotely
IGARSS.2007.4423479. piloted aircraft systems (rpass) for natural hazards monitoring and management.

29
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Nat. Hazard. Earth Syst. Sci.s 18, 1079–1096. https://doi.org/10.5194/nhess-18- susceptibility prediction. Landslides 17, 217–229. https://doi.org/10.1007/s10346-
1079-2018. 019-01274-9.
Girshick, R., 2015. Fast r-cnn. In: Proceedings of the IEEE International Conference on Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K., 2017. Densely connected
Computer Vision, pp. 1440–1448. https://doi.org/10.1109/ICCV.2015.169. convolutional networks. In: Proceedings - 30th IEEE Conference on Computer Vision
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep Learning. vol. 1. MIT and Pattern Recognition, CVPR 2017, pp. 2261–2269. https://doi.org/10.1109/
press, Cambridge. CVPR.2017.243.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Hungr, O., Leroueil, S., Picarelli, L., 2014. The varnes classification of landslide types, an
Courville, A., Bengio, Y., 2014. Generative adversarial nets. In: Ghahramani, Z., update. Landslides 11, 167–194. https://doi.org/10.1007/s10346-013-0436-y.
Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (Eds.), Advances in Neural Im, J., Rhee, J., Jensen, J., Hodgson, M., 2007. An automated binary change detection
Information Processing Systems. Curran Associates, Inc, pp. 2672–2680. URL: model using a calibration approach. Remote Sens. Environ. 106, 89–105. https://
https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afcc doi.org/10.1016/j.rse.2006.07.019.
f3-Paper.pdf. Jaboyedoff, M., Oppikofer, T., Abellán, A., Derron, M.H., Loye, A., Metzger, R.,
Grahn, T., Jaldell, H., 2017. Assessment of data availability for the development of Pedrazzini, A., 2012. Use of lidar in landslide investigations: a review. Nat. Hazard.
landslide fatality curves. Landslides 14, 1113–1126. https://doi.org/10.1007/ 61, 5–28. https://doi.org/10.1007/s11069-010-9634-2.
s10346-016-0775-6. James, M., Robson, S., 2012. Straightforward reconstruction of 3d surfaces and
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., topography with a camera: accuracy and geoscience application. J. Geophys. Res.
Colmenarejo, S., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A., Hermann, K., Earth Surf. 117, 1–17. https://doi.org/10.1029/2011JF002289.
Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P., 2021. A survey on knowledge graphs:
Kavukcuoglu, K., Hassabis, D., 2016. Hybrid computing using a neural network with representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst.
dynamic external memory. Nature 538, 471–476. https://doi.org/10.1038/ 1–27. https://doi.org/10.1109/TNNLS.2021.3070843.
nature20101. Ji, S., Yu, D., Shen, C., Li, W., Xu, Q., 2020. Landslide detection from an open satellite
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., imagery and digital elevation model dataset using attention boosted convolutional
Cai, J., Chen, T., 2018. Recent advances in convolutional neural networks. Pattern neural networks. Landslides 17, 1337–1352. https://doi.org/10.1007/s10346-020-
Recogn. 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013. 01353-2.
Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard Jiang, H., Li, Y., Zhou, C., Hong, H., Glade, T., Yin, K., 2020. Landslide displacement
evaluation: a review of current techniques and their application in a multi-scale prediction combining lstm and svr algorithms: a case study of shengjibao landslide
study, central italy. Geomorphology 31, 181–216. https://doi.org/10.1016/S0169- from the three gorges reservoir area. Appl. Sci. (Switzerland) 10, 1–21. https://doi.
555X(99)00078-1. org/10.3390/app10217830.
Guzzetti, F., Manunta, M., Ardizzone, F., Pepe, A., Cardinali, M., Zeni, G., Jing, S.Y., Fei, W., Bao, J.F., Ting, Y., 2012. Geological disaster monitoring system based
Reichenbach, P., Lanari, R., 2009. Analysis of ground deformation detected using the on wsn and gsm dual-network integration technology. In: International Conference
sbas-dinsar technique in umbria, central italy. Pure Appl. Geophys. 166, 1425–1459. on Communication Technology Proceedings. ICCT, pp. 374–379. https://doi.org/
https://doi.org/10.1007/s00024-009-0491-4. 10.1109/ICCT.2012.6511246.
Guzzetti, F., Mondini, A., Cardinali, M., Fiorucci, F., Santangelo, M., Chang, K.T., 2012. Kaur, H., Pham, N., Fomel, S., 2020. Seismic data interpolation using cyclegan. In: SEG
Landslide inventory maps: new tools for an old problem. Earth Sci. Rev. 112, 42–66. International Exposition and Annual Meeting 2019, pp. 2202–2206. https://doi.org/
https://doi.org/10.1016/j.earscirev.2012.02.001. 10.1190/segam2019-3207424.1.
Guzzetti, F., Peruccacci, S., Rossi, M., Stark, C., 2007. Rainfall thresholds for the Keefer, D., 2002. Investigating landslides caused by earthquakes - a historical review.
initiation of landslides in central and southern europe. Meteorol. Atmos. Phys. 98, Surv. Geophys. 23, 473–510. https://doi.org/10.1023/A:1021274710840.
239–267. https://doi.org/10.1007/s00703-007-0262-7. Khan, S., He, X., Porikli, F., Bennamoun, M., 2017. Forest change detection in incomplete
Hajimoradlou, A., Roberti, G., Poole, D., 2020. Predicting landslides using locally aligned satellite images with deep neural networks. IEEE Trans. Geosci. Remote Sens. 55,
convolutional neural networks. In: IJCAI International Joint Conference on Artificial 5407–5423. https://doi.org/10.1109/TGRS.2017.2707528.
Intelligence, pp. 3342–3348. https://doi.org/10.24963/ijcai.2020/462. Kingma, D., Welling, M., 2014. Auto-encoding variational bayes. In: 2nd International
Hamilton, W., Ying, R., Leskovec, J., 2017. Inductive representation learning on large Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings,
graphs. Advances in Neural Information Processing Systems, pp. 1025–1035. pp. 1–14. URL: http://arxiv.org/abs/1312.6114.
https://doi.org/10.5555/3294771.3294869. Kriegerowski, M., Petersen, G., Vasyura-Bathke, H., Ohrnberger, M., 2019. A deep
Hara, S., Fukahata, Y., Iio, Y., 2019. P-wave first-motion polarity determination of convolutional neural network for localization of clustered earthquakes based on
waveform data in western japan using deep learning. Earth Planets Space 71, 1–11. multistation full waveforms. Seismol. Res. Lett. 90, 510–516. https://doi.org/
https://doi.org/10.1186/s40623-019-1111-x. 10.1785/0220180320.
Harp, E., Keefer, D., Sato, H., Yagi, H., 2011. Landslide inventories: the essential part of Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet classification with deep
seismic landslide hazard analyses. Eng. Geol. 122, 9–21. https://doi.org/10.1016/j. convolutional neural networks. Neural Inf. Process. Syst. 25, 1097–1105. https://
enggeo.2010.06.013. doi.org/10.1145/3065386.
He, K., Gkioxari, G., Dollar, P., Girshick, R., 2017. Mask r-cnn. In: Proceedings of the Kumar, P., Debele, S., Sahani, J., Rawat, N., Marti-Cardona, B., Alfieri, S., Basu, B.,
IEEE International Conference on Computer Vision, pp. 2980–2988. https://doi.org/ Basu, A., Bowyer, P., Charizopoulos, N., Jaakko, J., Loupis, M., Menenti, M.,
10.1109/ICCV.2017.322. Mickovski, S., Pfeiffer, J., Pilla, F., Pröll, J., Pulvirenti, B., Rutzinger, M.,
He, K., Zhang, X., Ren, S., Sun, J., 2016a. Deep residual learning for image recognition. Sannigrahi, S., Spyrou, C., Tuomenvirta, H., Vojinovic, Z., Zieher, T., 2021. An
In: Proceedings of the IEEE Computer Society Conference on Computer Vision and overview of monitoring methods for assessing the performance of nature-based
Pattern Recognition, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90. solutions against natural hazards. Earth Sci. Rev. 217, 1–26. https://doi.org/
He, K., Zhang, X., Ren, S., Sun, J., 2016b. Identity mappings in deep residual networks. 10.1016/j.earscirev.2021.103603.
In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Kuraoka, S., Nakashima, Y., Doke, R., Mannen, K., 2018. Monitoring ground deformation
Intelligence and Lecture Notes in Bioinformatics) 9908 LNCS, pp. 630–645. https:// of eruption center by ground-based interferometric synthetic aperture radar (gb-
doi.org/10.1007/978-3-319-46493-0_38. insar): a case study during the 2015 phreatic eruption of hakone volcano. Earth
Hinton, G., Osindero, S., Teh, Y.W., 2006a. A fast learning algorithm for deep belief nets. Planets Space 70, 1–9. https://doi.org/10.1186/s40623-018-0951-0.
Neural Comput. 18, 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527. Lan, H., Martin, C., Zhou, C., Lim, C., 2010. Rockfall hazard analysis using lidar and
Hinton, G., Osindero, S., Teh, Y.W., 2006b. A fast learning algorithm for deep belief nets. spatial modeling. Geomorphology 118, 213–223. https://doi.org/10.1016/j.
Neural Comput. 18, 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527. geomorph.2010.01.002.
Hinton, G., Zemel, R., 1994. Autoencoders, minimum description length and helmholtz Lapins, S., Roman, D., Rougier, J., De Angelis, S., Cashman, K., Kendall, J.M., 2020. An
free energy. Adv. Neural Information Processing Systems 6, 3–10. https://doi.org/ examination of the continuous wavelet transform for volcano-seismic spectral
10.5555/2987189.2987190. analysis. J. Volcanol. Geotherm. Res. 389 https://doi.org/10.1016/j.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9, jvolgeores.2019.106728.
1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. https://
Hong, H., Xian, G., Yan, L., Qi, L., Dong, L., Ling, Y., Jia, X., 2017. Research of the doi.org/10.1038/nature14539.
hardware architecture of the geohazards monitoring and early warning system based LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to
on the iot. In: Procedia Comput. Sci., pp. 111–116. https://doi.org/10.1016/j. document recognition. Proc. IEEE 86, 2278–2323. https://doi.org/10.1109/
procs.2017.03.065. 5.726791.
Hooper, A., Segall, P., Zebker, H., 2007. Persistent scatterer interferometric synthetic Lei, T., Zhang, Y., Lv, Z., Li, S., Liu, S., Nandi, A., 2019. Landslide inventory mapping
aperture radar for crustal deformation analysis, with application to volcán alcedo, from bitemporal images using deep convolutional neural networks. IEEE Geosci.
galápagos. J. Geophys. Res. Solid Earth 112, 1–21. https://doi.org/10.1029/ Remote Sens. Lett. 16, 982–986. https://doi.org/10.1109/LGRS.2018.2889307.
2006JB004763. Lentas, K., 2018. Towards routine determination of focal mechanisms obtained from first
Hu, X., Lu, Z., Pierson, T., Kramer, R., George, D., 2018. Combining insar and gps to motion p-wave arrivals. Geophys. J. Int. 212, 1665–1686. https://doi.org/10.1093/
determine transient movement and thickness of a seasonally active low-gradient gji/ggx503.
translational landslide. Geophys. Res. Lett. 45, 1453–1462. https://doi.org/ Li, H., Xu, Q., He, Y., Fan, X., Li, S., 2020. Modeling and predicting reservoir landslide
10.1002/2017GL076623. displacement with deep belief network and ewma control charts: a case study in
Hua, Y., Wang, X., Li, Y., Xu, P., Xia, W., 2021. Dynamic development of landslide three gorges reservoir. Landslides 17, 693–707. https://doi.org/10.1007/s10346-
susceptibility based on slope unit and deep neural networks. Landslides 18, 281–302. 019-01312-6.
https://doi.org/10.1007/s10346-020-01444-0. Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P., Benediktsson, J., 2019. Deep learning for
Huang, F., Zhang, J., Zhou, C., Wang, Y., Huang, J., Zhu, L., 2020. A deep learning hyperspectral image classification: an overview. IEEE Trans. Geosci. Remote Sens.
algorithm using a fully connected sparse autoencoder neural network for landslide 57, 6690–6709. https://doi.org/10.1109/TGRS.2019.2907932.

30
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Li, X., Ge, M., Dai, X., Ren, X., Fritsche, M., Wickert, J., Schuh, H., 2015. Accuracy and Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D., 2021.
reliability of multi-gnss real-time precise positioning: Gps, glonass, beidou, and Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach.
galileo. J. Geod. 89, 607–635. https://doi.org/10.1007/s00190-015-0802-8. Intell. 2, 1–20. https://doi.org/10.1109/TPAMI.2021.3059968.
Linville, L., Pankow, K., Draelos, T., 2019. Deep learning models augment analyst Mondini, A., Guzzetti, F., Reichenbach, P., Rossi, M., Cardinali, M., Ardizzone, F., 2011.
decisions for event discrimination. Geophys. Res. Lett. 46, 3643–3651. https://doi. Semi-automatic recognition and mapping of rainfall induced shallow landslides
org/10.1029/2018GL081119. using optical satellite images. Remote Sens. Environ. 115, 1743–1757. https://doi.
Litjens, G., Kooi, T., Bejnordi, B., Setio, A., Ciompi, F., Ghafoorian, M., van der Laak, J., org/10.1016/j.rse.2011.03.006.
van Ginneken, B., Sánchez, C., 2017. A survey on deep learning in medical image Mosher, S., Audet, P., 2020. Automatic detection and location of seismic events from
analysis. Med. Image Anal. 42, 60–88. https://doi.org/10.1016/j. time-delay projection mapping and neural network classification. J. Geophys. Res.
media.2017.07.005. Solid Earth 125, 1–18. https://doi.org/10.1029/2020JB019426.
Lomax, A., Michelini, A., Jozinović, D., 2019. An investigation of rapid earthquake Mousavi, S., Beroza, G., 2020. A machine-learning approach for earthquake magnitude
characterization using single-station waveforms and a convolutional neural network. estimation. Geophys. Res. Lett. 47, 1–23. https://doi.org/10.1029/2019GL085976.
Seismol. Res. Lett. 90, 517–529. https://doi.org/10.1785/0220180311. Mousavi, S., Ellsworth, W., Zhu, W., Chuang, L., Beroza, G., 2020. Earthquake
Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic transformer-an attentive deep-learning model for simultaneous earthquake detection
segmentation. In: Proceedings of the IEEE Computer Society Conference on and phase picking. Nat. Commun. 11, 1–12. https://doi.org/10.1038/s41467-020-
Computer Vision and Pattern Recognition, pp. 431–440. https://doi.org/10.1109/ 17591-w.
CVPR.2015.7298965. Mousavi, S., Sheng, Y., Zhu, W., Beroza, G., 2019a. Stanford earthquake dataset (stead): a
Loughlin, S., Sparks, S., Brown, S., Jenkins, S., Vye-Brown, C., 2015. Global Volcanic global data set of seismic signals for ai. IEEE Access 7, 179464–179476. https://doi.
Hazards and Risk. https://doi.org/10.1017/CBO9781316276273. org/10.1109/ACCESS.2019.2947848.
Lu, H., Ma, L., Fu, X., Liu, C., Wang, Z., Tang, M., Li, N., 2020. Landslides information Mousavi, S., Zhu, W., Ellsworth, W., Beroza, G., 2019b. Unsupervised clustering of
extraction using object-oriented image analysis paradigm based on deep learning seismic signals using deep convolutional autoencoders. IEEE Geosci. Remote Sens.
and transfer learning. Remote Sens. 12, 1–22. https://doi.org/10.3390/rs12050752. Lett. 16, 1693–1697. https://doi.org/10.1109/LGRS.2019.2909218.
Lu, P., Qin, Y., Li, Z., Mondini, A., Casagli, N., 2019. Landslide mapping from multi- Mousavi, S., Zhu, W., Sheng, Y., Beroza, G., 2019c. Cred: a deep residual network of
sensor data through improved change detection-based markov random field. Remote convolutional and recurrent units for earthquake signal detection. Sci. Rep. 9, 1–14.
Sens. Environ. 231, 1–17. https://doi.org/10.1016/j.rse.2019.111235. https://doi.org/10.1038/s41598-019-45748-1.
Luong, M.T., Pham, H., Manning, C., 2015. Effective approaches to attention-based Mufundirwa, A., Fujii, Y., Kodama, J., 2010. A new practical method for prediction of
neural machine translation. In: Conference Proceedings - EMNLP 2015: Conference geomechanical failure-time. Int. J. Rock Mech. Mining Sci. 47, 1079–1090. https://
on Empirical Methods in Natural Language Processing, pp. 1412–1421. https://doi. doi.org/10.1016/j.ijrmms.2010.07.001.
org/10.18653/v1/d15-1166. Mutlu, B., Nefeslioglu, H., Sezer, E., Ali Akcayol, M., Gokceoglu, C., 2019. An
Lv, Z., Liu, T., Kong, X., Shi, C., Benediktsson, J., 2020. Landslide inventory mapping experimental research on the use of recurrent neural networks in landslide
with bitemporal aerial remote sensing images based on the dual-path fully susceptibility mapping. ISPRS Int. J. Geo Inf. 8, 1–21. https://doi.org/10.3390/
convolutional network. IEEE J. Selected Topic Appl. Earth Observ. Remote Sens. 13, ijgi8120578.
4575–4584. https://doi.org/10.1109/JSTARS.2020.2980895. Nadim, F., Kjekstad, O., Peduzzi, P., Herold, C., Jaedicke, C., 2006. Global landslide and
Lv, Z., Shi, W., Zhang, X., Benediktsson, J., 2018. Landslide inventory mapping from avalanche hotspots. Landslides 3, 159–173. https://doi.org/10.1007/s10346-006-
bitemporal high-resolution remote sensing images using change detection and 0036-1.
multiscale segmentation. IEEE J. Selected Topic Appl. Earth Observ. Remote Sens. Nam, K., Wang, F., 2020. An extreme rainfall-induced landslide susceptibility assessment
11, 1520–1532. https://doi.org/10.1109/JSTARS.2018.2803784. using autoencoder combined with random forest in shimane prefecture, Japan.
Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G., Johnson, B., 2019. Deep learning in remote Geoenviron. Disasters 7. https://doi.org/10.1186/s40677-020-0143-7.
sensing applications: A metaanalysis and review. ISPRS J. Photogram. Rem. Sens. NASA Earth Observatory, 2014. Before and After the Sunkosi Landslide. Available online
152, 166–177. https://doi.org/10.1016/j.isprsjprs.2019.04.015. at https://earthobservatory.nasa.gov/images/84406/before-and-after-the-sunkosi-
Ma, W., Liu, Z., Kudyshev, Z., Boltasseva, A., Cai, W., Liu, Y., 2021a. Deep learning for landslide.
the design of photonic structures. Nat. Photonics 15, 77–90. https://doi.org/ Nhu, V.H., Hoang, N.D., Nguyen, H., Ngo, P., Thanh Bui, T., Hoa, P., Samui, P., Tien
10.1038/s41566-020-0685-y. Bui, D., 2020. Effectiveness assessment of keras based deep learning with different
Ma, Z., Mei, G., Prezioso, E., Zhang, Z., Xu, N., 2021b. A deep learning approach using robust optimization algorithms for shallow landslide susceptibility mapping at
graph convolutional networks for slope deformation prediction based on time-series tropical area. Catena 188, 1–13. https://doi.org/10.1016/j.catena.2020.104458.
displacement data. Neural Comput. Appl. 5, 1–17. https://doi.org/10.1007/s00521- Nichol, J., Shaker, A., Wong, M.S., 2006. Application of high-resolution stereo satellite
021-06084-6. images to detailed landslide hazard assessment. Geomorphology 76, 68–75. https://
Makhzani, A., Frey, B., 2014. k-sparse autoencoders. In: 2nd International Conference on doi.org/10.1016/j.geomorph.2005.10.001.
Learning Representations, ICLR 2014 - Conference Track Proceedings, pp. 1–9. URL: Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E., 2016. A review of relational machine
http://dblp.uni-trier.de/db/conf/iclr/iclr2014.html. learning for knowledge graphs. Proc. IEEE 104, 11–33. https://doi.org/10.1109/
Malamud, B., Turcotte, D., Guzzetti, F., Reichenbach, P., 2004. Landslide inventories and JPROC.2015.2483592.
their statistical properties. Earth Surf. Process Landf. 29, 687–711. https://doi.org/ Nikolopoulos, E., Crema, S., Marchi, L., Marra, F., Guzzetti, F., Borga, M., 2014. Impact
10.1002/esp.1064. of uncertainty in rainfall estimation on the identification of rainfall thresholds for
Mandelli, S., Lipari, V., Bestagini, P., Tubaro, S., 2019. Interpolation and Denoising of debris flow occurrence. Geomorphology 221, 286–297. https://doi.org/10.1016/j.
Seismic Data using Convolutional Neural Networks, pp. 1–17. ArXiv preprint abs/ geomorph.2014.06.015.
1901.07927, URL: http://arxiv.org/abs/1901.07927. Oliveira, D., Ferreira, R., Silva, R., Vital Brazil, E., 2018. Interpolating seismic data with
Marmanis, D., Datcu, M., Esch, T., Stilla, U., 2016. Deep learning earth observation conditional generative adversarial networks. IEEE Geosci. Remote Sens. Lett. 15,
classification using imagenet pretrained networks. IEEE Geosci. Remote Sens. Lett. 1952–1956. https://doi.org/10.1109/LGRS.2018.2866199.
13, 105–109. https://doi.org/10.1109/LGRS.2015.2499239. Pan, Y., Zhang, G., Zhang, L., 2020. A spatial-channel hierarchical deep learning network
Martha, T., Kerle, N., Jetten, V., van Westen, C., Kumar, K., 2010. Characterising for pixel-level automated crack detection. Autom. Constr. 119, 1–16. https://doi.
spectral, spatial and morphometric properties of landslides for semi-automatic org/10.1016/j.autcon.2020.103357.
detection using object-oriented methods. Geomorphology 116, 24–36. https://doi. Panigrahi, S., Nanda, A., Swarnkar, T., 2021. A survey on transfer learning. Smart Innov.
org/10.1016/j.geomorph.2009.10.004. Syst. Technol. 194, 781–789. https://doi.org/10.1007/978-981-15-5971-6_83.
Martha, T., Kerle, N., van Westen, C., Jetten, V., Vinod Kumar, K., 2012. Object-oriented Pardo, E., Garfias, C., Malpica, N., 2019. Seismic phase picking using convolutional
analysis of multi-temporal panchromatic images for creation of historical landslide networks. IEEE Trans. Geosci. Remote Sens. 57, 7086–7092. https://doi.org/
inventories. ISPRS J. Photogram. Rem. Sens. 67, 105–119. https://doi.org/10.1016/ 10.1109/TGRS.2019.2911402.
j.isprsjprs.2011.11.004. Parker, R., Densmore, A., Rosser, N., De Michele, M., Li, Y., Huang, R., Whadcoat, S.,
McBrearty, I., Delorey, A., Johnson, P., 2019. Pairwise association of seismic arrivals Petley, D., 2011. Mass wasting triggered by the 2008 wenchuan earthquake is
with convolutional neural networks. Seismol. Res. Lett. 90, 503–509. https://doi. greater than orogenic growth. Nat. Geosci. 4, 449–452. https://doi.org/10.1038/
org/10.1785/0220180326. ngeo1154.
Mei, E., Lavigne, F., Picquout, A., de Bélizal, E., Brunstein, D., Grancher, D., Sartohadi, J., Perol, T., Gharbi, M., Denolle, M., 2018. Convolutional neural network for earthquake
Cholik, N., Vidal, C., 2013. Lessons learned from the 2010 evacuations at merapi detection and location. Sci. Adv. 4, 1–8. https://doi.org/10.1126/sciadv.1700578.
volcano. J. Volcanol. Geotherm. Res. 261, 348–365. https://doi.org/10.1016/j. Pham, V., Nguyen, Q.H., Nguyen, H.D., Pham, V.M., Vu, V., Bui, Q.T., 2020.
jvolgeores.2013.03.010. Convolutional neural network - optimized moth flame algorithm for shallow
Mei, G., Xu, N., Qin, J., Wang, B., Qi, P., 2020. A survey of internet of things (iot) for landslide susceptible analysis. IEEE Access 8, 32727–32736. https://doi.org/
geohazard prevention: applications, technologies, and challenges. IEEE Internet 10.1109/ACCESS.2020.2973415.
Things J. 7, 4371–4386. https://doi.org/10.1109/JIOT.2019.2952593. Piciullo, L., Calvello, M., Cepeda, J., 2018. Territorial early warning systems for rainfall-
Meng, Q., Wang, H., He, M., Gu, J., Qi, J., Yang, L., 2020. Displacement prediction of induced landslides. Earth Sci. Rev. 179, 228–247. https://doi.org/10.1016/j.
water-induced landslides using a recurrent deep learning model. Eur. J. Environ. earscirev.2018.02.013.
Civil Eng. 06, 1–15. https://doi.org/10.1080/19648189.2020.1763847. Pike, R., 1988. The geometric signature: Quantifying landslide-terrain types from digital
Milillo, P., Fielding, E., Shulz, W., Delbridge, B., Burgmann, R., 2014. Cosmo-skymed elevation models. Math. Geol. 20, 491–511. https://doi.org/10.1007/BF00890333.
spotlight interferometry over rural areas: the slumgullion landslide in colorado, usa. Piralilou, S., Shahabi, H., Jarihani, B., Ghorbanzadeh, O., Blaschke, T., Gholamnia, K.,
IEEE J. Selected Topic Appl. Earth Observ. Remote Sens. 7, 2919–2926. https://doi. Meena, S., Aryal, J., 2019. Landslide detection using multi-scale image segmentation
org/10.1109/JSTARS.2014.2345664. and different machine learning models in the higher himalayas. Remote Sens. 11,
1–26. https://doi.org/10.3390/rs11212575.

31
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Pourghasemi, H., Pradhan, B., Gokceoglu, C., Mohammadi, M., Moradi, H., 2013. Soares, L.P., Dias, H.C., Grohmann, C.H., 2020. Landslide segmentation with u-net:
Application of weights-of-evidence and certainty factor models and their comparison evaluating different sampling methods and patch sizes, pp. 1–13. ArXiv preprint abs/
in landslide susceptibility mapping at haraz watershed, iran. Arab. J. Geosci. 6, 1412.3555, URL: https://arxiv.org/abs/2007.06672.
2351–2365. https://doi.org/10.1007/s12517-012-0532-7. Soubestre, J., Shapiro, N., Seydoux, L., de Rosny, J., Droznin, D., Droznina, S.,
Prakash, N., Manconi, A., Loew, S., 2020. Mapping landslides on eo data: performance of Senyukov, S., Gordeev, E., 2018. Network-based detection and classification of
deep learning models vs. traditional machine learning models. Remote Sens. 12, seismovolcanic tremors: example from the klyuchevskoy volcanic group in
1–24. https://doi.org/10.3390/rs12030346. kamchatka. J. Geophys. Res. Solid Earth 123, 564–582. https://doi.org/10.1002/
Prokop, A., Panholzer, H., 2009. Assessing the capability of terrestrial laser scanning for 2017JB014726.
monitoring slow moving landslides. Nat. Hazard. Earth Syst. Sci. 9, 1921–1928. Srivastava, N., Salakhutdinov, R., 2014. Multimodal learning with deep boltzmann
https://doi.org/10.5194/nhess-9-1921-2009. machines. J. Mach. Learn. Res. 15, 2949–2980. https://doi.org/10.5555/
Qi, W., Wei, M., Yang, W., Xu, C., Ma, C., 2020. Automatic mapping of landslides by the 2627435.2697059.
resu-net. Remote Sens. 12, 1–14. https://doi.org/10.3390/RS12152487. Stein, S., Wysession, M., Houston, H., 2003. Review of ‘an introduction to seismology,
Rawat, M., Joshi, V., Rawat, M., Kumar, K., 2011. Landslide movement monitoring using earthquakes, and earth structure’. Phys. Today 56, 66–68. https://doi.org/10.1063/
gps technology: a case study of bakthang landslide, gangtok, east sikkim, india. 1.1629009.
J. Dev. Agric. Econ. 3, 194–200. Stumpf, A., Kerle, N., 2011. Object-oriented mapping of landslides using random forests.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., Remote Sens. Environ. 115, 2564–2577. https://doi.org/10.1016/j.rse.2011.05.013.
Prabhat, 2019. Deep learning and process understanding for data-driven earth Sun, J., Wauthier, C., Stephens, K., Gervais, M., Cervone, G., La Femina, P., Higgins, M.,
system science. Nature 566, 195–204. https://doi.org/10.1038/s41586-019-0912-1. 2020. Automatic detection of volcanic surface deformation using deep learning.
Remondino, F., Barazzetti, L., Nex, F., Scaioni, M., Sarazzi, D., 2011. Uav J. Geophys. Res. Solid Earth 125, 1–17. https://doi.org/10.1029/2020JB019840.
photogrammetry for mapping and 3d modeling - current status and future Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
perspectives. In: International Archives of the Photogrammetry, Remote Sensing and Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions. In:
Spatial Information Sciences - ISPRS Archives, pp. 25–31 doi:10.5194/isprsarchives- Proceedings of the IEEE Computer Society Conference on Computer Vision and
XXXVIII-1-C22-25-2011. Pattern Recognition, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594.
Ren, S., He, K., Girshick, R., Sun, J., 2017. Faster r-cnn: towards real-time object Tang, S., Ding, Y., Zhou, H., Zhou, H., 2020. Reconstruction of sparsely sampled seismic
detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, data via residual u-net. IEEE Geosci. Remote Sens. Lett. 1–5. https://doi.org/
1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031. 10.1109/LGRS.2020.3035835.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for Techel, F., Zweifel, B., Marty, C., 2015. Schnee und lawinen in den schweizer alpen.
biomedical image segmentation. In: Lecture Notes in Computer Science (including hydrologisches jahr 2014/15. WSL Berichte 37, 1–90. https://doi.org/10.3929/ethz-
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) a-000008971.
9351, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4. Thi Ngo, P., Panahi, M., Khosravi, K., Ghorbanzadeh, O., Kariminejad, N., Cerda, A.,
Ross, Z., Meier, M.A., Hauksson, E., 2018. P wave arrival picking and first-motion Lee, S., 2021. Evaluation of deep learning algorithms for national scale landslide
polarity determination with deep learning. J. Geophys. Res. Solid Earth 123, susceptibility mapping of iran. Geosci. Front. 12, 505–519. https://doi.org/10.1016/
5120–5129. https://doi.org/10.1029/2017JB015251. j.gsf.2020.06.013.
Rossi, G., Tanteri, L., Tofani, V., Vannocci, P., Moretti, S., Casagli, N., 2018. Tian, X., Zhang, W., Zhang, X., Zhang, J., Zhang, Q., Wang, X., Guo, Q., 2020.
Multitemporal uav surveys for landslide mapping and characterization. Landslides Comparison of single-trace and multiple-trace polarity determination for surface
15, 1045–1052. https://doi.org/10.1007/s10346-018-0978-0. microseismic data using deep learning. Seismol. Res. Lett. 91, 1794–1803. https://
Rudolf-Miklau, F., Sauermoser, S., Mears, A., 2015. The Technical Avalanche Protection doi.org/10.1785/0220190353.
Handbook. https://doi.org/10.1002/9783433603840. Titos, M., Bueno, A., García, L., Benítez, C., Segura, J., 2020. Classification of isolated
Saddik, A., Laamarti, F., Alja’Afreh, M., 2021. The potential of digital twins. IEEE volcano-seismic events based on inductive transfer learning. IEEE Geosci. Remote
Instrum. Meas. Mag. 24, 36–41. https://doi.org/10.1109/EEEIC/ Sens. Lett. 17, 869–873. https://doi.org/10.1109/LGRS.2019.2931063.
ICPSEurope49358.2020.9160810. Tous, R., Alvarado, L., Otero, B., Cruz, L., Rojas, O., 2020. Deep neural networks for
Salcedo Sanz, S., Ghamisi, P., Piles, M., Werner, M., Cuadra, L., Moreno-Martínez, A., earthquake detection and source region estimation in north-central venezuela. Bull.
Izquierdo-Verdiguier, E., Muñoz Marí, J., Mosavi, A., Camps Valls, G., 2020. Seismol. Soc. Am. 110, 2519–2529. https://doi.org/10.1785/0120190172.
Machine learning information fusion in earth observation: a comprehensive review Travelletti, J., Delacourt, C., Allemand, P., Malet, J.P., Schmittbuhl, J., Toussaint, R.,
of methods, applications and data sources. Inf. Fusion 63, 256–272. https://doi.org/ Bastard, M., 2012. Correlation of multi-temporal ground-based optical images for
10.1016/j.inffus.2020.07.004. landslide monitoring: application, potential and limitations. ISPRS J. Photogram.
Sameen, M., Pradhan, B., 2019. Landslide detection using residual networks and the Rem. Sens. 70, 39–55. https://doi.org/10.1016/j.isprsjprs.2012.03.007.
fusion of spectral and topographic information. IEEE Access 7, 114363–114373. Tsoumakas, G., Katakis, I., 2007. Multi-label classification: an overview. Int. J. Data
https://doi.org/10.1109/ACCESS.2019.2935761. Warehous Mining 3, 1–13. https://doi.org/10.4018/jdwm.2007070101.
Sameen, M., Pradhan, B., Lee, S., 2020. Application of convolutional neural networks Turhan, C., Bilge, H., 2018. Recent trends in deep generative models: a review. In: UBMK
featuring bayesian optimization for landslide susceptibility assessment. Catena 186, 2018-3rd International Conference on Computer Science and Engineering,
1–13. https://doi.org/10.1016/j.catena.2019.104249. pp. 574–579. https://doi.org/10.1109/UBMK.2018.8566353.
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G., 2009. The graph Uchide, T., 2020. Focal mechanisms of small earthquakes beneath the japanese islands
neural network model. IEEE Trans. Neural Netw. 20, 61–80. https://doi.org/ based on first-motion polarities picked using deep learning. Geophys. J. Int. 223,
10.1109/TNN.2008.2005605. 1658–1671. https://doi.org/10.1093/gji/ggaa401.
Schaefer, L., Di Traglia, F., Chaussard, E., Lu, Z., Nolesini, T., Casagli, N., 2019. Ullo, S., Langenkamp, M., Oikarinen, T., Delrosso, M., Sebastianelli, A., Iccirillo, F.,
Monitoring volcano slope instability with synthetic aperture radar: a review and new Sica, S., 2019. Landslide geohazard assessment with convolutional neural networks
data from pacaya (guatemala) and stromboli (italy) volcanoes. Earth Sci. Rev. 192, using sentinel-2 imagery data. In: International Geoscience and Remote Sensing
236–257. https://doi.org/10.1016/j.earscirev.2019.03.009. Symposium (IGARSS), pp. 9646–9649. https://doi.org/10.1109/
Schmidt, R.M., 2019. Recurrent neural networks (rnns): a gentle introduction and IGARSS.2019.8898632.
overview, pp. 1–16. ArXiv preprint abs/1912.05911, URL: http://arxiv.org/abs/ Ullo, S.L., Mohan, A., Sebastianelli, A., Ahamed, S.E., Kumar, B., Dwivedi, R., Sinha, G.
1912.05911. R., 2020. A New Mask R-Cnn Based Method for Improved Landslide Detection,
Schulz, W., 2007. Landslide susceptibility revealed by lidar imagery and historical pp. 1–9. ArXiv preprint abs/2010.01499, URL: http://arxiv.org/abs/2010.01499.
records, seattle, washington. Eng. Geol. 89, 67–87. https://doi.org/10.1016/j. Underwood, S., Schultz, M., Berti, M., Gregoretti, C., Simoni, A., Mote, T., Saylor, A.,
enggeo.2006.09.019. 2016. Atmospheric circulation patterns, cloud-to-ground lightning, and locally
Seydoux, L., Balestriero, R., Poli, P., Hoop, M., Campillo, M., Baraniuk, R., 2020. intense convective rainfall associated with debris flow initiation in the dolomite alps
Clustering earthquake signals and background noises in continuous seismic data of northeastern Italy. Nat. Hazard. Earth Syst. Sci.s 16, 509–528. https://doi.org/
with unsupervised deep learning. Nat. Commun. 11, 1–12. https://doi.org/10.1038/ 10.5194/nhess-16-509-2016.
s41467-020-17841-x. United Nations Office for Disaster Risk Reduction, 2009. 2009 UNISDR Terminology on
Sharma, A., Liu, X., Yang, X., Shi, D., 2017. A patch-based convolutional neural network Disaster Risk Reduction. Technical Report. United Nations Office for Disaster Risk
for remote sensing image classification. Neural Netw. 95, 19–28. https://doi.org/ Reduction.
10.1016/j.neunet.2017.07.017. Vaezi, Y., van der Baan, M., 2015. Comparison of the sta/lta and power spectral density
Shi, W., Min, Z., Ke, H., Fang, X., Zhan, Z., Chen, S., 2020. Landslide recognition by deep methods for microseismic event detection. Geophys. J. Int. 203, 1896–1908. https://
convolutional neural network and change detection. IEEE Trans. Geosci. Remote doi.org/10.1093/gji/ggv419.
Sens. PP 1–19. https://doi.org/10.1109/TGRS.2020.3015826. Van Den Eeckhaut, M., Poesen, J., Verstraeten, G., Vanacker, V., Nyssen, J.,
Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale Moeyersons, J., van Beek, L., Vandekerckhove, L., 2007. Use of lidar-derived images
image recognition. In: 3rd International Conference on Learning Representations, for mapping old landslides under forest. Earth Surf. Process. Landf. 32, 754–769.
ICLR 2015 - Conference Track Proceedings, pp. 1–14. URL: http://arxiv.org/abs/ https://doi.org/10.1002/esp.1417.
1409.1556. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L.,
Sinha, S., Giffard-Roisin, S., Karbou, F., Deschatres, M., Karas, A., Eckert, N., Coléou, C., Polosukhin, I., 2017. Attention is All You Need, pp. 1–15. ArXiv preprint abs/
Monteleoni, C., 2019a. Can avalanche deposits be effectively detected by deep 1706.03762, URL: https://arxiv.org/abs/1706.03762.
learning on sentinel-1 satellite sar images? Clim. Informat., pp. 1–6. Veličković, P., Casanova, A., Liò, P., Cucurull, G., Romero, A., Bengio, Y., 2018. Graph
Sinha, S., Giffard-Roisin, S., Karbou, F., Deschatres, M., Karas, A., Eckert, N., attention networks. In: 6th International Conference on Learning Representations,
Monteleoni, C., 2019b. Detecting avalanche deposits using variational autoencoder ICLR 2018 - Conference Track Proceedings, pp. 1–12. https://doi.org/10.17863/
on sentinel-1 satellite imagery. In: NeurIPS 2019 Workshop: Tackling Climate CAM.48429.
Change with Machine Learning NeurIPS workshop, pp. 1–5.

32
Z. Ma and G. Mei Earth-Science Reviews 223 (2021) 103858

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., 2010. Stacked denoising with constrains. IEEE J. Selected Topic Appl. Earth Observ. Remote Sens. 12,
autoencoders: learning useful representations in a deep network with a local 5047–5060. https://doi.org/10.1109/JSTARS.2019.2951725.
denoising criterion. J. Mach. Learn. Res. 11, 3371–3408. https://doi.org/10.5555/ Yi, Y., Zhang, W., 2020. A new deep-learning-based approach for earthquake-triggered
1756006.1953039. landslide detection from singleoral rapideye satellite imagery. IEEE J. Selected Topic
Waldeland, A., Reksten, J., Salberg, A.B., 2018. Avalanche detection in sar images using Appl. Earth Observ. Remote Sens. 13, 6166–6176. https://doi.org/10.1109/
deep learning. In: International Geoscience and Remote Sensing Symposium JSTARS.2020.3028855.
(IGARSS), pp. 2386–2389. https://doi.org/10.1109/IGARSS.2018.8517536. Yi, Y., Zhang, Z., Zhang, W., Jia, H., Zhang, J., 2020. Landslide susceptibility mapping
Wang, B., Li, J., Luo, J., Wang, Y., Geng, J., 2021a. Intelligent deblending of seismic data using multiscale sampling strategy and convolutional neural network: a case study in
based on u-net and transfer learning. IEEE Trans. Geosci. Remote Sens. 1–10. jiuzhaigou region. Catena 195, 1–14. https://doi.org/10.1016/j.
https://doi.org/10.1109/TGRS.2020.3048746. catena.2020.104851.
Wang, B., Zhang, N., Lu, W., Zhang, P., Geng, J., 2018. Seismic data interpolation using Yokoya, N., Yamanoi, K., He, W., Baier, G., Adriano, B., Miura, H., Oishi, S., 2020.
deep learning based residual networks. In: 80th EAGE Conference and Exhibition Breaking limits of remote sensing by deep learning from simulated data for flood and
2018: Opportunities Presented by the Energy Transition, pp. 1–5. https://doi.org/ debris-flow mapping. IEEE Trans. Geosci. Remote Sens. PP 1–15. https://doi.org/
10.3997/2214-4609.201801394. 10.1109/TGRS.2020.3035469.
Wang, G., Jia, Q.S., Qiao, J., Bi, J., Zhou, M., 2021b. Deep learning-based model Yoon, D., Yeeh, Z., Byun, J., 2020. Seismic data reconstruction using deep bidirectional
predictive control for continuous stirred-tank reactor system. IEEE Trans. Neural long short-term memory with skip connections. IEEE Geosci. Remote Sens. Lett. 1–5.
Netw. Learn. Syst. 32, 3643–3652. https://doi.org/10.1109/TNNLS.2020.3015869. https://doi.org/10.1109/LGRS.2020.2993847.
Wang, J., Xiao, Z., Liu, C., Zhao, D., Yao, Z., 2019a. Deep learning for picking seismic Yu, B., Chen, F., Xu, C., 2020. Landslide detection based on contour-based deep learning
arrival times. J. Geophys. Res. Solid Earth 124, 6612–6624. https://doi.org/ framework in case of national scale of Nepal in 2015. Comput. Geosci. 135, 1–8.
10.1029/2019JB017536. https://doi.org/10.1016/j.cageo.2019.104388.
Wang, Q., Mao, Z., Wang, B., Guo, L., 2017. Knowledge graph embedding: a survey of Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., Tan, W., Yang, Q., Wang, J.,
approaches and applications. IEEE Trans. Knowl Data Eng. 29, 2724–2743. https:// Gao, J., Zhang, L., 2020. Deep learning in environmental remote sensing:
doi.org/10.1109/TKDE.2017.2754499. achievements and challenges. Remote Sens. Environ. 241, 1–24. https://doi.org/
Wang, Q., Yuan, Z., Du, Q., Li, X., 2019b. Getnet: a general end-to-end 2-d cnn 10.1016/j.rse.2020.111716.
framework for hyperspectral image change detection. IEEE Trans. Geosci. Remote Zanetti, M., Bruzzone, L., 2018. A theoretical framework for change detection based on a
Sens. 57, 3–13. https://doi.org/10.1109/TGRS.2018.2849692. compound multiclass statistical model of the difference image. IEEE Trans. Geosci.
Wang, S., Cao, J., Yu, P., 2020a. Deep learning for spatio-temporal data mining: a survey. Remote Sens. 56, 1129–1143. https://doi.org/10.1109/TGRS.2017.2759663.
IEEE Trans. Knowl. Data Eng. 1–21. https://doi.org/10.1109/TKDE.2020.3025580. Zhang, H., Li, Y., Jiang, Y., Wang, P., Shen, Q., Shen, C., 2019a. Hyperspectral
Wang, T., Trugman, D., Lin, Y., 2019c. Seismogen: seismic waveform synthesis using classification based on lightweight 3-d-cnn with transfer learning. IEEE Trans.
generative adversarial networks, pp. 1–31. ArXiv preprint abs/1911.03966, URL: Geosci. Remote Sens. 57, 5813–5828. https://doi.org/10.1109/
https://arxiv.org/abs/1911.03966. TGRS.2019.2902568.
Wang, Y., Fang, Z., Hong, H., 2019d. Comparison of convolutional neural networks for Zhang, P., Gong, M., Su, L., Liu, J., Li, Z., 2016. Change detection based on deep feature
landslide susceptibility mapping in yanshan county, China. Sci. Total Environ. 666, representation and mapping transformation for multi-spatial-resolution remote
975–993. https://doi.org/10.1016/j.scitotenv.2019.02.263. sensing images. ISPRS J. Photogram. Rem. Sens. 116, 24–41. https://doi.org/
Wang, Y., Fang, Z., Wang, M., Peng, L., Hong, H., 2020b. Comparative study of landslide 10.1016/j.isprsjprs.2016.02.013.
susceptibility mapping with different recurrent neural networks. Comput. Geosci. Zhang, R., Jing, X., Wu, S., Jiang, C., Mu, J., Richard Yu, F., 2021. Device-free wireless
138, 1–18. https://doi.org/10.1016/j.cageo.2020.104445. sensing for human detection: the deep learning perspective. IEEE Internet Things J.
Wang, Y., Wang, X., Jian, J., 2019e. Remote sensing landslide recognition based on 8, 2517–2539. https://doi.org/10.1109/JIOT.2020.3024234.
convolutional neural network. Math. Problems Eng. 2019, 1–12. https://doi.org/ Zhang, X., Chen, N., Chen, Z., Wu, L., Li, X., Zhang, L., Di, L., Gong, J., Li, D., 2018.
10.1155/2019/8389368. Geospatial sensor web: a cyber-physical infrastructure for geoscience research and
Wei, Q., Li, X., Song, M., 2021. De-aliased seismic data interpolation using conditional application. Earth Sci. Rev. 185, 684–703. https://doi.org/10.1016/j.
wasserstein generative adversarial networks. Comput. Geosci. 154, 1–13. https:// earscirev.2018.07.006.
doi.org/10.1016/j.cageo.2021.104801. Zhang, Y., Ge, T., Tian, W., Liou, Y.A., 2019b. Debris flow susceptibility mapping using
Willcox, K.E., Ghattas, O., Heimbach, P., 2021. The imperative of physics-based machine-learning techniques in shigatse area, china. Remote Sens. 11, 1–26. https://
modeling and inverse theory in computational science. Nat. Comput. Sci. 1, doi.org/10.3390/rs11232801.
166–168. https://doi.org/10.1038/s43588-021-00040-z. Zhou, C., Wang, H., Wang, C., Hou, Z., Zheng, Z., Shen, S., Cheng, Q., Feng, Z., Wang, X.,
Woollam, J., Rietbrock, A., Bueno, A., De Angelis, S., 2019. Convolutional neural Lv, H., et al., 2021. Prospects for the research on geoscience knowledge graph in the
network for seismic phase classification, performance demonstration over a local big data era. Sci. China Earth Sci. 1–11. https://doi.org/10.1007/s11430-020-9750-
seismic network. Seismol. Res. Lett. 90, 491–502. https://doi.org/10.1785/ 4.
0220180312. Zhou, Y., Yue, H., Zhou, S., Kong, Q., 2019. Hybrid event detection and phase-picking
Xie, P., Zhou, A., Chai, B., 2019. The application of long short-term memory(lstm) algorithm using convolutional and recurrent neural networks. Seismol. Res. Lett. 90,
method on displacement prediction of multifactor-induced landslides. IEEE Access 7, 1079–1087. https://doi.org/10.1785/0220180319.
54305–54311. https://doi.org/10.1109/ACCESS.2019.2912419. Zhu, L., Peng, Z., McClellan, J., Li, C., Yao, D., Li, Z., Fang, L., 2019a. Deep learning for
Xing, Y., Yue, J., Chen, C., 2020. Interval estimation of landslide displacement prediction seismic phase detection and picking in the aftershock zone of 2008 mw7.9 wenchuan
based on time series decomposition and long short-term memory network. IEEE earthquake. Phys. Earth Planet. Int. 293, 1–16. https://doi.org/10.1016/j.
Access 8, 3187–3196. https://doi.org/10.1109/ACCESS.2019.2961295. pepi.2019.05.004.
Xu, S., Niu, R., 2018. Displacement prediction of baijiabao landslide based on empirical Zhu, W., Beroza, G., 2019. Phasenet: a deep-neural-network-based seismic arrival-time
mode decomposition and long short-term memory neural network in three gorges picking method. Geophys. J. Int. 216, 261–273. https://doi.org/10.1093/gji/
area, China. Comput. Geosci. 111, 87–96. https://doi.org/10.1016/j. ggy423.
cageo.2017.10.013. Zhu, W., Mousavi, S., Beroza, G., 2019b. Seismic signal denoising and decomposition
Yang, B., Yin, K., Lacasse, S., Liu, Z., 2019. Time series analysis and long short-term using deep neural networks. IEEE Trans. Geosci. Remote Sens. 57, 9476–9488.
memory neural network to predict landslide displacement. Landslides 16, 677–694. https://doi.org/10.1109/TGRS.2019.2926772.
https://doi.org/10.1007/s10346-018-01127-x. Zhu, W., Mousavi, S., Beroza, G., 2020. Seismic signal augmentation to improve
Yang, D., Qin, J., Pang, Y., Huang, T., 2021. A novel double-stacked autoencoder for generalization of deep neural networks. Adv. Geophys. 61, 151–177. https://doi.
power transformers dga signals with imbalanced data structure. IEEE Trans. Ind. org/10.1016/bs.agph.2020.07.003.
Electron. 1–10. https://doi.org/10.1109/TIE.2021.3059543. Zhu, W., Tai, K.S., Mousavi, S.M., Bailis, P., Beroza, G.C., 2021. An End-To-End
Ye, C., Li, Y., Cui, P., Liang, L., Pirasteh, S., Marcato, J., Goncalves, W., Li, J., 2019. Earthquake Detection Method for Joint Phase Picking and Association using Deep
Landslide detection of hyperspectral remote sensing data based on deep learning Learning, pp. 1–16. ArXiv preprint abs/2109.09911, URL: http://arxiv.org/abs/
2109.09911.

33

You might also like