1 s2.0 S0924271619302291 Main

ISPRS Journal of Photogrammetry and Remote Sensing 158 (2019) 63–75
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing

journal homepage: www.elsevier.com/locate/isprsjprs
Multi-modal deep learning for landform recognition T

a a a,⁎ b c d,e a
Lin Du , Xiong You , Ke Li , Liqiu Meng , Gong Cheng , Liyang Xiong , Guangxia Wang
a
Zhengzhou Institute of Surveying and Mapping, Longhai Road 66, Zhengzhou 450052, China
b
Department of Cartography, Technical University of Munich, Arcisstr. 21, 80333 Munich, Germany
c
School of Automation, Northwestern Polytechnical University, Youyi West Road 127, Xi’an 710072, China
d
Key Laboratory of Virtual Geographic Environment of Ministry of Education, Nanjing Normal University, Nanjing 210023, China
e
Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
ARTICLE INFO ABSTRACT
Keywords: Automatic landform recognition is considered to be one of the most important tools for landform classification
Landform recognition and deepening our understanding of terrain morphology. This paper presents a multi-modal geomorphological
Multi-modal geomorphological data fusion data fusion framework which uses deep learning-based methods to improve the performance of landform re-
Deep learning cognition. It leverages a multi-channel geomorphological feature extraction network to generate different
Convolutional neural networks (CNN)
characteristics from multi-modal geomorphological data, such as shaded relief, DEM, and slope and then it
harvests joint features via a multi-modal geomorphological feature fusion network in order to effectively re-
present landforms. A residual learning unit is used to mine deep correlations from visual and physical modality
features to achieve the final landform representations. Finally, it employs three fully-connected layers and a
softmax classifier to generate labels for each sample data. Experimental results indicate that this multi-modal
data fusion-based algorithm obtains much better performance than conventional algorithms. The highest re-
cognition rate was 90.28%, showing a great potential for landform recognition.
1. Introduction the Earth’s surface and the rapid growth of geographic information
systems (GIS) technology.
Continuous land surfaces consist of multiple landforms each having A number of landform recognition approaches use single-modal
different visual and physical characteristics. These landforms have been data. Some of them recognize landforms by using digital elevation
shaped by various long-term influences and various intensities of en- model (DEM) or geomorphological variables, which provide informa-
vironmental processes (MacMillan and Shary, 2009). Each landform tion about landform physical attributes such as elevation, aspect, slope
bears an immense amount of information about ecological environ- and curvature, etc. For example, Prima et al. propose a supervised
ment, natural resources and natural conditions. This information is landform classification method using four geomorphological variables
crucial to earth sciences and earth-related disciplines. Automatic re- including DEM, slope, topographic openness and thematic maps (Prima
cognition as well as interpretation of landforms could advance the de- et al., 2006). Pipaud and Lehmkuhl employ support vector machine
velopment of many fields ranging from environmental research to (SVM) to delineate and classify alluvial fans from Shuttle Radar Topo-
socio-economic issues and, as a result, has attracted an increasing at- graphy Mission (SRTM) DEM and geomorphological variables (i.e.,
tention. aspect, slope and curvature) (Pipaud and Lehmkuhl, 2017). Draˇgut and
Traditionally, landform recognition and classification is based on Eisank present an object-based image analysis (OBIA) approach to
the visual interpretation of topographic maps and aerial photographs classify landforms into eight topographic classes using 1 km resolution
combined with fieldwork. These manual approaches rely on expert global SRTM DEM (Draˇgut and Eisank, 2012). The algorithms utilizing
knowledge and implicit experience of the area being studied (Argialas, visual properties such as texture, contextual and shape features (Benz
1995; Smith, 2011). They are time-consuming, labour intensive, and et al., 2004) which are extracted from terrain texture or remote sensing
costly (Smith, 2011). These manual approaches (Dikau et al., 1991; images to represent landforms have been used to conduct landform
Hammond, 2005) have increasingly been giving way to automated recognition tasks. For instance, Kai et al. use back propagation neural
analysis and understanding of landforms that use various geospatial network to recognize landforms from terrain texture (gray level co-
data provided by the increased availability of geospatial data covering occurrence matrix) (Kai et al., 2013). Frohn et al. use a machine
⁎
Corresponding author.
E-mail address: like19771223@163.com (K. Li).
https://doi.org/10.1016/j.isprsjprs.2019.09.018
Received 4 April 2019; Received in revised form 14 September 2019; Accepted 30 September 2019
Available online 09 October 2019
0924-2716/ © 2019 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
L. Du, et al. ISPRS Journal of Photogrammetry and Remote Sensing 158 (2019) 63–75
learning algorithm which can extract spectral, contextual and shape machine (RBM) to fuse different features, a multi-modal geomorpho-
characteristics from remote sensing images to recognize landforms logical feature fusion model on the basis of residual learning model to
(Frohn et al., 2011; Duro et al., 2012). construct a highly discriminative joint characteristics is proposed, (3)
Two kinds of single-modal data, which contain information about Publicly available landform recognition benchmark dataset construc-
the visual attributes and physical characteristics of landforms, respec- tion. A landform recognition benchmark dataset named as ZISM50m
tively, are widely used in landform recognition and classification (Zhao was built. The ZISM50m dataset contains 8400 samples which are ob-
et al., 2017). Generally, DEM data and geomorphological variables are tained from 8 sample regions of diverse geomorphological classes in
more suitable for hilly and mountainous areas, while the images and China. This benchmark data set and our implementation code will be
texture data achieve better results for plains and tablelands (Kai et al., made public available to promote the development of related fields.
2016). Single-modal geomorphological data based algorithms may
improve landform recognition and classification accuracy to some ex- 2. Methods
tent.
Some algorithms combine different modalities of geomorphological The proposed landform recognition algorithm framework is shown
data to improve landform recognition and classification. Zhao et al. in Fig. 1. First, given the input shaded relief, DEM and slope, a three-
(2017) use an OBIA approach to extract terraces from high resolution channel based feature extraction network is used to generate physical
DEM and images on the Loess Plateau. Based on machine learning features from DEM and slope data as well as visual features from shaded
method, they construct classification features and rules considering relief data, respectively. Second, a feature fusion network is employed
terrain, geometric and texture characteristics. Iwahashi and Pike to construct a joint representation by fusing both physical features and
(Iwahashi and Pike, 2007) employ an unsupervised nested-means al- visual features. Finally, a softmax classifier is used to output the score
gorithm for landform classification utilizing slope gradient, local con- for each class. The descriptions of each individual module of the pro-
vexity as well as surface texture. The study shows that slope and texture posed framework are as follows.
appear important in distinguishing mountains. However, the convexity
of surface is helpful to distinguish among low-relief terrain, such as 2.1. Multi-channel geomorphological feature extraction
flood plain, river terrace, and alluvial fan. Zhao et al. (2017) utilize the
Random Forest (RF) method to automatically choose and fuse features Deep neural networks can mine inherent semantic characteristics of
for landform recognition based on terrain derivative extraction and raw data in a hierarchical manner (Farabet et al., 2013; Cheng et al.,
texture derivative extraction. These above algorithms usually consist of 2018). To construct a powerful hierarchical semantic feature of land-
a feature extractor or classifier designed based on empirical evidence form, we utilize CNN to extract low-level physical and visual char-
and expert knowledge. They are unlikely to efficiently mine highly acteristics. The overall structure of geomorphological feature extraction
nonlinear relationships between different modality features because of network is illustrated in Fig. 1(a). It contains three parallel CNN
the relatively simple model (Srivastava and Salakhutdinov, 2012). channels: texture, elevation and slope. For simplification, the same
Thus, they reach the limit of their ability to effectively improve land- network structures (see Fig. 2) are used in all channels. A detailed de-
form recognition accuracy. scription on the construction of the single-modal feature extraction is
Conventional machine-learning techniques, which require careful shown in Fig. 2. The process begins with a classic convolutional neural
engineering and remarkable expert knowledge to design feature ex- network in which deep features are extracted across various network
tractors, have limited ability to deal with raw forms of natural data layers.
(LeCun et al., 2015)., However, deep learning based approaches have A typical deep CNN is usually trained in several phases for image
performed very well in mining complex structures in high-dimensional representation. The inputs and outputs for each phase are a set of arrays
features (Marmanis et al., 2015; Cheng et al., 2016, 2019; Li et al., regarded as feature maps. The CNN also possess the ability of in-
2018). They exploit multiple processing layers to learn robust, generic, formation abstraction, and the output feature map of each layer is re-
and hierarchical deep features and thus can effectively reflect essential garded as a further refinement of the input. As illustrated in Fig. 2, each
object characteristics. The advantage of deep learning is that hier- phase of the multi-channel feature extraction network consists of mul-
archical deep features are not designed by domain experts, but rather tiple layers of operations: convolutional operation, activation function
automatically learned from the data using a common learning process such as a ReLU function, and pooling operation. The geomorphological
(LeCun et al., 2015). The algorithms which can effectively extract dif- deep features consist of the texture, elevation and slope information,
ferent characteristics from multi-modal geomorphological data to con- which are extracted from pool6 layer.
struct powerful joint representations are highly desirable. Inspired by The feature maps of convolutional and pool layers are represented
the work of Ngiam et al. (2011), Bu et al. (2014), Li et al. (2018), we by Ci, Pj , respectively, where i = 1, …, 6, and j = 1, 2, 6 , so the feature
propose a novel, end-to-end, deep-learning-based landform recognition maps of pool6 layers can be obtained by:
framework. It consists of three function modules (see Fig. 1): (1) a
P6 = pool (f (W6 C5 + b6)) (1)
multi-channel geomorphological feature extraction module, (2) a multi-
modal geomorphological feature fusion module, and (3) a landform where denotes the convolutional operation; b6 and W6 are the bias
classifier. These sequentially arranged modules form a bottom-up evo- parameter and the convolutional kernel of conv6 layers, respectively,
lutionary flow process for landform recognition. and they are the learning parameters; f is an activation function, i.e.
The study aims to offer an effective solution to landform recognition ReLU function, which is obtained by:
problems by using deep learning. It investigates the convolutional
f (x ) = max (0, x ) (2)
neural networks (CNN) applicability and performance in geomorpho-
logical feature construction, and in testing the feasibility of multi-modal where x is input data. Here, a max-pooling operator (Krizhevsky et al.,
fusion strategy for landform recognition. The aims should be achieved 2012) is used to take the maximum activation within the neighborhood.
through the following objectives: (1) Multi-modal geomorphological Specially, Eq. (1) is used to construct three feature maps named Ft , Fe
feature construction. A specialized multi-channel geomorphological and Fs from shaded relief, DEM and slope data, respectively.
feature extraction network is designed to extract the physical features Overall, our geomorphological feature extraction network is a
and visual features from DEM, slope, and shaded relief data for land- multi-layer network that learns a highly discriminative feature re-
form recognition, (2) Multi-modal geomorphological feature fusion. presentation from multi-modal geomorphological data to a tensor for
Compared to previous fusion models (Ngiam et al., 2011; Bu et al., each sample area. The detailed parameter configurations are shown in
2014; Li et al., 2018) which used multiple-layer restricted Boltzmann Table 1.
64
Fig. 1. Proposed multi-modal landform recognition framework.
In Fig. 3, some feature maps of selected layers are visualized, and it Table 1
is interesting to observe that some semantic contents are activated. For The parameter configuration of geomorphological feature extraction network.
example, most filters are activated by the line shape of ridge or valley. Patch Size/Stride Feature Map Size Padding Size Channels
These shapes contained in the input data activate the feature maps at
their corresponding positions. The first and second convolutional fea- Conv1 11 × 11/4 56 × 56 2×2× 2 64
ture maps can be directly visualized and easily understood. They cap- Pool1 3 × 3/2 27 × 27 – 64
Conv2 5 × 5/1 27 × 27 2×2× 2 192
ture oriented edges and spatial structures. The subsequent layers are
Pool2 3 × 3/2 13 × 13 – 192
not easy to understand. The former layer learns low-level shape feature Conv3 3 × 3/1 13 × 13 1×1× 1 384
representation, whereas the latter layers capture high-level semantic Conv4 3 × 3/1 13 × 13 1×1× 1 256
properties. Essentially, the feature maps of the pooling layer are more Conv5 3 × 3/1 13 × 13 1×1× 1 256
abstract than the feature maps of the convolutional layer. This is be- Conv6 3 × 3/1 13 × 13 1×1× 1 128
Pool6 3 × 3/2 6× 6 – 128
cause pooling operation has the ability to learn small translation in-
variance over a small neighborhood and resistance to noise. Pooling,
which makes the upper layer cover a larger region, can generate hier-
in the residual unit. As shown in Fig. 4(a), the dimensions of V and the
archically intrinsic semantic features.
output of are not equal. So, we use function to perform a linear
projection to match the dimensions in Eq. (4). To balance the efficiency
2.2. Multi-modal geomorphological feature fusion and landform recognition and the quality, the linear projection is done by 1 × 1 convolutions.
Then, we append a convolutional layer upon the residual unit X (1)c . Fi-
To match the required feature map shape, three feature maps Ft , Fe nally, the joint geomorphological feature vector J can be obtained by:
and Fs are concatenated along the channel axis to produce a merged
feature vector V , which combines different modalities of geomorpho- J = ReLU (W8 X (1) (5)
c + b8 )
logical features. It has the following form:
V = [Ft , Fe, Fs] (3) The construction process of the joint feature is shown in Fig. 4(a).
The detailed parameter configurations of multi-modal geomorpholo-
The works using Resnet (He et al., 2016; Dai et al., 2016; Zhang et al., gical feature fusion network are presented in Table 2. The input of re-
2017) have shown to be good at mining high-level non-linear char- sidual unit is a 6 × 6 × 384-dimensional feature vector V , and our
acteristics with a few stacked residual units (He et al., 2016). So in this multi-modal feature fusion network is used to reduce the redundant
work, residual learning (He et al., 2016) is employed to further discover information and improve the discrimination ability of deep hierarchical
high-level joint geomorphological feature vector V , which expresses features, finally produces a 6 × 6 × 64-dimensional joint geomorpho-
physical and visual information of landform sufficiently. logical feature vector J .
In the multi-modal geomorphological feature fusion network (see As illustrated in Fig. 4(b), the landform recognition network in-
Fig. 4(a)), a residual unit is stacked upon the merged feature vector V cludes three fully-connected (FC) layers and a softmax classifier. The
as follows: input of the first FC layer is a 6 × 6 × 64-dimensional joint geomor-
X (1) (0) phological feature vector J and the third FC layer outputs a 6-dimen-
c = (V) + (V , c ) (4)
sional vector S which can be regarded as a score of each landform class.
where the residual function represents a combination of Convolu- After obtaining the score S, a multi-class crossentropy loss function
tion + Batch Normalization + ReLU, and c is the learnable parameter measures the deviation between the target class distribution ci as well as
Fig. 2. Multi-channel geomorphological

feature extraction network architecture
which can generate visual texture features
and physical features from shaded relief,
DEM and slope data. Ft , Fe and Fs represent
three feature maps acquired by feature ex-
traction network from shaded relief, DEM
and slope data, respectively.
65
Fig. 3. The feature map of selected layers which are generated from grey shaded relief, DEM, and slope using multi-channel geomorphological feature extraction
networks.
the predicted class distribution in the training process. The loss function Table 2
is as follows: The parameter configuration of multi-modal geomorphological feature fusion
network.
e Si
Loss = ciln Patch Size/Stride Feature Map Size Padding Size Channels
i classes j classes
e Sj (6)
Conv7 3 × 3/1 6×6 1×1×1 256
where ci = 1 if the input is labeled i, and 0 otherwise. During the test Conv8 3 × 3/1 6×6 1×1×1 128
process, an argmax function is used to predict the class. For each input, Conv9 1 × 1/1 6×6 – 64
the final classification result l is given by
e Si 1200 samples from each of the following seven data channels: DEM,
l = argmax .
i classes j classes
e Sj (7) slope, flow accumulation (FA), curvature, topographic wetness index
(TWI), grey-shaded relief (GSR), and RGB-shaded relief (RSR). The size
of each sample is 600 × 600 pixels with a spatial resolution of 50 m. In
3. Study site and data order to ensure the authenticity of data and results, all training and
testing samples do not overlap spatially with each other. The distribu-
The study area is located in central China. Fig. 5 shows the spatial tion of landforms has strong regional characteristics, thus a sample area
distribution of terrestrial geomorphology in China (Zhou et al., 2011). generally includes only one or two types of landforms. The number of
Central China contains rich variety of landform types. Eight sampling landform classes and the number of samples contained in each sample
areas, which are distributed evenly from north to south, were selected area are not the same. To create a balanced dataset, each landform
for the experiment (see Fig. 5). We adopted Zhou’s digital land geo- category includes 200 samples locations which are distributed in dif-
morphology classification scheme of 1:1,000,000 (Zhou et al., 2011) as ferent sample areas. Fig. 6 shows some examples of these seven kinds of
the landform category principle. There are 10 landform types present in data in the dataset.
China (Fig. 5). Very few marine, lacustrine, glacial, or volcanic-lava The landform class of each sample is manually determined by the
landforms exist there. So six typical landform categories including experts based on the Geomorphological Atlas of the People’s Republic
aeolian, arid, loess, karst, periglacial, and fluvial were selected. of China (Li Jijun and Chenghu, 2009), the topographic map, and the
We introduce a new geomorphological dataset named ZISM50m relevant geomorphological variables. Fig. 7(a) shows an example of
which contains six different landform types. In total, the ZISM50m Geomorphological Atlas of the People’s Republic of China. Note that
dataset contains 8,400 samples which are obtained from eight sample different colors represent different classes of landforms in this atlas, and
regions of diverse geomorphological classes in China (see Fig. 5), with the red rectangle in Fig. 7(a) represents a sample area. Fig. 7(b) and (c)
Fig. 4. Multi-modal geomorphological feature fusion and landform recognition network architecture.
66
Fig. 5. Geomorphology in China (Zhou et al., 2011), and eight sample areas of different landform types indicated by red rectangles. Specially, three sample regions
(indicated by yellow rectangles) containing 220 sample data were not involved in training but were all use.d as test data.
shows the topographic map and grey shaded relief corresponding to the maps with a 50 m contour interval. Slope data extracted from DEM is an
sample area, respectively. We can see that the atlas only identifies the efficient geomorphological variable that quantitatively describes land-
geomorphological class of a region and describes the spatial scope of forms. It provides rich physical characteristics of landforms and is sig-
each landform, but it does not visualize the geomorphological features nificantly related to topographic relief and landscape development
of the region. So, we employ the atlas to determine the location of each stages (Li et al., 2016). Slope is calculated for each grid cell with 3 × 3
sample, and then use the atlas, topographic maps (see Fig. 7(b)) and matrix of neighboring elevations by the algorithm of Horn (1981).
geomorphological variables (see Fig. 7(c)) of the same region to accu- Curvature data entails the convexity measures of landforms: while
rately determine the landform class by experts. convexity represents peaks, concavity represents sinks and valleys
DEM consists of intricate patterns of elevation values which describe (Moore et al., 1991). Topographic wetness indicates the potential to be
the surface morphological characteristics (Jasiewicz and Stepinski, water-saturated and is calculated from the slope gradient as well as the
2013). It is widely applied in landform classification. DEM with a 50 m upslope drainage area (Moore et al., 1991). Flow Accumulation raster
grid was interpolated by using the contours of 1:250,000 topographic values represent the number of upslope cells flowing through each cell.
Fig. 6. Examples of RSR, GSR, DEM, Slope, Curvature, TWI, and FA (here Ln(FA) is applied on FA data sets to enhance the visibility for further analysis) in the
ZISM50m dataset. Note that RSR, GSR, TWI, and FA represent RGB-shaded relief, grey-shaded relief, topographic wetness index, and flow accumulation, respectively.
67
Fig. 7. ZISM50m data set required in the process of collecting ground truth.
It is closely related to the characteristics of streams as well as the the average of the recognition accuracy of each class, regardless of the
processes of soil erosion and soil aggradation (Jenson and Domingue, number of samples in each class. The confusion matrix is an informative
1988; Hutchinson and Gallant, 1999). table used to analyze the errors and confusions between different
Unlike the aforementioned physical variables, the shaded relief classes which is generated by counting each type of correct and in-
derived from DEM provides accentuated visual features, including correct recognition of the test samples and accumulating the results in
landform texture, contextual and shape characteristics (Benz et al., the table. Since our ZISM50m dataset has the same sample number for
2004; Zhao et al., 2017). In the paper, we respectively use grey and RGB each landform class, we have the same values for the metrics of OA and
color shaded relief to depict landforms, thus construct extrinsic visual AA. In our work, we used the metrics of AA and confusion matrix to
texture properties of the landforms. RGB colors differ according to the evaluate all recognition methods.
altitude in making RGB shaded relief. Compared with RGB shaded re- In each evaluation, the ZISM50m dataset is divided into two parts,
lief, the different grey values represent the different shades of the training set and test set. This division has been randomly done by se-
geomorphology, and do not represent the different altitudes. Direction lecting 75% samples for each category as the training set and the rest
and angle of the light source are key parameters for obtaining shaded 25% samples used for test set. To prevent the problem of overfitting, the
relief from DEM. We used conventional illumination parameters, i.e., parameters of the proposed network were tuned using sixfold cross-
azimuth angle: 315°, altitude angle: 45°, when building shaded relief. validation. In addition, in order to obtain reliable results for the metrics
Compared with remote sensing images, shaded relief is grid-based re- of AA and confusion matrix, we repeated the experiment six times for
presentation of the land surface on which vegetation and buildings have each training-test ratio and report the mean and standard deviation of
been removed, representing bare-ground terrain (Anders et al., 2011). the results.
Thus, shaded relief has purer geomorphological significance than re- The proposed network is trained with stochastic gradient descent
mote sensing images (Zhao et al., 2017). (SGD), and the training is performed through 10 K iterations. We set the
momentum to 0.9, the batch size to 50, and the learning rate to
4. Results 0.00001. In this paper, we use the open-source CNN library Pytorch
(Paszke et al., 2017) for implementing the proposed deep-learning-
This section presents the experimental design as well as discusses based landform recognition algorithms as introduced in Section 2. Both
the experiment results. Several experiments were conducted to evaluate the source code and the ZISM50m dataset can be obtained online at
the performance of the proposed approach, which use different geo- http://www.adv-ci.com/download/geomorphology/.
morphological data.
4.2. Recognition based on single geomorphological data
4.1. Evaluation metrics and experimental setup
This set of experiments was designed to compare the advantages
There exist three widely used, standard evaluation metrics in su- and disadvantages of single geomorphological data, i.e., grey shaded
pervised classification, namely overall accuracy (OA), average accuracy relief (GSR), DEM, slope, curvature, flow accumulation (FA), and to-
(AA), and confusion matrix. The overall accuracy is defined as the pographic wetness index (TWI). As shown in Fig. 8(a), landform re-
number of correctly recognition samples, regardless of which class they cognition based on single geomorphological data uses an architecture
belong to, divided by the total number of samples. The AA is defined as including single-channel geomorphological feature extraction network
Fig. 8. Different architectures for landform

recognition. (a) The architecture includes
single-channel geomorphological feature
extraction network and classification net-
work (named SC); (b) the architecture in-
cludes multi-channel geomorphological fea-
ture extraction network, proposed
geomorphological feature fusion Network
and classification network (named MFC); (c)
the architecture includes single-channel
geomorphological feature extraction net-
work and SVM classifier (named SS); (d) the
architecture includes single-channel geo-
morphological feature extraction network,
geomorphological feature fusion network
(RBM) and Softmax classifier (named SRS).
68
Table 3
Recognition accuracies based on six different geomorphological data. Bold numbers are the highest values for each column. Note that GSR, TWI, and FA represent
grey-shaded relief, topographic wetness index, and flow accumulation, respectively.
Aeolian Arid Loess Karst Fluvial Periglacial AA. Std Acc.
GSR 0.8067 0.7083 0.7750 0.6533 0.6967 0.6650 0.7175 ±1.716%

DEM 0.8233 0.6850 0.8600 0.6367 0.7550 0.5817 0.7236 ±1.704%
Slope 0.8083 0.7833 0.7850 0.7850 0.5250 0.5200 0.7011 ±1.713%
Curvature 0.8050 0.7467 0.8417 0.5850 0.6300 0.5567 0.6942 ±1.719%
FA 0.3817 0.2850 0.3017 0.3400 0.2650 0.1267 0.2834 ±1.728%
TWI 0.9000 0.7233 0.7117 0.6533 0.6367 0.6117 0.7061 ±1.715%
and classification network (named SC), which is a branch of the fra- and extrinsic visual texture properties. For example, for aeolian and
mework described in Fig. 1. arid landforms, the characteristics of these two kinds of landforms in
Table 3 presents the results of six experiments using single geo- some areas are small elevation fluctuation and smooth slope. Moreover,
morphological data. The landform recognition using GSR, DEM, slope aeolian and arid landforms are often intermingled in the same area. So,
or TWI reveals similar accuracies, which have a higher accuracy than these two landforms are easy to misclassify each other.
that of using curvature, or FA. Specifically, the recognition result from
the FA has very low accuracy. This indicates that GSR, DEM, TWI and
slope can better reflect the essential features of landforms than curva- 4.3. Recognition based on two different geomorphological data fusion
ture, and FA, if only single geomorphological data are used. Different
geomorphological data can obtain good results for some kinds of This section presents the evaluation of the proposed multi-modal
landforms. For example, slope data is more suitable to precisely de- geomorphological data fusion-based algorithm. Recognition with the
scribe the arid and karst landforms. experimental setting of the architecture shown in Fig. 8(b) was per-
Fig. 9 presents the confusion matrices by averaging recognition formed. In the experiment, the input is two different geomorphological
results over six experiments. The entry of row X and column Y denotes data, including single-modality geomorphological data fusion and
the average number of test data from class X that are classified as class multi-modal geomorphological data fusion. Table 4 lists the recognition
Y. It is to be observed that the recognition accuracy is higher than 60% accuracies of 11 geomorphological data fusion schemes on the
for most categories (4/6) except for using FA data. Especially for aeo- ZISM50m dataset. Rows 1–5 show the recognition results using two
lian landform, the recognition accuracies are higher than 80% by using different modalities of geomorphological data, and Rows 6–11 show the
single geomorphological data except FA. As shown in Fig. 9, there is no recognition results using a single modality of geomorphological data.
guarantee that one single geomorphological data will always be robust Comparing Table 3 with Table 4, the two different geomorpholo-
or suitable for all six-class landforms. However, a certain single geo- gical data fusion schemes achieve much better recognition performance
morphological data may be good at distinguishing some specific geo- than that with a single geomorphological data for all six landform
morphological classes. For example, DEM or GSR data is important in classes in terms of AA. Specifically, for karst, periglacial and fluvial
distinguishing the loess landform. For the aeolian landform, the accu- landform classes the recognition accuracies using only single dataset
racy of using TWI data is higher than that of using other geomorpho- were very low, while two different geomorphological data fusions lead
logical data. From Fig. 9, the greatest confusion is between aeolian and to a significant accuracy improvement of about 10%. This indicates that
arid landforms. This is because their similar intrinsic physical attributes the proposed geomorphological data fusion network plays an important
role in boosting the discrimination ability. Compared with Rows 1–5
Fig. 9. Confusion matrices produced by single-modal geomorphological data. Note that GSR, TWI, and FA represent grey-shaded relief, topographic wetness index,
and flow accumulation, respectively.
69
Table 4
Recognition accuracies based on the fusion of two different geomorphological data. Bold numbers are the highest values for each column. Note that GSR, TWI, and FA
represent grey-shaded relief, topographic wetness index, and flow accumulation, respectively.
Aeolian Arid Loess Karst Fluvial Periglacial AA Std Acc.
GSR + DEM 0.9217 0.8150 0.9233 0.7467 0.8617 0.9800 0.8747 ±1.7149%
GSR + Slope 0.9067 0.8217 0.9367 0.7750 0.8167 0.9783 0.8725 ±1.7134%
GSR + Curvature 0.9150 0.7967 0.8950 0.9100 0.7333 0.9800 0.8716 ±1.7165%
GSR + TWI 0.8900 0.7750 0.9383 0.7800 0.8050 0.9567 0.8575 ±1.7149%
GSR + FA 0.8433 0.5000 0.8867 0.5617 0.5617 0.9083 0.7103 ±1.7367%
Slope + Curvature 0.8400 0.8617 0.9117 0.9050 0.7300 0.8083 0.8428 ±1.7092%
Slope + FA 0.8217 0.4850 0.8450 0.7217 0.5867 0.4467 0.6511 ±1.7019%
Curvature + FA 0.7383 0.5150 0.4467 0.5800 0.4400 0.2883 0.5014 ±1.7159%
Curvature + TWI 0.8467 0.7617 0.9800 0.6467 0.7633 0.7667 0.7942 ±1.7005%
DEM + TWI 0.8750 0.7617 0.9417 0.8167 0.8567 0.9017 0.8589 ±1.6981%
TWI + Slope 0.8317 0.7767 0.9733 0.8050 0.7767 0.8100 0.8289 ±1.7186%
and Rows 6–11 of Table 4, it is obvious that multi-modal data fusion 4.4. Recognition based on three or more different geomorphological data
had much better recognition accuracy than that of using a single fusion
modality data. This shows that intrinsic physical attributes (i.e., ele-
vation, slope, and curvature, etc.) and extrinsic visual texture properties This set of experiments aims at obtaining the best scheme of geo-
(i.e., GSR) are complementary. Visual and physical geomorphological morphological data fusion. Table 6 shows the quantitative comparison
data fusion can significantly boost the performance of landform re- of 14 schemes using more than two geomorphological data. The multi-
cognition. modal fusion scheme, i.e., GSR + DEM + Slope, outperforms all other
Fig. 10 shows the confusion matrices produced by fusing two dif- schemes using three geomorphological data in terms of AA. Specifically,
ferent kinds of geomorphological data. For periglacial and loess land- in comparison with the fusion of GSR + DEM + Slope, adding one more
form classes, the number of correctly identified samples are more than geomorphological data does not lead to a higher accuracy of landform
46 when using the three multi-modal data fusion schemes (i.e., recognition by TWI, Curvature, or FA (see the first and the 7–9 rows of
GSR + DEM, GSR + Slope, and GSR + TWI). It indicates that the joint Table 6). This demonstrates that too much geomorphological data could
representation constructed by the multi-modal geomorphological fea- result in redundant information due to the high correlation between
ture fusion increases intra-class similarities while reducing inter-class different geomorphological data, which does not bring about additional
similarities. Comparing Fig. 9 to Fig. 10, it is not difficult to find that gain. The fusion of GSR + DEM + Slope + Curvature + TWI data
the proposed multi-modal data fusion based algorithm lessens the achieves a AA of 90.28%, which is the best performance among all 14
confusion of aeolian and arid landforms as well as karst and fluvial schemes. It is only 0.81 points higher than the fusion of
landforms. This indicates that the proposed geomorphological data GSR + DEM + Slope data, although two more geomorphological data
fusion network plays an important role in boosting the discrimination are involved. Specially, training a single epoch takes 31 s, when the
ability. training set consists of 900 samples from each of the following three
From Table 4, we can see that the fusion of Slope + FA, Curva- data channels: GSR, DEM, slope data. While training time increases to
ture + FA, Curvature + TWI achieved an average accuracy of 0.6511, 76 s/epoch, using five geomorphological data (e.g., GSR, DEM, slope,
0.5014, and 0.7942, respectively. As demonstrated in Table 5, the Curvature and TWI data). The training time cost more than 2 times, but
correlation of Slope vs. FA, Curvature vs. FA, and Curvature vs. TWI is the accuracy was only slightly improved. The results suggest that
very low. They are 0.0038, 0.0025 and 0.0129, respectively. The fusion adding more geomorphological data could slightly improve the per-
of TWI + Slope obtains an average accuracy of 0.8289, and the corre- formance, but there is a trade-off between complexity and accuracy.
lation of TWI vs. Slope is the highest with a value of 0.3049 in the From Tables 3,4,6, it is interesting to observe that: (1) the fusion of
correlation analysis between six geomorphological data. It is not diffi- similar modality of geomorphological data may improve landform re-
cult to find that the average accuracy of TWI + Slope is only higher cognition accuracy, but multi-modal fusion scheme achieves much
than that of the fusion schemes with very low correlations (i.e., Slope better recognition performance than that with single-modal data; (2)
vs. FA, FA vs. Curvature, and Curvature vs. TWI). Although the fusion of too much geomorphological data will greatly increase the complexity of
TWI + Slope obtains the best accuracy for two landform classes, the the algorithm and computation time, but the accuracy is only slightly
average accuracy of six landform classes is lower than that of the fusion improved; and (3) the recognition accuracies using a single dataset are
scheme with moderate correlated data (i.e., DEM vs. GSR, DEM + TWI very low, this is because single geomorphological data can only reflect
and GSR vs. Curvature, etc.). The results show that the fusion of geo- some features of landforms. Based on 31 geomorphological data fusion
morphological data that are significantly correlated (i.e. TWI + Slope) comparison results (see Tables 3,4,6), the fusion scheme of two mor-
or irrelevant (i.e., Slope + FA, and FA + Curvature, etc.) cannot phological characteristics (e.g., DEM + Slope) and one texture char-
achieve high recognition accuracy in terms of average accuracy. Con- acteristic (e.g., GSR) is the best solution among various fusion schemes
versely, moderately correlated data (i.e., DEM vs. GSR, DEM vs. TWI for landform recognition.
and GSR vs. Slope, etc.) can be fused to boost the recognition accuracy In order to verify the generalization of the proposed landform re-
and are more suitable for landform recognition. cognition algorithm, three sample regions were added (indicated by
For periglacial landforms, there are three fusion schemes with an yellow rectangles in Fig. 5). 220 sample data were selected from these
accuracy of 100%. From Table 6, we can see that the accuracies of three regions, and these data were not involved in training but were all
almost all fusion schemes are higher than 0.9 for periglacial landforms. used as test data. Note that the ground truth of each sample data in the
The result also shows that periglacial landform has low inter-class si- three new regions is labeled in the same way as those in the eight
milarity with other landforms and is easy to be recognized. In addition, sample regions. Please refer to Section 3. In the experiments, we used
it is not difficult to find that the three fusion schemes contain DEM, the geomorphological data of GSR, DEM and slope for landform re-
Slope, and FA. This demonstrates that the fusion of DEM, Slope, and FA cognition. Fig. 11 shows the recognition results on 3 newly added areas
is suitable for periglacial landform recognition. and 8 sample areas. The comparison of the recognition results between
70
Fig. 10. Confusion matrices produced by fusing two different kinds of geomorphological data. Note that GSR, TWI, and FA represent grey-shaded relief, topographic
wetness index, and flow accumulation, respectively.
them shows that the accuracies only fluctuate slightly. This indicates represent the different altitudes. So, RGB shaded relief is actually a
that the proposed landform recognition model trained on one region combination of grey shaded relief and DEM.
can also well transfer to another region. As illustrated in Fig. 12, the recognition accuracies using only RGB
shaded relief are significantly higher than that of grey shaded relief, but
5. Discussions the accuracy of the fusion of RGB shaded relief + DEM is comparable to
the accuracy of the fusion of grey shaded relief + DEM. The results also
5.1. Recognition based on RGB or grey shaded relief show that RGB shaded relief is not single geomorphological variable.
From Fig. 12, the recognition accuracy of the fusion of RGB shaded
Four additional experiments were performed to more comprehen- relief + other geomorphic data (i.e., DEM or slope data) is slightly
sively compare RGB shaded relief and grey shaded relief. The input data higher than that of grey shaded relief + other geomorphic data. The
of the first experiment was RGB shaded relief, and the input data of the results also verify that RGB shaded relief has more visual characteristics
second-fourth experiment was RGB shaded relief + DEM, RGB shaded than the gray shaded relief.
relief + slope, and RGB shaded relief + DEM + slope, respectively. In The direction and angle of the light source are key parameters of
the paper, RGB colors differ according to the altitude in making RGB obtaining shaded relief from DEM. We used conventional illumination
shaded relief. Compared with RGB shaded relief, the different grey parameters (i.e., azimuth angle: 315°/altitude angle: 45°) in the paper.
values represent the different shades of the geomorphology, and do not This set of experiments was designed to investigate whether using
71
Table 5
Correlation analysis between six geomorphological data. Note that GSR, TWI, and FA represent grey-shaded relief, topographic wetness index, and flow accumu-
lation, respectively.
Aeolian Arid Loess Karst Fluvial Periglacial mean.
DEM vs. Slope 0.1016 0.1330 0.0368 0.0805 0.0639 0.1190 0.0891
DEM vs. GSR 0.0499 0.0526 0.0209 0.0787 0.0408 0.0551 0.0497
DEM vs. Curvature 0.0504 0.0213 0.0639 0.0459 0.0006 0.0462 0.0381
Slope vs. GSR 0.1747 0.1184 0.1007 0.1631 0.1248 0.1676 0.1416
Slope vs. Curvature 0.0048 0.0065 0.0014 0.0043 0.0006 0.0018 0.0032
GSR vs. Curvature 0.0509 0.0368 0.0545 0.0912 0.0468 0.0383 0.0531
FA vs. DEM 0.0049 0.0054 0.0099 0.0076 0.0090 0.0010 0.0078
FA vs. Slope 0.0027 0.0019 0.0027 0.0028 0.0069 0.0061 0.0038
FA vs. GSR 0.0003 0.0005 0.0010 0.0016 0.0023 0.0015 0.0012
FA vs. Curvature 0.0032 0.0021 0.0017 0.0024 0.0028 0.0024 0.0025
TWI vs. DEM 0.1441 0.0985 0.0570 0.1411 0.1006 0.1424 0.1139
TWI vs. Slope 0.2837 0.2831 0.3112 0.3710 0.2743 0.3062 0.3049
TWI vs. GSR 0.0452 0.0513 0.1029 0.1539 0.0999 0.0946 0.0913
TWI vs. Curvature 0.0193 0.0048 0.0054 0.0055 0.0352 0.0075 0.0129
TWI vs. FA 0.1818 0.1635 0.1984 0.1856 0.2391 0.2346 0.2005
different illumination angles (including azimuth angle: 315°/altitude only one free parameter was used to adjust the trade-off between the
angle: 45°, azimuth angle: 135° /altitude angle: 45°, azimuth angle: number of acceptable errors and the maximum of the margin and it was
45° /altitude angle: 45°, as well as azimuth angle: 315°/altitude angle: set to 10 in accordance with experimental results.
45° and azimuth angle: 135° /altitude angle: 45°) can boost the landform As shown in Fig. 8(d), an RBM-based feature fusion method was
recognition performance. The recognition accuracies are comparable, used to recognize landforms and the architecture includes single-
and the AA is in the range of 0.8914 to 0.8958. This demonstrates that channel geomorphological feature extraction network, geomorpholo-
using different illumination angles is difficult to effectively improve gical feature fusion network (RBM) and Softmax classifier, named SRS.
landform recognition accuracy. Differently from the architecture of SS, SRS uses multiple-layer RBM to
mine correlations between different modalities, and then utilizes
softmax classifiers to predict landform classes. We refer the reader to
5.2. Comparisons with other landform recognition methods (Bu et al., 2014) for more details about RBM based feature fusion
method.
The proposed method was compared with two other popular ma- Table 7 shows the quantitative comparison results of the three dif-
chine learning algorithms: SVM and RBM-based feature fusion method ferent methods. It is not difficult to find that the recognition accuracy of
(Bu et al., 2014) in order to obtain a quantitative evaluation. The input SS is relatively low with a value of 81.81%, as it merely combines
data of these three methods are grey shaded relief, slope and DEM. For different modality data. This is less effective than fusing data when
a fair comparison, the same training data set and test data set were mining highly nonlinear relationships between different modality fea-
adopted for the proposed method and comparison methods. For the two tures. The proposed method and SRS employ residual learning and
comparison methods, the feature was extracted by single-channel geo- multiple-layer RBM to fuse intrinsic physical attributes and extrinsic
morphological feature extraction network, which is a branch of the visual texture properties, respectively, and they achieve higher re-
framework described in Fig. 1. Specially, the network is used three cognition performance than SS. Specifically, the proposed method
times to construct three feature maps from grey shaded relief, slope, achieves the highest value of 89.47%.
and DEM data, respectively. Fig. 13 shows the confusion matrices of three different geomor-
The SVM experimental process is illustrated in Fig. 8(c), and the phological data fusion methods. It is not difficult to find that the pro-
architecture includes single-channel geomorphological feature extrac- posed method outperforms all other approaches for all six landform
tion network and SVM classifier (named SS). The single-channel geo- classes. Specifically, the proposed method is more effective than the
morphological feature extraction network generated different mod- other two methods in alleviating the confusion of aeolian and arid
alities of features. After that, three feature maps were stacked and a landforms as well as karst and fluvial landforms. This indicates that the
SVM classifier was employed to recognize landforms. In SVM classifier,
Table 6
Recognition accuracies based on the fusion of three or more different geomorphological data. Note that GSR, TWI, and FA represent grey-shaded relief, topographic
wetness index, and flow accumulation, respectively.
Aeolian Arid Loess Karst Fluvial Periglacial AA Std Acc.
GSR + DEM + Slope 0.8617 0.8967 0.9833 0.8350 0.8333 0.9583 0.8947 ±1.7057%
GSR + DEM + TWI 0.8733 0.7350 0.9800 0.8567 0.8550 0.9867 0.8811 ±1.7087%
GSR + TWI + Curvature 0.8800 0.7783 0.9550 0.8167 0.8117 0.9817 0.8706 ±1.7149%
DEM + Slope + TWI 0.8617 0.8017 0.9600 0.8167 0.8450 0.9617 0.8745 ±1.7033%
Slope + TWI + Curvature 0.9183 0.8117 0.9767 0.8517 0.7717 0.8533 0.8639 ±1.7074%
Dem + Slope + FA 0.8300 0.6733 0.9567 0.6883 0.7467 1.0000 0.8158 ±1.7216%
GSR + DEM + Slope + TWI 0.9011, 0.7617 0.9883 0.7650 0.8667 0.9883 0.8785 ±1.7132%
GSR + DEM + Slope + FA 0.8450 0.6600 0.9117 0.6867 0.6967 1.0000 0.8000 ±1.7301%
GSR + DEM + Slope + Curvature 0.8767 0.8750 0.9617 0.8367 0.8233 0.9767 0.8917 ±1.7096%
GSR + Slope + TWI + Curvature 0.8667 0.7800 0.9567 0.9400 0.8683 0.9617 0.8956 ±1.7123%
DEM + Slope + TWI + Curvature 0.8533 0.8200 0.9450 0.8917 0.8800 0.9767 0.8945 ±1.7127%
Slope + Curvature + DEM + FA 0.8600 0.5983 0.8883 0.6200 0.6600 0.9967 0.7706 ±1.7315%
GSR + DEM + Slope + Curvature + TWI 0.9000 0.8100 0.9483 0.9217 0.8767 0.9600 0.9028 ±1.7067%
GSR + DEM + Slope + Curvature + TWI + FA 0.8650 0.7417 0.9000 0.8000 0.6183 1.0000 0.8208 ±1.7293%
72
Fig. 11. Recognition accuracies on the 3 newly added areas and 8 sample areas.
that the recognition accuracies are improved, with the increase of the
number of training samples. Moreover, the recognition accuracy of
different methods varies within 6%. Specifically, the accuracies of the
SS and SRS methods vary by about 6%. However, the accuracy differ-
ence of the proposed method is about 4%. This shows that the proposed
method is more robust than the other two methods.
5.3. Qualitative analysis
In Fig. 15, the landform recognition map is shown by using the

proposed approach. Fig. 15(a) represents the geomorphological map of
the sample region which indicated by the yellow rectangular area on
the top left of Fig. 5. Fig. 15(b) shows the landform recognition map of
the sample region produced by sliding a 600 × 600 pixels (30 × 30 km)
window within the sample region and getting a result of what is the
geomorphological class in each window. The incorrectly identified
Fig. 12. Performance comparisons of different shaded relief (i.e., grey and samples and correctly identified samples are marked in red and black
RGB) in terms of AA values. rectangles, respectively. The landform category of our data set does not
include glacial landforms, so we did not classify these samples. They are
proposed approach can efficiently fuse both physical and visual features marked in cyan rectangles.
information, and boost the discrimination ability as well as reduce the As shown in Fig. 15(b), the proposed algorithm correctly recognizes
redundant information. most of the samples. However, incorrectly recognized ones often occur
In order to provide more comprehensive evaluations, several addi- in the samples which include multiple landform classes in the same
tional experiments using different sample sizes with three different sample, such as Row 3/Column 12, Row 3/Column 13, Row 4/Column
methods (i.e., SS, SRS, and the proposed method) were performed to 8, and Row 4/Column 9 of Fig. 15(b). This may be because that the
study how the number of training samples affects the recognition ac- different categories of landforms in these samples are intermingled with
curacy (Dietterich, 1998). We used four different training sample ratios. each other and do not have prominent and obvious landform char-
To be specific, we randomly selected 45%, 60%, 75% and 90% samples acteristics. In fact, it is also not easy for landform experts to distinguish
of each category as the training sets, and the rest samples were used for the landform classes of these complex geomorphological samples. In
test sets. These experiments were conducted by using the same geo- addition, some easily confused landforms are also easily misidentified,
morphological data fusion scheme, i.e., GSR + DEM + Slope. such as aeolian and arid landforms, etc. See Row 5/Column 4, and Row
Fig. 14 shows the recognition results produced by three different 8/Column 9 of Fig. 15(b), these samples were misidentified as arid
methods using different training sample ratios. It is not difficult to find landform.
Table 7
Performance comparisons of three different methods. The bold numbers denote the highest values in each column. Note that SS represents the architecture with
single-channel geomorphological feature extraction network and SVM classifier. SRS also represents the architecture with single-channel geomorphological feature
extraction network, geomorphological feature fusion network (RBM) and Softmax classifier.
Aeolian Arid Loess Karst Fluvial Periglacial AA
SS 0.8583 0.7117 0.9183 0.7667 0.7533 0.9000 0.8181

SRS 0.8200 0.8167 0.9400 0.8067 0.7883 0.9483 0.8533
The proposed method 0.8617 0.8967 0.9833 0.935 0.8333 0.9583 0.8947
73
Fig. 13. Confusion matrices produced by

three different geomorphological data fu-
sion methods. Note that SS represents the
architecture with single-channel geomor-
phological feature extraction network and
SVM classifier. SRS also represents the ar-
chitecture with single-channel geomorpho-
logical feature extraction network, geomor-
phological feature fusion network (RBM)
and Softmax classifier.
Fig. 14. Performance comparisons of three different

methods using different training sample ratios. Note
that SS represents the architecture with single-
channel geomorphological feature extraction net-
work and SVM classifier. SRS also represents the ar-
chitecture with single-channel geomorphological
feature extraction network, geomorphological fea-
ture fusion network (RBM) and Softmax classifier.
Fig. 15. Landform recognition map.
6. Conclusions and fuse features from different geomorphological data, thus sig-
nificantly improving the accuracy of landform recognition.
This paper proposed a deep-learning-based landform recognition Although good results have been achieved using the proposed
framework using two different modalities of geomorphological data method, further improvements can be possible when other factors
(i.e., intrinsic physical attributes and extrinsic visual texture proper- would be considered. For example, other modalities of geomorpholo-
ties). The framework combines the methods of multi-channel geomor- gical data are not yet considered in the current study, such as soil in-
phological feature extraction, multi-modal feature fusion and landform formation and hydrologic data. These data play an important role in
recognition to construct joint feature and predict landform classes. Our landform recognition (Hengl and Rossiter, 2003; Zhou et al., 2011), and
experiments have revealed a convincing performance on a six-class we will extend the proposed multi-modal geomorphological data fusion
landform dataset and we have the following observations. (1) Intrinsic framework to verify the applicability of different combinations of multi-
physical attributes (i.e., elevation, slope, and curvature, etc.) and ex- modal geomorphic data. Related research by Zheng et al. (2015) and
trinsic visual texture properties (i.e., GSR) are complementary. Visual Chen et al. (2018) has revealed the potential of enhancing the robust-
and physical geomorphological data fusion can significantly boost the ness of our approach via structural learning. In the upcoming research,
performance of landform recognition. (2) Adding more geomorpholo- adapting Conditional Random Fields (CRF) will be considered to learn
gical data could slightly improve the performance, but there is a trade- structural relationship (e.g., contextual information) of landforms.
off between complexity and accuracy. (3) Compared with strong cor-
relation (such as TWI vs. Slope, etc.) and weak correlation (such as Acknowledgements
Slope vs. FA, etc.) geomorphological data fusion, moderate correlation
data fusion is more suitable for landform recognition. (4) The proposed This work was supported in part by the National Natural Science
deep-learning-based algorithms have the potential to efficiently extract Foundation of China (Grant Nos. 41871322, 61772425 and 41930102)
74
and in part by the China’s National Key R&D Plan (Grant No. Geomorphology 86 (3), 409–440.
2017YFB0503503). Jasiewicz, J., Stepinski, T.F., 2013. Geomorphons-a pattern recognition approach to
classification and mapping of landforms. Geomorphology 182, 147–156.
Jenson, S.K., Domingue, J.O., 1988. Extracting topographic structure from digital ele-
References vation data for geographic information system analysis. Photogram. Eng. Remote
Sens. 54 (11), 1593–1600.
Kai, L., Guoan, T., Sheng, J., 2013. Research on the classification of terrain texture from
Anders, N.S., Seijmonsbergen, A.C., Bouten, W., 2011. Segmentation optimization and
DEMs based on BP neural network. Geomorphometry 2013, 1–4.
stratified object-based analysis for semi-automated geomorphological mapping.
Kai, L., Guo’an, T., Xiaoli, H., Sheng, J., 2016. Research on the difference between tex-
Remote Sens. Environ. 115 (12), 2976–2985.
tures derived from DEM and remote-sensing image for topographic analysis. J. Geo-
Argialas, D., 1995. Towards structured-knowledge models for landform representation.
inform. Sci. 18 (3), 386–395 (in Chinese).
Zeitschrift Fr Geomorphologie N.F. Supplementband 101, 85–108.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep con-
Benz, U.C., Hofmann, P., Willhauck, G., Lingenfelder, I., Heynen, M., 2004. Multi-re-
volutional neural networks. In: Advances in Neural Information Processing Systems,
solution, object-oriented fuzzy analysis of remote sensing data for GIS-ready in-
pp. 1097–1105.
formation. ISPRS J. Photogram. Remote Sens. 58 (3), 239–258.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444.
Bu, S., Cheng, S., Liu, Z., Han, J., 2014. Multimodal feature fusion for 3D shape re-
Li, F., Tang, G., Wang, C., Cui, L., Zhu, R., 2016. Slope spectrum variation in a simulated
cognition and retrieval. IEEE MultiMedia 21 (4), 38–46.
loess watershed. Front. Earth Sci. 10 (2), 328–339.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L., 2018. Deeplab: se-
Li, K., Cheng, G., Bu, S., You, X., 2018. Rotation-insensitive and context-augmented object
mantic image segmentation with deep convolutional nets, atrous convolution, and
detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 56 (4),
fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intel. 40 (4), 834–848.
2337–2348.
Cheng, G., Zhou, P., Han, J., 2016. Learning rotation-invariant convolutional neural
Li, K., Zou, C., Bu, S., Liang, Y., Zhang, J., Gong, M., 2018. Multi-modal feature fusion for
networks for object detection in VHR optical remote sensing images. IEEE Trans.
geographic image annotation. Pattern Recogn. 73, 1–14.
Geosci. Remote Sens. 54 (12), 7405–7415.
Li Jijun, C.W., Chenghu, Zhou, 2009. Geomorphologic atlas of the People’s Republic of
Cheng, G., Li, Z., Han, J., Yao, X., Guo, L., 2018. Exploring hierarchical convolutional
China. Tech. rep.. Science Press, Beijing (in Chinese).
features for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 56
MacMillan, R., Shary, P., 2009. Landforms and landform elements in geomorphometry.
(11), 6712–6722.
Developments Soil Sci. 33, 227–254.
Cheng, G., Han, J., Zhou, P., Xu, D., 2019. Learning rotation-invariant and fisher dis-
Marmanis, D., Fathalrahman, A., Datcu, M., Esch, T., Stilla, U., 2015. Deep neural net-
criminative convolutional neural networks for object detection. IEEE Trans. Image
works for above-ground detection in very high spatial resolution digital elevation
Process. 28 (1), 265–278.
models 2(3), 103.
Dai, J., Li, Y., He, K., Sun, J., 2016. R-FCN: Object detection via region-based fully
Moore, I.D., Grayson, R., Ladson, A., 1991. Digital terrain modelling: a review of hy-
convolutional networks. In: Advances in Neural Information Processing Systems, pp.
drological, geomorphological, and biological applications. Hydrol. Process. 5 (1),
379–387.
3–30.
Dietterich, T.G., 1998. Approximate statistical tests for comparing supervised classifica-
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y., 2011. Multimodal deep
tion learning algorithms. Neural Comput. 10 (7), 1895–1923.
learning. In: International Conference on Machine Learning, pp. 689–696.
Dikau, R., Brabb, E.E., Mark, R.M., 1991. Landform classification of new Mexico by
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A.,
computer. Tech. rep. US Dept. of the Interior, US Geological Survey.
Antiga, L., Lerer, A., 2017. Automatic differentiation in pytorch. In: Advances in
Draˇgut, L., Eisank, C., 2012. Automated object-based classification of topography from
Neural Information Processing Systems, pp. 1–4.
SRTM data. Geomorphology 141–142, 21–33.
Pipaud, I., Lehmkuhl, F., 2017. Object-based delineation and classification of alluvial fans
Duro, D.C., Franklin, S.E., Dubé, M.G., 2012. A comparison of pixel-based and object-
by application of mean-shift segmentation and support vector machines.
based image analysis with selected machine learning algorithms for the classification
Geomorphology 293 (Part A), 178–200.
of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens. Environ. 118,
Prima, O.D.A., Echigo, A., Yokoyama, R., Yoshida, T., 2006. Supervised landform clas-
259–272.
sification of northeast honshu from DEM-derived thematic maps. Geomorphology 78
Farabet, C., Couprie, C., Najman, L., LeCun, Y., 2013. Learning hierarchical features for
(3), 373–386.
scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35 (8), 1915–1929.
Smith, M.J., 2011. Digital mapping: visualisation, interpretation and quantification of
Frohn, R.C., Autrey, B.C., Lane, C.R., Reif, M., 2011. Segmentation and object-oriented
landforms. Geomorphol. Mapp. 15, 225–251.
classification of wetlands in a karst Florida landscape using multi-season Landsat-7
Srivastava, N., Salakhutdinov, R.R., 2012. Multimodal learning with deep boltzmann
ETM+ imagery. Int. J. Remote Sens. 32 (5), 1471–1489.
machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230.
Hammond, E.H., 2005. Analysis of properties in land form geography: an application to
Zhang, J., Zheng, Y., Qi, D., 2017. Deep spatio-temporal residual networks for citywide
broad-scale land form mapping. Ann. Assoc. Am. Geogr. 54, 11–19.
crowd flows prediction. In: AAAI, pp. 1655–1661.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In:
Zhao, W.-F., Xiong, L.-Y., Ding, H., Tang, G.-A., 2017. Automatic recognition of loess
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
landforms using random forest method. J. Mount. Sci. 14 (5), 885–897.
770–778.
Zhao, H., Fang, X., Ding, H., Strobl, J., Xiong, L., Na, J., Tang, G., 2017. Extraction of
Hengl, T., Rossiter, D.G., 2003. Supervised landform classification to enhance and replace
terraces on the loess plateau from high-resolution dems and imagery utilizing object-
photo-interpretation in semi-detailed soil survey. Soil Sci. Soc. Am. J. 67 (6),
based image analysis. ISPRS Int. J. Geo-Inform. 6, 157.
1810–1822.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr,
Horn, B.K., 1981. Hill shading and the reflectance map. Proc. IEEE 69 (1), 14–47.
P.H., 2015. Conditional random fields as recurrent neural networks. In: International
Hutchinson, M.F., Gallant, J.C., 1999. Representation of terrain. Geographical
Conference on Computer Vision, pp. 1529–1537.
Information Systems: Principles and Technical Issues, vol. 2, pp. 105-124.
Zhou, C., Cheng, W., Qian, J., Li, B., Zhang, B., 2011. Structure and contents of layered
Iwahashi, J., Pike, R.J., 2007. Automated classifications of topography from DEMs by an
classification system of digital geomorphology for China. J. Geog. Sci. 21, 771–790.
unsupervised nested-means algorithm and a three-part geometric signature.
75

1 s2.0 S0924271619302291 Main

Uploaded by

Copyright:

Available Formats

You might also like

1 s2.0 S0924271619302291 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0924271619302291 Main

Uploaded by

Copyright:

Available Formats

ISPRS Journal of Photogrammetry and Remote Sensing 158 (2019) 63–75

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing

Multi-modal deep learning for landform recognition T

ARTICLE INFO ABSTRACT

Fig. 1. Proposed multi-modal landform recognition framework.

Fig. 2. Multi-channel geomorphological

Fig. 8. Different architectures for landform

GSR 0.8067 0.7083 0.7750 0.6533 0.6967 0.6650 0.7175 ±1.716%

5.3. Qualitative analysis

In Fig. 15, the landform recognition map is shown by using the

SS 0.8583 0.7117 0.9183 0.7667 0.7533 0.9000 0.8181

Fig. 13. Confusion matrices produced by

Fig. 14. Performance comparisons of three different

Fig. 15. Landform recognition map.

You might also like