Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

*

1 A Multiclass TrAdaBoost Transfer Learning Algorithm


2 for the Classification of Mobile Lidar Data
3
4
5 ABSTRACT:
6
7 A major challenge in the application of state-of-the-art deep learning methods to the classification of
8 mobile lidar data is the lack of sufficient training samples for different object categories. The transfer
9 learning technique based on pre-trained networks, which is widely used in deep learning for image
10 classification, is not directly applicable to point clouds, because pre-trained networks trained by a large
11 number of samples from multiple sources are not available. To solve this problem, we design a
12 framework incorporating a state-of-the-art deep learning network, i.e. VoxNet, and propose an extended
13 Multiclass TrAdaBoost algorithm, which can be trained with complementary training samples from
14 other source datasets to improve the classification accuracy in the target domain. In this framework, we
15 first train the VoxNet model with the combined dataset and extract the feature vectors from the fully
16 connected layer, and then use these to train the Multiclass TrAdaBoost. Experimental results show that
17 the proposed method achieves both improvement in the overall accuracy and a more balanced
18 performance in each category.
19
20 KEYWORDS: VoxNet; TrAdaBoost; Multiclass Classification; Point Cloud; 3DCNN; Deep Learning;
21 Transfer Learning
22
23
24 1. INTRODUCTION

25 Mobile lidar data have been widely used in 3D mapping, city modeling and road inventory surveying.
26 While visual interpretation of objects in point clouds by a human expert is relatively straightforward, it
27 is a time consuming and labour intensive task. Conversely, automatic semantic analysis of mobile lidar
28 data in real time, or at a large scale, is a significant challenge. To realize automatic semantic
29 segmentation of point clouds, approaches based on hand-designed features, including local features
30 (Bayramoglu and Alatan, 2010; Guo et al., 2016; Krig, 2014; Lian et al., 2010; Lo and Siebert, 2009;
31 Taati and Greenspan, 2011; Tombari et al., 2010; Wu et al., 2010), and global features (Bisheng Yang
32 et al., 2016; Khoshelham, 2007; Puttonen et al., 2011; Yang et al., 2015) have been proposed. The
33 performance of these methods in object recognition and object classification is highly dependent on the
34 quality of the point clouds. The application of these feature-based methods is limited to indoor and
35 urban environments where the lidar scans have a high resolution and are reasonably complete (Cabo et
1
*

36 al., 2014; Fukano and Masuda, 2015; Jing and Suya, 2015; Khoshelham, 2007; Lalonde et al., 2005;
37 Lam et al., 2010; Lehtomäki et al., 2010; Li and Elberink, 2013; Yokoyama et al., 2011; Yokoyama et
38 al., 2013). With mobile lidar, however, the extraction of high-level semantic information is more
39 challenging. Occlusions and shape variance caused by the scanning angle make instances of the same
40 object category exhibit different appearances, thus increasing the complexity of the recognition task.
41 Compared with indoor lidar data, the disturbance and noise in mobile lidar data make the direct
42 extraction of local point features a challenge. Another disadvantage of traditional feature-based methods
43 is that these features are not optimal and vary from one dataset to another.
44
45 To overcome the disadvantages of the traditional feature-based approaches, various deep learning
46 methods have been proposed (Maturana and Scherer, 2015b; Qi et al., 2017a; Qi et al., 2017b; Zhou
47 and Tuzel, 2017). Compared with the traditional supervised machine learning methods, deep
48 convolutional neural networks (CNNs) can learn high-level representations through compositions of
49 low-level point information from large numbers of training samples. However, the generation of
50 training samples by manual semantic labelling of a large variety of object categories in mobile lidar is
51 difficult and time-consuming. In order to solve this problem, Piewak et al. (2018) proposed a framework
52 for boosting LiDAR-based semantic labeling by cross-modal training data generation. In this method,
53 the image data is required as a reference for the generation of training data in point cloud format.
54 However, this approach would still require manual verification of the training samples to ensure their
55 correctness.
56
57 A more feasible solution is to take advantage of the previously labelled training samples from other
58 available datasets. Such samples can be incorporated as complementary training data by using transfer
59 learning methods. Deep transfer learning methods were first introduced in image classification tasks
60 where a pre-trained model is applied in a new, slightly different domain (Long et al., 2015; Sun and
61 Saenko, 2016). Compared to the application of transfer learning methods in image classification, the
62 introduction of these methods to point clouds classification is limited by the fact that no generic input
63 feature is available for point clouds collected in either different formats or different environments in the
64 deep networks. Another bottleneck in the traditional deep transfer learning method is the requirement
65 for abundant training samples with similar shape morphologies in each category. Among the different
66 existing approaches, instance-based deep transfer learning methods provide the possibility to take
67 advantage of the source data in mobile lidar classification by weight-adjusting methods, such as
68 boosting algorithms when the source data and target data share similar distributions. The underlying
69 assumption of instance-based transfer learning is that part of the instances in the source domain can be
70 utilized by the target domain with appropriate weights. TrAdaBoost is an instance-based transfer

2
*

71 learning algorithm which was first proposed by Dai et al. (2007) for two-class classification with an
72 additional dataset. It has been demonstrated that TrAdaBoost can significantly improve classification
73 performance on the target domain when supplemented with sufficient samples of similar distribution.
74
75 In this paper, we propose an instance-based transfer learning method by extending TrAdaBoost into a
76 multiclass classifier and adapting it for point cloud data. We then evaluate its performance for the
77 transfer learning and classification of mobile lidar data. We adopt VoxNet as the basic feature extraction
78 approach as it has been demonstrated to perform successfully on lidar data, computer graphics models
79 and depth (RGB-D) data with different resolutions (Maturana and Scherer, 2015b). To evaluate the
80 classification performance of the proposed method trained with a complementary dataset, we carry out
81 several comparison experiments. We compare the performance of VoxNet with and without the
82 complementary dataset. Then, we apply the proposed Multiclass TrAdaBoost algorithm to the feature
83 vector extracted from the fully connected layer of the VoxNet trained with the combined dataset and
84 compare its performance to VoxNet and AdaBoost trained with and without the complementary dataset.
85
86 2. RELATED WORK

87 2.1 Deep Learning based Object Recognition of Mobile Lidar Data


88 Compared to the outstanding performance of deep learning methods in 2D image classification, their
89 application to 3D mobile lidar has presented several challenges. Firstly, the irregular distribution of the
90 points makes their organization for traditional CNN filters more difficult. Secondly, the permutation
91 invariance and point number differences of identical shapes prevent the direct extension of traditional
92 2D deep learning methods into the 3D domain. To avoid the influence of permutation variance of points
93 in the input, point-based 3D nets (Qi et al., 2017a; Qi et al., 2017b), local and global geometric 3D nets
94 (Komarichev et al., 2019; Wang et al., 2018; Zhang et al., 2019), tree-based 3D nets (Klokov and
95 Lempitsky, 2017; Wang et al., 2017), depth-image-based 2D nets (Roveri et al., 2018), multi-view based
96 2D nets (Restrepo and Mundy, 2012), and Voxel-based 3D nets (Le and Duan, 2018; Li et al., 2018;
97 Maturana and Scherer, 2015b) have been proposed. As a typical deep learning network directly
98 consuming the original points, PointNet (Qi et al., 2017a) adopted T-net and multi-layer perception
99 networks to both account for geometric transformations and achieve permutation invariance. To
100 overcome a lack of knowledge about local structures in the metric space in PointNet, a hierarchical
101 neural network named PointNet++ (Qi et al., 2017b) using PointNet as the local feature extractor with
102 multi-kernel and multi-scale was designed. In PointNet++, centroid sampling by the farthest point
103 sampling (FPS) algorithm and neighbourhood ball are adopted for partitioning the point set in the metric
104 space. Similar to PointNet and PointNet++, EdgeConv (Wang et al., 2018) has extended the possibility
105 of CNN-based networks in point cloud classification by introducing a dynamic neighbourhood graph
3
*

106 updated after each layer. Other deep networks involve the reorganization of points in the tree-based
107 deep nets, or the mapping of points to 2D or 2.5D in multi-view-based and depth-image-based methods.
108 These methods, however, would increase the distribution dissimilarity of samples from multiple
109 sources. VoxNet on the other hand reorganizes the points in a 3D voxel space and shows more flexibility
110 and convenience when combining multi-source mobile lidar datasets together for the purpose of transfer
111 learning (Maturana and Scherer, 2015b).
112
113 2.2 Deep Transfer Learning
114 Deep transfer learning methods can be divided into three categories: network-based, instance-based
115 and adversarial-based deep transfer learning (Tan et al., 2018). In network-based deep transfer learning,
116 the network trained in the source domain is adapted to be part of the new network designed for the target
117 domain (Tan et al., 2018). Ranzato et al. (2007) realized unsupervised learning from large unlabelled
118 samples by using the encoder-decoder system to pre-train the lower layers. Huang and LeCun (2006)
119 and Oquab et al. (2014) demonstrated that CNNs trained with large-scale annotated image datasets can
120 be efficiently adopted as the mid-level image representation for other recognition tasks with a limited
121 dataset. The pre-trained basic model built in the general source domain could be fine-tuned with samples
122 in the target domain to generate the domain-specific model. Although network-based deep transfer
123 learning is widely used in image processing and natural language processing, researchers have found
124 that the transferability of features can be influenced by factors such as the layers from which the pre-
125 trained model is adopted, and the dissimilarity between the base task and target task (Yosinski et al.,
126 2014).
127
128 With 3D mobile lidar, it is impractical to directly use network-based deep transfer learning for
129 classification because of the limitation of training samples from the source domain (Haeusser et al.,
130 2017). Instance-based and adversarial-based deep transfer learning provide a possible solution for
131 transfer learning when the samples from the source domain are insufficient for a network-based transfer
132 learning. The key idea of adversarial-based deep transfer learning is to increase the assimilation in
133 feature distributions and keep the discrimination in classification. To reduce the distribution divergence
134 between the source data and target data, Xu et al. (2017) introduced a metric transfer learning
135 framework, using the Mahalanobis distance, to encode metric learning in transfer learning. To reinforce
136 associations between source and target data in embedding space, Haeusser et al. (2017) proposed
137 associative domain adaptation with neural networks in deep transfer learning. The domain-adversarial
138 neural network (DANN) was introduced in the work of Ajakan et al. (2014), Ganin and Lempitsky
139 (2014) and Ganin et al. (2016). Energy distances, maximum mean discrepancies (MMD) (Sejdinovic et
140 al., 2013) and a multiple kernel variant of MMD (MK-MMD) were proposed together with DANN for

4
*

141 the learning of transferable features in deep adaptation networks (Cai et al., 2018; Long et al., 2015).
142 In contrast to the complexity of adversarial deep transferable networks, instance-based transfer learning
143 is more straightforward as it involves adapting the classifier to samples with different distributions from
144 different sources of mobile lidar data. A practical approach to achieve this is boosting.
145
146
147 2.3 Boosting for Classification and Transfer Learning
148 Boosting is a general concept to improve the performance of learning algorithms (Freund et al., 1999;
149 Schapire et al., 1998). Freund and Schapire (1997) first introduced AdaBoost, in which the relative
150 weights of incorrectly classified samples are increased in each iteration. The goal of the learner in this
151 algorithm is to find a hypothesis with a small prediction error. The requirement is that the error rate in
152 each iteration is less than random guessing, which is ½ in the two class case, and 1 − 1 / k in the multiclass
153 case, where k is the number of classes. To meet this requirement, the traditional AdaBoost algorithm,
154 i.e. AdaBoost.M1, was extended to AdaBoost.M2 by replacing the error rate in each iteration by a
155 “pseudo loss”. The goal of the learner is thus changed to finding a hypothesis with minimum pseudo
156 loss. Inspired by the error-correcting output codes (ECOC) proposed by Dietterich and Bakiri (1994),
157 Schapire (1997) explored the possibility of combining AdaBoost and ECOC, and proposed
158 AdaBoost.OC to solve multiclass learning problems. Comparative experiments between AdaBoost.OC
159 and AdaBoost.M2 with different weak learners, however, did not show much improvement (Schapire,
160 1997). Moreover, similar improved algorithms, involving a simplification of the multiclass
161 classification into multiple one-to-all or one-to-one problems, did not yield much improvement in the
162 classification accuracy, but did increase the computation burden (Allwein et al., 2000; Friedman et al.,
163 2000; Schapire and Singer, 1999). In order to reduce the computation time, Hastie et al. (2009) directly
164 extended the AdaBoost algorithm by Stagewise Additive Modeling (SAMME) using a multiclass
165 exponential loss function.
166
167 Boosting-based transfer learning algorithms are instance-based transfer learning methods which utilize
168 labelled examples from the source domain to improve the classification performance in the target
169 domain via weighting-based knowledge transfer. The popular boosting transfer learning algorithm
170 TrAdaBoost was first proposed by Dai et al. (2007), where AdaBoost was adapted with SVM as the
171 basic learner for two-class classification. Li et al. (2017) extended the traditional TrAdaBoost method
172 for the classification of sandstone microscopic images by applying the one-to-all method. Liu et al.
173 (2018) extended TrAdaBoost by combining boosting and bagging for resampling and reweighting.
174 Compared to other transfer learning algorithms adapted for SVM classifiers (Wu and Dietterich, 2004;
175 Yang et al., 2007), boosting-based transfer learning is easy to implement by adjusting the weights of

5
*

176 samples. The main principle of TrAdaBoost is utilization of available source data sharing some
177 similarity with the target data but maybe differing in distribution or representation, so as to boost the
178 learning of the classifier for target data. However, instance-based transfer learning can easily lead to
179 negative transfer learning if the source data and initial weights are not properly chosen. Another
180 shortcoming of TrAdaBoost is its ignorance of the first half estimators to preserve the similar error-
181 convergence property as the AdaBoost. Additionally, once the weights of the samples from the source
182 domain decrease in the early stage, they cannot be recovered in later iterations. Moreover, the imbalance
183 of samples in each class could also influence the performance of TrAdaBoost to the extent that all
184 instances could be classified into one single category. To overcome these limitations, Al-Stouhi and
185 Reddy (2011) introduced a dynamic correction factor into TrAdaBoost, which significantly improved
186 the classification performance in two-class tasks. The application of boosting-based transfer learning to
187 point clouds is still limited for two reasons. On the one hand, the dynamic TrAdaBoost algorithm is
188 only applicable to two-class classification problems. On the other hand, multiclass classification
189 algorithms, such as SAMME, do not have transfer learning ability.
190
191 3. VOXNET-BASED MULTICLASS TRADABOOST CLASSIFICATION

192 In this section, we introduce a new framework for the classification of point clouds using VoxNet for
193 feature learning and boosting for transfer learning. We start with data pre-processing of the source and
194 target datasets, then address feature extraction from point segments by the VoxNet network, and finally
195 we cover transfer learning using a new Multiclass TrAdaBoost algorithm. The architecture of both the
196 VoxNet and the Multiclass TrAdaBoost algorithm are described.
197
198 3.1 Data Preprocessing
199 MLS data might include additional information such as scan line, scan angle, number of echoes and
200 intensity. However, since such information may not always be available, the proposed framework is
201 based on 3D point coordinates only. We follow a segment-based approach, where we first extract 3D
202 point segments representing potential objects from the raw point clouds, and then label these segments
203 by applying a classification method (He et al., 2017; Khoshelham et al., 2013). To achieve a complete
204 segmentation, we follow the pipeline proposed by Golovinskiy et al. (2009). The first step is to remove
205 the ground points identified as belonging to large horizontal planes. Then, a connected component
206 segmentation is applied to group the points into individual hypotheses to obtain the locations of potential
207 objects. Since our focus is on the extraction of traffic-related objects, we remove the ground, buildings
208 and tree canopies, which appear as large connected components. The performance of the connected
209 component segmentation is dependent upon two thresholds, namely the minimum number of points in
210 a segment and the maximum distance between the points. To refine overgrown segments connecting
6
*

211 the objects to the background, we apply the connected component segmentation twice with different
212 parameters. The resulting segments are then manually labelled for the training.
213
214 3.2 VoxNet
215 VoxNet was adapted from ShapeNet by Maturana and Scherer (2015a) for landing zone detection from
216 lidar data. Its architecture is provided in Figure 1. VoxNet consists of the input layer, two convolutional
217 layers, one pooling layer, one fully connected layer and the output layer. The input layer accepts a fixed-
218 size grid of 32×32×32 voxels. The value for each grid cell is updated depending upon the occupancy
219 mode: 1 for occupied, otherwise 0. The convolutional layers accept four-dimensional input in which
220 three of the dimensions are spatial, and the fourth contains the feature values. The convolutional layers
221 create new feature values by convolving the input with 32 filters in each layer. The pooling layer
222 downsamples the input after convolutional layers by a size of 2×2×2 units. The fully connected layer
223 consists of 128 output neurons as a learned linear combination of all the outputs from the pooling layer.
224 In the output layer, ReLU function and a softmax nonlinear model are applied to generate the
225 probabilistic output, where the number of outputs corresponds to the number of class labels K.

Feature Vector

FC(K)

226
227 Figure 1 The architecture of VoxNet.

228 3.3 Multiclass TrAdaBoost


229 The proposed new Multiclass TrAdaBoost algorithm has low computational complexity as compared
230 to the one-to-all approach. The key idea of TrAdaBoost is to update the sample weights of the target
231 domain and source domain separately. The assumption of TrAdaBoost is that, the wrongly predicted
232 instances from the source domain are those which are most dissimilar to the distribution of the target
233 data, while the correctly predicted instances share more similarity to the target data. TrAdaBoost keeps
234 the same weight updating mechanism of AdaBoost for the target data, but for the training data from the
7
*

235 source domain it assigns smaller weights to the wrongly predicted instances by applying a fixed
236 multiplier:
237
𝑤𝑤 𝑡𝑡 ⋅ 𝛽𝛽 I(ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 )≠𝑦𝑦(𝒙𝒙𝒊𝒊 )) , 1 ≤ 𝑖𝑖 ≤ 𝑚𝑚
238 𝑤𝑤𝑖𝑖𝑡𝑡+1 = � 𝑖𝑖𝑡𝑡 (1)
𝑤𝑤𝑖𝑖 ⋅ 𝛽𝛽𝑡𝑡 I(ℎ𝑡𝑡(𝒙𝒙𝒊𝒊 )≠𝑦𝑦(𝒙𝒙𝒊𝒊 )) , 𝑚𝑚 + 1 ≤ 𝑖𝑖 ≤ 𝑚𝑚 + 𝑛𝑛
239
240 where I is the indicator function defined as:
1 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) ≠ 𝑦𝑦(𝒙𝒙𝒊𝒊 )
241 I(ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) ≠ 𝑦𝑦(𝒙𝒙𝒊𝒊 ))= � (2)
0 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) = 𝑦𝑦(𝒙𝒙𝒊𝒊 )
242 Here m is the number of source samples, n is the number of target samples, 𝑤𝑤𝑖𝑖𝑡𝑡 is the weight for sample
243 𝑖𝑖 at iteration t, 𝒙𝒙𝒊𝒊 is the feature vector for sample i extracted from the trained VoxNet model, ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 )
244 indicates the predicted label and 𝑦𝑦(𝒙𝒙𝒊𝒊 ) is the ground truth label. The multiplier for source samples is
245 defined as 𝛽𝛽 = 1/(1 + �2 𝑙𝑙𝑙𝑙 𝑚𝑚 /𝑁𝑁), where N is the maximum number of iterations. For target samples,
246 the multiplier is defined as 𝛽𝛽𝑡𝑡 = (1 − 𝜀𝜀𝑡𝑡 )/𝜀𝜀𝑡𝑡 , where 𝜀𝜀𝑡𝑡 is the overall error of ℎ𝑡𝑡 on all target samples at
247 iteration t. Since these multipliers are defined for binary classification, where the maximum overall
248 error is 0.5, they effectively increase the weight of wrongly predicted target samples and decrease the
249 weight of wrongly predicted source samples while keeping the weight of correctly predicted source and
250 target samples unchanged. To extend this weight updating mechanism to multiclass classification, we
251 adopt the forward stagewise additive modelling of SAMME proposed by Hastie et al. (2009), which
252 uses an exponential loss function:
𝐾𝐾−1
𝑤𝑤 𝑡𝑡 ⋅ 𝑒𝑒 − 𝐾𝐾 𝛼𝛼𝑡𝑡
, 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) = 𝑦𝑦(𝒙𝒙𝒊𝒊 )
253 𝑤𝑤𝑖𝑖𝑡𝑡+1 = � 𝑖𝑖 1 (3)
𝑤𝑤𝑖𝑖𝑡𝑡 ⋅ 𝑒𝑒 𝐾𝐾𝛼𝛼𝑡𝑡 , 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) ≠ 𝑦𝑦(𝒙𝒙𝒊𝒊 )
254 Here, 𝛼𝛼𝑡𝑡 is the weight updating parameter based on multiclass loss defined as 𝛼𝛼𝑡𝑡 = 𝑙𝑙𝑙𝑙𝑙𝑙( 1 − 𝜀𝜀𝑡𝑡 )/𝜀𝜀𝑡𝑡 +
255 𝑙𝑙𝑙𝑙𝑙𝑙( 𝐾𝐾 − 1), with K the number of classes, and 𝜀𝜀𝑡𝑡 the overall error of ℎ𝑡𝑡 on all samples at iteration t.
256 For transfer learning Equation (3) results in a rapid weight drop for correctly predicted source samples.
257 To avoid this so-called weight drift effect, we adopt the adaptive boosting method for transfer learning
258 proposed by Al-Stouhi and Reddy (2011) to keep the weight ratio of the whole source data to the whole
259 target data constant. This is achieved by applying a correction factor 𝐶𝐶𝑡𝑡 extended for K classes:
𝐾𝐾−1
260 𝐶𝐶𝑡𝑡 = 𝐾𝐾(1 − 𝜀𝜀𝑡𝑡 ) ⋅ 𝑒𝑒 − 𝛼𝛼
𝐾𝐾 𝑡𝑡 (4)
261 where 𝜀𝜀𝑡𝑡 is the overall error of ℎ𝑡𝑡 on all target samples at iteration t. By combining the weight
262 multipliers for target samples from Equation (3) and those for source samples from Equation (1)
263 corrected by the correction factor in Equation (4) we obtain the complete weight updating mechanism
264 as follows:

8
*

𝐾𝐾−1
𝑡𝑡 − 𝛼𝛼
⎧𝑤𝑤𝑖𝑖 ∙ 𝐾𝐾(1 − 𝜀𝜀𝑡𝑡 ) ⋅ 𝑒𝑒 𝐾𝐾 𝑡𝑡 , 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) = 𝑦𝑦(𝒙𝒙𝒊𝒊 ) 1 ≤ 𝑖𝑖 ≤ 𝑚𝑚
⎪ 𝑡𝑡 𝐾𝐾−1
𝑤𝑤𝑖𝑖 ∙ 𝐾𝐾(1 − 𝜀𝜀𝑡𝑡 ) ⋅ 𝑒𝑒 𝛼𝛼 ⋅ 𝑒𝑒 − 𝐾𝐾 𝛼𝛼𝑡𝑡 , 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) ≠ 𝑦𝑦(𝒙𝒙𝒊𝒊 ) 1 ≤ 𝑖𝑖 ≤ 𝑚𝑚
265 𝑤𝑤𝑖𝑖𝑡𝑡+1 = 𝐾𝐾−1 (5)
⎨𝑤𝑤 𝑡𝑡 ⋅ 𝑒𝑒 − 𝛼𝛼
𝐾𝐾 𝑡𝑡 , 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) = 𝑦𝑦(𝒙𝒙𝒊𝒊 ) 𝑚𝑚 + 1 ≤ 𝑖𝑖 ≤ 𝑛𝑛 + 𝑚𝑚
⎪ 𝑖𝑖 1
𝛼𝛼
⎩𝑤𝑤𝑖𝑖𝑡𝑡 ⋅ 𝑒𝑒 𝐾𝐾 𝑡𝑡 , 𝑖𝑖𝑖𝑖 ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 ) ≠ 𝑦𝑦(𝒙𝒙𝒊𝒊 ) 𝑚𝑚 + 1 ≤ 𝑖𝑖 ≤ 𝑛𝑛 + 𝑚𝑚
266
𝐾𝐾−1
267 where 𝛼𝛼 = 𝑙𝑙𝑙𝑙𝑙𝑙( 1/(1 + �2 𝑙𝑙𝑙𝑙 𝑚𝑚 /𝑁𝑁)) and 𝑒𝑒 𝛼𝛼 = 𝛽𝛽. Dividing the four equations in (5) by 𝑒𝑒 − 𝐾𝐾
𝛼𝛼𝑡𝑡
and
268 using the indicator function I, the equations can be further simplified and expressed in a more compact
269 form as follows:
𝑤𝑤 𝑡𝑡 ∙ 𝐾𝐾(1 − 𝜀𝜀 )⋅ 𝑒𝑒 𝛼𝛼⋅I(ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 )≠𝑦𝑦(𝒙𝒙𝒊𝒊 )) , 1 ≤ 𝑖𝑖 ≤ 𝑚𝑚
270 𝑤𝑤𝑖𝑖𝑡𝑡+1 = � 𝑖𝑖𝑡𝑡 𝛼𝛼 ⋅I(ℎ (𝒙𝒙𝑡𝑡 )≠𝑦𝑦(𝒙𝒙 )) (6)
𝑤𝑤𝑖𝑖 ⋅ 𝑒𝑒 𝑡𝑡 𝑡𝑡 𝒊𝒊 𝒊𝒊 , 𝑚𝑚 + 1 ≤ 𝑖𝑖 ≤ 𝑛𝑛 + 𝑚𝑚
271
272 This weight updating mechanism keeps the weight of correctly predicted target samples unchanged but
273 increases the weight of incorrectly predicated target samples according to AdaBoost principle of
274 focusing more on difficult samples during the training. For the source data, however, the weight of
275 incorrectly predicted samples is significantly decreased, as these are identified as having a distribution
276 dissimilar to that of target samples, and the weight of correctly predicted source samples is slightly
277 decreased such that they make a smaller contribution to the training as compared to the target samples.
278 By incorporating the above weight updating mechanism in the classification we obtain the Multiclass
279 TrAdaBoost algorithm as follows:
280
Algorithm 1: Multiclass TrAdaBoost
Input: labelled source dataset 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠 with m samples, target dataset 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡 with n samples,
unlabelled test dataset S, the maximum number of iterations 𝑁𝑁, and a base classifier 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿.
1
Initialize the initial weight vector: 𝒘𝒘1 = �𝑤𝑤11 , . . . , 𝑤𝑤(𝑛𝑛+𝑚𝑚) �. The user can specify the initial
values w1 according to the ratio of samples in the two datasets.
For 𝒕𝒕 = 𝟏𝟏, . . . , 𝑵𝑵
1. Set 𝒑𝒑𝑡𝑡 = 𝒘𝒘𝑡𝑡 /(∑𝑛𝑛+𝑚𝑚 𝑡𝑡
𝑖𝑖=1 𝑤𝑤𝑖𝑖 ).

2. Call 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 with the combined training set 𝑇𝑇𝑐𝑐 = 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠 ∪ 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡 weighted by 𝒑𝒑𝑡𝑡
and the unlabelled test set S to obtain a hypothesis: ℎ𝑡𝑡 : 𝑋𝑋 → 𝑌𝑌
𝑤𝑤𝑖𝑖𝑡𝑡 ∙I(ℎ𝑡𝑡 (𝒙𝒙𝒊𝒊 )≠𝑦𝑦(𝒙𝒙𝒊𝒊 ))
3. Compute the error of ℎ𝑡𝑡 on 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡 : 𝜀𝜀𝑡𝑡 = ∑𝑛𝑛𝑖𝑖=1 ∑𝑛𝑛 𝑡𝑡
𝑖𝑖=𝑖𝑖 𝑤𝑤𝑖𝑖

4. Set 𝛼𝛼𝑡𝑡 = 𝑙𝑙𝑙𝑙𝑙𝑙( 1 − 𝜀𝜀𝑡𝑡 )/𝜀𝜀𝑡𝑡 + 𝑙𝑙𝑙𝑙𝑙𝑙( 𝐾𝐾 − 1) , 𝛼𝛼 = 𝑙𝑙𝑙𝑙𝑙𝑙( 1/(1 + �2 𝑙𝑙𝑙𝑙 𝑚𝑚 /𝑁𝑁)).
5. Update the weight vector according to Equation (5).
End

9
*

Output the hypothesis


𝑁𝑁

𝐻𝐻(𝒙𝒙) = 𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚 � 𝛼𝛼𝑡𝑡 ⋅ I(ℎ𝑡𝑡 (𝒙𝒙) = 𝑘𝑘).


𝑘𝑘
𝑡𝑡=1

281
282 In Algorithm 1, base classifier learner can be any simple multiclass classifier. In our experiments, we
283 adopted decision trees as the base classifier learner.
284
285 In theory, the weight ratio of individual useful samples in the source domain to the individual correctly
286 classified samples in the target domain can vary dramatically. Although the relative weight ratio of the
287 whole target dataset and the whole source dataset is kept constant, the weight of positive instances in
288 the source domain adjusts K times faster than that of the correctly classified samples in the target domain
289 in each iteration. After several iterations, the wrongly predicted target samples will have the greatest
290 weights, followed by the correctly predicted source samples. The correctly predicted target samples will
291 have smaller weights, and the wrongly predicted source samples will have the smallest weights.
292 Consequently, the weights of correctly predicted source samples can become significantly larger than
293 those of correctly predicted target samples. In principle, the target data should always have larger
294 weights than the source data. In practice, however, to avoid this so-called weight imbalance problem,
295 we found empirically that the correction factor 𝐾𝐾(1 − 𝜀𝜀𝑡𝑡 ) in Equation (5) can be set as 2(1 − 𝜀𝜀𝑡𝑡 ) in
296 order to slow the weight increase rate of the correctly predicted source samples. In this setting, we can
297 avoid the negative transfer learning and slow down the weight drift to target data at the same time.
298
299 3.4 The Transfer Learning Framework
300 The main challenge in the recognition of objects in mobile lidar data is the limitation of training
301 samples. In classification with transfer learning, a complementary dataset is utilized to boost the
302 performance of the classification. In this case, the dataset available in the original task is named the
303 target dataset, and the complementary dataset related to the original data is named the source dataset.
304 In order to benefit from the available datasets collected with different sensors in different environments,
305 and at the same time minimize the negative influence of distribution dissimilarity, we design a
306 framework to incorporate the source dataset into the training of the classification model, as illustrated
307 in Figure 2. Dataset B is the dataset in the source domain. Dataset C in the target domain is split into a
308 training dataset C1 and a testing dataset C2. The two main steps of transfer learning are VoxNet and
309 Multiclass TrAdaBoost. In the training of the VoxNet network, samples in the target and source
310 domains are voxelized into grids of 32×32×32 cells to achieve equal input size. The Multiclass
311 TrAdaBoost is then trained by extracted feature vectors of dataset B and C1 from the trained VoxNet

10
*

312 model, where the algorithm adjusts the weights of extracted feature vectors from the source and target
313 domains.

314
315 (a)

316
317 (b)
318 Figure 2. The proposed transfer learning framework. In the training phase (a), VoxNet is trained using
319 samples from both the source and target domains (B+C1) and the extracted feature vectors are used to
320 train the Multiclass TrAdaBoost algorithm. In the classification phase (b), the trained classifier is
321 evaluated using samples from the target domain (C2).
322
323 4. EXPERIMENTS AND ANALYSIS

324 We use two mobile lidar datasets to evaluate the performance of the proposed transfer learning
325 framework, a benchmark dataset, collected in Sydney, Australia, as the target dataset, and a
326 complementary dataset collected in Enschede, the Netherlands, as the source dataset. We compare the
327 performance of the proposed framework with and without the complementary dataset. To evaluate the
328 classification accuracy, we compute precision, recall, F1-score, macro-average F1-score, and weighted
329 macro-average F1-score. While precision, recall, and F1-score are common measures for the evaluation
330 of classification performance per category, the latter measures are recommended in the scikit-learn
331 library (Pedregosa et al., 2011) for the evaluation of overall classification performance, especially for
332 unbalanced datasets. The F1-score is defined for a single class as follows:
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝∙𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
333 𝐹𝐹1 = 2 ∙ (7)
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝+𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟

334 In the multiclass case, the macro-average F1-score is defined as the arithmetic mean of F1 scores for
335 the individual classes, ignoring the data imbalance:
𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 +𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 +⋯+𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
336 Macro_𝐹𝐹1 = 𝑁𝑁
(8)

11
*

337 where N is the total number of samples. The weighted macro-average F1-score measures the overall
338 classification performance, whereby the contribution of each class to the average is weighted by the
339 relative number of samples available for it. It is calculated as follows:
𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ∙𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 +𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ∙𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 +⋯+𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ∙𝐹𝐹1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
340 Weighted_𝐹𝐹1 = 𝑁𝑁
(9)

341 where 𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 , 𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 , and 𝑛𝑛𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 denote the number of samples in class A, class B and class N
342 respectively.
343
344 4.1 Data Preprocessing
345 The Sydney dataset contains a variety of common urban road objects scanned with a Velodyne HDL-
346 64E Lidar in the city’s central business district (De Deuge et al., 2013). The dataset was already
347 segmented, and the segments were manually labelled by the data providers. The data was collected in
348 non-ideal sensing conditions with a large variability in viewpoint and occlusions, resulting in many
349 duplicate objects. To avoid the influence of duplicate objects, we use the samples preselected by the
350 data providers (De Deuge et al., 2013). These samples were randomly divided into 4 folds, F0, F1, F2
351 and F3, and contain the following 14 classes: 4wd (four-wheel drive vehicle), wall, bus, car, person,
352 pillar, pole, traffic lights, traffic sign, tree, truck, trunk, ute (utility vehicle) and van. The Enschede
353 dataset was collected with an Optech LYNX Mobile Mapper system by the German company TopScan
354 in 2008, and covers several urban roads containing object categories similar to the Sydney dataset. To
355 obtain the training samples from the raw point clouds collected in Enschede, we applied the
356 segmentation and labelling steps described in Section 3. In the first segmentation step, the maximum
357 distance between the points and the minimum number of points were set as 0.2 m and 500, repectively.
358 In the refined segmentation step, these parameters were set as 0.15m and 100 to reduce segmentation
359 errors. The segmentation result for several strips of the Enschede dataset is shown in Figure 3.
360

12
*

361
362 Figure 3 The segmentation result for the Enschede dataset.

363 In the experiments, we use cross-validation to evaluate the performance of the proposed method. In four
364 iterations, we select one fold from the Sydney dataset as the test set and the other three folds as the
365 training set from the target domain, plus the samples from the Enschede dataset as the complementary
366 training set from the source domain. The evaluation result is then the average of the four iterations, each
367 using a different fold as the test set.
368
369 Table 1 shows the number of samples in each category and each fold. F4 consists of the samples
370 collected in Enschede. This is used as the complementary dataset. Note that from Enschede only those
371 categories were selected which had similar instances to those in Sydney. It can be seen in Table 1 that
372 the number of test samples for truck, ute, four-wheel vehicle, pillar, pole, trunk and building is quite
373 low, which can result in large variance in the classification results for these classes. However, the aim
374 of the experiments is to observe possible improvement in the four classes with complementary samples
375 (pedestrian, traffic sign, traffic light, and tree), for which we have a larger number of test samples across
376 the four folds. Example instances from several categories in the Sydney dataset are shown in Figure 4.
377 Example samples of similar categories in Enschede are shown in Figure 5.
378

13
*

truck car 4wd

building bus

tree pedestrian pole traffic light traffic sign pillar


379
380 Figure 4 Example samples from the Sydney dataset.

381 For the training, we augmented the segments by creating 18 rotated copies around the z axis at equal
382 intervals. We set the voxel size at 𝑝𝑝 = (32,32,32) and the resolution as 0.2 m, which was empirically
383 found appropriate for both datasets. The Hit grid model is applied to determine the occupancy of grid
384 cells, where occupied cells receive a value of 1, otherwise 0. For the training, we selected
385 AdamOptimizer, where the learning rate is set at 0.0008, batch size at 32, number of batches in one
386 epoch at 5000 and number of epochs at 8.
387

tree traffic light pedestrian traffic sign


388
389 Figure 5 Example samples from the Enschede dataset.

390
391 Table 1 Distribution of samples collected in Sydney and Enschede

14
*

Dataset 2
Dataset 1
(Enschede
Category (Sydney Dataset)
Dataset)
Number
Number
F0 F1 F2 F3 F4
4wd 5 6 4 6
building 5 5 5 5
bus 5 3 3 5
car 23 20 21 24
pedestrian 37 36 34 45 27
pillar 6 5 4 5
pole 6 5 4 6
traffic light 10 18 8 11 21
traffic sign 11 18 11 11 50
tree 8 8 8 10 186
truck 3 3 3 3
trunk 14 13 15 13
ute 4 4 4 4
van 9 11 8 7
Total Number 146 155 132 155 284
392
393 4.2 Baseline performance of VoxNet model trained with and without complementary data.
394 In the first experiment, we trained the VoxNet model without TrAdaBoost. The VoxNet model was
395 then seperately trained with the training samples from Sydney and the combined dataset, and each
396 trained network was tested on the Sydney test dataset. The combined dataset comprised both the Sydney
397 and Enschede datasets. Precision and recall values for the VoxNet models trained with and without the
398 complementary dataset are provided in Figure 6. The F1-score for each category is shown in Figure 7.

15
*

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0

Precision(Sydney) Precision(Sydney+Enschede) Recall(Sydney) Recall(Sydney+Enschede)

Figure 6 The precision and recall values for VoxNet models trained with and without the
complementary dataset.
399
400 From the results, it is evident that adding complementary samples from Enschede yields a slight
401 improvement in classes with complementary data. Specifically, the precision, recall and F1 values are
402 higher for traffic sign and tree when complimentary samples are introduced. However, for other
403 categories, such as building, bus, pole and ute, the precision, recall and F1 values decrease. This

1
Table 2 The Macro_F1 and Weighted_ F1
0.9
0.8 scores for VoxNet trained with and without the
0.7
0.6 complementary dataset
0.5
0.4 Macro_F1 Weighted_ F1
0.3
0.2 VoxNet trained by
0.1 0.501 0.668
0
Sydney Dataset
VoxNet trained by
Sydney+Enschede 0.499 0.673
F1-score(Sydney) F1-score(Sydney+Enschede)
Dataset
Figure 7 The F1-score per category for the
VoxNet models trained with and without the
complementary dataset

404 indicates that supplying VoxNet with the complementary dataset directly does not improve the overall
405 accuracy of the trained model. Table 2 shows the Macro_F1 and Weighted_F1 scores for VoxNet
406 trained with and without complementary samples from Enschede. These scores also reveal that the
407 complementary dataset does not improve the overall accuracy of the trained VoxNet model.
408

16
*

409 4.3 Classification performance of Multiclass TrAdaBoost


410 To evaluate the performance of Multiclass TrAdaBoost, we use the VoxNet model trained with the
411 combined dataset as feature extractor and then train the Multiclass TrAdaBoost with training samples
412 from both Sydney and Enschede. The precision and recall measures obtained by the Multiclass
413 TrAdaboost as compared to the VoxNet model are shown in Figure 8. The F1-score per category is
414 shown in Figure 9, and the Macro_F1 and Weighted_F1 scores are listed in Table 3. It can be seen that
415 the Multiclass TrAdaBoost outperforms the VoxNet model in most categories and achieves higher
416 precision, recall and F1 scores. The Multiclass TrAdaBoost algorithm shows better tolerance to the
417 unbalanced dataset and performs well in minority categories too. The complimentary training samples
418 of traffic light, traffic sign, and tree have a larger contribution to the transfer learning performance of
419 the Multiclass TrAdaBoost algorithm. The Macro_F1 and Weighted_F1 scores presented in Table 3
420 also show that the Multiclass TrAdaBoost achieves a higher overall accuracy as compared with VoxNet.
421

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0

Precision(VoxNet+Sydney) Precision(TrAdaBoost+Sydney+Enschede) Recall(VoxNet+Sydney) Recall(TrAdaBoost+Sydney+Enschede)

Figure 8 The precision and recall values for the Multiclass TrAdaBoost trained with the combined
dataset compared with the VoxNet model trained with the Sydney dataset.
422

423

17
*

424

425

Table 3 The Macro_F1 and Weighted_F1 scores


1
0.9 for the Multiclass TrAdaBoost algorithm trained
0.8
0.7
with the combined dataset compared with the
0.6 VoxNet model trained with the Sydney dataset.
0.5
0.4
0.3 Macro_F1 Weighted_F1
0.2
0.1
0 VoxNet trained by
0.501 0.668
Sydney Dataset
Multiclass
F1-score(VoxNet+Sydney) F1-score(TrAdaboost+Sydney+Enschede)

TrAdaBoost
Figure 9 The F1-score per category for the
trained by 0.640 0.742
Multiclass TrAdaBoost algorithm trained with the
Sydney+Enschede
combined dataset compared with the VoxNet
Dataset
model trained with the Sydney dataset

426
427 4.4 AdaBoost vs Multiclass TrAdaBoost with the combined dataset
428 To examine the transfer learning ability of TrAdaBoost we compare its performance with the
429 conventional AdaBoost. Using features extracted by the VoxNet model trained with the combined
430 dataset, we train AdaBoost and Multiclass TrAdaBoost separately using samples from both Sydney and
431 Enschede. For both classifiers, the maximum depth of the boosting algorithms was set at 2, the number
432 of trees at 400, and the learning rate at 1. The precision and recall values obtained for AdaBoost and
433 Multiclass TrAdaBoost are shown in Figure 10. The F1-score per category is shown in Figure 11. The
434 Macro_F1 and Weighted_F1 scores are provided in Table 4.
435
436
437 A comparison shows that the AdaBoost algorithm has limited transfer learning ability compared to the
438 Multiclass TrAdaBoost, which achieves higher accuracies in most of the categories in terms of precision,
439 recall and F1-score. In particular, the precision, recall and F1 values for tree, traffic light and traffic
440 sign are higher for TrAdaBoost. The Macro_F1 and Weighted_F1 values shown in Table 4 also indicate
441 the better overall performance of TrAdaBoost as compared to AdaBoost. This demonstrates that the
442 Multiclass TrAdaBoost can take advantage of complementary data and at the same time avoid the
443 negative influence of samples with dissimilar distributions.

18
*

444

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

Recall(AdaBoost+Sydney+Enschede) Recall(TrAdaBoost+Combined)
Precision(Adaboost+Sydney+Enschede) Precision(TrAdaBoost+Sydney+Enschede)

Figure 10 The precision and recall values for the Multiclass TrAdaBoost algorithm compared with
the conventional AdaBoost algorithm both trained with the combined dataset.
445

Table 4 The Macro_F1 and Weighted_F1


1
0.9 scores for the Multiclass TrAdaBoost algorithm
0.8
compared with the AdaBoost algorithm both
0.7
0.6 trained with the combined dataset.
0.5
0.4 Macro_F1 Weighted_F1
0.3
0.2
AdaBoost
0.1 trained by
0 0.614 0.713
Sydney+Enschede
Dataset
F1-score(Adaboost+Sydney+Enschede) F1-score(TrAdaboost+Combined) Multiclass
TrAdaBoost
Figure 11 The F1-score per category for the
trained by 0.640 0.742
Multiclass TrAdaBoost algorithm compared
Sydney+Enschede
with the conventional AdaBoost algorithm both
Dataset
trained with the combined dataset
446
447 Considering that the AdaBoost performance could be influenced by the complementary data, we also
448 compared the performance of Multiclass TrAdaBoost to that of AdaBoost trained with the Sydney
449 dataset only. Figure 12 shows the overall performance of Multiclass TrAdaBoost compared to VoxNet
450 and AdaBoost trained with and without complementary samples. The Macro_F1 and Weighted_F1
451 scores show that considerable improvement is achieved by Multiclass TrAdaBoost over VoxNet and
452 the conventional AdaBoost algorithm trained with and without the complementary dataset. More
19
*

453 significant improvement can be expected when more source datasets and a larger number of
454 complementary samples are available.
455

VoxNet+Sydney VoxNet+Sydney+Enschede
Adaboost+Sydney Adaboost+Sydney+Enschede
Multiclass TrAdaboost+Sydney+Enschede

0.742
0.721

0.717
0.673
0.668
0.64
0.614
0.591
0.501

0.499

MACRO_F1 WEIGHTED_F1
456
457
458 Figure 12 The comparison of Macro_F1 and Weighted_F1 scores for different trained algorithms.

459
460
461 5. CONCLUSIONS

462 In this paper, we proposed a transfer learning method for the classification of mobile lidar data, which
463 combined the VoxNet network with a new multiclass transfer learning algorithm named Multiclass
464 TrAdaBoost. To evaluate the performance of this framework, we implemented a series of comparison
465 experiments with two datasets, one designated as the target domain and the other as the source domain.
466 We evaluated the performance and transfer learning ability of our proposed algorithm through
467 comparisons with the original VoxNet and AdaBoost algorithms with and without complementary data.
468
469 The results of the comparisons show that our proposed framework outperforms the other models.
470 Specifically, the Multiclass TrAdaBoost achieves the highest overall accuracy with an unbalanced
471 dataset and it can effectively avoid negative transfer learning as compared to AdaBoost. Considering
472 that the instance-based transfer learning method is based on a weighting mechanism, the common
473 limitation for dynamic weight updating approaches is the adjustment of the weights of samples from
474 the source and target domain. Without a standard distance measure of distribution dissimilarity of the
475 source and target domain, it is difficult to decide both the initial weight and the weight updating factor.
476 Another limitation of the Multiclass TrAdaBoost algorithm is that transfer learning happens in the
20
*

477 classification layer only and not during feature learning. The limited transferability of high-level
478 features extracted from the trained VoxNet has been pointed out in previous works (Yosinski et al.,
479 2014). A potential approach to overcome this limitation is to use a distance measure such that the
480 network could be designed to extract features with minimum domain distance.
481
482 The transferability of the proposed Multiclass TrAdaBoost based framework can be improved in two
483 aspects: i) Our source dataset and the number of complementary samples were limited, and future
484 testing of the proposed framework will involve more datasets with a larger number of complementary
485 samples. ii) The proposed framework involves separate training steps for VoxNet and the Multiclass
486 TrAdaBoost classifier. Future work will thus focus on developing an end-to-end transfer learning
487 network to reduce the domain discrepancy at the feature-level.
488
489 REFERENCES

490 Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., 2014. Domain-adversarial neural networks. arXiv
491 preprint arXiv:1412.4446.

492 Al-Stouhi, S., Reddy, C.K., 2011. Adaptive boosting for transfer learning using dynamic updates, Joint European Conference
493 on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 60-75.

494 Allwein, E.L., Schapire, R.E., Singer, Y., 2000. Reducing multiclass to binary: A unifying approach for margin classifiers.
495 Journal of machine learning research 1, 113-141.

496 Bayramoglu, N., Alatan, A.A., 2010. Shape index SIFT: Range image recognition using local features, Pattern Recognition
497 (ICPR), 2010 20th International Conference on. IEEE, pp. 352-355.

498 Bisheng Yang, Yuan Liu, Fuxun Liang, Dong, Z., 2016. Using Mobile Laser Scanning Data for Features extraction of High
499 Accuracy Driving Maps, ISPRS, Czech Republic.

500 Cabo, C., Ordoñez, C., García-Cortés, S., Martínez, J., 2014. An algorithm for automatic detection of pole-like street furniture
501 objects from Mobile Laser Scanner point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 87, 47-56.

502 Cai, G., Wang, Y., Zhou, M., He, L., 2018. Unsupervised Domain Adaptation with Adversarial Residual Transform Networks.
503 arXiv preprint arXiv:1804.09578.

504 Dai, W., Yang, Q., Xue, G.-R., Yu, Y., 2007. Boosting for transfer learning, Proceedings of the 24th international conference
505 on Machine learning. ACM, pp. 193-200.

506 De Deuge, M., Quadros, A., Hung, C., Douillard, B., 2013. Unsupervised feature learning for classification of outdoor 3d
507 scans, Australasian Conference on Robitics and Automation, p. 1.

508 Dietterich, T.G., Bakiri, G., 1994. Solving multiclass learning problems via error-correcting output codes. Journal of artificial
509 intelligence research 2, 263-286.

510 Freund, Y., Schapire, R., Abe, N., 1999. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence
511 14, 1612.

512 Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal
513 of computer and system sciences 55, 119-139.

514 Friedman, J., Hastie, T., Tibshirani, R., 2000. Additive logistic regression: a statistical view of boosting (with discussion and
515 a rejoinder by the authors). The annals of statistics 28, 337-407.

21
*

516 Fukano, K., Masuda, H., 2015. Detection and Classification of Pole-Like Objects from Mobile Mapping Data. ISPRS Annals
517 of Photogrammetry, Remote Sensing and Spatial Information Sciences 2, 57-64.

518 Ganin, Y., Lempitsky, V., 2014. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495.

519 Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V., 2016. Domain-
520 adversarial training of neural networks. The Journal of Machine Learning Research 17, 2096-2030.

521 Golovinskiy, A., Kim, V.G., Funkhouser, T., 2009. Shape-based recognition of 3D point clouds in urban environments,
522 Computer Vision, 2009 IEEE 12th International Conference on. IEEE, pp. 2154-2161.

523 Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., Kwok, N.M., 2016. A comprehensive performance evaluation of 3D
524 local feature descriptors. International Journal of Computer Vision 116, 66-89.

525 Haeusser, P., Frerix, T., Mordvintsev, A., Cremers, D., 2017. Associative domain adaptation, Proceedings of the IEEE
526 International Conference on Computer Vision, pp. 2765-2773.

527 Hastie, T., Rosset, S., Zhu, J., Zou, H., 2009. Multi-class adaboost. Statistics and its Interface 2, 349-360.

528 He, H., Khoshelham, K., Fraser, C., 2017. A two-step classification approach to distinguishing similar objects in mobile
529 LiDAR point clouds. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences 4.

530 Huang, F.-J., LeCun, Y., 2006. Large-scale learning with svm and convolutional nets for generic object categorization, Proc.
531 Computer Vision and Pattern Recognition Conference (CVPR’06).

532 Jing, H., Suya, Y., 2015. Pole-like object detection and classification from urban point clouds, 2015 IEEE International
533 Conference on Robotics and Automation (ICRA), pp. 3032-3038.

534 Khoshelham, K., 2007. Extending generalized Hough transform to detect 3D objects in laser range data, ISPRS Workshop on
535 Laser Scanning and SilviLaser 2007, 12-14 September 2007, Espoo, Finland. International Society for Photogrammetry and
536 Remote Sensing.

537 Khoshelham, K., Oude Elberink, S.J., Xu, S., 2013. Segment-based classification of damaged building roofs in aerial laser
538 scanning data. IEEE Geoscience and Remote Sensing Letters 10, 1258-1262.

539 Klokov, R., Lempitsky, V., 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models, Computer
540 Vision (ICCV), 2017 IEEE International Conference on. IEEE, pp. 863-872.

541 Komarichev, A., Zhong, Z., Hua, J., 2019. A-CNN: Annularly convolutional neural networks on point clouds, Proceedings of
542 the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7421-7430.

543 Krig, S., 2014. Local Feature Design Concepts, Classification, and Learning, Computer Vision Metrics. Springer, pp. 131-
544 189.

545 Lalonde, J.-F., Unnikrishnan, R., Vandapel, N., Hebert, M., 2005. Scale selection for classification of point-sampled 3D
546 surfaces, 3-D Digital Imaging and Modeling, 2005. 3DIM 2005. Fifth International Conference on. IEEE, pp. 285-292.

547 Lam, J., Kusevic, K., Mrstik, P., Harrap, R., Greenspan, M., 2010. Urban scene extraction from mobile ground based lidar
548 data, Proceedings of 3DPVT, pp. 1-8.

549 Le, T., Duan, Y., 2018. Pointgrid: A deep network for 3d shape understanding, Proceedings of the IEEE conference on
550 computer vision and pattern recognition, pp. 9204-9214.

551 Lehtomäki, M., Jaakkola, A., Hyyppä, J., Kukko, A., Kaartinen, H., 2010. Detection of Vertical Pole-Like Objects in a Road
552 Environment Using Vehicle-Based Laser Scanning Data. Remote Sensing 2, 641.

553 Li, D., Elberink, S.O., 2013. Optimizing detection of road furniture (pole-like objects) in mobile laser scanner data. ISPRS
554 Ann. Photogramm. Remote Sens. Spat. Inf. Sci 1, 163-168.

555 Li, N., Hao, H., Gu, Q., Wang, D., Hu, X., 2017. A transfer learning method for automatic identification of sandstone
556 microscopic images. Computers & Geosciences 103, 111-121.

22
*

557 Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B., 2018. Pointcnn: Convolution on x-transformed points, Advances in neural
558 information processing systems, pp. 820-830.

559 Lian, Z., Godil, A., Sun, X., 2010. Visual Similarity Based 3D Shape Retrieval Using Bag-of-Features, Shape Modeling
560 International Conference (SMI), 2010, pp. 25-36.

561 Liu, X., Liu, Z., Wang, G., Cai, Z., Zhang, H., 2018. Ensemble transfer learning algorithm. IEEE Access 6, 2389-2396.

562 Lo, T.-W.R., Siebert, J.P., 2009. Local feature extraction and matching on range images: 2.5 D SIFT. Computer Vision and
563 Image Understanding 113, 1235-1250.

564 Long, M., Cao, Y., Wang, J., Jordan, M.I., 2015. Learning transferable features with deep adaptation networks. arXiv preprint
565 arXiv:1502.02791.

566 Maturana, D., Scherer, S., 2015a. 3d convolutional neural networks for landing zone detection from lidar, Robotics and
567 Automation (ICRA), 2015 IEEE International Conference on. IEEE, pp. 3471-3478.

568 Maturana, D., Scherer, S., 2015b. VoxNet: A 3D Convolutional Neural Network for real-time object recognition.

569 Oquab, M., Bottou, L., Laptev, I., Sivic, J., 2014. Learning and transferring mid-level image representations using
570 convolutional neural networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717-
571 1724.

572 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R.,
573 Dubourg, V., 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, 2825-2830.

574 Piewak, F., Pinggera, P., Schafer, M., Peter, D., Schwarz, B., Schneider, N., Enzweiler, M., Pfeiffer, D., Zollner, M., 2018.
575 Boosting LiDAR-based semantic labeling by cross-modal training data generation, Proceedings of the European Conference
576 on Computer Vision (ECCV), pp. 0-0.

577 Puttonen, E., Jaakkola, A., Litkey, P., Hyyppä, J., 2011. Tree classification with fused mobile laser scanning and hyperspectral
578 data. Sensors 11, 5158-5182.

579 Qi, C.R., Su, H., Kaichun, M., Guibas, L.J., 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation,
580 Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, pp. 77-85.

581 Qi, C.R., Yi, L., Su, H., Guibas, L.J., 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space,
582 Advances in Neural Information Processing Systems, pp. 5099-5108.

583 Ranzato, F.-J.H., Boureau, Y.-L., LeCun, Y., 2007. Unsupervised learning of invariant feature hierarchies with applications to
584 object recognition, Proc. Computer Vision and Pattern Recognition Conference (CVPR’07). IEEE Press.

585 Restrepo, M.I., Mundy, J.L., 2012. An Evaluation of Local Shape Descriptors in Probabilistic Volumetric Scenes, BMVC, pp.
586 1-11.

587 Roveri, R., Rahmann, L., Oztireli, C., Gross, M., 2018. A network architecture for point cloud classification via automatic
588 depth images generation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4176-4184.

589 Schapire, R.E., 1997. Using output codes to boost multiclass learning problems, ICML. Citeseer, pp. 313-321.

590 Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S., 1998. Boosting the margin: A new explanation for the effectiveness of
591 voting methods. The annals of statistics 26, 1651-1686.

592 Schapire, R.E., Singer, Y., 1999. Improved boosting algorithms using confidence-rated predictions. Machine learning 37, 297-
593 336.

594 Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K., 2013. Equivalence of distance-based and RKHS-based statistics
595 in hypothesis testing. The Annals of Statistics 41, 2263-2291.

596 Sun, B., Saenko, K., 2016. Deep coral: Correlation alignment for deep domain adaptation, European Conference on Computer
597 Vision. Springer, pp. 443-450.

23
*

598 Taati, B., Greenspan, M., 2011. Local shape descriptor selection for object recognition in range data. Computer Vision and
599 Image Understanding 115, 681-694.

600 Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C., 2018. A Survey on Deep Transfer Learning. arXiv preprint
601 arXiv:1808.01974.

602 Tombari, F., Salti, S., Di Stefano, L., 2010. Unique signatures of histograms for local surface description, European Conference
603 on Computer Vision. Springer, pp. 356-369.

604 Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., Tong, X., 2017. O-cnn: Octree-based convolutional neural networks for 3d shape
605 analysis. ACM Transactions on Graphics (TOG) 36, 72.

606 Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M., 2018. Dynamic graph CNN for learning on point
607 clouds. arXiv preprint arXiv:1801.07829.

608 Wu, H.-Y., Zha, H., Luo, T., Wang, X.-L., Ma, S., 2010. Global and local isometry-invariant descriptor for 3D shape
609 comparison and partial matching, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, pp.
610 438-445.

611 Wu, P., Dietterich, T.G., 2004. Improving SVM accuracy by training on auxiliary data sources, Proceedings of the twenty-
612 first international conference on Machine learning. ACM, p. 110.

613 Xu, Y., Pan, S.J., Xiong, H., Wu, Q., Luo, R., Min, H., Song, H., 2017. A unified framework for metric transfer learning. IEEE
614 Transactions on Knowledge and Data Engineering 29, 1158-1171.

615 Yang, B., Dong, Z., Zhao, G., Dai, W., 2015. Hierarchical extraction of urban objects from mobile laser scanning data. ISPRS
616 Journal of Photogrammetry and Remote Sensing 99, 45-57.

617 Yang, J., Yan, R., Hauptmann, A.G., 2007. Cross-domain video concept detection using adaptive svms, Proceedings of the
618 15th ACM international conference on Multimedia. ACM, pp. 188-197.

619 Yokoyama, H., Date, H., Kanai, S., Takeda, H., 2011. Pole-like objects recognition from mobile laser scanning data using
620 smoothing and principal component analysis, ISPRS Workshop, Laser scanning, pp. 115-121.

621 Yokoyama, H., Date, H., Kanai, S., Takeda, H., 2013. Detection and classification of pole-like objects from mobile laser
622 scanning data of urban environments. International Journal of CAD/CAM 13, 1-10.

623 Yosinski, J., Clune, J., Bengio, Y., Lipson, H., 2014. How transferable are features in deep neural networks?, Advances in
624 neural information processing systems, pp. 3320-3328.

625 Zhang, Z., Hua, B.-S., Yeung, S.-K., 2019. Shellnet: Efficient point cloud convolutional neural networks using concentric
626 shells statistics, Proceedings of the IEEE International Conference on Computer Vision, pp. 1607-1616.

627 Zhou, Y., Tuzel, O., 2017. Voxelnet: End-to-end learning for point cloud based 3d object detection. arXiv preprint
628 arXiv:1711.06396.
629

24
Minerva Access is the Institutional Repository of The University of Melbourne

Author/s:
He, H;Khoshelham, K;Fraser, C

Title:
A multiclass TrAdaBoost transfer learning algorithm for the classification of mobile lidar data

Date:
2020-08

Citation:
He, H., Khoshelham, K. & Fraser, C. (2020). A multiclass TrAdaBoost transfer learning
algorithm for the classification of mobile lidar data. ISPRS Journal of Photogrammetry and
Remote Sensing, 166, pp.118-127. https://doi.org/10.1016/j.isprsjprs.2020.05.010.

Persistent Link:
http://hdl.handle.net/11343/242304

You might also like