Real-Time Beacon Identification Using Linear and Kernel (Non-Linear) Support Vector Machine, Multiple Kernel Learning (MKL), and Light Detection and Ranging (LIDAR) 3D Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

PROCEEDINGS OF SPIE

SPIEDigitalLibrary.org/conference-proceedings-of-spie

Real-time beacon identification using


linear and kernel (non-linear) Support
Vector Machine, Multiple Kernel
Learning (MKL), and Light Detection
and Ranging (LIDAR) 3D data

Tasmia Reza, Lucas Cagle, Pan Wei, John E. Ball

Tasmia Reza, Lucas Cagle, Pan Wei, John E. Ball, "Real-time beacon
identification using linear and kernel (non-linear) Support Vector Machine,
Multiple Kernel Learning (MKL), and Light Detection and Ranging (LIDAR) 3D
data," Proc. SPIE 10988, Automatic Target Recognition XXIX, 1098815 (14
May 2019); doi: 10.1117/12.2518714

Event: SPIE Defense + Commercial Sensing, 2019, Baltimore, Maryland,


United States

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


Real-time beacon identification using linear and kernel
(non-linear) Support Vector Machine, Multiple Kernel
Learning (MKL) and Light Detection and Ranging (LIDAR)
3D data
Tasmia Rezaa , Lucas Cagleb , Pan Weic , and John E. Ballb
a
Holcombe Department of Electrical and Computer Engineering, Clemson University,
Clemson, SC 29634, USA
b
Department of Electrical and Computer Engineering, Mississippi State University, 406 Hardy
Rd., Mississippi State, MS 39762, USA
c
Amazon, 101 Main St., Cambridge, MA 02142, USA.

ABSTRACT
The target of this research is to develop a machine-learning classification system for object detection based on
three-dimensional (3D) Light Detection and Ranging (LiDAR) sensing. The proposed real-time system operates
a LiDAR sensor on an industrial vehicle as part of upgrading the vehicle to provide autonomous capabilities.
We have developed 3D features which allow a linear Support Vector Machine (SVM), Kernel (non-linear) SVM,
as well as Multiple Kernel Learning (MKL), to determine if objects in the LiDARs field of view are beacons (an
object designed to delineate a no-entry zone) or other objects (e.g. people, buildings, equipment, etc.). Results
from multiple data collections are analyzed and presented. Moreover, the feature effectiveness and the pros and
cons of each approach are examined.
Keywords: Advanced Driver Assistance Systems, industrial vehicle control, support vector machine, object
detection, LiDAR, multiple-kernel learning, machine learning, real–time system .

1. INTRODUCTION
In an industrial environment it is often required to keep certain areas off limits to vehicles for the protection of
people and valuable assets. Herein, a system is developed to use an eight–beam Quanergy M8 LiDAR sensor
on an industrial vehicle to quarantine certain areas via a highly–reflective passive device (called a beacon) used
to delineate a no-entry area. A LiDAR can readily detect these beacons but suffers from false positives due to
other reflective surfaces such as worker safety vests or vehicles. The detection system is applied in a real–time
scenario on a LiDAR attached to an industrial vehicle. The data used for training and testing the system were
collected by graduate students from the Electrical and Computer Engineering department at Mississippi State
University.
The contributions of this paper include: (1) Simple features that can be used to distinguish beacons from
other objects, (2) validation and evaluation of a real–time implementation of this system on an industrial vehicle
in an industrial setting.
The rest of the paper is organized as follows: Section 2 discusses background. Section 3 discusses the project
methodology. Section 4 discusses results and section 5 draws conclusions.
Further author information: (Send correspondence to J.E.B.)
J.E.B.: E-mail: jeball@ece.msstate.edu, Telephone: 1 662 325 4169

Automatic Target Recognition XXIX, edited by Riad I. Hammoud, Timothy L. Overman, Proc. of SPIE
Vol. 10988, 1098815 · © 2019 SPIE · CCC code: 0277-786X/19/$18 · doi: 10.1117/12.2518714

Proc. of SPIE Vol. 10988 1098815-1


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
2. BACKGROUND
LiDAR detection offers some advantages as well as some disadvantages in classification tasks. The main obstacle
is the sparse 3D point cloud, which makes it tough to classify and detect objects reliably due to their changing
nature depending on the relative position of the object from the LiDAR. There are several examples from
the literature where LiDAR 3D point cloud data is processed for classification. One approach1 introduced an
improved eigen–feature analysis of weighted covariance matrices with a Support Vector Machine (SVM) classifier
to classify data in urban areas from airborne LiDAR data. Golovinskiy et al. classified feature vectors for each
candidate object with respect to a training set of manually labeled object locations.2 Wellhausen et al.3 uses
an environmental change detection pipeline to perform in real-time on distorted 3D point clouds with slow
acquisition rate in cluttered environments. Most of the available methods on LiDAR data classification are
implemented on a high–density (high beam count) LiDAR. We are using eight–beam Quanergy M8 LiDAR
which collects comparatively sparse point cloud data. Due to the heightened sparsity, these methods could not
be successfully implemented in our system with an eight–beam LiDAR.

3. METHODOLOGY
In general, objects appear at different scales based on their distance to the LiDAR and their angular value can
play a significant role in detection. Our method detects beacons based on their particular geometry. Herein, we
developed a passive barrier termed a beacon. The beacon delineates a no–entry area. The beacon is constructed
of a regular orange traffic cone with a long pole attached at its top. The traffic cone is orange colored and
28” high with vertical strips of reflective tape affixed. It has a 2” diameter highly–reflective pole extending the
beacon to two meters in height. The beacon is shown in fig. 1.

Figure 1: Beacon.

The beacon presents in a LiDAR scan as a set of high intensity values. It provides a high reflectivity return
compared to most background objects. However, other reflective objects can also have bright returns such as
people wearing safety vests with retro-reflective tape, or other industrial vehicles. To implement a real–time
beacon detection system, the following steps are performed: (1) ground points are removed from the LiDAR
data, (2) density–based and bright–return–based clustering is performed, (3) features are extracted from each
dense cluster, and (4) a Support Vector Machine (SVM) classifies beacons or non–beacons. The following sections
discuss each of these steps in detail.

Proc. of SPIE Vol. 10988 1098815-2


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
3.1 Ground Point Removal
The major issue with the ground points is that they create false positives from the reflections of highly reflective
paint or other reflective objects on the ground. Moreover, ground points waste processing time, since the beacon
will present as an above–ground set of points in the point cloud. In addition, it can interfere with feature
extraction. There are many examples of ground point removal from areas with varying ground conditions. Since
this industrial application has a smooth, flat area for the ground, we employed a simple vertical threshold to
remove ground points. Other algorithms could be used if the environment was more complex.4–7

3.2 Clustering
In this approach, after ground point removal, the bright (e.g. high-intensity) points are clustered. The idea is
that most surfaces return very low–intensity points, so the beacon, which has a tall, thin, highly–reflective pole,
returns many high–intensity points. In this manner, the clutter is significantly reduced except for certain other
objects such as people wearing safety vests with retro-reflective markers. The point intensities are compared to an
empirically determined threshold value of intensity TH = 15. This value was determined by examining intensity
levels in thousands of both beacon and non–beacon returns. The ground threshold was set to TG = −1.18 m,
due to the LiDAR mounting height on the industrial vehicle. The negative sign indicates below the LiDAR.

Input: LiDAR point cloud P = {xj , yj , zj , ij , rj } with N P points.


Input: High–intensity threshold: TH .
Input: Cluster distance threshold:  (meters).
Input: Ground Z threshold: TG (meters).
Remove all non–return points (NaN’s).
Remove all points with intensity < TH .
Remove all ground points (Z–values below TG ).
Cluster bright points:
for each point Pj in the modified point cloud do
if The point does not belong to a cluster then
Add point to cluster
Increment the number of clusters: cl ← cl + 1.
Assign the point to cluster cl.
Set the centroid of cluster cl to the point.
Scan through all remaining points and re–cluster if necessary.
if Distance from point to centroid of cluster cl <  then
Add Pm to cluster cl and recalculate centroid. Recalculate centroid of cluster cl.
end
end
end
Algorithm 1: LiDAR bright pixel clustering.

For our data processing we have used a modified DBSCAN clustering algorithm8 which clusters based on point
cloud density as well as intensity to cluster the bright points, as shown in algorithm 1. The cluster parameter 
was experimentally determined to be 0.5m based on the structure of the beacon. It was chosen as 0.5m because
values larger than that could group two nearby beacons (or a beacon and another nearby reflective object)
together into one cluster, which is undesirable. In Algorithm 1, the distances are estimated using Euclidean
distances with only the x (front-to-back) and y (left–to–right) coordinates. This algorithm clusters bright
LiDAR points by projecting them down onto the x − y plane. This approach is more computationally efficient
than using all three coordinates in the clustering algorithm.

3.3 Feature Extraction


The point intensities are compared to TH . The beacon provides brighter LiDAR returns compared to most
other objects in the industrial setting. However, there are also objects that have high returns, such as other

Proc. of SPIE Vol. 10988 1098815-3


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
industrial vehicles with retro–reflective markings, shiny surfaces of the vehicles or workers wearing safety vests
with retro–reflective stripes.
In order to classify objects as beacons and non–beacons, special hand–crafted features are utilized. Due
to processing limitations and the system requirements to operate the LiDAR processing at 5 Hz, the feature
extraction and subsequent LiDAR processing must not only be computationally simple but also effective.
After clustering the beacons appear as tall, thin objects, whereas all other objects are either not as tall or
are tall and wider. People with reflective vest have closer structure to a beacon, but a beacon is (usually) much
taller and does not have the horizontal extent that people do. Features are extracted around the cluster center
in a small rectangular volumes centered at each object’s centroid. Another feature extraction is done using a
larger rectangular volume which also centers around the object’s centroid. The idea is that the beacon mainly
has points in the inner rectangular area, while other objects will have points extending in the areas outside of
the inner area. These features include the number of bright points in each region, determining the x, y, and z
extents (e.g. max value minus min value) of the points in each region, etc. Beacons mainly have larger values in
the smaller region, while other objects also have values in the larger regions. Reference Figure 2 for a top–down
illustration of the inner and outer regions. The inner analysis region has depth (x coordinate) of 0.5 meters,
width (y coordinate) of 0.5 meters, and the height includes all points with z coordinate values of -1.18 meters
and above. The outer region extends 2.0 meters in both x and y directions and has the same height restriction
as the inner region. These values were empirically determined based on the dimensions of the beacon and based
on the LiDAR height.

Figure 2: LiDAR detection regions (inner and outer) visualized from a top–down view.

Table 1 describes the extracted features. The features are extracted using algorithms 2, with parameters
TL = 0, TH = 15, and TG = −1.18 m, and algorithm 3, with parameters ∆xI = 0.5 m, ∆yI = 0.5 m, ∆xO = 2.0
m, ∆yO = 2.0 m, and ZL = 1.4 m. Herein, extent means the maximum value minus the minimum value, e.g. Z
extent is max {Z} - min {Z}. It is noted that many features were examined and they each had different abilities
to discriminate the beacons from non–beacons. In the table, the features are derived from either the High
Threshold (HT) or Low Threshold (LT) point clouds. In this work, the Low Threshold point cloud consisted of
all points with an intensity higher than TL , basically allowing all detected LiDAR points in the analysis regions
to be analyzed. The High Threshold point cloud used TH as the intensity threshold. The rationale for this is
some objects, such as humans with retroreflective safety vests, present to the LiDAR as points with both low
and high intensities. For the LiDAR, assuming the LiDAR is level with the ground, beam 7 is the upper–most
beam, pointed about 3 degrees above the horizontal plane, beam 6 would be in the horizontal plane, and beam
0 lower at roughly 18 degrees below the horizontal. The beam spacing is approximately 3 degrees per beam.
To determine the best features, a simple (but sub–optimal) feature selection process was performed. Each
feature was evaluated on the training set for its ability to distinguish beacons from non–beacons using a measure
of classifier performance, which is a score from 0 to 1,000, where higher numbers indicate better performance.
This score is calculated as score = 500 T PT+F
P TN
N + 500 T N +F P , where T P , F P , T N and F N are the number of

Proc. of SPIE Vol. 10988 1098815-4


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Input: LiDAR point cloud P = {xj , yj , zj , ij , rj }.
Input: Low and high–intensity thresholds: TL and TH .
Input: Ground Z threshold: TG (meters).
Output: Feature vector f.
Remove all non–return points (NaN’s).
Remove ground points: Remove points with Z < TG .
Create threshold point clouds:
Set PHT = ∅ and PLT = ∅.
for each point Pj in the modified point cloud do
if Point Pj has intensity ≥ TH then
Add Pj to PHT .
end
if Point Pj has intensity ≥ TL then
Add Pj to PLT .
end
end
Extract features f using Algorithm 3.
Algorithm 2: LiDAR high–level feature extraction preprocessing.

Input: LiDAR high–intensity point cloud PHT = {xj , yj , zj , ij , rj }.


Input: LiDAR low–intensity point cloud PLT = {xj , yj , zj , ij , rj }.
Input: Inner region x and y extents: ∆xI and ∆yI (meters).
Input: Outer region x and y extents: ∆xO and ∆yO (meters).
Input: LiDAR height above ground: ZL (meters).
Output: Feature vector f.
Cluster the high–intensity point cloud:
Calculate features:
for each cluster center point c = (xC , yC , zC ) in the point cloud do
Determine points in PHT in inner region and calc. features 1,13 and 17 from Table 1.
Determine points in PHT in the outer region and calc. feature 4 from Table 1.
Determine points in PLT in the inner region and calc. features 6,7,9,10,11,14,16 and 18 from Table 1.
Determine points in PLT in the outer region and calc. features 2,3,5,8,12,15,19 and 20 from Table 1.
end
Return f = [f1 , f2 , f3 , · · · , f20 ].
Algorithm 3: LiDAR feature extraction.

Proc. of SPIE Vol. 10988 1098815-5


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Table 1: Feature Descriptions. HT/LT=High/Low threshold data. Region:I=Inner,O=Outer. Pts=Points.
Refer to fig. 2 for the region (I=Inner,O=Outer).
Feat. Subset Region Description
1 HT I Extent of Z in cluster.
2 LT O Max X,Y extents in cluster, beam 7.
3 LT O Max X,Y extents in cluster, beam 5.
4 HT O Max Z in cluster - LiDAR height.
5 LT O Extent of Z in cluster.
6 LT I Pts. in cluster, beam 7.
7 LT I Max X,Y extents in cluster, beam 6.
8 LT O Pts. in cluster, beam 5.
9 LT I Extent of X in cluster.
. 10 LT I Pts. in cluster, beam 4.
11 LT I Pts. in cluster, beam 5.
12 LT O Pts. in cluster, beam 6.
13 HT I Pts. in cluster, beam 6.
14 LT I Max of X and Y extents, beam 5.
15 LT O Pts. in cluster / cluster radius, beam 5.
16 LT I Extent X in cluster.
17 HT I Pts. in cluster, beam 7.
18 LT I Pts. in cluster.
19 LT O Pts. in cluster.
20 LT O Extent of Z in cluster.

true positives, false positives, true negatives and false negatives, respectively.9 The score increases as T P and
T N increase, and decreases with bad decisions, which cause F P or F N to increase. It is worth noting that using
only overall accuracy caused the system to produce poor results. The score heavily penalized both false alarms
(non–beacon objects being classified as beacons) and misses (missed beacons). After the features are scored,
they are sorted by ascending order of score, and concatenated one at a time until the desired level of performance
is achieved.

3.4 Classification
Herein, three methods are proposed to determine beacon presence. The first method is a linear Support Vector
Machine (SVM). The Matlab toolbox liblinear10 is used for this processing. The linear SVM learns an optimal
separating hyperplane from the training data and this decision boundary is then applied to the testing data.
If the data are non–linearly separable, slack variables can be used. However, if the data are highly nonlinear,
then the linear SVM will provide sub–optimal results. The second and third approaches are to utilize kernel
machines, which project the data into a nonlinear space (when designed correctly this space will make the data
nearly linearly separable) and then use a linear SVM in the projected space. Herein, multiple kernel learning
(MKL) was utilized as the non-linear supervised classifier.11 These methods are each applied to the extracted
features described in Table 1. MKL has various forms, and in this work, single-kernel MKL and MKL with
Group Lasso (MKLGL) were used. In single-kernel MKL, Radial Basis Function (RBF) kernels were utilized
and the system optimized the kernel parameter, the RBF standard deviation. MKLGL uses a system where the
weights and classification surface are simultaneously solved by iteratively optimizing a min-max optimization
until convergence. The online library of Anderson et al.12 was used for both of the kernel SVM methods.

4. RESULTS AND DISCUSSION


The results for each method (using 20 features) are shown in Table 2 and in fig. 3. In this figure, if the number
of features is k, then we use the concatenation of the k best–performing features (as ranked by the scores). That
is, when k = 1, there is only the best performing feature; when k = 2, only the top two features are utilized, etc.

Proc. of SPIE Vol. 10988 1098815-6


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
From the figure, the MKLGL and Single Kernel MKL results are almost the same, and better than the linear
SVM in all cases. The results show the MKLGL as the best performing solution of those tried. It is worth noting
that there were 134 features in total that were examined, but we are only reporting the top 20 in this paper,
since this was implemented in the real–time system on an NVIDIA Jetson TX 2 processor with only the top 20
features used. The MKLGL and single kernel methods can have similar performance using about 6-8 features
as the linear SVM with 20 features. Fig. 4 shows PDFs of the linear SVM discriminant values. The decision
threshold is zero. This figure shows that the data is not quite linearly separable, although there is very little
overlap at the decision boundary. In this figure, beacon–like objects have more negative discriminant values,
while non–beacon–like objects have more positive values.

Table 2: Overall Accuracy (OA) results.


SVM Single Kernel MKLGL
OA % 98.23 98.34 99.71

Figure 3: Overall accuracy with respect to number of features for linear SVM, single Kernel and MKLGL.

5. CONCLUSIONS
The Quanergy M8 LiDAR collects very sparse point cloud data as it has only eight beams compared to other
higher beam–count LiDARs. Another challenge is implementing the system in real–time. Only the optimal
twenty features for the SVM are used as it achieves similar accuracy as all the features combined with the linear
SVM. The Linear SVM is the quickest to calculate of the methods tried as the overall discriminant function
is only a dot product of the SVM weights with the feature vector plus a bias value. Both kernel methods
outperformed the linear method and allowed similar performance using a smaller set of features. Thus, all three
methods are viable. We note that different results will be found for different datasets, as the method of using
linear or kernel SVMs is general. In this case, the features were nearly linearly separable with about 20 features,
while better results were achieved using a much smaller number with the kernel SVMs. In our case, we were
able to implement the real–time system and operate the LiDAR and all of the detection processing at 5 Hz when
using a linear SVM and 20 features.
Future work includes expanding this work to classify multiple objects i.e., people, vehicles, etc. detection. In
these experiments, the system lacked the available computational power to implement a deep learning system
for LiDAR-based classification. Future work would examine hardware and Graphical Processing Units (GPUs)
capable of implementing deep learning methods for the LiDAR point clouds.

Proc. of SPIE Vol. 10988 1098815-7


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Figure 4: Linear SVM discriminant values PDF.

REFERENCES
[1] Lin, C.-H., Chen, J.-Y., Su, P.-L., and Chen, C.-H., “Eigen-feature analysis of weighted covariance matrices
for lidar point cloud classification,” ISPRS Journal of Photogrammetry and Remote Sensing 94, 70–79
(2014).
[2] Golovinskiy, A., Kim, V. G., and Funkhouser, T., “Shape-based recognition of 3d point clouds in urban en-
vironments,” in [Computer Vision, 2009 IEEE 12th International Conference on ], 2154–2161, IEEE (2009).
[3] Wellhausen, L., Dubé, R., Gawel, A., Siegwart, R., and Cadena, C., “Reliable real-time change detection and
mapping for 3d lidars,” in [2017 IEEE International Symposium on Safety, Security and Rescue Robotics
(SSRR) ], 81–87, IEEE (2017).
[4] Meng, X., Currit, N., and Zhao, K., “Ground filtering algorithms for airborne LiDAR data: A review of
critical issues,” Remote Sensing 2(3), 833–860 (2010).
[5] Rummelhard, L., Paigwar, A., Nègre, A., and Laugier, C., “Ground estimation and point cloud segmentation
using SpatioTemporal Conditional Random Field,” in [Intelligent Vehicles Symposium (IV), 2017 IEEE ],
1105–1110, IEEE (2017).
[6] Rashidi, P. and Rastiveis, H., “Ground filtering lidar data based on multi-scale analysis of height difference
threshold,” ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information
Sciences XLII-4/W4, 225–229 (2017).
[7] Chang, Y., Habib, A., Lee, D., and Yom, J., “Automatic classification of lidar data into ground and non-
ground points,” International archives of Photogrammetry and Remote Sensing 37, 463–468 (2008).
[8] Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al., “A density-based algorithm for discovering clusters in
large spatial databases with noise.,” in [Kdd], 96, 226–231 (1996).
[9] Lillywhite, K., Lee, D.-J., Tippetts, B., and Archibald, J., “A feature construction method for general object
recognition,” Pattern Recognition 46(12), 3300–3314 (2013).
[10] Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J., “Liblinear: A library for large linear
classification,” Journal of machine learning research 9(Aug), 1871–1874 (2008).
[11] Pinar, A. J., Rice, J., Hu, L., Anderson, D. T., and Havens, T. C., “Efficient multiple kernel classification
using feature and decision level fusion,” IEEE Transactions on Fuzzy Systems 25(6), 1403–1416 (2017).
[12] Anderson, D., Keller, J., and Chan, C. S., “Fuzzy integral and computer vision library,” (2017).

Proc. of SPIE Vol. 10988 1098815-8


Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 25 May 2019
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

You might also like