Professional Documents
Culture Documents
Autonomous Design Report ASURT FS AI 2021 v4
Autonomous Design Report ASURT FS AI 2021 v4
net/publication/353314517
CITATIONS READS
0 263
15 authors, including:
Salma Ibrahim
Ain Shams University
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Michael Samy Hannalla on 17 July 2021.
Michael Hannalla, Haidy Sorial, Ahmed Tarek, Salma Ibrahim, Ahmed Hesham, Islam Ayman,
Laila Elgenedi, Abdelrahman Ayman, Ahmed Samy, Omar Fadl, Kareem Alsawah,
Aly Alkady, Nesma Walid, Hassan Amr, Moataz Elashery
Abstract—Self driving cars field is a very active area of together to be allowed to communicate using an Ethernet
research that impacts and benefits everyday life on roads. This network.
document demonstrates the phases undergone by our team to
develop an autonomous vehicle, to participate and compete III. P ERCEPTION
in the 2021 Formula Student AI competition. We include the
state-of-the-art schemes we studied and the diverse methods we This section demonstrates the different cone detection
endeavored to approach our problem. pipelines and utilities we used in our system, mentioning both
laser based and vision based methods.
D ISCLAIMER
This paper is written as an autonomous design report to be A. LiDAR Pointcloud Pre-processing
submitted to FS-AI judges in Formula Student competition. 1) Field of View Trimming: Due to the LiDAR’s placement
Content may include mentioning trademarks and manufactur- on the vehicle and the wide surrounding, there are too many
ers for sensors, computers, or software. These manufacturers irrelevant points which we need to get rid of. We start by
do not supervise or endorse any of the content in this paper. applying pass-filters to trim the field-of-view, a longitudinal
to remove the vehicle points and anything far away, and a
I. I NTRODUCTION lateral one to limit the side field.
The aim of this work is to provide state of the art approaches 2) Adaptive Ground Removal: After applying the FOV
in solving the problems facing autonomous driving starting trimming, we need to filter out the ground points. We apply an
from the generalized perception in most of weather conditions adaptive algorithm that acclimates to changes in the inclination
and robust to the scale of the image semantic content moving of the ground. We use a RANSAC algorithm to segment the
through providing the car with conscious strategic maneuver ground plane from the cones’planes; treating the ground points
when required. The work would be tested against standards as outliers and leaving us with the non-ground points.
benchmarks in measuring the performance of self-driving 3) Clustering: For point cloud clustering, we made a choice
vehicles and partnered with the industrial pioneers in the field using the Euclidean distance Clustering Algorithm by the
of autonomous vehicles. means of a K-D tree as the base spatial locator class for
nearest neighbor estimation. This approach is implemented by
II. S YSTEM OVERVIEW subdividing the space into boxes of fixed widths, or in a more
This section provides an overview on the hardware and general case an octree data structure. This method is very fast
sensors used, high level architecture of our autonomous system to build and gives us a useful representation of the data in
from the very high level perception to the lowest level of every resultant 3D-box.
control and commands sent to the ADS-DV1 via the CAN 4) Cone Reconstruction: The next reasonable step is to
bus. Fig. 1 shows the high level architecture of our software. reconstruct the clusters obtained, as carrying out the ground
We use Velodyne VLP-16 LiDAR, FLIR Backfly S monocular removal process causes the loss and elimination of some
camera, and ZED stereo camera for perception. Along with an cone points along with the ground points.Unfortunately, this
IMU, GPS, and feedback from wheel and steering encoders dwindles the already modest number of points used in cone
for state estimation. Perception sensors are used in different detection. This is solved by restoring a cylinder-shaped surface
pipelines for cone detection and drivable space estimation. area of points with a diameter equal to that of the cone width
State estimates and cone detections are fed to the SLAM around each cluster using points of the antecedent point-cloud
module that fuses all these together to output an optimal state (before ground removal). This way, we can assure that most of
estimate and a map. The map and pose estimate are then taken the mistakenly revoked points are retrieved, so our detection
by the planning module to output waypoints that the navigation and colour pattern estimation processes can transpire smoothly
module sends control commands via the CAN bus to track and and more accurately.
follow these waypoints.
1 ADS-DV is the shared formula vehicle that FS-AI DDT (Dynamic Driving
All of our software is based on ROS installed on Ubuntu.
Task) competitors use to deploy their autonomous systems on during the
Our computing devices are the prefitted InCarPC and an competition. This vehicle has a prefitted ZED camera and an InCarPC
NVIDIA Jetson TX2. Computers and sensors are connected computer.
Fig. 1. High-level software architecture
5) Filtration: After the cluster reconstruction, each cluster specific problem, we train the model using open source racing-
should be passed through two filters to ascertain the fact that cones dataset, which we modified to also differentiate between
all the clusters used for detection are truly cones. This is different cone-colours.
done by the means of two filters; the Rule Based Filter and
the Z-Centroid Filter. The former depends on calculating C. Mono-LiDAR Cone Detection Pipeline
the expected number of points in a cone according to its In this section, we fused between two sensors: a camera
distance from the vehicle, cone dimensions, and the LiDAR and LiDAR. The LiDAR as a stand-alone sensor gets accurate
specifications, then comparing this value to the substantial depth and localization, while the camera as a stand-alone
cluster number of detected points. If the difference between sensor gets accurate colors. By fusing the two sensors, we
the two numbers does not exceed a certain threshold, then leverage the measurements coming out from the two sensors
the cluster passes by the latter filter. Since all cones are of individually. We mainly rely on projection of the LiDAR
equal dimensions, consequently they should all have the same points on the camera’s image plane to get the pose of the
centroid position. However, the variance in their distances cones and their colors.
from the vehicle results in having a certain range for the Let’s denote C as the incoming clusters from the LiDAR
centroid values instead of one specific distinct number. This pre-processing step as a stacked matrix of column vectors and
filter assures that all cone cluster centroids lie within that each column C i as i-th cluster in the incoming clusters from
range. Otherwise, the cluster is not regarded as a cone and the LiDAR pre-processing step.
it does not undergo the colour estimation process. Where L C are the clusters given in the LiDAR’s coordinate
frame and N represents the number of points in the point
B. Image Object Detection cloud. We then need to transform the points from the LiDAR
As the tracks are delineated by blue, yellow and different frame to the camera frame using simple homogeneous trans-
size orange cones, our goal is to localize the cones and formation as follows:
differentiate between their different colours with high accuracy C
C = C TL ∗ L C
and low latency for safe maneuvering through the track. We
tackle the detection challenge using the state-of-the-art model, Where C C are the clusters transformed to the camera’s
YOLOv3 [1] [2] architecture. To have our model fit our coordinate frame and C TL ∈ SE(3) represents the homo-
geneous transformation between the LiDAR frame and the
T T T
camera frame. [v̇x , v̇y ] = [ax , ay ] + ψ̇ [vy , −vx ] + nv
We then need to project these points from the 3D camera ψ̇ = nψ̇ (1)
frame to the image plane as follows (subject to matrix broad- ȧ = na
casting and vectorization):
For the correction step of the EKF, multiple sensors were
U 1 fx 0 X c used to correct the state estimates. GPS was used to correct
=Z + x
V 0 fy Y cy position as well as the local longitudinal and lateral velocities
Where U is a stacked row vector for u-coordinates for of the vehicle with the following measurement model.
clusters projected on the image plane, V is another stacked row
vector for v-coordinates for clusters projected on the image Pxgps Px
plane, f x is x-axis focal length, f y is y-axis focal length, cx Pygps Py
vxgps = vx cos ψ − vy sin ψ
+ ngps (2)
and cy are u-coordinate and v-coordinate of the center pixel
on the image plane. vygps vx sin ψ + vy cos ψ
Since we already have both the bounding box coordinates
A magnetometer was used to correct vehicle yaw estimate.
- from the object detector module - and the projected LiDAR
Before correction, calibration is done for a variable duration
clusters on the image plane, we use bounding box coordinates
to register initial heading measurement by averaging all the
to find which cluster from the LiDAR are belongs to the
readings within this duration and use it as reference for later
bounding box. By reaching this points, we have got the 3D
estimates. This will ensure that magnetometer heading is
coordinates of the object defined inside the bounding box.
always zero at starting point.
D. Stereo Cone Detection Pipeline LiDAR scans were also used to achieve odometry through
In order to localize cones in the world frame, this pipeline Advanced LOAM implementation based on the paper [3] by
has two main components; first of them being the cones’ J. Zhang and S. Singh. Additionally, visual odometry from
bounding boxes and the second being the disparity map. Since ZED stereo camera was also used to correct vehicle’s position
the bounding boxes, accompanied with its colour, are already & yaw angle to overall provide a reliable and redundant
obtained from object detection module, this section focuses on system for estimating vehicle states.
obtaining 3D coordinates of the cones.
We start by rectifying the left and right images to remove In order to achieve sensor redundancy and protect the
any distortion. Then disparity matching is computed with autonomous system from sensor failure, wheel encoders were
reference to the left frame with a confidence score for every used to correct local longitudinal and lateral vehicle velocities
disparity-pixel. For every cone we obtain a disparity region as well as yaw rate through the implementation of the tire slip
by indexing the disparity map using the cone’s bounding model [4] but with a constant slip assumption.
box coordinates. The disparity region may contain different
values for the cone so we choose the pixel with the highest B. SLAM
confidence score as it being the most reliable. Using disparity As our system has to navigate autonomously in an unseen
value, bounding box center coordinates and intrinsic camera environment; we don’t know the shape or length of the race-
parameters we calculate the cone extrinsic parameters by using track. This leads us to deploy SLAM algorithms in our system.
the stereopsis equations. Our system needs to map cones of the track and estimate the
vehicle position within these landmarks of interest (cones).
IV. S TATE E STIMATION Our algorithm is based on FastSLAM 2.0 algorithm proposed
in [5] which is a probabilistic approach to SLAM based on the
A. Motion Estimation concept of Rao-blackwellization. Rao-blackwellization is the
splitting of the SLAM problem into pose estimation problem
A standard extended Kalman filter is used to predict
and independent landmarks estimation using N number of
and hcorrect the system states. The proposed state vector
iT Kalman filters, one for each landmark as its position state
x = Px , Py , ψ, vx , vy , ψ̇ ∈ R6 where Px and Py are the estimator. Where N is the number of mapped landmarks
vehicle’s position, vx and vy are the longitudinal & lateral (cones).
velocities respectively, and ψ is vehicle’s yaw angle. Due to the challenges imposed by FS-AI, we modified
the base FastSLAM in the literature to suit our needs and
In our proposed system, a constant acceleration process requirements.
model is used since jerk is close to a zero mean Gaussian First, we made our landmark estimator estimate, not only
distribution. Hence, the positions and velocities are propagated landmark’s position, but also estimate its color - yellow cone or
using the acceleration and yaw rate from the IMU, resulting blue cone for example in our case - and account for perception
in the following process model. uncertainties. Landmark’s color matters in our system because
we then pass this map to the planning module that decides triangulation is also used in constructing an edges matrix,
which path to follow. which is a matrix consisting of all pairs of cones making
Also, as the main core for SLAM to provide correct the midpoints. The midpoints matrix and the edges matrix are
estimates for landmarks and poses, the vehicle should be able synchronized together and used in both the cost function and
to decide if its perception system is seeing again a previously the constraints.
mapped landmark. We used the Mahalanobis distance metric
to decide whether this is a new landmark or a landmark previ- C. Cost Parameters
ously mapped and being seen again, we chose the Mahalanobis
Those midpoints are then given costs and passed through a
distance to account for measurement uncertainties.
cost function so that the midpoints with the lowest costs are
For loop closure, we exploit the prior knowledge that the
chosen as goal way-points. Those way-points are then passed
race track starts and ends by a special type of cones (mentioned
to the navigation module.
in FS-AI rule-book). Once we detect these cones and localize
1) Distances: The first cost parameter is the distance be-
them, we can know that this is where the zero position of
tween each midpoint and the vehicle pose in the current time-
the vehicle started with full certainty. This triggers our loop
step. This is calculated as a euclidean distance. Then all the
closure algorithm to optimize and refine the map that was
distances between the vehicle and all midpoints are passed
incrementally drifting before loop closure due to odometry
through a softmax function to make all costs between zero
and measurement inaccuracies.
and one.
As the vehicle is moving along the track for the first time,
2) Heading Angles: Secondly, the angle between the vehi-
we take the outputs from FastSLAM and construct a pose
cle heading vector and each vector between the vehicle and
graph that gets optimized once loop closure conditions are
each midpoint is calculated. Thus, midpoints that require a
met. We used a pose graph optimization algorithm to optimize
small steering angle or no steering at all are given a low cost.
the map after loop closure. This optimized map then is used
All angle costs are passed through a softmax function as well.
in subsequent race laps (in challenges that has two or more
laps) by the vehicle to only localize in it and doesn’t perform 3) Voronoi meshing: Voronoi Meshing is a geometric mesh-
map updates. Thus shutting down the SLAM process and only ing technique that constructs polygons around the cones given
performing localization. certain geometric constraints. Some of these polygons are
The optimized map is then used by a Monte Carlo based lo- unbounded. Thus, all midpoints that are positioned inside or
calization algorithm to determine the vehicle’s location within on the edge of bounded polygons are given a zero cost, and
this map. This gives us the capability for faster pose estimate all midpoints that are placed outside the bounded polygons are
update rate thus allowing faster navigation. given a high cost(unity).
2) Prediction Model:
Three models were used, kinematic model, dynamic model,
and a fusion between both models according to the vehicle’s