Fpga Soc Fiducial System For Unmanned Aerial Vehicles: Raymond Zhang

FPGA SoC Fiducial System for Unmanned Aerial Vehicles
By
Raymond Zhang
A Research Paper Submitted
in
Partial Fulfillment
of the
Requirements for the Degree of
MASTER OF SCIENCE
in
Electrical Engineering
Approved by:
PROF
(Dr. Daniel S. Kaputa, Research Advisor)
PROF
(Dr. Sohail A. Dianat, Department Head)
DEPARTMENT OF ELECTRICAL AND MICROELECTRONIC ENGINEERING
KATE GLEASON COLLEGE OF ENGINEERING
ROCHESTER INSTITUTE OF TECHNOLOGY
ROCHESTER, NEW YORK
May 2018
Contents
1 Introduction 1
1.1 Unmanned Aerial Vehicle Applications . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Challenges of Autonomous Navigation . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Artifact Independent Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Artifact Dependent Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 6
2.1 AprilTags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 AprilTag Detection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Unity Environment Development for Drone Simulation . . . . . . . . . . . . . . . 11
3 Design Methodology 19
3.1 Basic Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Snickerdoodle SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Xilinx Vivado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 AprilTag HLS Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Results 36
4.1 HLS Block Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 IP core Speedup Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 FPGA Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Conclusion 44
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
i
6 Appendix 48
6.1 AprilTag Baseline Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 AprilTag Detection from 0.25 Meters . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 AprilTag Angled Detection from 0.342 Meters . . . . . . . . . . . . . . . . . . . 54
6.4 AprilTag Detection from 0.5 Meters . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.5 AprilTag Angled Detection from 0.57 Meters . . . . . . . . . . . . . . . . . . . . 63
ii
List of Figures
1.1 Translational and Rotational Axis of a Quadcopter Drone in Space . . . . . . . . . 3
2.1 AprilTag 0 From tag36h11 Family . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Kernel for Gaussian Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Window to Calculate Magnitude and Phase . . . . . . . . . . . . . . . . . . . . . 9
2.4 High Level Implementation of AprilTag Localization Implementation . . . . . . . 11
2.5 Unity Developmental Environment for AprilTag Test Case . . . . . . . . . . . . . 12
3.1 Methodology of Speedup for AprilTag Localization Algorithm . . . . . . . . . . . 21

3.2 AprilTag Image for Image Processing . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 fim - AprilTag Image after Gaussian Blur . . . . . . . . . . . . . . . . . . . . . . 22
3.4 fimMag - AprilTag Image After Gradient Magnitude . . . . . . . . . . . . . . . . 23
3.5 fimTheta - AprilTag Image After Gradient Theta . . . . . . . . . . . . . . . . . . 23
3.6 Modified AprilTag Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 Snickerdoodle Processing Chip Design . . . . . . . . . . . . . . . . . . . . . . . 26
3.8 Snickerdoodle Development Board . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.9 Example Vivado Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 Line Buffer Operation from Input Stream to Window . . . . . . . . . . . . . . . . 31
3.11 fim - AprilTag Image after Gaussian Blur from Vivado HLS . . . . . . . . . . . . 32
3.12 fimMag - AprilTag Image after Gradient Magnitude from Vivado HLS . . . . . . . 33
3.13 fimTheta - AprilTag Image after Gradient Theta from Vivado HLS . . . . . . . . . 33
3.14 fim - Pixel Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.15 fimMag - Pixel Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.16 fimTheta - Pixel Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iii
4.1 Vivado Generated Gaussian Blur IP Block . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Vivado Generated Gradient Magnitude IP Block . . . . . . . . . . . . . . . . . . . 38
4.3 Vivado Generated Gradient Theta IP Block . . . . . . . . . . . . . . . . . . . . . 40
iv
List of Tables
2.1 AprilTag Detection and Position Calculation Process . . . . . . . . . . . . . . . . 8

2.2 Baseline - Camera Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Relationship Between Unity and AprilTag Orientation . . . . . . . . . . . . . . . . 13
2.4 Camera View from 0.25 Meters - Camera Center . . . . . . . . . . . . . . . . . . 14
2.5 Camera View from 0.25 Meters - Camera Right . . . . . . . . . . . . . . . . . . . 15
2.6 Camera View from 0.25 Meters - Camera Top . . . . . . . . . . . . . . . . . . . . 15
2.7 Angled Camera View from 0.342 Meters - Pitch, Roll, and Yaw . . . . . . . . . . 16
2.8 Camera View from 0.5 Meters - Camera Center . . . . . . . . . . . . . . . . . . . 16
2.9 Camera View from 0.5 Meters - Camera Right . . . . . . . . . . . . . . . . . . . . 17
2.10 Camera View from 0.5 Meters - Camera Top . . . . . . . . . . . . . . . . . . . . . 17
2.11 Angled Camera View from 0.57 Meters - Pitch, Roll, and Yaw . . . . . . . . . . . 18
2.12 Translational And Rotational Error for AprilTag System . . . . . . . . . . . . . . . 18
3.1 Specification of the Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 AprilTag Detection Algorithm Benchmark from a Computer . . . . . . . . . . . . 20
3.3 Specification of the Snickerdoodle Development Board . . . . . . . . . . . . . . . 27
3.4 Specification of the Snickerdoodle Black Development Board . . . . . . . . . . . 27
3.5 AprilTag Detection Algorithm Benchmark from Snickerdoodle Processor . . . . . 28
4.1 Vivado Gaussian Blur IP Performance Metrics . . . . . . . . . . . . . . . . . . . 37

4.2 Vivado Gaussian Blur IP Utilization . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Vivado Gradient Magnitude IP Performance Metrics . . . . . . . . . . . . . . . . 38
4.4 Vivado Gradient Magnitude IP Utilization . . . . . . . . . . . . . . . . . . . . . . 39
4.5 Vivado Gradient Theta IP Performance Metrics . . . . . . . . . . . . . . . . . . . 40
4.6 Vivado Gradient Theta IP Utilization . . . . . . . . . . . . . . . . . . . . . . . . . 41
v
4.7 Vivado Generated IP Block Timing Performance . . . . . . . . . . . . . . . . . . 41
4.8 Timing Benchmark between a Computer, Snickerdoodle, and HDL . . . . . . . . 42
4.9 AprilTag Detection Algorithm Speedup from Snickerdoodle and FPGA Hybrid . . 42
4.10 Projected FPGA Resource Utilization from Gaussian Blur and Gradient Magnitude 43
6.1 Baseline - Camera Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.2 Camera View from 0.25 Meters - Camera Center . . . . . . . . . . . . . . . . . . 49
6.3 Camera View from 0.25 Meters - Camera Right . . . . . . . . . . . . . . . . . . . 50
6.4 Camera View from 0.25 Meters - Camera Left . . . . . . . . . . . . . . . . . . . . 50
6.5 Camera View from 0.25 Meters - Camera Top . . . . . . . . . . . . . . . . . . . . 51
6.6 Camera View from 0.25 Meters - Camera Top Left . . . . . . . . . . . . . . . . . 51
6.7 Camera View from 0.25 Meters - Camera Top Right . . . . . . . . . . . . . . . . . 52
6.8 Camera View from 0.25 Meters Away - Camera Bottom . . . . . . . . . . . . . . . 52
6.9 Camera View from 0.25 Meters - Camera Bottom Left . . . . . . . . . . . . . . . 53
6.10 Camera View from 0.25 Meters - Camera Bottom Right . . . . . . . . . . . . . . . 53
6.11 Angled Camera View from 0.342 Meters - Roll . . . . . . . . . . . . . . . . . . . 54
6.12 Angled Camera View from 0.342 Meters - Pitch . . . . . . . . . . . . . . . . . . . 55
6.13 Angled Camera View from 0.342 Meters - Yaw . . . . . . . . . . . . . . . . . . . 55
6.14 Angled Camera View from 0.342 Meters - Pitch and Yaw . . . . . . . . . . . . . . 56
6.15 Angled Camera View from 0.342 Meters - Roll and Pitch . . . . . . . . . . . . . . 56
6.16 Angled Camera View from 0.342 Meters - Yaw and Roll . . . . . . . . . . . . . . 57
6.17 Angled Camera View from 0.342 Meters - Pitch, Roll, and Yaw . . . . . . . . . . 57
6.18 Camera View from 0.5 Meters - Camera Center . . . . . . . . . . . . . . . . . . . 58
6.19 Camera View from 0.5 Meters - Camera Right . . . . . . . . . . . . . . . . . . . . 59
6.20 Camera View from 0.5 Meters - Camera Left . . . . . . . . . . . . . . . . . . . . 59
6.21 Camera View from 0.5 Meters - Camera Top . . . . . . . . . . . . . . . . . . . . . 60
6.22 Camera View from 0.5 Meters - Camera Top Left . . . . . . . . . . . . . . . . . . 60
6.23 Camera View from 0.5 Meters - Camera Top Right . . . . . . . . . . . . . . . . . 61
vi
6.24 Camera View from 0.5 Meters - Camera Bottom . . . . . . . . . . . . . . . . . . . 61
6.25 Camera View from 0.5 Meters - Camera Bottom Left . . . . . . . . . . . . . . . . 62
6.26 Camera View from 0.5 Meters - Camera Bottom Right . . . . . . . . . . . . . . . 62
6.27 Angled Camera View from 0.57 Meters - Roll . . . . . . . . . . . . . . . . . . . . 63
6.28 Angled Camera View from 0.57 Meters - Pitch . . . . . . . . . . . . . . . . . . . 64
6.29 Angled Camera View from 0.57 Meters - Yaw . . . . . . . . . . . . . . . . . . . . 64
6.30 Angled Camera View from 0.57 Meters - Pitch and Yaw . . . . . . . . . . . . . . 65
6.31 Angled Camera View from 0.57 Meters - Roll and Pitch . . . . . . . . . . . . . . . 65
6.32 Angled Camera View from 0.57 Meters - Yaw and Roll . . . . . . . . . . . . . . . 66
6.33 Angled Camera View from 0.57 Meters - Pitch, Roll, and Yaw . . . . . . . . . . . 66
vii
Nomenclature
ARM Advanced RISC Machine
AXI Advanced Extensible Interface
BRAM Block RAM
CLB Configurable Logic Block
CORDIC Coordinate Rotation DIgital Computer
CPU Central Processing Unit
DDR Double Data Rate
FF Flip Flop
FPGA Field Programmable Gate Array
FPS Frames Per Second
GPIO General Purpose Input/Output
GPS Global Positioning System
HDL Hardware Description Language
HLS High Level Synthesis
IMU Inertial Measurement Unit
IP Intellectual Property
LUT Look-up Table
viii
RANSAC Randome Sample Consensus
RISC Reduced Instruction Set Computer
RTL Register Transfer Level
SLAM Simultaneous Localization and Mapping
SoC System on a Chip
SOM System on a Module
UAV Unmanned Aerial Vehicle
ix
Chapter 1
Introduction
1.1 Unmanned Aerial Vehicle Applications
Aircraft today are widely used for many purposes, spanning civilian, commercial, and mil-
itary applications. Multiple different types of aircraft designs exist to fill these applications, in-
cluding fixed wing and rotating propeller designs. A fixed wing design relies on high speed for
stability, while a propeller design relies on rotors to remain airborn. New advances in technology
have allowed these aircraft to fly without a pilot in an unmanned configuration. These unmanned
configurations are known as Unmanned Aerial Vehicles (UAV) or drones [1]. These drones can
range in size from small toys with maneuvarable quadcopter designs to long range predator drones
with fixed wing configurations. UAVs are also used for recreational video recordings, and are now
being used as a platform for developers [2, 3].
1.2 Challenges of Autonomous Navigation
To maneuver from one location to another, drones are dependent on either remote user con-
trol or autonomous navigation algorithms that uses Global Positioning Systems (GPS) [4]. The
GPS system uses many satellites orbiting Earth that constantly track ground stations in many dif-
ferent locations. With remote user control, a UAV controlled by a human can navigate within an
environment. This can become a problem if the system flies out of range, causing the user to lose
control over the unit. Drones that utilize GPS can autonomously and accurately travel within a
few meters of there defined position if the system is in an outdoor environment with sufficient sky
visibility. However, GPS can be compromised by object obstructions such as forests, mountains,
1
large building structures, or closed indoor environments [2, 5].
Inertial Measurement Units are electronic devices that measure a system’s acceleration and
rotational velocities. These essential components are used to keep the drone stable in the air. Such
components are widely used because of their accurate performance and inexpensive implementa-
tion. Without GPS, a drone is susceptible to drift, where the system slowly loses position from a
buildup of noisy IMU measurements. Over time, drift develops from IMUs resulting in imprecise
localization. A drift can be difficult to detect since there is no absolute position signal. The pro-
posed solution to fix the problem of the drift that occurs in a non-GPS navigation is to implement
navigation based on machine vision. Through machine vision, cameras are used to process features
within the environment, and therefore a drone can remain local to a defined position by focusing
on specific features. In a system, localization is executed through image processing algorithms.
A successful implementation will require the algorithm to process a video input at a minimum of
10 Frames Per Second (FPS). The speed of the image processing algorithm could be optimized
and executed at around 100 FPS. At high processing speeds, the algorithm can be used for both
localization and stabilization. Image processing algorithms can extract precise translational and
rotational orientations of the drone relative to the features in an environment. Using the relative
position of a drone to the environement, IMU stabilization can be enhanced with image processing
algorithms to keep the drone stable. A description of the translational and rotational orientation for
a drone is shown in Figure 1.1 [6].
2
Figure 1.1: Translational and Rotational Axis of a Quadcopter Drone in Space
1.3 Artifact Independent Localization
Visual localization uses images captured from a camera to process the relative location
of a system. With a captured image, there needs to be enough features in the environment for a
system to process. To narrow down the number of features available, image processing can be
used to focus on specific feature points produced from an algorithm. Random Sample Consensus
(RANSAC) is a method where the system utilizes an iterative method to detect image features that
contain outliers, or data points that are not within the average of a set of common data. The iteration
executes multiple sequences to reach an accurate approximation which is concluded when the inlier
values reach a consensus threshold. This method is performed on two image sets, captured from a
stereo camera implementation. From the two images, triangulation is used to map the environment
of the captured images. From the 3D environment, distance and orientation of the system can be
accurately determined [7]. Due to the iterative process, the output of this algorithm produces very
high precision results that models the surroundings of a drone. However, the algorithm can only
be utilized with a high-performance processors since the algorithm requires long computing time
due to extensive calculations that taxes low-end processors [8, 9].
3
1.4 Artifact Dependent Localization
Visual localization can also be used to detect an artificial feature within an image. These
artificial features can be designed to be any size or shape. When a specific artifact is detected by the
camera, the localization algorithm can process and extract the orientation data. Fiducial markers
are a form of artificial features created to assist localization by representing the orientation data
with respect to the system. The markers are designed to be unique in order to minimize the possi-
bility of matching with a pre-existing marker within the environment. To be distinct from potential
environmental disruption, fiducial designs consist of simple geometric shape combinations with a
specified color set. Examples includes bar codes, QR codes, or circles with binary encoded data.
Environmental disruptions are further mitigated by detecting specific design combinations. Valid
detection is determined if the design matches a library of specific combinations. Since the system
is detecting a unique artificial feature, orientation data with respect to the system can be precisely
extracted by measuring the known features of the artifact [10].
1.5 Literature Review
As the concept of localization grows to be essential to UAVs, the demand for a fast and
robust high performance system becomes a growing challenge for system developers. The purpose
of localization is to provide the ability for a robotic system to navigate in an unknown environment
by tracking the position and orientation of the system. Researchers have determined that visual
localization using cameras is a robust and accurate method of implementation [7]. One implemen-
tation of localization is a stereo camera. Mah demonstrated a successful localization technique that
determined the position of the system relative to a defined color object. The downside of this algo-
rithm was its slow processing speed. On a Raspberry Pi, the processing speed achieved a maximum
rate of 10 FPS , which is applicable for real time localization on a drone, but not fast enough for
stabilization [11]. The speed of localization is important for precision and accuracy of the system,
4
where such factors are proportional to the complexity of the algorithm. As the algorithm becomes
more sophisticated, power consumption and computation time also increases. To meet faster pro-
cessing and precision requirements, researchers are pushing towards hybrid applications between
Central Processing Units (CPUs) and Field Programmable Gate Arrays (FPGAs). Computational
tasks that require extensive CPU processing time can be optimized by allocating localization tasks
to FPGAs. Programs written in Hardware Description Languages (HDLs) can be optimized to
pipeline a process where localization data is computed and delivered to the CPU from an image
input [9]. Localization algorithms can then extract the position of an image’s orientation with re-
spect to the camera. Orientation data can then be broken down into its constituent components,
such as translational and rotational positions. Given the orientation of the system, the data can then
be used to track the movement of an aerial system in an open environment in conjunction with a
Simultaneous Localization and Mapping (SLAM) algorithm. With a SLAM implementation, a 3D
representation of the environment can be created for navigation [5, 7].
In this research, the performance of visual localization is analyzed with the implementation
of an AprilTag. An AprilTag is a form of an artifact dependent visual fiducial system, where posi-
tion and orientation data are determined from the defined features within a tag. The accuracy of the
detection algorithm is evaluated to verify that the performance is accurate. The speed of detection
process was benchmarked to determine the frame rate for real time implementation. Slow portions
of the computational algorithm were extracted and converted to an FPGA in order to increase the
speed of the detection algorithm.
5
Chapter 2
Background
2.1 AprilTags
An AprilTag is a form of artifact dependent visual fiducial system developed by Edwin

Olson, an Associate Professor of Computer Science and Engineering at University of Michigan.
The system can be used for a variety of applications ranging from the development of augmented
reality to tracking and navigation with autonomous aerial systems. In this application, the visual
localization focuses on detecting AprilTags, shown in Figure 2.1. AprilTags are designed to be
conceptually similar to QR codes, where data is stored in 2D array [12]. Compared to QR codes,
the bits encoded within the AprilTags are designed to represent smaller values by using larger
squares. Large block combinations make the system more robust, and allows the tags to be detected
from longer distances. The tag system uses a black and white color combination, which helps
make the tags less sensitive to different lighting conditions. Tags can be encoded in many different
combinations and are separated into different families. The three families are Tag36h11, Tag25h9,
and Tag16h5. In Tag36h11, the data is encoded by a combination of a six-by-six array for a 36-bit
tag. In Tag25h9, the data is encoded is created by a combination of a five-by-five array for a 25-bit
tag. In Tag16h5, the data is encoded by a combination of a four-by-four array for 16-bit tag. It is
desirable to choose a high hamming distance, since it represents the number of positions at which
two symbols are different. A higher hamming distance leads to a lesser chance of confusing a tag
as a different tag. In Tag36h11, the “h11” represents a hamming distance of 11. Tag25h9 and Tag
16h5 has a hamming distance of 9 and 5 respectfully. Tag36h11 was chosen for the research due
to a lesser chance of recognizing one tag as a different tag. The specific tag shown in Figure 2.1
represents a value of 0 from family Tag36h11 [10, 13].
6
Figure 2.1: AprilTag 0 From tag36h11 Family
2.2 AprilTag Detection Process
The fundamental approach to visual localization is to extract data sets from a captured image
through image processing. There are a few conditions that the image should meet to ensure that
detection will work. The captured image should contain the full tag on a flat surface with no
object obstruction. The background can contain other objects or additional AprilTags, where the
algorithm can detect and choose to process or ignore the second AprilTag. Table 2.1 shows the
detection process for an AprilTag.
7
Steps Process Operation
1 Gaussian Blur
2 Gradient Extraction
3 Edge Extraction
4 Cluster Formation
5 Cluster Segmentation
6 Segment Connection
7 Quad Formation
8 Quad Decode
9 Overlap Check
10 Position Calculation
Table 2.1: AprilTag Detection and Position Calculation Process
The detection process can be described within several distinct phases. In the first phase, the
image is first condensed by converting from color to grayscale. This process reduces the data size
for each pixel from 16-bit to 8-bit representation. For a RGB pixel, the 16-bit representation can be
broken down to R5:G6:B5. The most significant 5 bits represent the intensity for red, the middle 6
bits represent the intensity for green, and the least significant 5 bits represent the intensity for blue.
The conversion from color to grayscale utilizes Equation 2.1.
Gray = 0.299 ∗ Red + 0.587 ∗ Green + 0.114 ∗ Blue (2.1)
To increase the accuracy, filtering is used to reduce noise within an image. In this algorithm,
a Gaussian blur is applied to the grayscale image to average out the noise within the pixels. Gaussian
blur can be represented as a convolution of the image with a matrix kernel. The design of the matrix
kernel is defined by equation 2.2.
1 x2 +y2 2
G (x, y) = e 2σ (2.2)
2πσ 2
The equation represents a normal distribution in two dimensions, where x is the distance
from the origin along the horizontal axis and y represents the distance from the origin along the
vertical axis. σ represents the standard deviation of the Gaussian Distribution. An example of a
8
simple Gaussian blur kernel is shown in Figure 2.2. A kernel is a 2D array of filter value for pixel
processing.
Figure 2.2: Kernel for Gaussian Blur
In the second stage, the image is then further processed by computing the gradient of the
tag. This is achieved by initially computing two data sets, which are magnitude and phase. Figure
2.3 shows a window holding nine pixels. To determine magnitude, the difference between adjacent
pixels are calculated. Equation 2.3 and 2.4 reflects the calculation of the difference along the hori-
zontal and vertical axis respectively for pixel E. Using the difference, magnitude and phase for the
specific pixel are determined with equations 2.5 and 2.6.
Figure 2.3: Window to Calculate Magnitude and Phase
9
Ix (E) = F − D (2.3)
Iy (E) = H − B (2.4)
magnitude (E) = Ix2 + Iy2 (2.5)
! "
−1 Iy
phase (E) = tan (2.6)
Ix
In the third stage, an edge is determined when there is a grouping of pixels with the same
angle from the phase data set. After a list of potential edges is generated, the list is compared with
the magnitude value at the pixel location. An edge is then determined to exist if the magnitude
of the group of pixels is sufficiently large. This method creates an array of edge data storing the
weight of potential edges for an AprilTag. From the edge data, clusters can be created by comparing
the phase of surrounding pixels. By checking over the clusters, segment lines are defined over the
cluster edges by checking for pixels with similar weight. From the defined segments, the beginning
and end of the segments are stored to define potential connection points. To determine connection
points, a loop is iterated to check if a connection of 4 sequences can be made. The process is then
repeated until a quad is defined. The quad is then verified with a defined library set to determine if
the detected quad is valid. The final step is to check if more than one of the same tags is detected
as an overlapping tag. Using the hamming value from both tag detections, the tag with the lower
hamming value is concluded to be the main tag [10].
The AprilTag detection algorithm was further improved by Michael Kaess, an assistant re-
search professor at Carnegie Mellon University. His addition to the AprilTag detection algorithm
was the addition of a localization algorithm whereby it determined the location of a system rela-
tive to a detected tag. The implemented localization algorithm then returned the translational and
10
rotational data of an AprilTag. In a GPS-denied environment, the algorithm can be used to find
position data for a drone. Thereby the system can process and perform the necessary adjustments
to remain stable. The design approach of the AprilTag algorithm can be represented in Figure 2.4.
The goal of this research implementation was to use this algorithm to achieve in real time drone
localization.
Figure 2.4: High Level Implementation of AprilTag Localization Implementation
2.3 Unity Environment Development for Drone Simulation
To create an ideal test case, Unity was used to develop an environment in which a drone
would view an AprilTag from different positions. Unity is a cross-platform game engine, widely
used to develop simulated environments. The software is capable of supporting both 2D and 3D
simulations with realistic high performance physics applications. In Unity, a test setup was created
using a plane and a camera. The plane was made to represent the AprilTag and the camera was
placed in different positions to simulate a drone looking at the tag. The objects within Unity can
be placed in different directions and orientations using translational and rotational settings. An
example of the test environment is captured in Figure 2.5.
11
Figure 2.5: Unity Developmental Environment for AprilTag Test Case
To support a wide range of compatibility, Unity does not define a unit for certain user con-
trols. A rotation for an object is defined in degrees, but the translation for an object is unitless. To
define a baseline scale for translation, AprilTag localization software was executed with a defined
tag size of 0.1 meters by 0.1 meters. From Table 2.2, the distance between the tag and the camera
was calculated to be 0.1254 meters for a Unity translation of 11 units.
0.1254 meters 11 units

= (2.7)
x meters 1 unit
From the proportional scaling defined in equation 2.7, one unit of translation in Unity is
therefore approximately 0.0114 meters. The orientation of the defined AprilTag system is different
than the defined axis Unity. Table 2.3 shows the relationship in axis orientation between the two
systems. Observing Table 2.2 and Table 2.3, a translation of -11 units on the z axis in Unity means
a translation of 0.1254 meters on the x axis of the AprilTag system. A translation on the x and y
axis in Unity will represent a translation in the y and z axis in the AprilTag system respectively.
After defining the baseline, a series of test measurements were developed to check if the detection
12
algorithm’s accuracy.
Unity AprilTag Detection

x 0 0.1254 m
y 0 0.0001 m
z -11 0.0001 m
pitch 0º -0.0006º
roll 0º 0.000º
yaw 0º -0.000º
0.1254 meters x meters
Comment: 11 units = 1 unit
; 1 unit = 0.0114 meters
Table 2.2: Baseline - Camera Center
Unity Orientation AprilTag Orientation
x y
y z
z x
pitch rotation in x pitch rotation in y
roll rotation in z roll rotation in x
yaw rotation in y yaw rotation in z
Table 2.3: Relationship Between Unity and AprilTag Orientation
Using the determined unit value as a baseline for Unity, an analysis can be performed to
measure the accuracy of the detection system. Table 2.4, Table 2.5 and Table 2.6 capture the trans-
lational position of the camera with respect to the center of the AprilTag from 0.25 meters (Unity
distance of 22). Observing vertical translation, the measured position is off by 0.0032 meters. For
the horizontal translation, the measured position is off by 0.0058 meters. Table 2.7 captures a series
of camera rotation simulating a slight offset view of the AprilTag from 0.342 meters (Unity distance
13
of 30). Referencing from the axis for AprilTag system, pitch represents a rotation in the y axis, roll
represents a rotation in the x axis, and yaw represents a rotation in the z axis. The test also utilized
small rotational angles to simulate possible real drone conditions when facing an AprilTag. If only
one axis of rotation is changed, the angle of detection is off by at most 1°. With two axis of rotation,
the angle of detection is off by at most 2.3°. For three axis of rotation, the maximum error is 1.25°.
Table 2.8, Table 2.9 and Table 2.10 capture the translational position of the camera with respect to
the center of the AprilTag from 0.5 meters (Unity distance of 44). Observing vertical translation,
the measured position is off by 0.0064 meters. For the horizontal translation, the measured position
is off by 0.0118 meters. Table 2.11 captures a series of camera rotation simulating a slightly rotated
view of the AprilTag from 0.57 meters (Unity distance of 50). Similar to the test cases captured
in Table 2.7, small rotation angles were used. When only a single axis was rotated, the angle of
detection was off by at most 3°. With two degrees of rotation, the angle of detection was off by at
most 2.9°. From three degrees of rotation, the maximum error was 1.1°. Table 2.12 captures the
errors for AprilTag detection system.
x 0 0.2507 m
y 0 0.0002 m
z -22 0.0002 m
pitch 0º -0.029º
roll 0º -0.011º
yaw 0º 0.023º
Comment: 22 ∗ 0.0114 meters = 0.2508 meters
Table 2.4: Camera View from 0.25 Meters - Camera Center
14
x 5 0.2507 m
y 0 0.0628 m
z -22 0.0002 m
pitch 0º 0.006º
roll 0º -0.000º
yaw 0º 0.011º
Table 2.5: Camera View from 0.25 Meters - Camera Right
x 0 0.2507 m
y 3 0.0002 m
z -22 0.0374 m
pitch 0º -0.040º
roll 0º 0.006º
yaw 0º -0.029º
Table 2.6: Camera View from 0.25 Meters - Camera Top
15
x 0 0.3337 m
y 0 0.0779 m
z -30 0.0301 m
pitch 10º 8.924º
roll -30º 31.25º
yaw 8º -6.966º
Comment:
Table 2.7: Angled Camera View from 0.342 Meters - Pitch, Roll, and Yaw
x 0 0.5019 m
y 0 0.0004 m
z -44 0.0004 m
pitch 0º 0.132º
roll 0º 0.006º
yaw 0º 0.040º
Comment:44 ∗ 0.0114 meters = 0.5016 meters
16
x 10 0.5020 m
y 0 0.1258 m
z -44 0.0004 m
pitch 0º 0.029º
roll 0º -0.006º
yaw 0º 0.034º
x 0 0.5019 m
y 6 0.0004 m
z -44 -0.0748 m
pitch 0º 0.074º
roll 0º -0.017º
yaw 0º -0.183º
17
x 0 0.5514 m
y 0 0.1499 m
z -50 0.0629 m
pitch 12º 10.08º
roll -30º 31.59º
yaw 9º -7.312º
Comment:
Distance Translational Error Rotational Error

Unity Camera (m) Vertical (m) Horizontal (m) Single Axis of Rotation
30 0.342 0.0032 0.0058 0.928º
50 0.570 0.0064 0.0118 2.25º
Table 2.12: Translational And Rotational Error for AprilTag System
The position detection algorithm was much more accurate when the camera was closer to
the tag for both translational and rotational orientation. Due to the higher accuracy detection, the
algorithm should be used when the drone is 0.25 meters from the AprilTag. From Table 2.12, the
error in the angle of rotation becomes larger as the camera is extended farther from the tag. The
accuracy in the orientation results was also affected by the resolution of the image, where the lower
resolutions provided less accurate pixel representations of the environment. More combinations
of test are shown in the Appendix. The conditions defined in this analysis are not exhaustive, due
to the fact that there are many more combinations of AprilTag orientation. Output errors from the
baseline measurements captured in the Appendix may be larger than the current baseline analysis.
18
Chapter 3
Design Methodology
3.1 Basic Design Flow
From previous research, Mah implemented a successful algorithm that detected a red circle
and then determined the location of the system relative to an object. However, the implementation
requires a stereo camera system running on a CPU to perform the necessary position calculations.
From the benchmark performance status, the algorithm required a lengthy amount of time to process
an image [11]. Santos-Castro improved the concept of localization by increasing the processing
speed on an image. His speedup implementation involved image processing algorithms in HDL
that were executed on an FPGA [14]. His processing algorithms utilized simple implementations
of grayscale, inverting, and thresholding operations.
Modeling his design approach, the speedup of image processing can also be applied to the
AprilTag localization algorithm. By analyzing the steps within the detection algorithm, a perfor-
mance benchmark was created and analyzed for potential speedup. The localization algorithm,
which was executed on a computer (shown in Table 3.1), was benchmarked and the timing results
where recorded in Table 3.2.
Sony Vaio S Series

Processor Intel(R) Quadcore(TM) i7-3632QM
Speed 2.2 GHz
Processing System
RAM 8GB @ 20 GHz
Wireless 802.11n
Weight 3.8 pounds
Hardware
Dimension 8.0” (width) x 14.2” (height) x 14.6” (depth)
Table 3.1: Specification of the Computer
19
Steps Process Operation Execution Time
1 Gaussian Blur 38.413 ms
2a Gradient Extraction Magnitude 17.952 ms
2b Gradient Extraction Theta 26.263 ms
3 Edge Extraction 31.983 ms
4 Cluster Formation 3.7579 ms
5 Cluster Segmentation 0.3139 ms
6 Segment Connection 0.0970 ms
7 Quad Formation 0.0432 ms
8 Quad Decode 0.2288 ms
9 Overlap Check 0.0009 ms
10 Position Calculation 5.8190 ms
Total Execution time 124.723 ms
Processing Speed 8.02 FPS
Table 3.2: AprilTag Detection Algorithm Benchmark from a Computer
With an overall execution time of 124.723 milliseconds, the maximum frequency the system can
achieve is 8.02 Hertz, or about 8 FPS. when the time to compute each section was observed, Gaus-
sian blur, gradient extraction, and edge extraction required the longest period of time to perform.
A Gaussian blur filters an image to reduce image noise. Gradient extraction uses a mathematical
sequence that calculates the magnitude and the phase of an image pixel based on the adjacent pix-
els. The edge extraction process determines the potential edges within an image and stores the data
into an array. Since the output of a Gaussian blur and gradient takes a long time to process, those
algorithms where chosen to be implemented on an FPGA. Figure 3.1 portrays a breakdown of the
analysis process to optimize the execution of the localization algorithm. From a grayscale image
input, a measurement of the execution time to perform steps 1 and 2 on the CPU and FPGA was
measured. The output image results were also compared to determine how accurate the results were
from the FPGA. The sample image used for AprilTag detection shown in Figure 3.2 is the position
of the camera looking at a tag rotated 45°.
20
Figure 3.1: Methodology of Speedup for AprilTag Localization Algorithm
Figure 3.2: AprilTag Image for Image Processing
Throughout the processing algorithm, there are specific names for each image after each
specific step. The first step processed an image with a Gaussian filter that resulted with an output
image called filtered image, or fim. The second step processed the magnitude of the gradient,
producing an output that was called filtered image magnitude, or fimMag. Determining the phase
of the gradient was a similar process, where the output result was called filtered image theta, or
fimTheta. Figure 3.3 shows the output result fim. It can be seen that the edges in and around the
AprilTag was slightly less sharp than the original AprilTag.
21
Figure 3.3: fim - AprilTag Image after Gaussian Blur
After the image was filtered, the output was then processed to determine gradient. In this
section, fim was processed to generate two image results. Image fimMag shows the magnitude of
each pixel, as shown in Figure 3.4. The stored magnitude for each pixel shows the outline of an
AprilTag along with the interior features. Image fim output was also used to calculate the phase
of each pixel with respect to the surrounding pixels. Observing fimTheta from Figure 3.5, the
output for each pixel shows the calculated phase. It can be shown that the edges of the AprilTag
are outlined with uniform theta values. Compared to Figure 3.2, the surface regions have defined
phase for both the dark and light regions.
22
Figure 3.4: fimMag - AprilTag Image After Gradient Magnitude
Figure 3.5: fimTheta - AprilTag Image After Gradient Theta
The processed image was standardized to utilize the full grayscale spectrum (8 bits or 1
byte). Image fim was scaled up, due to the fact that each pixel was originally stored as a floating-
point value between 0 and 1. Since the pixels are limited to a maximum value of 1, the result was
23
multiplied by 255. This method of scaling was also used for fimMag pixels. The calculations for
fimTheta utilizes inverse tangent, where the output range was between negative and positive pi. To
account for the negative values, the range was adjusted to be between zero and two pi by adding pi
to every pixel. The pixels were then normalized and scaled by dividing two pi and then multiplied
by 255 to fulfill the range. The scale conditioning for the pixel is shown in equation 3.1.
pixel + π
Scaled Output = ∗ 255 (3.1)
2π
In the original AprilTag program flow, the images were originally read from the beginning
and then processed through ten steps. In the modified program flow, the algorithm was changed
by adding additional steps in between steps 2 and 3. Instead of passing the data in between the two
steps, the data was extracted and saved as an image file. The output images were then read back
into step 3, where the process then continued to calculate position. The additional modifications
were implemented to extract the expected outputs from step 1 and 2 as a baseline to compare with
the HDL outputs. The modified program flow was also implemented to prove that the data set
stored within each pixel remained valid when the image was read back into the algorithm. From
this modification, the algorithm successfully calculated the position of the camera with respect to
the tag. The test implementation is shown in Figure 3.6.
24
Figure 3.6: Modified AprilTag Detection Algorithm
3.2 Snickerdoodle SOM
The target application for the AprilTag detection algorithm was on a Snickerdoodle de-
velopment board. Snickerdoodle is produced by Krtkl inc. and the board is based on Advanced
RISC Machine (ARM) processor developed by Xilinx. An advantage of using a Snickerdoodle
compared to other development boards is that the processor on the Snickerdoodle works with an
FPGA integrated on the chip, as shown in Figure 3.7. Figure 3.8 shows the Snickerdoodle devel-
opment board, and Table 3.3 shows the specifications. The 7000 family ZYNQ processor chips are
designed to operate with dual A-9 Processor integrated with a programmable FPGA. With Wi-Fi
capability, Snickerdoodle can operate as an independent operating system that supports file sharing
and remote desktop capabilities. Through the remote desktop, the AprilTag detection algorithm was
executed and a performance benchmark was recorded, as shown in Table 3.5. It is important to note
that the measured operation time was larger since the board uses slower hardware components than
the computer. For this research, a normal Snickerdoodle was used to develop and test the design.
Future research designs will be applied to Snickerdoodle Black and potentially UltraZed-EV.
25
Figure 3.7: Snickerdoodle Processing Chip Design
Figure 3.8: Snickerdoodle Development Board
26
Snickerdoodle
SoC Components Z-7010-1C
Processor Dual-Core ARM Cortex-A9 MPCore
Speed 667 MHz
Processing System
DRAM 1GB @ 400MHz LPDDR2
Wireless 2.4 GHz SISO 802.11b/g/n
Logic Cells 28k
Look Up Tables 17600
Programmable Logic Block RAM 2.1 Mb
DSP Slices 80
I/O Pins 100
Weight 0.0683 Pounds
Hardware
Dimension 2 in x 3.5 in
Table 3.3: Specification of the Snickerdoodle Development Board
Snickerdoodle Black
SoC Components Z-7020-3E
Processor Dual-Core ARM Cortex-A9 MPCore
Speed 867 MHz
Processing System
DRAM 1GB @ 400MHz LPDDR2
Wireless 2.4 GHz/ 5 GHz MIMO 802.11b/g/n
Logic Cells 85k
Look Up Tables 53200
Programmable Logic Block RAM 4.9 Mb
DSP Slices 220
I/O Pins 125
Weight 0.0701 Pounds
Hardware
Dimension 2 in x 3.5 in
Table 3.4: Specification of the Snickerdoodle Black Development Board
27
Steps Process Operation Computer Execution Time Snickerdoodle Execution Time
1 Gaussian Blur 38.413 ms 102.382 ms
2a Gradient Extraction Magnitude 17.952 ms 21.137 ms
2b Gradient Extraction Theta 26.263 ms 78.964 ms
3 Edge Extraction 31.983 ms 83.064 ms
4 Cluster Formation 3.7579 ms 19.183 ms
5 Cluster Segmentation 0.3139 ms 3.263 ms
6 Segment Connection 0.0970 ms 0.376 ms
7 Quad Formation 0.0432 ms 0.078 ms
8 Quad Decode 0.2288 ms 7.438 ms
9 Overlap Check 0.0009 ms 0.011 ms
10 Position Calculation 5.819 ms 7.297 ms
Total Execution time 124.723 ms 323.193 ms
Processing Speed 8.02 FPS 3.09 FPS
Table 3.5: AprilTag Detection Algorithm Benchmark from Snickerdoodle Processor
Observing the detection algorithm performance from Table 3.5, the overall execution time
was slower on the Snickerdoodle processor. With a total execution time of 323.193 milliseconds,
the maximum frequency the system can achieve is 3.09 Hertz, or about 3 FPS. For drone appli-
cations, 3 FPS is not fast enough to perform real time tracking. Gaussian blur requires 102.382
milliseconds to filter an image, which is much slower than when compared to the computer. Even
though the performance specification for a PC is much better than a Snickerdoodle processor, the
size and weight of a computer is not feasible for a system that flies on a limited source of power.
Therefore, a Snickerdoodle is better suited for a drone due to its much lighter design and smaller
dimension.
3.3 Xilinx Vivado
To discuss the implementation of step 1 and step 2 on an FPGA, it is important to introduce

the design tools developed by Xilinx. Vivado is a software platform that allows designers to create,
analyze, and synthesize HDL designs. HDL is a specialized computer language that is used to
describe the structure of digital logic circuits. Using Vivado, the FPGA on the Snickerdoodle can
be programmed with a synthesized design. Vivado also offers a wide variety of analysis tools for
28
designers and can be used to perform timing analysis to optimize a design for fast performance.
Designers can also examine the model at a Register Transfer Level (RTL) where signals can be
traced throughout the program execution to debug any potential errors.
In a Vivado project, the System on a Chip (SoC) design revolves around a ZYNQ processor.
The block design shown in Figure 3.9 shows an example where the board blinks an LED at a con-
stant rate. In Figure 3.9, the main processor is shown with a set of input and output ports. GPIO_0
port interacts with the General Purpose Input/Output (GPIO) pins on the physical development
board. This port allows the design to control an assigned port at runtime. Double Data Rate (DDR)
is a port that allows the system to interface with the memory on the board. M_AXI_GP0 works
with Advanced Extensible Interface (AXI) to facilitate the connections of the functional blocks.
The output FCLK_CLK0 feeds a clock to the blink_0, system_ila0 blocks, and the processor to
drive the system. The design generates a blink IP block that drives a signal to an LED pin on the
board.
Figure 3.9: Example Vivado Block Design
A specific tool used in this research that was developed by Xilinx was Vivado High Level
Synthesis (HLS) . This tool assists with compiling high-level languages into Xilinx devices with-
out the need to manually design RTL code. The high-level languages that Vivado HLS supports
are C, C++, and System C. Models designed from high-level languages can also be converted to
29
Intellectual Property (IP) blocks. This feature allows the designer to create unique functions. From
Figure 3.9, IP block blink_v1_0 is a specialized IP block that generates a blink pulse.
3.4 AprilTag HLS Design
Using Vivado HLS, step 1 and step 2 from the AprilTag localization algorithm can be de-
signed using C. Due to the limited Block RAM (BRAM) size, a 640-by-480 resolution was used.
The pixels within an image are not entirely stored, but are individually streamed into the IP block.
To create the pixel stream, the header file hls_video.h must be included. The functions within the
library are developed to accelerate image processing on an FPGA. The library also supports com-
mon data structures, such as integer, character, short, float, etc. Using the library, objects specific
to both image and video processing can be instantiated. These objects include streams, line buffers,
and windows. A stream object defines the ports for the pixels to be read in and send out of an IP
block. A line buffer object pipelines the design by storing the stream and passing a specific pixel
value to the window when the address is called. This is important because it is needed to solve
the issue of linear stream input. A window object stores the side pixels around the central pixel to
perform the necessary computation. This object is a 2D array that stores a pixel with a defined type
of data. The number of rows and columns can be defined to be any size, but the implementation
chosen requires a three-by-three matrix. The objects defined are then used in all the IP cores de-
veloped to perform the Gaussian blur and determine the gradient from the localization algorithm.
The design of the buffer and window is shown in Figure 3.10.
30
Figure 3.10: Line Buffer Operation from Input Stream to Window
In the Gaussian blur design, two windows were instantiated. One window was instantiated
to use as a filter kernel, and the second window was instantiated to store the pixel stream. Each ele-
ment in the filter kernel was multiplied with the respective element of the window. The multiplied
results from each element of the window was also accumulated. The pixels within the window
was then divided by the accumulated value for normalization. Using Figure 3.2 as image input, the
output result of the Gaussian blur operation in HLS is shown in Figure 3.11. Comparing the output
to Figure 3.3, the filtered result produced a minor line disruption in the background. Similar to
the Gaussian blur from the original system. The edges are also slightly less sharp than the original
AprilTag.
31
Figure 3.11: fim - AprilTag Image after Gaussian Blur from Vivado HLS
The next design is to determine the gradient of each pixel. In this design, one window was
instantiated to store the pixel stream. Only the single window was needed to determine magnitude
or phase since no convolution operation was necessary for this image process. Using the window,
only the adjacent pixel values are needed to compute magnitude and phase. As discussed in back-
ground, magnitude is determined using equation 2.5 and phase is determined by using equation 2.6.
The output of the gradient operation in HLS is shown in Figure 3.12 for fimMag and Figure 3.13 for
fimTheta. Observing the results, fimMag produced an accurate result when compared to the origi-
nal expected output produced from the AprilTag algorithm. The outlined edges along the inner tag
highlighted the features. The only minor difference was in portions of the interior feature, where
the magnitude was slightly smaller (darker). Comparing fimTheta generated from Vivado HLS to
the original program output fimTheta, the output results are darker. This difference is caused from
using a different inverse tangent algorithm. A faster inverse tangent algorithm was used because
the utilization of resources on the board is significantly less than the original C function [15]. The
approximation method of the inverse tangent used for HLS is shown in equation 3.2.
32
π
tan−1 (x) = x − x (|x| − 1) (0.2447 + 0.0663|x|) (3.2)
4
Figure 3.12: fimMag - AprilTag Image after Gradient Magnitude from Vivado HLS
Figure 3.13: fimTheta - AprilTag Image after Gradient Theta from Vivado HLS
33
From the image results, a comparison between the program and Vivado HLS output can be
made. Since the pixel values are grayscale, performing a binary comparison will not result in an
accurate comparison. Instead, a pixel difference can be used, where the difference in magnitude
can be determined based on how light or dark the output pixels are. Lighter pixels represent a
larger difference and darker pixels represent a smaller difference. Observing Figure 3.14, the pixel
difference between both fim images are very small. Figure 3.15 shows the difference between the
fimMag images. The image shows that there are some sections of the outline of the AprilTag have
gray and light pixel outlines. Investigating the lighter pixel outlines, the pixel storing the values are
close to the correct value however, the stored positions were offset by one pixel position. Due to the
offset pixels, the difference in magnitude result will seem significantly large. Figure 3.16 shows
a significant pixel difference between the fimTheta produced in the program and the fimTheta
produced from Vivado HLS. Due to the bigger difference in pixel values, implementing fimTheta
on FPGA will not be suitable if the pixel value results are innacurate. As previously explained, this
difference is due to a faster inverse tangent implementation where an accuracy trade off was made
for a faster calculation that utilized less resources.
Figure 3.14: fim - Pixel Comparison
34
Figure 3.15: fimMag - Pixel Comparison
Figure 3.16: fimTheta - Pixel Comparison
35
Chapter 4
Results
4.1 HLS Block Generation
Using Vivado HLS, the design for each block was synthesized and generated as an IP block.
The IP block for a Gaussian blur is shown in Figure 4.1. The block was designed to have an input
stream, instream, and an output stream fimStream. The control for the blocks operation was handled
by s_axi_CRTL_BUS. This let the block know when the input stream began so the function could
begin processing the data. The block was driven by an ap_clk, and the function could be reset with
an ap_rst_n. The block was also capable of generating an interrupt signal to another IP block, but
the function was not used in AprilTag localization algorithm.
Figure 4.1: Vivado Generated Gaussian Blur IP Block
After a successful synthesis, a report was generated for the IP block. Table 4.1 shows the
performance metrics for the IP block. The clock used for the block is ap_clk. The design was
optimized for a target clock cycle of 10 ns. The final design synthesized with an estimated clock
cycle of 7.97 nanoseconds, potentially increasing to 9.22 nanoseconds. To compute the Gaussian
blur, the minimum clock cycles required was 922949. Using the minimum latency and the estimated
36
execution time, it took 7.356 milliseconds to perform Gaussian blur on an image.
Clock ap_clk
Target (ns) 10.00
Estimated (ns) 7.97
Uncertainty (ns) 1.25
Minimum Latency (clock cycles) 922949
Maximum Latency (clock cycles) 922953
Minimum Interval (clock cycles) 922953
Maximum Interval (clock cycles) 922953
Table 4.1: Vivado Gaussian Blur IP Performance Metrics
Resources within the hardware system are composed of Configurable Logic Blocks (CLB)
and a memory system. For a ZYNQ processor, the components are defined as the BRAM_ 18K,
DSP48E, Flip-Flop (FF), and Look-up Table (LUT) . BRAM_18K defines the RAM available to use
for the processor. DSP48E are predefined digital processing elements that are included on FPGA
devices. Some are predefined to perform widely use operations, such as addition and subtraction.
FF are registers in an IP core and are used to store variables that are frequently used throughout
the program. LUTs are used for determining the output of a common digital logic gate for fast
reference and computation.
Table 8 captures the resources utilized to perform the Gaussian blur. From the available
amount of 17,600 LUTs, the block utilized a total of 6296 elements. The core uses 6% of the
available BRAM for basic storage.
Name BRAM_18K DSP48E FF LUT

DSP - - - -
Expression - - 0 1295
FIFO - - - -
Instance 0 18 1500 3354
Memory 8 - 0 0
Multiplexer - - - 1135
Register 0 - 2936 512
Total 8 18 4436 6296
Available 120 80 35200 17600
Utilization (%) 6 22 12 35
Table 4.2: Vivado Gaussian Blur IP Utilization
37
Gradient magnitude was the second block synthesized and generated as an IP block for our
test, and is shown in Figure 4.2. The block was designed to have an input stream, fimstream, and
an output stream fimMagStream. Like the Gaussian blur, the control for the blocks operation was
handled by s_axi_CRTL_BUS. This port let the block know when the input stream began so the
function could begin processing the data. The block was driven by an ap_clk with a functional reset
by port ap_rst_n.
Figure 4.2: Vivado Generated Gradient Magnitude IP Block
After a successful synthesis, a report was generated for the IP block. Table 4.3 shows
the performance metrics for the IP block. From an input clock source of ap_clock, the design
was optimized for a target clock cycle of 10 nanoseconds. The final design synthesized with an
estimated total maximum time of 9.88 nanoseconds. To compute the magnitude, the minimum
clock cycles required was 922917. Using the minimum latency and the estimated execution time,
it took 7.965 milliseconds to determine magnitude for all the pixels.
Clock ap_clk
Target (ns) 10.00
Estimated (ns) 8.63
Table 4.3: Vivado Gradient Magnitude IP Performance Metrics
38
Table 4.4 captures the resources utilized to determine the magnitude at each pixel. From
the available amount of 17,600 LUTs, the block utilized a total of 5476 elements. The core used
6% of the available BRAM for basic storage. Compared to the Gaussian blur design, the Vivado
generated design used less resources to perform the necessary magnitude calculations. The need
for DSP48E utilization greatly decreased. This was largely due to the decreased amount of the
multiplication application used per pixel. For a Gaussian blur, all the pixels within the window was
required multiplication. Compared to determining the magnitude of each pixel, the process only
utilized four pixels per window.

DSP - - - -
FIFO - - - -
Instance 0 5 1474 3265
Memory 6 - 0 0
Register 0 - 1623 288
Total 6 5 3097 5476
Available 120 80 35200 17600
Table 4.4: Vivado Gradient Magnitude IP Utilization
Gradient theta was the third block synthesized and generated as an IP block for our test. The
generated IP block design is shown in Figure 4.3. Similar to gradient theta, the block was designed
to have an input stream, fimstream, with an output stream fimThetaStream. The control for the
blocks operation was handled by s_axi_CRTL_BUS. Similar to the Gaussian blur and fimMag,
this port let the block know when the input stream began so the function could begin processing
the data. The block was driven by an ap_clk with a reset port, ap_rst_n.
39
Figure 4.3: Vivado Generated Gradient Theta IP Block
After a successful synthesis, a report was generated for the IP block. Table 4.5 shows the
performance metrics for the IP block. The design of this block was optimized for a target clock
cycle of 10 ns. The final design synthesized with an estimated clock cycle of 19 nanoseconds,
potentially requiring a maximum clock speed of 20.25 nanoseconds. To compute the phase at each
pixel, the minimum clock cycles required was 11982095. Using the minimum latency and the
estimated execution time, it took 242.6 milliseconds to determine the phase direction at all pixel
locations.
Clock ap_clk
Target (ns) 10.00
Estimated (ns) 19.00
Table 4.5: Vivado Gradient Theta IP Performance Metrics
Table 4.6 captures the resources utilized to determine the phase at each pixel. From the
available amount of 17,600 LUTs, the block utilized a total of 10222 elements. The core used 5%
of the available BRAM for basic storage. Compared to the other IP designs, gradient theta used the
most LUTs with an overall usage of 58%.
40
DSP - - - -
FIFO - - - -
Instance 0 21 2884 6905
Memory 6 - 0 0
Register 0 - 930 -
Total 6 21 3814 10222
Available 120 80 35200 17600
Table 4.6: Vivado Gradient Theta IP Utilization
4.2 IP core Speedup Analysis
From the three designs synthesized in Vivado, a timing summary to process a single image
was compiled and is shown in Table 4.7. The Gaussian blur and gradient magnitude design proved
to be faster implementations for the AprilTag localization algorithm.
IP Blocks Time of Execution (ms)

Gaussian Blur 7.360
Gradient Magnitude 7.965
Gradient Theta 242.60
Table 4.7: Vivado Generated IP Block Timing Performance
A comparison of the algorithm performance on a computer, Snickerdoodle processor, and

HDL is shown in Table 4.8. Observing the benchmark performance for Gaussian blur, the HDL
implementation for the algorithm resulted in the shortest processing time. Compared to the latency
on a computer or Snickerdoodle, it took 7.36 milliseconds to process a single image. If the logic
block was implemented on a Snickerdoodle, the speedup to perform Gaussian blur could be up to
13.9 times faster. The time it takes to determine the magnitude for each pixel was also improved in
HDL. Compared to the performance on a Snickerdoodle, the speedup was about 2.65 times faster.
The design of gradient theta block failed to meet the desired performance on an FPGA. Comparing
the performance metrics, it took about 3 times longer to compute the phase for each pixel in an
41
image in HDL. Examining the design of the IP block, an essential element of the calculation to
determine phase was floating point division. In HDL, floating point division performs an iterative
series of subtraction which requires many clock cycles to complete. In this block, every pixel
required this process to determine phase direction. This also consumed a large amount of resource
on the FPGA and could potentially limit the design of other subsystem components for a drone.
Steps Process Computer (ms) Snickerdoodle Processor (ms) FPGA (ms)

1 Gaussian Blur 38.413 102.382 7.36
2a Gradient Magnitude 17.952 21.137 7.965
2b Gradient Theta 26.263 78.964 242.60
Table 4.8: Timing Benchmark between a Computer, Snickerdoodle, and HDL
To optimize the performance of AprilTag localization, Gaussian blur and gradient magni-
tude IP blocks can be implemented on the FPGA. Instead of performing the Gaussian blur and
determining the magnitude of each pixel on the processor, they should be processed on the FPGA
for faster performance. After the images are processed, they can then be sent to the processor to per-
form the remaining steps. From Table 4.9, the projected speedup of the implementation decreases
the total execution time from 323.193 milliseconds to 214.999 milliseconds. This decreased the
total time by 108.194 milliseconds with a frame rate speedup of 1.56 FPS.
Steps Process Operation Old Execution Time New Execution Time

1 Gaussian Blur 102.382 ms 7.36 ms
2a Gradient Extraction Magnitude 21.137 ms 7.965 ms
2b Gradient Extraction Theta 78.964 ms 78.964 ms
3 Edge Extraction 83.064 ms 83.064 ms
4 Cluster Formation 19.183 ms 19.183 ms
5 Cluster Segmentation 3.263 ms 3.263 ms
6 Segment Connection 0.376 ms 0.376 ms
7 Quad Formation 0.078 ms 0.078 ms
8 Quad Decode 7.438 ms 7.438 ms
9 Overlap Check 0.011 ms 0.011 ms
10 Position Calculation 7.297 ms 7.297 ms
Total Execution Time 323.193 ms 214.999 ms
Processing Speed 3.09 FPS 4.65 FPS
Table 4.9: AprilTag Detection Algorithm Speedup from Snickerdoodle and FPGA Hybrid
42
4.3 FPGA Utilization
By projecting the implementation of the two IP cores, a potential resource utilization in

hardware is shown in Table 4.10. The largest resource consumed from the two IP cores was the
LUTs. The two blocks could potentially be further optimized if they were combined. This would
help eliminate some of the overlapping resources needed to develop the stream and buffer storage
of the incoming pixels. This approach can further decrease the amount of BRAM utilized for the
two-core implementation.

DSP - - - -
FIFO - - - -
Instance 0 23 2974 6619
Memory 14 - 0 0
Register 0 - 4559 800
Total 14 23 7533 11772
Available 120 80 35200 17600
Table 4.10: Projected FPGA Resource Utilization from Gaussian Blur and Gradient Magnitude
43
Chapter 5
Conclusion
This work has introduced the implementation of AprilTag detection as a technique for drone
localization. The algorithm utilized a series of image processing methods with individual pixels,
where a successful evaluation of an image resulted with a fiducial marker detection. Using Unity,
a test environment was set up to capture a baseline set of test images of an AprilTag in different
positions. This was done to simulate the different potential positions that a drone could view an
AprilTag in 3D space. After determining that the detection algorithm was accurate and suitable
for the application, the algorithm was benchmarked for timing performance. From benchmark
analysis on a computer and a Snickerdoodle, it was determined that certain parts of the algorithm
were a lengthy process and could be improved for a real-time image processing application. The
selected algorithms were then designed using Vivado with a goal of a faster performance when
implemented on an FPGA. From the three IP core designs, two were chosen due to a significant time
improvement while the third failed to meet a faster implementation. A projection of the speedup
in a Snickerdoodle and FPGA application was calculated, resulting with a slightly faster AprilTag
detection algorithm.
5.1 Future Work
The next step of this research is to integrate the constructed IP cores into the FPGA system
of the Snickerdoodle. This will require the development of a hardware interface between the ZYNQ
processor, BRAM, and AXI controller. Previous research performed by Santos-Castro developed
an interface between the IP cores and the ZYNQ chip that transfer data between the processor
and the FPGA [14]. The IP cores developed within this research were also optimized to support
44
streaming data which helped alleviate the issue with limited BRAM resources. Additional speedup
could be achieved by utilizing a CORDIC IP block in Vivado. This block could be used to perform
trigonometric operations, which could be implemented with inverse tangent calculation. The use
of CORDIC block would require a communication scheme with the theta gradient block to send
and recieve data. The synchronization scheme between the two blocks would also need to account
for the inherent latencies of the CORDIC block when processing the input data. From the image
result comparison, it can be seen that the developed IP cores will also require some calibration
to align the output values with the correct pixel positions. In terms of software applications, the
AprilTag algorithm will require a decent amount of work to interface to the BRAM memory and
retrieve the processed images. This is due to the need for a communication scheme between the
FPGA and processor. This is essential to control the start and end of image processing on hardware
and determine when the processed image is ready to be extracted from the BRAM to the processing
core.
From a larger perspective, the application for non-GPS based UAV could be extended to
an UltraZed-EV development board after being fully tested on a Snickerdoodle. UltraZed-EV is
a high performance SOM based on Zynq® UltraScale+™ multiprocessor that is fully capable of
large scale computation. The system contains two processor systems: a Quad-core ARM Cortex-
A53 that can operate at 1.5 GHz and a Dual-core RM Cortex-R5 that can operate at 600 MHz.
Compared to a Snickerdoodle, UltraZed-EV also offers a HDL development platform that is 18
times larger with 504k logic cells. Implementing such a powerful development board will allow
future research to improve AprilTag detection for localization on non-GPS based UAVs.
45
Bibliography
[1] J.-E. Gomez-Balderas, G. R. F. Colunga, L.-R. G. Carrillo, and R. Lozano, “Tracking a ground
moving target with a quadrotor using switching control,” Journal of Intelligent and Robotic
Systems, 2014.
[2] W. Mao, Z. Zhang, L. Qiu, J. He, Y. Cui, and S. Yun†, “Indoor follow me drone,” in Pro-
ceedings of the 15th Annual International Conference on Mobile Systems, Applications, and
Services, 2017.
[3] S. A. M. Lajimi and J. McPhee, “A comprehensive filter to reduce drift from euler angles,
velocity, and position using an imu,” in 2017 IEEE 30th Canadian Conference on Electrical
and Computer Engineering (CCECE), IEEE, 2017.
[4] B. R. Stojkoska, J. Palikrushev, K. Trivodaliev, and S. Kalajdziski, “Indoor localization and

of unmanned and aerial vehicles based on rssi,” in 17th International Conference on Smart
Technologies, IEEE, 2017.
[5] T. Iulian, Ciocoiu, Florin, and D. Moldoveanu, “Vision based localization stereo rig cali-
bration and rectification,” in Optimization of Electrical and Electronic Equipment (OPTIM),
IEEE, 2017.
[6] N. Ferry, “Quadcopter plant model and control system developmentwith matlab/simulink im-
plementation,” in Rochester Institute of Technology, 2017.
[7] H. Lategahn, M. Schreiber, J. Ziegler, and C. Stiller, “Urban localization with camera and
inertial measurement and unit,” in IEEE Intelligent Vehicles Symposium, IEEE, 2013.
[8] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, V. P. K. Wan, L. Ma, and X. Tan, “An im-
provement algorithm on ransac for image-based indoor localization,” in International Wire-
less Communications and Mobile Computing, IEEE, 2016.
46
[9] E. P. Abdelkader Ben Amara and M. Atri, “Sobel edge detection system design and integra-
tionon an fpga based hd video streaming architecture,” in 11th International Design & Test
Symposium, 2016.
[10] J. Wang and E. Olson, “Apriltag 2: Efficient and robust fiducial detection,” in RSJ Interna-
tional Conference on Intelligent Robots and Systems, 2016.
[11] B. Mah, “Model-based design for visual localization via stereoscopic video processing,” in
Rochester Institute ofTechnology, 2017.
[12] P. Samarin, K. B. Kent, R. Herpers, and T. Saitov, “Fiducial marker detection using fpgas,”
in University of New Brunswick, 2013.
[13] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in Top Accessed Articles
2011 IEEE International Conference on Robotics and Automation, 2011.
[14] A. Santos-Castro, “A novel real-time image processing verification framework targeted to the
zynq soc for drone localization applications,” in Rochester Institute of Technology, 2017.
[15] S. Rajan, S. Wang, R. Inkol, and A. Joyal, “Efficient approximations for the arctangent func-
tion,” in IEEE Signal Processing Magazine, IEEE, 2006.
47
Chapter 6
Appendix
6.1 AprilTag Baseline Detection
x 0 0.1254 m
y 0 0.0001 m
z -11 0.0001 m
pitch 0º -0.0006º
roll 0º 0.000º
yaw 0º -0.000º
0.1254 meters x meters
Comment: 11 units
= 1 unit ; 1 unit = 0.0114 meters
Table 6.1: Baseline - Camera Center
48
6.2 AprilTag Detection from 0.25 Meters
x 0 0.2507 m
y 0 0.0002 m
z -22 0.0002 m
pitch 0º -0.029º
roll 0º -0.011º
yaw 0º 0.023º
49
x 5 0.2507 m
y 0 0.0628 m
z -22 0.0002 m
pitch 0º 0.006º
roll 0º -0.000º
yaw 0º 0.011º
x -5 0.2507 m
y 0 -0.0624 m
z -22 0.0002 m
pitch 0º -0.057º
roll 0º -0.011º
yaw 0º 0.040º
Comment:
Table 6.4: Camera View from 0.25 Meters - Camera Left
50
x 0 0.2507 m
y 3 0.0002 m
z -22 0.0374 m
pitch 0º -0.040º
roll 0º 0.006º
yaw 0º -0.029º
x -5 0.2507 m
y 3 -0.0624 m
z -22 -0.0374 m
pitch 0º 0.046º
roll 0º 0.011º
yaw 0º -0.040º
Comment:
Table 6.6: Camera View from 0.25 Meters - Camera Top Left
51
x 5 0.2507 m
y 3 0.0628 m
z -22 -0.0373 m
pitch 0º -0.011º
roll 0º 0.006º
yaw 0º 0.000º
Comment:
Table 6.7: Camera View from 0.25 Meters - Camera Top Right
x 0 0.2507 m
y -3 0.0002 m
z -22 0.0378 m
pitch 0º -0.011º
roll 0º -0.006º
yaw 0º 0.017º
Comment:
Table 6.8: Camera View from 0.25 Meters Away - Camera Bottom
52
x -5 0.2507 m
y -3 -0.0624 m
z -22 0.0378 m
pitch 0º -0.040º
roll 0º -0.011º
yaw 0º 0.011º
Comment:
Table 6.9: Camera View from 0.25 Meters - Camera Bottom Left
x 5 0.2507 m
y -3 0.0628 m
z -22 0.0378 m
pitch 0º 0.006º
roll 0º -0.006º
yaw 0º 0.011º
Comment:
Table 6.10: Camera View from 0.25 Meters - Camera Bottom Right
53
6.3 AprilTag Angled Detection from 0.342 Meters
x 0 0.3414 m
y 0 0.0003 m
z -30 0.0003 m
pitch 0º 0.019º
roll 45º -44.99º
yaw 0º -0.004º
Comment:
Table 6.11: Angled Camera View from 0.342 Meters - Roll
54
x 0 0.3366 m
y 0 0.0003 m
z -30 -0.0649 m
pitch -10º -9.076º
roll 0º 0.012º
yaw 0º -0.042º
Comment:
Table 6.12: Angled Camera View from 0.342 Meters - Pitch
x 0 0.3366 m
y 0 -0.0649 m
z -30 0.0003 m
pitch 0º 0.053º
roll 0º -0.004º
yaw -10º 9.078º
Comment:
Table 6.13: Angled Camera View from 0.342 Meters - Yaw
55
x 0 0.3319 m
y 0 0.0656 m
z -30 0.0646 m
pitch 10º 8.878º
roll 0º 1.555º
yaw 10º -8.663º
Comment:
Table 6.14: Angled Camera View from 0.342 Meters - Pitch and Yaw
x 0 0.3397 m
y 0 -0.0193 m
z -30 0.0343 m
pitch 6º 5.443º
roll 30º -30.00º
yaw 0º -0.003º
Comment:
Table 6.15: Angled Camera View from 0.342 Meters - Roll and Pitch
56
x 0 0.3367 m
y 0 0.0568 m
z -30 -0.0323 m
pitch 0º 0.010º
roll -30º 30.00º
yaw 10º -9.005º
Comment:
Table 6.16: Angled Camera View from 0.342 Meters - Yaw and Roll
x 0 0.3337 m
y 0 0.0779 m
z -30 0.0301 m
pitch 10º 8.924º
roll -30º 31.25º
yaw 8º -6.966º
Comment:
57
6.4 AprilTag Detection from 0.5 Meters
x 0 0.5019 m
y 0 0.0004 m
z -44 0.0004 m
pitch 0º 0.132º
roll 0º 0.006º
yaw 0º 0.040º
58
x 10 0.5020 m
y 0 0.1258 m
z -44 0.0004 m
pitch 0º 0.029º
roll 0º -0.006º
yaw 0º 0.034º
x -10 0.5020 m
y 0 -0.1250 m
z -44 0.0004 m
pitch 0º -0.011º
roll 0º -0.006º
yaw 0º -0.023º
Comment:
Table 6.20: Camera View from 0.5 Meters - Camera Left
59
x 0 0.5019 m
y 6 0.0004 m
z -44 -0.0748 m
pitch 0º 0.074º
roll 0º -0.017º
yaw 0º -0.183º
x -10 0.5020 m
y 6 -0.1250 m
z -44 -0.0748 m
pitch 0º -0.011º
roll 0º -0.000º
yaw 0º -0.046º
Comment:
Table 6.22: Camera View from 0.5 Meters - Camera Top Left
60
x 10 0.5020 m
y 6 0.1258 m
z -44 -0.0748 m
pitch 0º 0.017º
roll 0º -0.006º
yaw 0º -0.006º
Comment:
Table 6.23: Camera View from 0.5 Meters - Camera Top Right
x 0 0.5020 m
y -6 0.0004 m
z -44 0.0756 m
pitch 0º -0.023º
roll 0º -0.006º
yaw 0º 0.011º
Comment:
Table 6.24: Camera View from 0.5 Meters - Camera Bottom
61
x -10 0.5020 m
y -6 -0.1250 m
z -44 0.0757 m
pitch 0º -0.011º
roll 0º 0.006º
yaw 0º -0.006º
Comment:
Table 6.25: Camera View from 0.5 Meters - Camera Bottom Left
x 10 0.5021 m
y -6 0.1258 m
z -44 0.0757 m
pitch 0º 0.023º
roll 0º -0.000º
yaw 0º -0.000º
Comment:
Table 6.26: Camera View from 0.5 Meters - Camera Bottom Right
62
6.5 AprilTag Angled Detection from 0.57 Meters
x 0 0.5694 m
y 0 0.0005 m
z -50 0.0005 m
pitch 0º 0.020º
roll 60º -59.99º
yaw 0º -0.033º
Comment:
Table 6.27: Angled Camera View from 0.57 Meters - Roll
63
x 0 0.5536 m
y 0 0.0005 m
z -50 0.1523 m
pitch 14º 11.75º
roll 0º 0.000º
yaw 0º -0.020º
Comment:
Table 6.28: Angled Camera View from 0.57 Meters - Pitch
x 0 0.5512 m
y 0 0.1629 m
z -50 0.0005 m
pitch 0º 0.009º
roll 0º 0.003º
yaw 15º -12.37º
Comment:
Table 6.29: Angled Camera View from 0.57 Meters - Yaw
64
x 0 0.5396 m
y 0 -0.1621 m
z -50 -0.1257 m
pitch -12º -9.911º
roll 0º 2.606º
yaw -15º 11.79º
Comment:
Table 6.30: Angled Camera View from 0.57 Meters - Pitch and Yaw
x 0 0.5513 m
y 0 -0.0927 m
z -50 0.1335 m
pitch 15º 12.44º
roll 35º -35.01º
yaw 0º 0.013º
Comment:
Table 6.31: Angled Camera View from 0.57 Meters - Roll and Pitch
65
x 0 0.5488 m
y 0 0.0997 m
z -50 0.1422 m
pitch 0º -0.001º
roll 55º -54.99º
yaw 16º -13.08º
Comment:
Table 6.32: Angled Camera View from 0.57 Meters - Yaw and Roll
x 0 0.5514 m
y 0 0.1499 m
z -50 0.0629 m
pitch 12º 10.08º
roll -30º 31.59º
yaw 9º -7.312º
Comment:
66

Fpga Soc Fiducial System For Unmanned Aerial Vehicles: Raymond Zhang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fpga Soc Fiducial System For Unmanned Aerial Vehicles: Raymond Zhang

Uploaded by

Copyright:

Available Formats

FPGA SoC Fiducial System for Unmanned Aerial Vehicles

A Research Paper Submitted

Requirements for the Degree of

DEPARTMENT OF ELECTRICAL AND MICROELECTRONIC ENGINEERING

KATE GLEASON COLLEGE OF ENGINEERING

ROCHESTER INSTITUTE OF TECHNOLOGY

ROCHESTER, NEW YORK

1.1 Translational and Rotational Axis of a Quadcopter Drone in Space . . . . . . . . . 3

2.1 AprilTag 0 From tag36h11 Family . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 Methodology of Speedup for AprilTag Localization Algorithm . . . . . . . . . . . 21

2.1 AprilTag Detection and Position Calculation Process . . . . . . . . . . . . . . . . 8

3.1 Specification of the Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Vivado Gaussian Blur IP Performance Metrics . . . . . . . . . . . . . . . . . . . 37

6.1 Baseline - Camera Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

ARM Advanced RISC Machine

AXI Advanced Extensible Interface

BRAM Block RAM

CLB Configurable Logic Block

CORDIC Coordinate Rotation DIgital Computer

CPU Central Processing Unit

DDR Double Data Rate

FPGA Field Programmable Gate Array

FPS Frames Per Second

GPIO General Purpose Input/Output

GPS Global Positioning System

HDL Hardware Description Language

HLS High Level Synthesis

IMU Inertial Measurement Unit

LUT Look-up Table

RISC Reduced Instruction Set Computer

RTL Register Transfer Level

SLAM Simultaneous Localization and Mapping

SoC System on a Chip

SOM System on a Module

UAV Unmanned Aerial Vehicle

1.1 Unmanned Aerial Vehicle Applications

1.2 Challenges of Autonomous Navigation

1.3 Artifact Independent Localization

1.5 Literature Review

An AprilTag is a form of artifact dependent visual fiducial system developed by Edwin

2.2 AprilTag Detection Process

Table 2.1: AprilTag Detection and Position Calculation Process

Gray = 0.299 ∗ Red + 0.587 ∗ Green + 0.114 ∗ Blue (2.1)

Figure 2.2: Kernel for Gaussian Blur

Figure 2.3: Window to Calculate Magnitude and Phase

magnitude (E) = Ix2 + Iy2 (2.5)

Figure 2.4: High Level Implementation of AprilTag Localization Implementation

2.3 Unity Environment Development for Drone Simulation

0.1254 meters 11 units

Unity AprilTag Detection

Table 2.2: Baseline - Camera Center

Unity Orientation AprilTag Orientation

Table 2.3: Relationship Between Unity and AprilTag Orientation

Unity AprilTag Detection

Table 2.4: Camera View from 0.25 Meters - Camera Center

Table 2.5: Camera View from 0.25 Meters - Camera Right

Unity AprilTag Detection

Table 2.6: Camera View from 0.25 Meters - Camera Top

Unity AprilTag Detection

Table 2.8: Camera View from 0.5 Meters - Camera Center

Table 2.9: Camera View from 0.5 Meters - Camera Right

Unity AprilTag Detection

Table 2.10: Camera View from 0.5 Meters - Camera Top