Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Design Lane Detector System for Autonomous Vehicles

based on Hardware using Xilinx System Generator


Pham Thien Long Dinh, Thi Ngoc Diem Nguyen, Duc Khai Lam
University of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam
Email: 18521021,18520597@gm.uit.edu.vn, khaild@uit.edu.vn

Abstract—Hough Transform (HT) algorithm is a method for the computation time of the Hough Transform algorithms. Ad-
extracting straight line from an edge image. In the HT, the ditionally, [2] also proposed solutions for parallel computation
parameters of edge pixels, i.e. points in an image with sharp and parallel voting to increase processing speed.
intensity changes, are treated as ”votes”. Then, they will be
accumulated in Hough-space variables (ρ, θ) to find the votes Based on [2], [3] continues to increase processing speed
which is maximum values or in other words, pixels that have the and reduce memory requirements by electing locations with
same value of Gatan and the ρ distance will be on the same. How- the largest number of votes locally - local maximum.
ever, this algorithm requires huge memory and computational For [4], the authors built a new algorithm for Hough
complexity. In this paper, we proposes a HT architecture that uses Parameter Space (HPS). This reduced the value of memory
a Look Up Table (LUT) to store trigonometric values and use
the value of orientation θ calculated in the Sobel Edge Detection requirements compared to the standard Hough Transform
algorithm instead of rotating small angles as HT standard. We while maintaining the accuracy of detected lines and overall
will then reduce the processing time for each image frame so that pixel information.
it can be applied in real-time processing. Our work has been Similar to [4], the authors modified the basic HT algorithm
processed at 170MHz and the processing time per 1024x1024 in [5] by using lane-specific properties, while also breaking
image resolution frame is 6.17ms.
Index Terms—Hough Transform, Matlab Simulink, Xilinx
down the modified algorithm into a collection of Hardware
System Generator, Hough Transform, Look Up Table, PCIe. (HW) and Software (SW) components. Directly lowering
HT computation and memory usage significantly improves
I. I NTRODUCTION operational performance.
Lane detection is one of the objectives with particular However, all reported solutions primarily optimized HT and
significance in image processing, computer vision; applied in solved the post-processing cost of Inverse Hough Transform
industry as vehicle indication or in advanced driver assistance (IHT) detection that has not yet been implemented in real
technology Advanced Driver Assistance System (ADAS). times. In this paper, we will rebuild a Hough Transform
Although many alternative algorithms have been used in architecture based on the article [4]–[5] to build a full system
lane recognition, the Hough Transform method has found of image processing algorithms from the original video, and
widespread use because it consistently handles issues includ- at the same time improve the execution speed, and processing
ing non-contiguous lines, the appearance of numerous broken resolution of the image frame. Since then, a lane detection and
lines, and the impact of traffic, interference caused by the simulation system has been designed and simulated on Matlab
environment. However, due of the numerous trigonometric Simulink software combined with Xilinx System Generator [6]
calculations and the substantial memory requirements, this library to check the system’s functionality. After testing the
technique has a high computational cost and will operate functionality, we will proceed to create an HDL Netlist into
slowly in real-world situations. Implementing a lane detecting Vivado software and encapsulate Block Design so that it can
system for driver less vehicles based on the Hough Transform be implemented on the PCIe interface.
method is thus still difficult. III. P ROPOSED ARCHITECTURE
In this paper, we suggest a new Hough Transform for As you can see in the Fig.1, our proposed architecture has
straight lane detection architecture that is better suited for an 03 parts:
FPGA implementation in order to achieve a real-time lane
• Matlab: include Matlab scripts designed to add the image
detection system.
to be processed simultaneously performs the steps, sets
The paper is set up as follows. Section II presents the
the parameters, Pre processing, Post processing, and
reported and customary some architectures. Sections III and IV
displays the results.
discuss the suggested architecture and FPGA implementation.
• Simulink and Xilinx System Generator: include modules
Finally, Sections V provides the testing and results of our
that will be designed with Matlab Simulink and Xilinx
architecture.
System Generator tool combined with the parameters
II. R ELATED WORK declared in the Matlab section. All the modules that we
According to previous studies, [2]–[3] built a hardware design are shown from A to D.
• Hardware: implement design dump to FPGA Virtex-7
architecture for the parallel Hough Transform algorithm and
used the LUT lookup table solution, which helped to reduce adopts a PCIe interface and real-time execution.
Fig. 3: Gray Scale block diagram

C. Sobel Edge Detection module


Sobel Edge Detection module assumes the function of lane
Fig. 1: Proposed architecture edge detection. Fig.4 will show the architecture of this module.
• Line Buffer: include each of the gray frame’s pixel values.
• Gx/ Gy Operation: Multiply horizontal direction by verti-

A. Masking module cal direction of the template and then multiply the vertical
direction by horizontal direction of thetemplate:
The main responsibility of the Mask module is to inform

1 0 −1
subsequent modules about which image pixel regions of lanes. vertical = 2 0 −2
This helps make follow-up processes simpler. The common 1 0 −1
lane will be white or yellow depending on the results of the  
experiment. Based on the magnitude value of each pixel with 1 2 1
03 color channels (Red, Green and Blue) to determine if the horizontal =  0 0 0
pixel is white or yellow in the frame by comparing the value −1 −2 −1
• Cordic atan: calculate the gradient direction Gatan and the
of each pixel in each color channel with a constant according
to experiment. If those pixels are likely to be lanes, that pixel’s gradient size Gmag by the formula Eq.(2) Eq.(3) with a
RGB value will then be converted to (255, 255, 255) via AND block in Xilinx, CORDIC ATAN.
and Mux blocks to combine the 3 color channels together. Gatan = atan(Gy /Gx ) (2)
Fig.2 shows the architecture of this Masking module. q
Gmag = G2x + G2y (3)
• Coordinate Counter: store the coordinates (x, y) of the
pixels.
• Compare: select the appropriate Threshold (TH), If the
new pixel value ≥ T H, The pixel points can be regarded
as image edge points. Additionally, we will combine area
Region of Interest (ROI) to find competing pixels that are
likely to become white lines/yellow lines in the image.

Fig. 2: Masking block diagram

B. Gray Scale module


Module Gray Scale will be responsible for converting three
Fig. 4: Sobel Edge Detection block diagram
color channels of RGB into three corresponding gray matrices
for each red, green and blue channel. Fig.3 shows our module
using Eq.(1). The general equation used to convert the gray D. Hough Transform module
scale image is presented in the form of the following formula: The Hough Transform operates on binary pictures obtained
after using edge detection operators like Sobel to assess
gray = 0.299 ∗ red + 0.587 ∗ green + 0.114 ∗ blue (1) the collinearity of edge pixels and subsequently detect lines.
Each edge pixel (x, y) has a corresponding magnitude of
displacement ρ and orientation of displacement θ from the
image origin. According to article [1], the standard Hough
Transform formula is as follows:
ρ = x cos θ + y sin θ (0◦ ≤ θ ≤ 179◦ ) (4)
An already allocated accumulator array in memory, A(ρ, θ), is
represented by the Hough Parameter Space (HPS). Utilizing
Eq.(4) throughout a discrete range of values [0: ∆θ: maxθ],
where ∆θ denotes the discretization step and maxθ denotes
the highest value of θ, the edge pixels inside the image are
processed. A vote is applied to the associated location in
the HPS by use of the generated parameters, which create (a) The original image
an address. The HPS is checked for peaks when the voting
is over. As these relate to linear elements in the image,
parameter coordinates with a sizable number of votes are
noted. The inverse function of the spatial image domain’s line
reconstruction is performed using Eq.(4).

(b) Hough lines image


Fig. 5: Hough Transform block diagram

The remainder of this section describes our Hough Trans-


form architecture and functionality of each of the blocks in
Fig.5 works.
• Calculate rho: calculate ρ by using Eq.(4).
• Coordinate Counter: get the coordinates (x, y) of the edge
pixel to calculate ρ.
• Generate Address HPS: compute addresses in order to
vote and determine (ρ, θ).
• Initial Address: initialize the Hough Transform space’s
address.
(c) Lane segmentation image
• HPS Memory: include 02 Dual-port RAM to conduct the
left and right region voting. The next step is to identify the Fig. 6: Our results on Matlab Simulink
pair of numbers that represent the left and right regions
with the highest vote totals. Vote Accumulation and Read-
out and Reset are the memory’s two operational modes. IV. FPGA I MPLEMENTATION
The HPS vote is configured to be read by port A of the
RAM during vote accumulation so that it can be increased According to the methodology described above, the pro-
by one. Port B updates the vote and writes it back to posed architecture can be easily synthesized on an FPGA
memory. The HPS Memory is defined experimentally as platform.The system consists of two components:The host
we explain in the following. computer has a AMD Ryzen 3 2200G x64 Processor CPU
• Valid Control: create control values for the HPS Memory and 16GB of DDR4 RAM. The board will be enclosed by
block’s selection procedure. the lane detection system using the PCIe interface, thanks to
• Peak Detection: draw a straight line and identify the lane FPGA. The overall system is showed in Fig.7. Throughout the
in which the car can travel based on (ρ, θ) discovered. remainder of this section, we will focus primarily on the ROI
Our architectures are developed using Xilinx System Gen- and memory aspects of our hardware design.
erator. Simulations correctly produced Hough Parameters cor- In module Sobel, to calculate the norm and the direction of
responding to lines in several test images. These were re- gradient generation, we use CORDIC algorithm in its vector
constructed and superimposed on the original images for mode. In this mode, the CORDIC block calculates the magni-
inspection. The output of our system is shown in Fig.6. tude and phase of the gradient from Gx and Gy. CORDIC is
Fig. 7: Overview of the system architecture

an appropriate algorithm for FPGA implementation because it


only requires shifter and adder/subtractor. The outputs of this
stage are the and the angle formed by the gradient direction
like Eq.(2) and Eq.(3). If a contour point’s gradient strength is
larger than a predetermined empiric threshold, it is pipelined
to the HT block. However, many edge points are not included
in the lane contour when seen from the other side. As seen
in Fig.8, pixels with gradient orientation outside of these
intervals can be disregarded and not processed at the HT
stage. This is accomplished by creating a ROI for surrounding
the left and right lanes, with the left limit varying between
atan lef t min and atan lef t max and the right limit vary-
ing between atan right min and atan right max Fig.8. Fig. 9: Implementation of magnitude and gradient block
From there, a trapezoid will encompass the region where
the lane needs to be determined. Comparators are used to
implement the filtering procedures, as seen in Fig.9.
left regions is implemented by two Dual-port RAMs. Each
region will has two Dual-port RAMs. The RAM’s one port
address is used for HT voting, while the other port is set
aside for HT mapping operations to continue being accurate
enough. The HW implementation for this stage is shown in
Fig.10. A Dual-port RAM is employed as the accumulator for
various (ρ; θ) pairs in order to provide simultaneous read-write
operations. The two ports can execute the accumulation and
voting operations simultaneously since they operate in separate
modes (read/write operations) and have a programmable clock
frequency. Following the mapping and accumulation proce-
Fig. 8: Define the left and right lane boundaries’ region of dures of the candidate (ρ, θ) pairings, the largest vote of the
interest (ROI) accumulator is identified using a straightforward comparison
that is still straightforward and requires little processing effort.
In HT module, we employ two LUTs, two multipliers, and A straightforward comparator makes up the parallel peak
one adder to implement Eq.(4) for the mapping operation detecting unit. When the corresponding vote above a prede-
from the image space (x,y) to the Hough space (ρ, θ). These termined empirical threshold, the (ρ, θ) pair is available at the
LUTs employ a fixed-point format and sample θ every 1◦ in output. The stored vote value for this address is immediately
this implementation. The voting method for a simultaneous reset to zero the other RAM will begin voting for the next
lecture from two separate addresses for the right and the frame.
Fig. 11: The Virtex-7 board is shown connected to the host
Fig. 10: Implementation of the accumulator and vote stage PC

V. T ESTING AND R ESULTS The prototype system’s three primary components make
up its core circuit, which uses the FPGA to implement line
The hardware-based autonomous vehicle lane detecting detection in the input video sequence. The first component,
system was created using the Xilinx System Generator and implemented in Python, carries out the preparation for im-
Matlab Simulink tools. Then, we will generate HDL code file age edge extraction and has the ability to read videos and
to run Post Timing Implementation. It was demonstrated that change the IP of an architecture proposal’s size. The second
the Block Design could attain a clock frequency of 170MHz component is the FPGA, which receives data and reads the
on a Virtex-7 VC707 board by synthesizing and implementing results from the return table. Following post-processing, it
it using the Xilinx Vivado Design Suite for various image will proceed to plot the detected coverage and show the
resolutions. The processing time for one pixel in our imple- results on the screen. The proposed implementation was tested
mentation is merely 5.88ns. This outcome does pertain to a on numerous videos with different lighting and road scene
1024x1024 pixel image, it is true. In order to achieve the conditions, including road type (urban street, highway), road
optimal balance of HS accuracy (resolution), processing speed, condition, occlusion, poor line paints, day and night. We
and FPGA resource use, the size of the processed image was should point out that the voting threshold was the same for
selected. Table.I will list the resources necessary to create a all photographs. By visual comparison, we can demonstrate
video with a 1024x1024 pixel resolution. Only 33.06% of from this figure that the implemented architecture successfully
the FPGA’s BRAM tiles are needed by the improved HPS recognizes the straight lane lines. Some images of videos
memory. Furthermore, only 5.83% of the Virtex-7 VC707- under different conditions is shown in Fig.12.
LUT 1’s is used, whereas 25% of the PCIe resource is used.
The performance comparison of various HT implementa-
The model is then written through the PCIe link to the Virtex-7
tions from the literature is shown in Table.II. The frame
VC707 FPGA Board. The Virtex-7 board is shown connected
rate is influenced by the architecture, the hardware platform
to the host PC’s PCIe connection in Fig.11.
being used, the size and the amount of symbols in the image,
as well as other factors. Since the submitted works employ
TABLE I: FPGA resource requirements various resolutions, pre-processing techniques, and platforms,
we adopt a normalized speed as a merit factor. Research [2]
Resource Ultilization Available Ultilization %
LUT 17689 303600 5.83 and [3] have built a processing system for 1024x768 and
LUTRAM 2820 130800 2.16 640x480 images with frequencies of 200MHz and 50MHz,
FF 19204 607200 3.16 respectively. In study [4], the authors built a processing system
BRAM 340.50 1030 33.06 for 460x480 images with an execution speed of 1.47ms/frame.
DSP 2 2800 0.07 In the study [5], the author processed each image frame
IO 4 700 0.57 with the size of 1024x1024 with a frequency of 145MHz,
GT 1 28 3.57 the processing speed is 9.03ms/frame. The architecture we
BUFG 7 32 21.88 built handles the same input image size of the same size the
MMCM 2 14 14.29 study [5], the system achieves a frequency of 170MHz and a
PCIe 1 4 25.00 processing time of 6.17ms/frame.
TABLE II: Results of our work comparison with different architectures
[2] [3] [4] [5] Our architecture
Image Resolution 1024x768 640x480 640x480 1024x1024 1024x1024
Fmax (MHz) 200 50 200 145 170
Processing Speed (ms/frame) 5.4 7.4 1.47 9.03 6.17
Normallized Speed (ns/pixel) 6.8 24.08 4.78 8.61 5.88
Device Altera Stratix IV Cyclone II FPGA Virtex-5 ML505 Xilinx xc7z001-1 Virtex-7 VC707

road-lane detection. In contrast to other work of a comparable


nature, the hardware architecture was created and built using
Xilinx tools. Following that, we established a PCIe interface
with the FPGA. This raises the detection accuracy to match the
standards of real-time applications for road lane detection. Our
FPGA architecture can handle around 6.17ms per 1024x1024
resolution frame with a frequency of 170MHz. In addition, the
system when implemented on an FPGA board with a real-time
video processing speed of about 97FPS.
R EFERENCES
[1] J. Illingworth, J. Kittler, “A survey of the hough transform,” in Computer
Vision, Graphics, and Image Processing, vol. 44, no. 1, pp. 87-116, 1988.
[2] Guan, J.; An, F.; Zhang, X.; Chen, L.; Mattausch, H.J. Real-
Time Straight-Line Detection for XGA-Size Videos by Hough Trans-
form with Parallelized Voting Procedures. Sensors 2017, 17, 270.
https://doi.org/10.3390/s17020270
[3] Guan, J., F. An, X. Zhang, Lei Chen and H. Mattausch. “Energy-Efficient
Hardware Implementation of Road-Lane Detection Based on Hough
Transform with Parallelized Voting Procedure and Local Maximum
Algorithm.” IEICE Trans. Inf. Syst. 102-D (2019): 1171-1182.
[4] Hajjouji, I. E., S. Mars, Z. Asrih and A. E. Mourabit. “A novel
FPGA implementation of Hough Transform for straight lane detection.”
Engineering Science and Technology, an International Journal 23 (2020):
274-280.
[5] D. Northcote, L. H. Crockett and P. Murray, ”FPGA Implementation of
a Memory-Efficient Hough Parameter Space for the Detection of Lines,”
2018 IEEE International Symposium on Circuits and Systems (ISCAS),
2018, pp. 1-5, doi: 10.1109/ISCAS.2018.83511.
[6] Vivado Design Suite. Reference Guide. Model-Based DSP Design Using
System. Generator. UG958 (v2018.1) April 4, 2018

Fig. 12: Lane detection results in various conditions. Original


image (left). Lane detector system results (right)

VI. C ONCLUSIONS
In this paper, we presented a hardware architecture for
Matlab Simulink and Xilinx System Generator-based real-time

You might also like