Professional Documents
Culture Documents
Fpgas Design Ebook Emea Emeaen
Fpgas Design Ebook Emea Emeaen
Fpgas Design Ebook Emea Emeaen
FPGA Design
eBook
By Adam Taylor
Contents
• Introduction 3
• What is a Field Programmable Gate Array (FPGA)? 3
• Benefits of Programmable Logic (PL) 3
• FPGA Building Blocks 3
• FPGA Design 4
• Device families 5
• High end solutions 6
• Toolchains 6
• Vivado Design Suite 6
• Application Software Creation 8
• Acceleration 8
• AI and Beyond 9
• Embedded Processing 9
• Embedded Processors 10
• Soft-Core Processors 11
• Big - Little Approach 13
• Communicating Internally and Externally 13
• Internal Data Movement 14
• Programmable Logic Applications 15
• Test Equipment 15
• Automotive and ADAS 15
• Cloud Computing 17
• Industrial 17
• Deep Dive Application - Creating Embedded Vision Systems 18
• Elements of a PL Image Processing System 19
• Software Defined Image Processing 19
• Conclusion 20
• About the author 21
2
Mouser Electronics eBook
Introduction
and RADAR.
The world we inhabit is analogue. However,
Traditionally programmable logic devices
digital processing enables us to experience
have been available in two device classes:
Since their introduction in the and interact with the world in new ways: from
Complex Programmable Logic Devices
mid-1980s, Field Programmable satellite navigation, to autonomous vehicles,
(CPLD) or Field Programmable Gate Arrays
Gate Array devices have moved augmented reality, and the smartphones we
(FPGA). CPLDs offer a simple device
from providing the developer carry in everyday life.
structure of registers and logic functions
with the ability to integrate several Being able to process this information in using a sea-of-gate approach.
discrete logic functions, e.g. glue real and near real-time requires significant
FPGAs offer a more complex structure
logic, to becoming truly the heart processing capability and, of course, this
than CPLDs and often include dedicated
of the system. processing capability has benefited from
hardware elements such as block memory,
Moore’s law. Design engineers also have
digital signal processing, clock management,
several processing technologies from
Modern FPGA devices and tools gigabit serial transceivers, and IO blocks.
which to choose when selecting the most
are unrecognizable to those frst appropriate for the application at hand. These
introduced; they contain not only
traditional programmable logic
processing technology choices range from
traditional processors to Graphics Processing
FPGA Building Blocks
resources but also high-performance Units (GPU) and Programmable Logic (PL). The basic building blocks of FPGAs are
embedded processors, dedicated the lookup table (LUT), registers, and the
Of all processing technologies, programmable
flexible IO cell structures. LUTs enable the
interfaces and memory controllers, logic is probably the least well known and
implementation of logic equations while
and IO structures capable of often considered one of the more challenging
registers provide the storage element
providing multi-gigabit data rates. to use when implementing solutions.
necessary to implement sequential logic
designs. LUTs and registers are combined
This eBook explores modern FPGAs, Benefits of PL to provide what is often called a logic slice,
how they are programmed and a simple example of which is shown in
Programmable logic enables users to
Figure 1. In modern devices, these
developed, and examines various implement truly parallel implementations
slices contain many options which allow
applications, including a deep-dive of their algorithms and applications. This
implementation of combinatorial or sequential
into image processing. parallel implementation enables a more
logic circuits including local distributed
deterministic and responsive solution.
memory and the ability to use the LUT as a
As such, they are used where real-time
shift register depending upon configuration.
processing and responses are required.
D Q
Input A
A
Input B
B
Input C
C
Input D
D
LUT
3
Mouser Electronics eBook
Switch
Matrix
CLB CLB
CLB CLB
Switch
Matrix
Within the FPGA device, it is common to model the parallel architecture of the Processing System (PS) delays, and even
to group together two slices to form a FPGA architecture. It is also increasingly serial de-serializer structures. This means
Configurable Logic Block (CLB). These common to develop FPGA IP blocks using FPGAs offer any-to-any interfacing and are
CLBs are interconnected to implement the high level synthesis (HLS) using languages able to interface with any standard, bespoke
necessary functionality using routing and such as C, C++ or OpenCL, while these or legacy interface. This flexibility also frees
switching matrixes as shown in Figure 2. languages do not support parallelism up the system designers from becoming pin
compiler directives can be used by the bound when using Applications Specific
4
Mouser Electronics eBook
• Place – The logic resources While FPGAs offer significant performance Of course, the device presented above is
determined by the synthesis tool and interfacing benefits, development of currently the largest Xilinx FPGA which
are placed at available locations FPGA-based solutions could be considered would be overkill for many applications.
within the target device. more complex than traditional software To help guide engineers in selecting a
• Routing – The placed logic resources development. However, modern design tools, suitable FPGA for their application, Xilinx
in the design are interconnected especially high-level synthesis coupled with offers a range of FPGA and System on Chip
using routing and switch matrixes the availability of a range of freely available devices capable of supporting a wide array
to implement the final application. IP together with the capabilities of modern of solutions across several different families.
• Bit File – The generation of the final devices, means this is not the case.
The cost-optimized portfolio developed
programming file for the target FPGA.
around the 28 nm node provides three
Device families different device families, each optimized
Simulation is used to ensure that for different user needs.
If you are unfamiliar with the history of
the implemented design functions in FPGAs, they were invented by Ross Freeman
accordance with the design requirements. and Bernard Vonderschmitt in 1985 with • Spartan-7 FPGAs – The Spartan-7
Engineers create test benches which the release of the XC2064. This first FPGA family is the successor to the
stimulate the RTL modules inputs and has 64 configurable logic blocks. Today’s extremely popular Spartan-6 range
monitors the resulting outputs from the modern Xilinx devices offer the user 8,938, of devices and offers developers
Register-Transfer Level (RTL) module (no s). 000 system logic cells, 3840 DSP elements, with increased performance
The behavior of the modules can be verified and 76 Mb of Block RAM (BRAM) and 90 Mb and lower power over the older
by viewing the simulation waveform as shown of UltraRAM. This is quite a capability step technology 45 nm node. The
in Figure 3 or alternatively writing a more up from the original offering. Spartan-7 devices are I/O optimized,
complex test bench which can check and offering the highest pin count within
verify the outputs.
5
Mouser Electronics eBook
the cost-optimized portfolio. HBM devices are used in applications to At the heart of Vivado is the IP integrator
• Artix-7 FPGAs – A new family accelerate network and storage applications. which enables designers to capture designs
for the Xilinx 7 series which are quickly and easily using IP provided by Xilinx,
transceiver optimized offering Toolchains third parties or custom developed. This IP
can be defined using HDL or alternatively, a
6.6 Gbps transceivers.
• Zynq-7000 SoCs – A revolutionary All devices from the smallest Spartan-7 higher-level approach can be used with Vitis
family when first debuted, Zynq- to the largest Virtex UltraScale+ are HLS which enables the development of IP
7000 SoCs introduced a new class supported by Xilinx development tools. blocks using C and C++.
of devices which combine hard core These development tools cover every
While implementation is Vivado’s focus,
Arm Cortex-A9 processors with FPGA aspect of the design life cycle from RTL
Vivado offers a complete development
fabric. This enables a new class of capture, to simulation, and software
ecosystem and provides several different
device which can provide integrated development for use with processor cores.
capabilities which aid the overall
system solutions, along with the
programmable logic development.
associated benefits of integration • Vivado Design Suite – Vivado enables
including reduce power consumption, the capture of the design, RTL One of the key features of any design
a smaller overall solution, and simulation along with implementation is being able to guarantee functional
significantly reduced EMI. process of synthesis, place and performance of the HDL prior to
route and bit file generation. implementation. To verify the HDL
• Vivado HLS – High-level functionality, Vivado includes an HDL
Devices within this portfolio can support synthesis which enables IP simulator which enables the developer to
a range of applications from sensor fusion, development using C or C++. stimulate the HDL. Depending upon the
to precision control, image processing, and • Vitis Unified Software Platform – stage of implementation, the test bench can
cloud computing. Vitis enables software development be applied against the RTL, synthesized
for embedded processors and also netlist, or the implemented netlist with
High end solutions enables acceleration using OpenCL. associated timing information.
• PetaLinux Tools – Petalinux is
For ultra-high performance and more an embedded Linux solution It is through Vivado that we can also
specialized applications, Xilinx provides for embedded processors. debug designs on the hardware thanks to
the Kintex and Virtex families across three This technology stack enables us to its support for integrated logic analysers
technology nodes at 28nm, 20nm and develop solutions for both traditional (ILA), Virtual IO (VIO), and JTAG to AXI IP.
16nm. This progression of devices provides FPGA and heterogeneous system on chips This allows the designer to instrument the
significant increases in performance which combine programmable logic with programmable logic design and observe
and capability with the UltraScale and high-performance Arm processor cores. behaviour at run time in the actual system.
UltraScale+ family of devices. Interestingly, as we will see, this technology
stack enables implementation using Both Vivado’s implementation and simulation
Kintex devices offer increasing performance, capabilities are used by higher-level tools in
traditional hardware design language (HDL)
logic resources, and transceivers across the the development stack as we will see.
capture and a higher-level system optimizing
three technology nodes. From 65,500 logic
compiler approach depending upon the
cells in the Kintex devices to 1,143,00 in
users’ desired entry point.
Kintex UltraScale+ devices, they offer both
GTH and GTY transceivers which support The inter-relationship between the design
data rates at up to 16.3 Gbps and 32.75 tools can be seen below in Figure 4. Each
Gbps respectively. element of the technology stack provides a
specific capability.
The highest-performance FPGAs are within
the Virtex family. These devices provide
not only logic resources of up to 8,938,000 Vivado Design Suite
system logic cells and transceivers capable
The lowest level of the technology stack
of operating at 58 Gbps, but also support
is Vivado. Vivado enables us to capture
for high bandwidth memory (HBM). These
designs using VHDL or Verilog as well as
devices provide between 4GB and 16GB
synthesise the HDL design to the target
of on-chip DRAM, with up to 460 Gbps
device before placing and routing and
bandwidth or approximately 20 times more
generating the programming file.
than provided by a DDR4 DIMM. Virtex
6
Mouser Electronics eBook
AI / ML Solution
Implementation
Application
Prototyping and Rapid
Development
Embedded and
Accelerated Software
Development
7
Mouser Electronics eBook
8
Mouser Electronics eBook
9
Mouser Electronics eBook
Figure 7 - Zynq MPSoC Processing System Interfacing with PL Image Processing Chain
Both heterogeneous SoC and soft-core add) multiple times with little control code. With the evolution of the Zynq-7000 SoC
embedded solutions have a range of use In applications such as this, leveraging the into the next-generation Zynq MPSoC,
cases across several exciting applications SIMD unit can result in a significant increase a significant step change in processing
as we will see. It is not unusual to implement in performance. capabilities was introduced along with
additional soft-core processors in the the latest logic fabric. For the first time,
Data transfer between the processing system
programmable logic of heterogeneous SoCs heterogeneous processors were introduced
and the programmable logic is implemented
to create a Big-Little enabling off-loading of within the processing system, enabling the
using several Advanced eXtensible Interfaces
time for critical tasks. developer to address several challenges
(AXI). Using this interface, both the processor
within the same device.
system or the programmable logic can be
Embedded Processors the initiator of the transaction. This allows The processing system within the
transfer of data easily to and from the Zynq MPSoC contains the following
In the Xilinx suite of devices, embedded
processor system DDR memory. processor cores:
processors are provided in the Zynq-7000
SoC and Zynq MPSoC product lines. These This combination of processing system
• Application Processing Unit –
devices offer true heterogeneous processing and programmable logic makes the Zynq-
Consists of quad or dual 64-bit
systems on the same silicon. Architecturally 7000 series excellent for implementing
Arm Cortex-A53 processors
in these devices, the processor system applications which require both serial and
• Real Time Processing Unit –
boots first like a traditional processor and parallel processing (for example image
Dual lockstep 32-bit Arm
then configures the programmable logic. processing, robotics, and augmented
Cortex-R5 processors
reality). To ease connectivity and leverage
The Zynq-7000 SoC was the first introduction • Platform Management Unit
the large support of frameworks and
and offers dual or single-core 32-bit Arm – Silicon implementation of
applications, embedded Linux solutions can
Cortex-A9 processors combined with a Triple Modular Redundant
be deployed on the processing system, while
programmable logic. As would be expected, 32-bit MicroBlaze processor
the programmable logic accelerates key
the processing system provides peripherals • Graphics Processor Unit –
elements or algorithms. This combination
used for both volatile and non-volatile of PS and PL provides for a more responsive Arm Mali-400 MP GPU
memory along with several interfacing and deterministic solution. In the table
peripherals such as ethernet, UART and CAN. below, you will find a simple demonstration
In addition to the four processing groups
implementing AES encryption.
To support high-performance applications, which can be programmed by the developer,
each Cortex-A9 core also includes a the MPSoC processing system also contains
Operating System Linux
floating-point unit and a NEON engine. a configuration security processor to
The NEON engine allows processing of Processor System Clocks 36662 implement safety and security processing
large data sets in parallel using a single and security event responses.
instruction against multiple data (SIMD). PS Clocks with Programmable 15644
This is especially useful for applications Logic This diverse range of processing solutions
like image and audio processing, where enables the creation of single-chip solutions
algorithms require data sets to be processed Reduction in Processing Time 54.8% for many applications (e.g. automotive)
using simple instructions (e.g. multiply and where both high-level algorithms and user
10
Mouser Electronics eBook
11
Mouser Electronics eBook
Figure 9 – Big-Little Approach with the Zynq MPSoC and Arm Corex-M3
12
Mouser Electronics eBook
13
Mouser Electronics eBook
These transceivers enable the ultra-fast programmable logic and processing systems, channels: the write address, write data,
transfer of data allowing the programmable if desired. Within the Xilinx environment, the and write response sub-channels.
logic to work with some of the fastest serial primary protocol used for data movement
interface standards such as PCIe, SATA, is the Advanced eXtensible Interface (AXI): AXI Stream interfaces are commonly used to
100G ethernet, SDI, JESD204A/B, USB 3.0 a subset of the Arm AMBA bus, developed transfer information from a single producer
and DisplayPort. specifically to support implementation in to a consumer, typically between IP blocks
programmable logic. as part of a processing chain. Example
Xilinx identifies transceivers as GTx, where x processing chains could be image processing
indicates the specific standards. UltraScale To provide scalability for different use or signal processing, where the signal is
and UltraScale+ devices provide for data
cases, AXI itself offers three different received and processed by each IP block
rates between 6 Gbps (GTR) and 58 Gbps
interfacing standards. before being passed on to another.
(GTM). The specific mix of GTx depends
upon the family. Across the UltraScale and When AXI interfaces are implemented
UltraScale+ families of devices, this means • AXI Full / Memory Map - A higher- in programmable logic, we can benefit
we get an excellent range of ultrafast IO and, performance memory-mapped from wide data bus widths to increase the
consequently, a significant peak bandwidth. interface that supports independent bandwidth. Data bus widths can vary between
read and write channels. Both 32 to 256 Bits when assuming a clock of
Scale GTx Gbps channels enable bursts to optimize 400MHz, giving data rates between 12.8
throughput. In programmable logic Gbps and 100 Gbps. This arrangement
Virtex UltraScale+ GTY/ 32.75/58.0
designs, AXI Full is often used to of AXI Full, AXI Lite, and AXI Stream is
GTM
implement direct memory transfers shown in Figure 11.
Kintex UltraScale+ GTH/ 16.3/32.75 between the programable logic and an
GTY external DDR memory, for example. AXI interfaces can be locked down using Arm
Zynq UltraScale+ GTR/ 6.0/16.3/ • AXI Lite – A stripped down version TrustZone software to support design security
GTH/ 32.75 of AXI Full to provide a memory- when working with Zynq MPSoC UltraScale+
GTY mapped interface which can devices. This capability is increasingly
be used for configuration and important both in the cloud and at the
Virtex UltraScale GTH/ 16.3/30.5
control of IP blocks. AXI Lite does edge. Allowing these orthogonal software
GTY
not support burst accesses. worlds prevents lower security, higher-risk
Kintex UltraScale GTH 16.3 applications from being able
• AXI Stream - A unidirectional stream
of data from a producer to a consumer. to access registers and peripherals defined
This stream is point-to-point and as secure.
Of course, being able to interface with such
contains no addressing information.
data volumes provided by the HD, HP, HR,
When we are working with heterogeneous
and GTx interfacing capabilities means we
Both the AXI Full and AXI Lite consist SoCs like the Zynq MPSoC UltraScale+
need to be just as efficient if not more within
of independent read and write channels. device, system-level cache coherency
the device.
Of course, the complexity of the channel becomes increasingly important. AXI also
varies between the two flavours. The read provides the additional sideband signals to
Internal Data Movement channels consist of two sub-channels: the provide IO cache coherence and complete
read address and control sub-channel and cache coherence with the ACE, ACP and
A key internal feature of the programmable HP(C) ports available on the Zynq MPSoC
the read data and response sub-channel.
logic device is the ability to move data: processing system.
The write channel consists of three sub-
between IP blocks, even between
14
Mouser Electronics eBook
Programmable Logic
Test Equipment as part of the STTE enables the
15
Mouser Electronics eBook
along with advanced triggering capabilities more responsive solution, which is required by several IEEE 802.1 standards. Time
on the digitized data. The ARM processor to implement safe interaction with other awareness across the network is implemented
core contained within the SoC runs the vehicles and the environment. in TSN by allocating scheduled traffic in time-
operating system and scope application and defined slots, while also supporting cyclic
provides the USB interfacing capabilities.
Such an approach enables a more tightly
Cloud Computing data transmission and providing pre-emption
for higher priority packets.
integrated solution.
One of the hottest topics in the
Correctly implementing TSN requires a
programmable logic world is data centre
Automotive and ADAS acceleration. Deploying programmable
solution which can provide a low latency and
deterministic response at TSN end points
logic in data centres combines large FPGAs
In line with the society of Automotive and switches. This is where Xilinx Zynq SoC
with x86 processors connected using PCIe.
Engineer’s five-level capability matrix, the and Zynq MPSoC devices come into play
Such an approach enables the x86 software
automotive world is on a mission to increase because they enable the implementation of
application to offload highly parallel functions
the level of assisted and automated driving TSN ports with programmable logic.
to the FPGA, accelerating the performance of
capabilities in vehicles.
the system. To be able to deploy accelerators We can implement TSN ports in the Xilinx
In order to safely interact with the external with the Cloud or on premises, Xilinx offers ecosystem using their TSN Ethernet
world, automotive solutions must use a range of accelerator cards called Alveo. Endpoint MAC LogiCORE IP within a Zynq-
several diverse sensor modalities and These cards are design to interface over PCIe 7000 SoC or Zynq UltraScale+ MPSoC.
communications systems, which include and are programmed using the Vitis unified Each device utilises both the Processing
the following: software platform and OpenCL. This enables System (PS) and the Programmable Logic
the developer to use C/C++ and OpenCL to (PL). The LogiCORE IP consists of FPGA
• Vision Systems – Including Infrared accelerate algorithms. Typical datacentre logic for MAC, TSN Bridge, and TSN
• 4D RADAR applications include quantitative finance, Endpoint, along with software components
• LIDAR database and data analytics, machine for network synchronization, initialization,
• Accelerometers learning, and network acceleration. and interfacing with network configuration
• Global Positioning Systems controllers for Stream Reservation as
• Vehicle-to-Vehicle and Industrial defined in P802.1Qcc. The software is
Infrastructure Communications designed to run on PetaLinux and will be
Programmable logic plays a large part published as Yocto patches.
in Industry 4.0 where one of the main
These sensors and systems generate challenges is to implement a converged The logic IP core provides deterministic
significant data volumes which need to be network. Converged networks merge the behavior in the PL for synchronization (IEEE
aggregated and processed before decisions Information Technology (IT) and Operation 802.1AS), scheduled traffic (IEEE 802.1Qbv),
on vehicle actions can be implemented, Technology (OT) networks. Traditionally and seamless redundancy (P802.1CB) while
presenting several challenges to the system the IT network is where the Enterprise helping to offload the processing unit. It
designer. With such a diverse range of Resource Planning (ERP) is located, while is also possible to implement an optional
sensors comes a diverse range of sensor the OP network contains the sensors and integrated time-aware L2 switch to enable
interfaces, ranging from high performance drives used to manufacture the product. either chain or tree topology.
multi-gigabit serial links (e.g. MIPI) to lower- Connections between the two bring about
speed interfaces such as SPI and I2C as used challenges using gateways and bridges and Once implemented, the TSN can be
by lower-speed sensors. Traditional system- protocol convertors, which can limit the combined with custom applications like motor
on-chip solutions provide the user with a scalability of the OT network. control or sensor interfacing within the PL to
limited number of fixed-function interfaces be able to act under the control of the TSN.
of varying types. Using a programmable As such, the IT and OT networks have
logic device enables the system designer
to leverage the flexible IO structure to
different requirements. The IT network Deep Dive Application
needs to be able to access multiple systems
implement the specific number and type and databases etc., while the OT network - Creating Embedded
of interfaces required, freeing previous needs to be real time and deterministic to be
IO limitations. able to control its sensors and drives. Vision Systems
The parallel structure of programmable There is something special about seeing an
One of the increasingly popular solutions is
logic enables the implementation of parallel image that you have created on a display.
to implement a solution using Time-Sensitive
image/signal processing chains. This parallel That display could be demonstrating a
Networking (TSN). The most popular TSN
implementation of algorithms provides a simple, transparent image showcasing an
standard is Ethernet which is defined
16
Mouser Electronics eBook
image sensor’s capability. Alternatively, it Developers can get the best of both worlds further processing. Once the image has
could be implementing an advanced image by using a heterogeneous SoC such as the been captured, further processing might
processing solution that identifies and Zynq-7000 SoC or Zynq UltraScale+ MPSoC. be required to obtain a useable image
classifies objects or tracks movement in the These devices combine programmable logic for the image processing algorithm. This
image. Of course, with the correct sensor additional image processing may require
with high-performance Arm processors.
selection, we can even extend the range of color filtering (debayer) to convert raw
This provides significant flexibility because
vision beyond the visible range of the EM pixel values to RGB pixels. The image
the image processing algorithms can be
spectrum into the infrared or X-ray elements capture phase may also include gamma
implemented within the programmable logic.
of EM spectrum. correction, noise filtering, and color space
While the processing system can provide the
conversion. In adaptive systems, the input
Implementing image processing algorithms is image processing algorithm configuration to video timing and resolution will be detected
computationally intensive, especially as image allow easy adaption to new image sensors to enable the image processing system to
resolutions increase beyond HD and move or requirements, it can also implement high- automatically configure itself for the video
to 4K. A color HD image of 1920 pixels by level algorithms that take the output from format received. An example image capture
1080 lines using a 30-bit pixel must be able to pipeline can be seen in Figure 13 below
the image processing system.
process 3.73 Gbps to achieve a 60 frames per which includes a MIPI CSI-2, demosaic
second. Moving to 4K resolution, which has
3840 pixels and 2160 lines with a 30-bit pixel Elements of a PL Image (debayer), and frameBuffer to write to PS
DDR memory.
and 60 frames per second, requires quite an
increase of 14.92 Gbps. Each stage of the
Processing System
Algorithm – This is the actual implementation
image processing algorithm must, therefore, Implementing an image processing system of the image processing algorithm. In many
be able to support this data rate to achieve in programmable logic is not as daunting cases it will consist of several stages of
the desired frame rates, even when doing as it first may seem. The image processing image processing algorithms, each one
complex calculations on each pixel. pipeline can be broken down into three connected to the next stage using an AXI
distinct elements: image capture, algorithm, Stream. These IP blocks may be provided by
The truly parallel nature of programmable
and output pipeline. the Xilinx Vivado IP library which includes
logic provides an ideal technology to
IP blocks that can scale images up or and
implement image processing pipelines. The Image Capture – The image capture
layer video layers on top of each other as
parallel nature frees the developer from the pipeline connects interfaces directly with
demonstrated in Figure 14.
sequential software world where each stage the image sensor or camera. As such, the
image capture interfaces externally to Alternatively, they can be implemented using
of the image processing algorithm must be
the programmable logic using interfaces a hardware description language or high-
implemented in sequence. In programmable
such as HDMI, SD/HD/UHD-SDI, MIPI level synthesis which enables higher-level
logic, the algorithm’s elements run in
or Parallel/LVDS. Thanks to the flexible languages such as C/C++ to implement
parallel, enabling an increased throughput
nature of programmable logic IOs, most image processing algorithms. Using a
and a more deterministic performance. This standards can be implemented using higher-level language enables developers
can be critical for many image processing the IO structures without the need for an to leverage the vision domain Xilinx Vitis
applications that use embedded vision to external PHY. To help capture the image, accelerated libraries. These libraries
interact safely with the environment. ADAS Xilinx provides a range of IP cores in provide several advanced image processing
or vision-guided robotics are two good the Vivado IP library that will enable the functions like filters, bounding boxes, bad
examples. image to be captured and made ready for pixel correction, warp transformation, and
17
Mouser Electronics eBook
Figure 14 – Image Processing Video Mixer mixing live video with a Head Up Display.
stereo block matching. If the image needs the programmable logic, the developer needs processing system to perform high-level
to be made available to the processor to convert the processed image which is in algorithms on the processed image contents.
system for higher-level algorithms for an AXI Stream format into the correct output The software can further process the image
example, Video Direct Memory can be used format. Along with converting the AXI Stream and output it back into the image processing
to transfer the video stream to the PS DDR. into the appropriate format, the video must stream if so desired.
This transfer can also operate to transfer also be re-timed for output. Just like with the
data from the processor system to the
programmable logic. Such PS-PL transfers
image capture and algorithm pipeline, the
Vivado IP library contains the necessary IP
Software Defined Image
can be used to provide an overlay on the cores to generate the output video in the Processing
image, presenting information on the display correct format. Figure 15 below shows a
if required. typical output pipeline with a frame read from There are several approaches which can
the PS DDR passing data to the AXI Stream be undertaken when it comes to working
Output Pipeline – Once the image has to Parallel Video output, operating under the with the image in the processing system.
completed the algorithm pipeline, the control of video timing controller. Regardless of the approach taken, the
processed image needs to be out to the image processing implemented within the
appropriate display. This could be MIPI-DSI, It is possible to move the image into the programmable logic is highly configurable
HDMI, SD/HD/UHD-SDI or traditional parallel processing system DDR memory during by the developed application software.
video with pixel clock, V Sync and H Sync. In the algorithmic processing. This allows the
18
Mouser Electronics eBook
Bare Metal – Bare metal developments aids the acceleration of the machine learning
are often used as an initial stage in the inference into the programmable logic using
development of the image processing the DPU and supporting frameworks such as
system. They enable the developers to TensorFlow, Caffe and PyTorch.
demonstrate the design quickly and easily in
the programmable logic and the image sensor
can be correctly configured. This allows for
Conclusion
the creation of a transparent path which In this eBook we have introduced the basics
displays the captured image on the selected of what an FPGA is, the tools we use to Adam Taylor is a world-recognised expert
display. The bare metal application does not work with AMD FPGAs along with looking in in design and development of embedded
include the complexity of an embedded Linux depth structures, elements and features of systems and FPGAs for several end
stack. As such, it is very useful for debugging a modern FPGA. Along with demonstrating applications. Throughout his career,
and commissioning the design using the several different applications which can Adam has used FPGAs to implement
Internal Logic Analysers (ILA) and memory benefit from FPGA technology. a wide variety of solutions from RADAR
views as well as the debugging capabilities to to safety critical control systems (SIL4)
inspect register contents. If you want to know more about FPGAs,
and satellite systems. He also had
a range of resources are available at
interesting stops in image processing
PYNQ - The open source PYNQ framework https://resources.mouser.com/programmable-logic.
and cryptography along the way.
enables developers to leverage the
power of Python to control IP within the Adam is Chartered Engineer, Senior
programmable logic thanks to several Member of the IEEE, Fellow of the Institute
PYNQ APIs, drivers and libraries. Thanks of Engineering and Technology, Arm
to these PYNQ provisions, developers Innovator, Edge Impulse Ambassador.
can focus on the algorithm development
because the PYNQ frameworks includes He is also the owner of the engineering
drivers for most AXI connected IP blocks and consultancy company Adiuvo
on the programmable logic. PYNQ runs Engineering and Training which develops
on a Ubuntu-like distribution and enables embedded solutions for high reliability,
developers to start focusing on the image mission critical and space applications.
processing algorithms using OpenCV Current projects include ESA Plato,
and other popular Python frameworks. Lunar Gateway, Generic Space Imager,
Using PYNQ enables us to focus on the UKSA TreeView and several other clients
algorithm development using the real- across the world.
world sensors, which includes being able
to see the limitations of the sensor under FPGAs are Adam‘s first love, he is the
different conditions and the impacts on author of numerous articles and papers
the implemented algorithm. Once we know on electronic design and FPGA design
what the algorithm is, we can implement including over 440 blogs and 30 million
the functionality in the programmable logic plus views on how to use the Zynq and
using Xilinx IP, HLS or the Vitis accelerated Zynq MPSoC for Xilinx.
library function.
19