Zynq Ultrascale+ Architecture Stephanie Soldavini and Andrew Ramsey

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Zynq Ultrascale+ Architecture

Stephanie Soldavini and Andrew Ramsey

CMPE-550

Dec 2017

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17


Agenda

Heterogeneous Computing
Zynq Ultrascale+
History
Architecture
Applications

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 2 / 17


Problem: Flexibility/Performance Trade Off
Programmability / Flexibility

General Purpose
Processors
(GPPs):

Application-Specific
Processors (ASPs)

Configurable Hardware
e.g. FPGAs

Selection Factors:
- Type and complexity of computational algorithms
Co-Processors
(general purpose vs. Specialized) Application Specific
- Desired level of flexibility - Performance Integrated Circuits
- Development cost - System cost (ASICs)
- Power requirements - Real-time constrains

Specialization , Development cost/time


Performance/Chip Area/Watt Performance
(Computational Efficiency)
Software Hardware

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17


Problem: Flexibility/Performance Trade Off
Programmability / Flexibility

General Purpose
Processors
(GPPs): Solution: Use some of
each in a single system
Application-Specific
Processors (ASPs)

Configurable Hardware
e.g. FPGAs

Selection Factors:
- Type and complexity of computational algorithms
Co-Processors
(general purpose vs. Specialized) Application Specific
- Desired level of flexibility - Performance Integrated Circuits
- Development cost - System cost (ASICs)
- Power requirements - Real-time constrains

Specialization , Development cost/time


Performance/Chip Area/Watt Performance
(Computational Efficiency)
Software Hardware

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17


Heterogeneous Computing

Combine the use of different devices, for example:


Hardware accelerator used to speed up one function in a program
Offload matrix calculations to a GPU
Cloud system with GPP, GPU, and/or FPGA resources
Allows for each part of a task to run on the device it is best suited for

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 4 / 17


Zynq Ultrascale+ History

Made by Xilinx
“Microheterogenous”
Integrates GPP, GPU, FPGA, Co-Proc, &
ASIC in one SoC
Increases speed by reducing off-chip data
transfer
Predecessors
Kintex-UltraScale and Virtex-UltraScale
(20/16nm FPGA fabric)
Zynq-7000 (Dual-core ARM Cortex A9 &
28nm FPGA fabric)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 5 / 17


General Architecture

Processing System (PS) Programmable Logic (PL)


Application Processing Unit (APU) 16nm FinFET+
64-bit quad-core or dual-core programmable logic
ARM Cortex-A53
Configurable Logic
Real-time Processing Unit (RPU) Blocks (CLB)
32-bit dual-core ARM Cortex-R5
36 kb Block RAMs
Graphics Processing Unit (GPU)
ARM Mali-400
UltraRAM
On-Chip Memory (OCM) DSP Blocks
256 kB RAM with
Error-Correcting Codes (ECC)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 6 / 17


Processing System
(PS)
Programmable Logic
(PL)
Interconnects & I/O

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 7 / 17


Application Processing Unit (APU)

64-bit quad-core or dual-core


ARM Cortex-A53
Up to 1.5 GHz
ARMv8-A Architecture
64-bit mode: A64
instruction set
32-bit mode: A32/T32
instruction set
Single/double precision floating point unit (FPU)
Cache
IL1: 32 kB 2-way set-assoc with parity (independent for each CPU)
DL1: 32 kB 4-way set-assoc with ECC (independent for each CPU)
L2: 1 MB 16-way set-assoc with ECC (shared between CPUs)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 8 / 17


Real-time Processing Unit (RPU)

32-bit dual-core ARM Cortex-R5


Up to 600 MHz
ARMv7-R Architecture: A32/T32
instruction set
Single/double precision FPU
Caches/Tightly Coupled Memory (TCM)
L1: 32 kB 4-way set-assoc with ECC (independent for each CPU)
TCM: 128 kB (independent, but can be combined into one 256 kB)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 9 / 17


Graphics Processing Unit (GPU)

ARM Mali-400
Up to 667 MHz
One geometry processor
Two pixel processors
Supports OpenGL 1.1 & 2.0, OpenVG 1.1
Advanced anti-aliasing support
Cache: L2: 64 kB

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 10 / 17


Programmable Logic (PL)
16nm FinFET+ programmable logic
Configurable Logic Blocks (CLB)
Look Up Tables (LUT)
Flip flops (FF)
Cascadable adders
36 kb Block RAMs
True dual-port
Up to 72 bits wide
Configurable as dual 18 kb
UltraRAM
288 kb
72 bits wide
ECC
DSP Blocks
27×18 signed multiply
48-bit adder/accumulator
27-bit pre-adder
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 11 / 17
Vivado Design Suite

Bright green
shows
configurable
components

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 12 / 17


Vivado Design Suite

Customize
components,
for instance
the DDR
controller

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 13 / 17


Applications

Data Center: Networked Storage/Service Platform[2]


Multimedia, video encoding/decoding[1]
Particle physics[4]
Automotive driver assistance, driver information, and infotainment.
LTE radio and baseband.
Medical diagnostics and imaging.
Video and night vision equipment.
Wireless radio.
Single-chip computer.

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 14 / 17


Application: Data Center

A sample configuration used for a networked storage platform


4.5X performance speed-up & 20X power reduction over x86
implementations

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 15 / 17


Questions?

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 16 / 17


References

[1] Gosain, Y. and A. Gupta. 2017. “Xilinx Advanced Multimedia Solutions with Video
Codec/Graphics Engines,” Zynq UltraScale+ MPSoC. Xilinx, October 23.
https://www.xilinx.com/support/documentation/white papers/wp497-multimedia.pdf
[2] Hansen, L. 2016. “Unleash the Unparalleled Power and Flexibility of Zynq UltraScale+
MPSoCs,” Zynq UltraScale+ MPSoC. Xilinx, June 15.
https://www.xilinx.com/support/documentation/white papers/wp470-ultrascale-plus-
power-flexibility.pdf
[3] Shaaban, M. “Basics of Computer Design.” Lecture, CMPE-550, Rochester, NY, August 29,
2017.
[4] Stamen, R. “The Development of the Global Feature eXtractor (gFEX) for the ATLAS
Level 1 Calorimeter Trigger at the LHC” Presented at TWEPP 2017, Santa Cruz, CA, 2017.
[5] Xilinx, “Overview,” Zynq UltraScale+ MPSoC Data Sheet, July 2017.
https://www.xilinx.com/support/documentation/data sheets/ds891-zynq-ultrascale-plus-
overview.pdf
[6] Xilinx, “Zynq UltraScale+ Device,” Technical Reference Manual, November 2017.
https://www.xilinx.com/support/documentation/user guides/ug1085-zynq-ultrascale-
trm.pdf

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 17 / 17

You might also like