Professional Documents
Culture Documents
Giulio Corradi Presentation PDF
Giulio Corradi Presentation PDF
Giulio Corradi Presentation PDF
1 = 10 Billion
Water drop Water molecules
Page 3
© Copyright 2017 Xilinx
.
25 Water molecules
7 nanometers
Page 4
© Copyright 2017 Xilinx
.
Advancement in Memory technology
10nm technology
Data transfer rate of 3,200 megabits per second (Mbps),
30 % faster than the 2,400Mbps rate of 20nm DDR4 DRAM
36 Water molecules
Page 5 © Copyright 2017 Xilinx
.
Semiconductor Industry Consolidation
2007 2017 Reasons
Diminishing returns from:
– Moore’s Law and,
– Dennard scaling
Consequences
Huge computing parallelism
– Multicores, Manycores
– Heterogeneous computing
Cobots Industrial
Transport
Enterprise Cloud
rf sensors
Page 9
© Copyright 2017 Xilinx
.
Edge and Cloud applications integrated
Transportation/Infrastructure Autonomous Cars &Trucks Transportation & Grid Control Traffic & Network Analytics
Field Drones & Robots Climate, Water, Field Sensor Data Analytics
Oil & Gas/Agriculture Energy & Flow Control
Edge Node
PLC
IT Network
Button
Database Display
ERP Sensor
MES Drive
SCADA
Temp
PLC sensor Volt-
IO/Drive meter
I/O Light
sensor
Page 11
© Copyright 2017 Xilinx
.
Networking time awareness for different traffic classes
All product names, logos, and brands are property of their respective owners
Page 15
© Copyright 2017 Xilinx
.
Some Industrial Use case for Heterogeneous SoC
Manipulation
1ms
HARD REAL TIME
Data
IEC61131
I/O
Safety Layers
Communication
Motion Layers
100ns
62us
Diagnostic
HMI Layers
1us
Layers
1ms
I/O Layers
31.25us
Layers
Industrial Ethernet
BURST MODES –
Wireless STREAMING MODES
Communication SYNCHRONIZATION MODES
Page 18
© Copyright 2017 Xilinx
.
The basic high level ingredients of heterogeneous SoC
SAFE SAFE
APP APP APP RTP Peripherals
APP RTP
Switches / Crossbar
Switches / Crossbar
Switches / OCM TCM External
Memory
Switches / Crossbar
Page 19
© Copyright 2017 Xilinx
.
Main take away….
Diversified Application Processors
– Interference on caches, memory, pheriperals
– Highest performance realtime (some applications cannot use R-class) seek solutions
Real time processors
– Automatic partition of real-time classes (soft, firm, and hard real-time)
– Competition with application processors for shared resources
– Safety integrity among application processors and real-time processors
Programmable Logic exploitation
– scheduling offloading
– Data movers
– Zero latency synchronizers
– Direct access (ACE, ACP) to caches helps predictability
Different memory traffic
– Reservation, realtime, stream and best-effort
– Purpose specific memory to reduce bottlenecks
Page 20 © Copyright 2017 Xilinx
.
All Programmable Platforms
MB
S_AXI_GP0/1 M_AXI_GP0/1
MB
XADC
Lockstep
MB = MicroBlaze™ is Xilinx’s full-featured, FPGA optimized 32-bit Reduced Instruction Set Computer (RISC) soft processor, single, dual core lock step and TMR
Page 23
© Copyright 2017 Xilinx
.
Zynq® UltraScale+™ Connection Diagram
Performance bottleneck
Wireless
APU WiFi
Dom 0 (Guest OS) Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Virtual Machine 4
Middleware
Process
Safety
(OPC/UA Server)
Machine Learning
SoftPLC and Diagnostic Security HMI
RPU
Linux Kernel
GPOS
RTOS GPOS GPOS
Physical Driver IPC
I/O split I/O split IPC RPU
device device
driver driver
Safety Loop
Hypervisor (XEN Type)
PL
API API (t
Industrial Ethernet TSN (Time Sensitve ML Proprietary IP
HW Plug-in Network)
Network Processor
(Link Layer)
Network Processor
(Link Layer)
3-ports Switch
Dedicated
Industrial ERP/MRP DDR Field
networking
1Gb
100M / 1G
Page 26
© Copyright 2017 Xilinx
.
Performance challenges for the Heterogeneous systems
Application Processors classic challenges (the old problem)
– AMP (Asymmetric Multiprocessing) resource sharing, still an hard problem to solve
– Resource contention at L2 cache
• Caches lock, coloring, and other schemes……cache lock no more available in many new AP clusters
– Resource contention at Memory Controller
• Access policy, different quality of service
– Bus competition
• SCU (Snoop Control Unit)
• Word length (64bits,128bits,256bits)
Page 32
© Copyright 2017 Xilinx
.
How to take advantage of HLS in SoC ZYNQ7000®
Processing System
Static Memory Controller Dynamic Memory Controller Programmable
Quad-SPI, NAND, NOR DDR3, DDR2, LPDDR2 Logic:
System Gates,
AMBA® Switches DSP, RAM
2x SPI AMBA® Switches
S_AXI_HP0
2x I2C MB
ARM® CoreSight™ Multi-core & Trace Debug S_AXI_HP1
2x CAN
NEON™/ FPU Engine NEON™/ FPU Engine S_AXI_HP2 MB
2x UART
I/O Cortex™-A9 MPCore™ Cortex™-A9 MPCore™ S_AXI_HP3
MIO
MUX GPIO 32/32 KB I/D Caches 32/32 KB I/D Caches
Workers
S_AXI_ACP
2x SDIO 512 KB L2 Cache Snoop Control Unit (SCU)
with DMA
Timer Counters 256 KB On-Chip Memory
HLS code
2x USB General Interrupt Controller DMA Configuration
with DMA
HLS code
2x GigE
with DMA AMBA ® Switches
® Switches
AMBA
modules
S_AXI_GP0/1 M_AXI_GP0/1
XADC
Module
Module
Workers
MB
MB
Workers
Dual Core A9
Dual Core Microblaze (MB)
Shared segment in DDR between the A9s and MB
A9 Subsystem
MicroBlaze subsystem
V* - V or BE if DP is not used
BE* - BE or V if GDMA not used
Dual Channel A9
Device Domains
Diverse Channel PS/PL
Device Domains
PS Processing Units
– Applications Processor Unit (APU) = A53 Complex – Configuration Security Unit (CSU): Configuration &
– Real-Time Processing Unit (RPU) = R5 Complex Security
– Graphic Processing Unit (GPU) = Mali-400MP – Platform Management Unit (PMU): Power & Safety
Complex
Page 48 © Copyright 2017 Xilinx
.
Zynq Ultrascale + Low Power Domain Functional Safety
Coverage
1. Lockstep for R5s
2. Triple Modular Redundancy (TMR) for Platform
Management Unit (PMU) and Configuration &
Security Unit (CSU)
3. ECC for TCM, OCM, CSU and
PMU RAMs
4. Memory & Peripheral Protection Units provide
functional isolation
5. CCF coverage by clock, voltage, and temperature
monitors
6. Logic Built In Self Test (LBIST) for checkers &
monitors at power-on
– Peripherals coverage by end-to-end software
protocols
7. Software Test Library (STL) for GIC, interconnect,
SLCRs & error injection
Page 49 © Copyright 2017 Xilinx
.
Functional Safety Design Support and Artifacts
DDR2
128 bit @ 250 MHz
Display
changes
Memory Output
Released “flat” version first
– Two decoders Partial Reconfiguration Region
per channel
128 bit @ 250 MHz
Decoder
Expanded functionality with existing 64 bits @ [MPEG2, DV, VC3, ProRes 422]
250 MHz DDR2
hardware 32 bits @ 125 MHz
RM_allc RM_orig
RM_gray RM_rev
RP_pos
RP_color
RM_sepia
implemented
implement implemented
implement
Page 54
© Copyright 2017 Xilinx
.
Models in the Cloud for who would like to connect the Edges
and the Cloud processing systems
Page 60
© Copyright 2017 Xilinx
.
Contact
Giulio.Corradi@Xilinx.com
Put in the email header ECRTS17 (otherwise you will be Bayesian filtered)
Expect delayed response, if none within reasonable time insist, if still none probably you have been filtered
http://www.pynq.io/ (How to get a full many core Zynq based Python enabled framework)
https://github.com/Xilinx/HLx_Examples (Highly interesting HLS examples)
– Memcached implementation in HLS fully in hardware
– TCP/IP implementation in HLS fully in hardware
– Video streams
– Matrix multiplications offloaded in hardware
https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html
– Full in software environment for what if analysis and performance measurement
facebook.com/XilinxInc
twitter.com/XilinxInc
youtube.com/XilinxInc
linkedin.com/company/Xilinx
plus.google.com/+Xilinx