Professional Documents
Culture Documents
Infiniband
Infiniband
networks: Infiniband
IBA
• The InfiniBand Architecture (IBA) is an
industry-standard architecture for server I/O
and inter-server communicaCon.
– Developed by InfiniBand Trade AssociaCon (IBTA).
• It defines a switch-based, point-to-point
interconnecCon network that enables
– High-speed
– Low-latency
communicaCon between connected devices.
Infiniband used RDMA based
CommunicaCon
RDMA – How Does it Work
Application Application
USER
1 Buffer 1 Buffer 1 2
KERNEL
Buffer 1 Buffer 1
OS OS
Buffer 1 Buffer 1
HCA HCA
HARDWARE
TCP/IP
RACK 1 RACK 2
Mellanox Training Center Training Material
• Infiniband architecture overview
Architecture Layers
IB Architecture Layers
IBA IBA
Transport: Messages
Operations Operations
- Delivers packets to the appropriate Queue Pair; Que Pairs
Message Assembly/De-assembly, access rights, etc. SAR SAR
Inter Subnet Routing
Network Network
Network:
Link L2 Switching LID Based Link
- How packets are routed between different partitions/subnets
Encoding
Media
Packet Encoding
Relay Media
MAC
MAC
Access Access
Control Control
Data Link (symbols and framing):
- From source to destination on the same partition subnet End Node Switch L2 End Node
Flow control (credit-based); How packets are routed
Physical:
- Signal levels and frequency, media, connectors
Ethernet
InfiniBand
Commonly used in Local area network(LAN) Interprocess
what kinds of or communicaCon (IPC)
network
wide area network(WAN)
network
Transmission medium
Copper/opCcal
Copper/opCcal
Bandwidth
1Gb/10Gb
2.5Gb~120Gb
Latency
High
Low
Popularity
High
Low
Cost
Low
High
InfiniBand Components Overview
InfiniBand Devices
Host Channel Adapter (HCA)
• Device that terminates an IB link and
executes transport-level functions and
support the verbs interface
Switch
• A device that moves packets from one
link to another of the same IB Subnet
Router
• A device that transports packets
between different IBA subnets
Bridge/Gateway
• InfiniBand to Ethernet
IBA Subnet
Endnodes
• IBA endnodes are the ulCmate sources and
sinks of communicaCon in IBA.
– They may be host systems or devices.
• Ex. network adapters, storage subsystems, etc.
Links
• IBA links are bidirecConal point-to-point
communicaCon channels, and may be either
copper and opCcal fibre.
– The base signalling rate on all links is 2.5 Gbaud.
• Link widths are 1X, 4X, and 12X.
Channel Adapter
• Channel Adapter (CA) is the interface between
an endnode and a link
• There are two types of channel adapters
– Host channel adapter(HCA)
• For inter-server communicaCon
• Has a collecCon of features that are defined to be
available to host programs, defined by verbs
– Target channel adapter(TCA)
• For server IO communicaCon
• No defined so\ware interface
Addressing
• LIDs
– Local IdenCfiers, 16 bits
– Used within a subnet by switch for rouCng
– Dynamically assigned at runCme
• GUIDs
– Global Unique IdenCfier
– Assigned by vendor (just like a MAC address)
– 64 EUI-64 IEEE-defined idenCfiers for elements in a subnet
• GIDs
– Global IDs, 128 bits (same format as IPv6)
– Used for rouCng across subnets
GID: RouCng across subnets
GID - Global Identifier
Usage
• A 128 bit field in the Global Routing Header (GRH) used to route packets between different IB
subnets
• Multicast groups port identifier IB & IPOIB
Structure
• GUID- 64 bit identifier provided by the manufacturer
• IPv6 type header
• Subnet Prefix: A 0 to 64-bit:
- Identifier used to uniquely identify a set of end-ports which are managed by a common Subnet
Manager
port GUID: 0x0002c90300455fd1
Agent
Agent
Agent
Agent
Agent Agent
Agent
Agent
= Destination
Reached
Yes Send
0x0001 Packet
0x0009 No
Packet
0x0009 Wait for Credits 12
Infiniband RouCng
Linear Forwarding Table Establishment (Path Establishment)
The Best Path Result Will be based on Shortest Path First (SPF)
algorithm LID 75
LID 72 LID 82
CQ CQ
QP
QP
Fabric
Send and Receive
Process
Remote Process
WQE
CQ CQ
QP
QP
WQE
Fabric
Send and Receive
Process
Remote Process
CQ CQ
QP
QP
WQE WQE
Send Recv
Data packet
Send Recv
Fabric
Send and Receive
Process
Remote Process
CQ CQ
QP
CQE
QP
CQE
Fabric
RDMA Read / Write
Process
Remote Process
Target Buffer
CQ CQ
QP
QP
Fabric
RDMA Read / Write
Process
WQE
Remote Process
Target Buffer
CQ CQ
QP
QP
Fabric
RDMA Read / Write
Process
Remote Process
Target Buffer
Read / Write
CQ CQ
QP
QP
WQE
DataSend Recv
packet
Send Recv
Fabric
RDMA Read / Write
Process
Remote Process
Target Buffer
CQ CQ
QP
CQE
QP
Fabric