Interconnection Networks: Ned Nedialkov

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Interconnection Networks

Ned Nedialkov

McMaster University
Canada

CS/SE 4F03
January 2016
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Outline

Shared-memory interconnects

Distributed-memory interconnects

Bisection width

Bandwidth

Hypercube

Indirect interconnects

From TOP 500


c 2013–16 Ned Nedialkov 2/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Shared-memory interconnects

I Most widely used are buses and crossbars


I Bus
I Devices are connected to a common bus
I Low cost and flexibility
I Increasing the number of processes decreases
performance, as wires are shared
I As the number of processors increases, buses are replaced
by switched interconnects
I Switched interconnects use switches to control the routing


c 2013–16 Ned Nedialkov 3/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Crossbar
P1 P2 P3 P4

M1

M2 I Lines: bidirectional links,


M3
squares: cores or memory,
M4

(a)
circles: switches
I Allows simultaneous
communication between
(i)
(b)
(ii)
nodes
P1 P2 P3 P4 I Faster than buses
M1
I Cost of switches and links is
M2

M3
relatively high
M4

(c)

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 4/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Distributed-memory interconnects
I Direct interconnects: each switch is connected to
processor-memory
I (a) Ring: each switch has 3 links; p processors 2p links
I (b) Toroidal mesh: each switch has 5 links
p processors: 2p links

P1 P2 P3

(a) (b)

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 5/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

I IBM’s Blue Gene/L and Blue Gene/P, Cray XT3: three


dimensional torus
I IBM’s Blue Gene/Q: five dimensional torus
I Fujitsu K: six dimensional torus interconnect called Tofu


c 2013–16 Ned Nedialkov 6/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Bisection width
I The minimum number of links that can be removed to split
the nodes into two halves
I The bisection width is a measure of how many
communications can be done simultaneously
I (a) 2, (b) 2
A B A B

A B B A

A B A B

A B B A

(a) (b)

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 7/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Example
I Two-dimensional toroidal mesh with p = q 2 nodes
I Remove middle horizontal and middle wrap around links

I 2q = 2 p bisection width

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 8/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Bandwidth

I Bandwidth is the rate at which data is transmitted


I Bisection bandwidth:
(bisection width)× (bandwidth of a link)


c 2013–16 Ned Nedialkov 9/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Hypercube

(a) (b) (c)

I Hypercubes of dimension (a) 1, (b) 2, (c) 3


I Hypercube of dimension d has p = 2d nodes
I Bisection width is p/2

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 10/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Indirect interconnects

I The switches may not be connected directly to processors


I Crossbar with unidirectional links:

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 11/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Omega network
I The switches are 2 × 2 crossbars
I 2p log2 p switches versus p2 switches in a crossbar

Figure from P. Pacheco, An Introduction to Parallel Programming, 2011


c 2013–16 Ned Nedialkov 12/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Blue Gene: 3D Torus


c 2013–16 Ned Nedialkov 13/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

From TOP 500

I http://www.top500.org/list/2013/11/,
November 2013
I Performance is measured on the Linpack Benchmark
I Petaflops (Pflops), 1015 flops per second


c 2013–16 Ned Nedialkov 14/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

no. super computer Pflops cores


1 Tianhe-2, National University of 33.86 3,120,000
Defense Technology, China
2 Titan, Cray XK7, Department of 17.59 560,640
Energy (DOE) Oak Ridge National
Laboratory, USA
3 Sequoia, IBM BlueGene/Q, 17.17 1,572,864
Lawrence Livermore National
Laboratory, USA
4 Fujitsu K, RIKEN Advanced In- 10.51 705,024
stitute for Computational Science
(AICS), Japan
5 Mira, BlueGene/Q, Argonne Na- 8.59 786,432
tional Laboratory, USA


c 2013–16 Ned Nedialkov 15/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

no. super computer Pflops cores


6 Piz Daint, Cray XC30, Swiss 6.27 115,984
National Supercomputing Centre
(CSCS), Switzerland
89 BGQ - BlueGene/Q, Southern On- 0.36 32,768
tario Smart Computing Innovation
Consortium/University of Toronto


c 2013–16 Ned Nedialkov 16/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

Highlights

I Tianhe-2 uses Intel Xeon Phi processors to speed up


computational rate
I No. 2 Titan and No. 6 Piz Daint use NVIDIA GPUs to
accelerate computation
I Total of 53 systems on the list use
accelerator/co-processor technology
I 38 use NVIDIA chips
I 3 use ATI Radeon
I 13 use use Intel MIC technology (Xeon Phi)


c 2013–16 Ned Nedialkov 17/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

I Intel provides 82.4% of TOP500 systems processors


I 94% use processors with 6 or more cores
I 75% use processors with 8 or more cores
I Biggest users of HPC
1. USA
2. China
3. Japan
out of 500
USA 265
Asia 115
Europe 102
UK 23
France 22
Germany 20


c 2013–16 Ned Nedialkov 18/19
Shared-memory interconnects Distributed-memory interconnects Bisection width Bandwidth Hypercube Indirect interconnects

TOP500 started in June 1993


Number 1:

year super computer Gflops cores


Nov/1993 CM-5/1024, Thinking Ma- 59.7 1,024
chines Corp. Los Alamos,
USA
Nov/2003 Earth-Simulator NEC, Japan 35,860.0 5,120
Agency for Marine-Earth Sci-
ence and Technology


c 2013–16 Ned Nedialkov 19/19

You might also like