Professional Documents
Culture Documents
How Computer Architecture Trends May Affect Future Distributed Systems
How Computer Architecture Trends May Affect Future Distributed Systems
Mark D. Hill
• Motivation
• Internet Components
– Clients -- mobile, wireless
– “On Ramp” -- LANs/DSL/Cable Modems
– WAN Backbone -- IPv6, massive BW
– and ...
• SERVICES
– Scale Storage
– Scale Bandwidth
– Scale Computation
– High Availability
• Motivation
proc proc
memory interconnect
memory bridge
i/o bus
proc proc
memory interconnect
proc
• QP setup system call
– Connect with process
Main – Connect with remote QP
dma-W4
Memory (not shown here)
dma-R3
• QP placed in “pinned”
send2 receive1
virtual memory
send1 receive2
• User directly access QP
– E.g., sends, receives &
HCA remote DMA reads/writes
• Roadmap
– NGIO/FIO merger in ‘99
– Spec in ‘00
– Products in ‘03-’10
• My Assessment
– PCI needs successor
– InfiniBand has the necessary features (but also many others)
– InifiniBand has considerable industry buy-in (but it is recent)
– Gigabit Ethernet will be only competitor
• Good name with backing from Cisco et al.
• But TCP/IP is a killer
– Infiniband for storage will be key
• Motivation
• Use
– PCs -- cheap but small
– Workgroup servers -- medium cost; medium size
– Large servers -- premium cost & size
interconnection network
memory memory
100 4
P2:GETX P1:GETX
P1:GETX P2:GETX P1:GETX
P2:GETX P1:GETX
P2:GETX
Mem P0 P1 P2
data data data
data data
Data Network
Address Network
P2:GETXP2:GETX
P1:GETX
send send P1:GETX P2:GETX
Dir/Mem P0 P1 P2
data data data
data data
Data Network
• But
– Cache-to-cache transfers common in demanding apps
(55-62% sharing misses for OLTP [Barroso ISCA ‘98])
– Many applications can’t use 100s of processors
– Must also “scale down” well
• On cache miss
– Predict "multicast mask" (e.g., bit vector of processors)
– Issue transaction on multicast address network
• Networks
– Address network that totally-orders address multicasts
– Separate point-to-point data network
• Processors snoop all incoming transactions
– If it's your own, it "occurs" now
– If another's, then invalidate and/or respond
• Simplified directory (at memory)
– Purpose: Allows masks to be wrong (explained later)
• Techniques
– Many straightforward cases (e.g., stack, code,
space-sharing)
– Many options (network load, PC, software, local/global)
• Address Network
– Must create the illusion of total order of multicasts
– May deliver a multicast to destinations at different times
• Wish List
– High throughput for multicasts
– No centralized bottlenecks
– Low latency and cost (~ pipelined broadcast tree)
– ...
• Sample Solutions
– Isotach Networks [Reynolds et al., IEEE TPDS 4/97]
– Indirect Fat Tree [ISCA `99]
– Direct Torus
P$
DM
(C) 2000 Mark D. Hill PODC00: Computer Architecture Trends
Indirect Fat Tree, cont.
• Basic Idea
– Processors send transactions up to roots
– Roots send transactions down with logical timestamp
– Switches stall transactions to keep in order
– Null transaction sent to avoid deadlock
• Assessment
– Viable & high cross-section bandwidth
– Many "backplane" ASICs means higher cost
– Often stalls transactions
• Want
– Lower cost of direct connections
– Always delivery transactions as soon as possible (ASAP)
– Sacrifice some cross-section bandwidth
(C) 2000 Mark D. Hill PODC00: Computer Architecture Trends
Direct 2-D Torus (work in progress)
• Features
0 1
– Each processor is switch
– Switches directly connected
– E.g., network of Compaq 21364
14 15
• Network order?
– Broadcasts unordered
– Snooping needs total order
• Solution
– Create order with logical timestamps
instead of network delivery order
– Called Timestamp Snooping [ASPLOS ‘00]
• Timestamp Snooping
– Snooping with order determined by logical timestamps
– Broadcast (not multicast) in ASPLOS ‘00
• Basic Idea
– Assign timestamp to coherence transactions at sender
– Broadcast transactions over unordered network ASAP
– Transaction carry timestamp (2 bits)
– Processors process transactions in timestamp order
• Other
– Priority queue at processor to order transactions
– Flow control and buffering issues
(C) 2000 Mark D. Hill PODC00: Computer Architecture Trends
Initial Multifacet Results
• Multicast Snooping
– What program property are mask predictors exploiting?
– Why is there no good model of locality
or the “90-10” rule in general?
– How does one build multicast networks?
– What about fault tolerance?
• Timestamp Snooping
– What is an optimal network topology?
– What about buffering, deadlock, etc.?
– Implementing switches and priority queues?
• Motivation
SMP SMP
SMP SMP
(C) 2000 Mark D. Hill PODC00: Computer Architecture Trends
Multiprocessor Servers, cont.
• Traditionally
– Good error isolation
– Poor communication performance (especially latency)
– LANs are not optimized for clusters
• Clusters
– High communication performance
• Servers
– Better error isolation
– Multi-box solutions
• Use same hardware & configure in the field
• Issues
– How do we model these hybrids?
– Should PODC & SPAA also converge?
(C) 2000 Mark D. Hill PODC00: Computer Architecture Trends
Three Questions