Professional Documents
Culture Documents
Reliability and State Machines in An Advanced Network Testbed
Reliability and State Machines in An Advanced Network Testbed
Mac Newbold
School of Computing
University of Utah
MS Thesis Defense
April 5, 2004
Advisor: Prof. Jay Lepreau
Distributed Systems
• Distributed Systems are complex
– Many components
– Distributed across multiple systems
• Component failures are relatively common
– But should not cause system breakdown
• “A distributed system is one in which the
failure of a computer you didn’t even know
existed can render your own computer
unusable.” – Leslie Lamport, quoted in
CACM, June 1992
April 5th, 2004 Mac Newbold - MS Thesis Defense 2
Our Context: Emulab
• Emulab is an advanced network testbed
• Complex time- and space-shared system
• System dynamically reconfigures nodes and
network links to create “experiments”
• Key architectural feature: Central Database
– System uses DB for storage, communication
• Complex system with many different scripts
and programs on clients and servers
April 5th, 2004 Mac Newbold - MS Thesis Defense 3
Emulab Background
• First prototype in April 2000 (10 nodes)
• In production since Oct. 2000 (40 nodes)
• Early versions weren’t perfect
– Reliability problems
– Experiments of limited size
– Inefficient use of resources
• Problem is becoming harder
– 200 nodes, 400 remote, 2000 virtual nodes