Professional Documents
Culture Documents
Networking Class Notes UW Spring 2013
Networking Class Notes UW Spring 2013
Say the network supports 100 Mbps. Each user subscribes to 5 Mbps and uses the n
etwork 1/2 of the time. So it looks like the network can support 20 users. But t
he chance of all 20 using their bandwidth at once is 1/2^20, so most of the band
width is wasted. We can have 30 users and most of the time they'll still get eno
ugh bandwidth given their usage. 30/20 = 1.5 is the statistical multiplexing gai
n.
Content delivery: to deliver the same content to n users, instead of routing it
n times through almost the same path (different only at the end when it goes to
different clients), route it to a replica machine close to all the users, then d
istribute to each user. If the common path has m machines, the old way takes mn
hops, the new one takes m + n hops -> way more efficient.
Large networks have more economic value than small ones because of Metcalfe's la
w: the value of a n-node network is proportional to n^2, e.g. a 6-node network h
as 6*5 = 30 connections, but a 12-node network has 12*11 = 121 connections.
Host = edge device = sink
Link: 3 kinds
full duplex: bidirectional
half duplex: bidirectional but not at the same time
simplex: unidirectional, not common
Wireless links: messages are broadcast to all nodes in range
Note that when node A is sending to node B
B can't send to any other node at the same time because a node can't sen
d and receive at the same time
no other node that's connected to B can send to B because that'd interfe
re with B's reception
The Socket API: used for clients and servers to talk to each other, supports str
eams (reliable) and datagrams (unreliable)
Apps can attach to the network at different ports.
Primitive calls:
SOCKET: create new communication endpoint
SEND/RECEIVE
CONNECT: try to establish a connection
BIND: associate local address with socket
LISTEN: announce readiness to accept connection
ACCEPT: establish incoming connection
CLOSE
Example app: client and server connect, client sends request, server sends respo
nse, both disconnect.
client.socket
server.socket
server.bind
server.listen
server.accept*
client.connect*
client.send
server.receive*
server.send
client.receive*
client.close
server.close
*: blocking calls: wait for other side to do something
Program code:
Client
socket()
getaddrinfo() // translates human friendly names to addresses eg IP addr
esses
connect()
...
send()
receive()
...
close()
Server
socket()
getaddrinfo()
bind()
listen()
accept() // loops from here to end to serve many clients
...
receive()
...
send()
close()
Traceroute: looks inside network as a packet travels. Returns message from host
at each hop, thus tracing the packet.
Protocol: instances of a protocol on different nodes can talk to each other indi
rectly, by having instance of a lower layer protocol on the same nodes talk to e
ach other. THe lower protocol uses yet lower protocols to talk all the way down
to the physical medium layer.
Protocol stack: set of protocols used on a node.
Example: browser connected to the internet wirelessly
HTTP
TCP
IP
802.11
Encapsulation: where lower layers wrap content from higher layers and add their
own info to make a new message for delivery. Like the postal service, lower laye
rs don't look inside the contents of higher layers.
Lower layers usually add their info to the front, called the header. Sometimes t
hey also add trailers, or encrypt or compress contents, or segmentation and reas
sembly.
A host running multiple apps using multiple protocols (eg browser and Skype) wil
l need to route packets from the same lower layer protocol to the right higher l
ayer protocol, called demultiplexing. This is done with demultiplexing keys in t
he headers. The key in each layer says what the next higher protocol is, eg the
Ethernet header may point to IP instead of ARP, the IP header points to TCP inst
ead of UDP. The TCP header has the port number to point to the right app.
Advantages of layering
Info hiding and reuse: when the lower layers differ (eg Ethernet and 802.11)
the top layers can still be the same -> reuse.
Different same-layer protocols (eg Ethernet and 802.11) can talk by having a
middle device that takes off the header of one protocol and attach the header f
or the other, then pass the message along. This is a router.
Disadvantages:
Overhead, but only a few bytes, very insignificant for large messages
Hides info, eg when you need to know if you're wired or wireless.
Reference models
Physical media
Wires
Twisted pair: LANs and phones. Twisting reduces radiated signal
Cat 5 UTP (unshielded twisted pair) cable: 4 pairs of twisted ca
ble.
Coaxial cable: from the inside out: copper core, insulating material
, braided outer conductor, protective plastic covering. Faster.
Fiber: long thin pure strand of glass
multimode: shorter links, cheaper
single mode: up to 100km
Wireless
can interfere -> must moderate
How to carry digital signals through analog cables
Signals can be decomposed to first and second harmonics using Fourier tr
ansform.
If the bandwidth (width of frequency band - EE definition) is reduced, t
he signal is degraded until it's no longer clear if it's a 0 or a 1. We can reco
ver the signals up to a certain level of degradation.
Effects of traveling over wire on a signal
delayed
attenuated
frequencies above a cutoff are highly attenuated
noise added
Fiber: very little attenuation over very long distances
Wireless:
very fast attenuation at rate of 1/d^2 where d is distance from transmis
sion point
interference and confusion if can receive multiple signals on the same c
omputer
Spatial reuse: can use the same frequency if distance is long enough, be
cause one of the two signals would be too weak at any given place
Multipath fading: when a receiver receives 2 signals from the same trans
mitter, but one bounced off a reflector (eg filing cabinet) so it's delayed and
shifted a bit. When the two signals are added they cancel each other out -> mult
ipath fading. It's possible for reflectors to be positioned such that the reflec
ted signal is as strong as the original, so when added we have a signal twice as
strong.
Modulation: for translating analog to digital signals
Baseband modulation: for physical media like wires
NRZ (Non-Return to Zero) scheme: high voltage = 1, low = 0
Can use more signal levels, eg 4 levels to transmit 2 bits at a time
(level 4 to transmit 11, 3 for 10, 2 for 01, 1 for 00, for example)
Engineering considerations:
Clock recovery: When there's a long string of repeated bits eg 0
00000...0, it's hard for an ordinary clock to tell one 0 from another, so we wan
t to introduce some variation in the bitstream.
multiple coding schemes to make this happen, eg Manchester c
oding. One simple way called 4B/5B: for every possible 4 bit sequence, send an a
lternative 5 bit sequence instead, eg instead of 0000, send 11110. To prevent lo
ng runs of 1s, when there's a 1, invert the signal; if it's a 0, signal stays wh
ere it is (no matter if it had been high or low).
Passband modulation: for fiber/wireless: need higher frequencies which p
ass well through the media
amplitude shift keying: high frequency for 1, flat for 0
frequency shift keying: high frequency for 1, low for 0
phase shift keying: up then down for 1, down then up for 0
Network limits: expressed in terms of bandwidth B, signal strength on the re
Add check bits to the message bits to detect and correct errors
Naive approach: send 2 copies
can detect as many errors as there are bits in the original
can't correct any (don't know which copy is wrong)
takes 2 errors (in the same bit position) to fail (2 copies matc
h but both are wrong)
takes up 50% of bandwidth
-> terrible
Send codewords: D data bits + R check bits (aka systematic block cod
e) where R = fn(D)
sender computers R check bits from the D bits
receiver recomputes R' check bits based on the D bits it receive
d. If R != R', there was an error.
There are 2^(D + R) possible codewords, but there's only 2^D cor
rect (valid) codewords -> randomly chosen codeword only has 1/(2^R) chance of be
ing a correct one.
Hamming distance: how many bits need to flip to turn D1 to D2
Hamming distance of a code: minimum distance between any pair of
codewords
Observe that if we duplicate a pair of bit sequences, their
Hamming distance doubles. eg HD(0, 1) = 1, HD(00, 11) = 2.
For a code of d + 1 distance, up to d errors can be detected
eg 000 and 111 -> d = 2. More than 2 errors means 000 be
comes 111 and vice versa -> still valid -> can't be detected.
For a code of 2d + 1 distance, up to d errors can be correct
ed by mapping to the nearest codeword.
eg 000 and 111 -> d = 1. If receiving 001, can map to 00
0.
Error detection schemes:
Parity Bit: add 1 check bit that's the sum of the data bits (same as XORing
all the data bits)
Distance of code is 2 because if I flip 2 bits the parity bit is right a
gain.
Can detect all odd numbers of errors
Checksums: sum data in N-bit words, eg 16 bits in TCP/IP
example: in TCP/IP, the checksum for D bits is a 16 bit value made by: t
hose D bits broken up into 16-bit numbers, added up, any carry bits (excess bit
on the left) added to the right, bit flipped (one's complement)
eg to transmit 0001 f203 f4f5 f6f7
sum = 2ddf0
carry bit added = ddf2
bit flipped = 220d -> checksum
On receiving end: add received bits in groups of 16 bits, add carry bit,
flip. If result is 0, data is correct.
Distance of code: 2 (can change 0 to 1, 1 to 0 -> same sum)
Can detect max 1, can't correct any
Can detect all burst errors (series of errors). There are 2^16 possible
16-bit sequences, so a random checksum has a 1/2^16 chance of being correct -> m
uch better than parity.
Cyclic Redundancy Check (CRC): given n data bits, generate k check bits
such that the n+k bits are evenly divisible by a generator C
eg n = 302, k = 1, C = 3, check bits = 1 because 3021 % 3 = 0
Has to be done in binary -> has to use modulo 2
arithmetic
Send:
extend the n data bits with k zeros
divide by the generator value C
keep remainder, ignore quotient
adjust k check bits by remainder
Receive:
divide and check for zero remainder
Example:
n = 1101011111, k = 4, C = 10011
11010111110000
- 10011
1001
- 10011
1001
etc
10
-> remainder is 10, so check bits are 0010
Standard CRC-32 is 0x82608ed7
HD = 4 -> can detect up to 3 bit errors
can catch all odd numbers of errors
can catch bursts of up to k bits in error
not vulnerable to systematic errors like checksum
CRCs are widely used on links: Ethernet, wifi, ADSL, cable
Checksums used in Internet: IP, TCP, UDP
Error correction
For a code of HD 2d+1, with <= d errors, can correct by mapping to closest c
odeword
Hamming code: method for constructing a code with a distance of 3.
n = 2^k - k - 1
put check bit in positions p that are powers of 2, starting with 1.
at position p, check bit is parity of positions with a p term in their v
alues (ie the pth digit in their binary expression is turned on)
Example: data = 0101, 3 check bits
7 bit code -> check bits at 1, 2 and 4
__0_101
at 1, covers parity of positions 1, 3 (1+2), 5 (1+4), 7 (1 + 2 + 4)
-> par(0, 1, 1) = 0
at 2, covers parity of positions 2, 3, 6, 7 -> par(0, 0, 1) = 1
at 4, covers parity of positions 4, 5, 6, 7 -> par(1, 0, 1) = 0
-> 0100101
To decode:
recompute check bits (with parity sum including the check bit)
arrange as binary number
value (aka syndrome) tells error position
0 means no error
otherwise, flip bit to correct
Example: 0100101
p1 = par(0011) = 0
p2 = par(1001) = 0
p3 = par(0101) = 0
syndrome = 000 -> no error
data = 0101
if error: 0100111
p1 = par(0011) = 0
p2 = par(1011) = 1
p3 = par(0111) = 1
syndrome = 110 = 6 -> 6th bit is wrong -> flip 6th bit -> data 0
101
Other correction codes:
Convolutional codes
Low density parity check: state of the art
Is detection or correction better?
depends on error pattern
eg 1000 bit messages, error rate 1/10000
correction: need 10 check bits per message
can't use CD because nodes can't hear while sending -> keep sending and
collision keeps happening for a long time undetected.
potential solution: Multiple Access with Collision Avoidance (MACA):
sender sends Request to Send (RTS) first, receiver sends Clear to Send (CTS) ba
ck. CTS states how long the packet is, so all nodes hearing the CTS can stay sil
ent.
in A B C scenario, A sends RTS to B, B sends CTS to A and C, C k
nows B is receiving so C doesn't send -> problem solved.
in A B C D scenario, B sends RTS to A, C sends RTS to D, B gets
C's RTS and C gets B's, but they're busy sending so they won't hear it. A and D
send CTSs back, but D and A won't hear the other's CTS, so B and C are clear to
send.
Physical layer of wifi:
use 20/40 MHz channels in ISM bands that the government opened up for free u
se.
Link layer:
frame format:
destination address
source address
access point address
frames are ACKed and retransmitteed with ARQ
errors are detected with CRC-32
to avoid collision, use CSMA/CA:
sender inserts small random gaps, ie instead of sending as soon as the l
ine is clear, wait a random amount of time. This way RTS/CTS are optional.
Contention-free MA:
token ring: nodes are wired in a circle and a token is passed around. The no
de with the token gets to send.
pros
prevents collisions under high load
predictable demand
cons
higher overhead at low load
token can get lost
can use time references: if token is lost for some time, assume
loss.
Switched Ethernet: modern Ethernet, has a central switch instead of all nodes co
nnecting to a wire
Devices:
Hub: physical layer, connects all hosts so any host can talk to any othe
r host. Equivalent to a shared wire.
Switch: link layer. Each host coming in branches out into as many branch
es as there are other nodes, each of which crosses the other nodes coming in, re
sulting in a n*n grid. This way independent pairs of nodes can communicate at th
e same time by taking different paths in the grid. The wires are bidirectional s
o that the links can be full duplex. Switches have buffers at either input or ou
tput to store frames when there's more frames going to the same host than the li
nks can handle.
Hubs and switches have replaced wires because wiring to a single place i
s more convenient, and if
the hub/switch fails, we know to just replace it in
stead of hunting down the wire. They're also scalable: all ports can use max ban
dwidth instead of sharing the cable's bandwidth.
How do switches figure out what port is what address? By using backwards lea
rning: when a host sends, the switch knows its address and the port it's coming
from, so it makes a table mapping the port to the address. If it needs to forwar
d to an address whose port it doesn't know, it just sends to all ports. The wron
g ports just ignore it.
This works when multiple switches and hubs are connected together too. A
ssuming no loops, if switch 1 doesn't know the destination port, it'll broadcast
to all its nodes including any hub/switch it's connected to. Switch 2 receives
the broadcast and rebroadcasts to all its nodes, one of which is the destination
, so the frame now arrives at the destination port. So now both switches have le
arned the address associated with the source port.
If there's a loop, eg switches C and D have 2 links to each other. Say A
, which belongs to switch C, wants to send to F, which belongs to switch D. A se
nds to C, C broadcasts to D twice (because of the 2 links), D broadcasts back to
C twice, C broadcasts to D twice again as well as back to A, etc -> fail.
Solution: switches together find a spanning tree (a tree, meaning no loo
ps, that connects all switches) and treat this spanning tree as the topology.
Theory:
the switch with the lowest address is the root
grow tree as shortest distances from root (if 2 links have same
distance, use lower address one) (in practice, 2 happens at the same time as 1)
turn off ports for forwarding if they're not in the spanning tre
e
Practice
initially, all switches assume it's the root
each switch sends periodic updates to its neighbors with its add
ress, root address, and its distance in hops to the root. Format: A, C, 1 means
I'm A, the root is C, my distance to root is 1.
the switches favor ports with shortest distance to lowest root,
using lowest address to break ties.
Example: fig 1.
Network layer:
Why need one when we have the link layer? Link layer doesn't scale (every sw
itch has to maintain millions of entries in the host table), every send to an un
known links will be broadcast to millions of hosts, and different kinds of link
layers don't work together (eg wifi, Ethernet).
Routing vs forwarding:
routing: deciding which way a packet should go
forwarding: sending the actual packet according to the found route. Only
happens at 1 node -> faster.
2 kinds of services the network layer provides to the transport layer:
Datagrams aka connectionless service: each unit is self-contained, like the
post office (IP)
Virtual circuits: connection-oriented, like the phone.
Both are implemented with store-and-forward packet switching: routers receiv
e a full packet, may stores it temporarily before forwarding. Mostly only store
if there's contention for bandwidth -> we use statistical multiplexing to share
the bandwidth over time
Datagrams:
Router output ports have buffering for when it gets input from several s
ources. The buffer is typically FIFO and there's a discard policy when congestio
n happens.
Each router has a forwarding table keyed by address that gives next hop
for each destination address, but may change, eg if A needs to go to C first to
go to F, the F entry in the A router has the value C. Packets may take different
routes and arrive out of order.
Each packet contains the destination address.
Easier to mask failure - just resend to different route.
Hard to add quality of service to individual packets.
VC:
3 phases: set up, data transfer and teardown. During the teardown, the r
outers along the circuit delete the circuit state.
Packets only contain a circuit identifier.
Each router has a forwarding table keyed by circuit that gives the outpu
t line and next label to place on packet as it moves along the circuit. Each ent
ry has 'in' and 'out' parts to identify where a packet is coming from and what c
onnection label to put on it next, eg
H1 1 C 5
H3 1 C 2
means if the packet comes from H1 on connection 1, send to C on conn
ection 5; if from H3, send on connection 2. Notice the network can handle multip
le connections from and to the same pair of hosts.
Application: Multi-Protocol Label Switching (MPLS): used in ISPs, set up
connections inside their backbone, adds MPLS label to IP packets at ingress, re
move at egress.
Hard to mask failure: have to replace router if failing.
Easy to add quality of service.
Internetworking
Hard because many things are different between networks: service model, addr
essing, quality of service (eg some packets are priority and need better service
), packet sizes, etc
Example: wifi datagram network to MPLS VC network to Ethernet datagram netwo
rk
IPv4 packet format:
Version (4)
IP Header Length (IHL, length of everything before payload)
Total length
Protocol: what protocol is inside the IP, eg TCP
Header checksum
Source address
Destination address (both 32 bits)
IP Addresses:
written in 'dotted quad' notation
Blocks of IP addresses share a common prefix, which is a group of bits at th
e beginning of the address.
Classful addressing (still embedded in addresses but ignored):
Class A: start with 0, prefix 8 -> there are 127 of them, each supportin
g 2^24 hosts
Class B: start with 10, prefix 16
Class C: start with 110, prefix 24
Routers have a table with prefixes. The prefixes may overlap. The longest ma
tching prefix, ie most specific (fewest addresses in the block), wins.
Example: router has 2 entries:
prefix
next hop
192.24.0.0/18
D
# from 192.24.0.
0 to 192.24.63.255
(last 16 bits 00/000000 00000000 to 00/111111 11111111)
192.24.12.0/22
B
# from 192.24.12
.0 to 192.24.15.255 (last 16 bits 000011/00 00000000 to 000011/11 11111111) -> l
ies within the D range, but is more specific
Addresses:
192.24.6.0
-> last 16 bits 00000110 00000000 -> fits D but no
t B -> D
192.24.14.32
-> last 16 bits 00001110 00100000 -> fits both B and
D -> B because B is more specific
192.24.54.0
-> last 16 bits 00110110 00000000 -> fits D but not
B -> D
Hosts also have a forwarding table. Only 2 entries needed:
network's prefix -> means on local network, so goes to that IP addre
ss directly
0.0.0.0 -> catches everything else, which is the router's job, so se
nd to router
Routers can do this too, eg send all addresses not in their network to t
heir ISP
DHCP (Dynamic Host Configuration Protocol): helps nodes get their IP addresses w
hen they just woke up (otherwise nodes only have their Ethernet address)
leases IP addresses to nodes
provides other params: network prefix, address of local router, DNS server,
11/0
traceroute
Echo Request or Reply
8 or 0/0
ping
There's a TTL (Time to live) field in the IP header that's decremented each
hop. ICMP error when it reaches 0. This protects forwarding loops.
Traceroute works by sending messages with increasing TTL, starting at 1. Thu
s, at each subsequent step, it gets a message back from the host at that hop. Th
at's how it knows where the packets are going.
IPv6:
128 bits, 8 groups of 4 hex digits
Can write shorthand by skipping initial 0s in each group and skipping groups
of 0000 altogether
Will need to be compatible with IPv4 for a long time.
Solutions:
Dual stack: hosts speak both IPv4 and IPv6
Translators: translate packets back and forth
Tunnels: to cross IPv4 links that connect islands of IPv6 networks, by w
rapping the IPv6 packet
NAT boxes:
present in network devices
turns 1 external IP address into many internal ones, using one of the intern
al prefixes eg 192.168.0.0/16 or 10.0.0.0/8
works by keeping an internal/external table with IP address and ports, eg 2
internal addresses can go to different ports but same external address. Internal
ly, 2 hosts can have the same address but different ports.
the NAT box rewrites the IP header to change the internal address to the ext
ernal one for outgoing packets and vice versa for incoming.
The table is populated on a host's first attempt to send a packet. The NAT w
ill then copy down the host's internal address and port, and the external addres
s, and assign the host a new random port.
Weaknesses:
traffic can't come in without first having outgoing traffic so the NAT e
ntry can be set up
apps that expose the IP address can't be used
Routing:
one way is the spanning tree algorithm, but it's inefficient because the lin
ks removed to avoid loops could be the shortest path.
Delivery models:
unicast: one sender to one receiver
broadcast: 1 sender to all receivers
multicast: 1 sender to several but not all receivers
anycast: to nearest receiver
Properties of a good routing scheme:
correctness: found paths should work
efficiency: use bandwidth well
fairness: all nodes should have the ability to send/receive
fast convergence: recover from failures quickly
scalability: as networks grow
Finding best unicast routes:
define a cost function to compute the cost of each path
factors the cost function can take in: distance, data cost, number of ho
ps, etc
subsets of the shortest path will also be shortest paths.
sink tree: union of all shortest paths to a node from all other nodes (s
ink = destination node)
useful because all those paths may converge in a few links going to
the sink, so once we get to the converging node, we don't need to care where the
packet is coming from.
Dijkstra's algorithm:
Hierarchical routing: to reduce size of routing tables and number of routing mes
sages as the number of hosts grow.
divide hosts into regions. routers talk to regions instead of hosts.
can be less efficient sometimes (not taking most direct route)
IP aggregation and subnetting:
routers can change the prefix without affecting hosts.
thus, we can group more specific prefixes under one less specific prefix, eg
group 3 /18 networks under 1 /16 network -> subnetting
joining multiple smaller prefixes into one large prefix is aggregation
helps keep routing table small because there's only 1 prefix to save.
Networks on the internet are connected directly or through Internet Exchange Poi
nts (IXPs)
Different networks may have different routing policies, which can cause inef
ficiency
eg two ISPs want to choose the shortest route within their network, but
when they want to talk, the shortest path in the sender ISP leads to a longer pa
th in the destination ISP. Also the other direction may not use the same path.
Example policies:
transit policies: between customers and ISPs: customers can use ISPs to
send and receive data anywhere on the network, not just from and to other hosts
in the ISP
peer policies: between ISPs, so that one ISP can send and receive traffi
c meant for their customer going through the other ISP.
Border Gateway Protocol (BGP): the interdomain routing protocol used in the
internet.
is a path vector protocol: like distance vector protocol but stores path
s instead of distances
different parties like ISPs are called Autonomous Systems (AS)
border routes of ASs announce BGP routes to each other, including IP pre
fix, path vector (list of ASs on the way to the prefix, for finding loops) and n
ext hop, in the opposite direction of traffic
kinds of policies:
transit: if A buys transit service from B, B sends to and receives f
rom A, but A doesn't do any intermediate routing for B. A is the end of the line
, similar to customers buying service from ISPs
peer: if A and B are peers, they send to and receive from each other
, without going through a bigger ISP, but they don't carry intermediate traffic.
So if A peers with B, B peers with C, A still can't send to C through B.
example: AS1 is a big ISP. AS2, AS3, AS4 are small ISPs that rent ac
cess from AS1. Prefixes A, B, C are on AS2, AS3, AS4 respectively. AS1 tells AS2
[B, (AS1, AS3)], meaning you can get to B by going to AS1 then AS3. Traffic fro
m AS1 to AS2 uses the transit policy, from AS2 to AS1 uses customer policy, and
AS2 to AS3 uses peer policy.
Transport layer
doesn't examine data inside it like lower layers (examined by switches and r
outers)
unit is segment; has TCP header and payload
can provide unreliable or reliable transport
unreliable: UDP (uses messages)
messages may be lost, reordered, duplicated
limited message size
pretty much like IP
doesn't take into account receiver's state or network state
reliable: TCP (uses bitstreams)
sends each byte only once, reliably and in order
arbitrary length content
take into account receiver's state or network state
uses the Socket API
sockets let apps attach to the transport layer using different ports
Max-Min Fairness:
"maximize the minimum"
increase all links until some reach maximum, then hold that link
constant and increase the other links until another one reaches maximum, and so
on
ex: Fig 9 (max-min)
Bandwidth allocation models
open loop: reserve bandwidth before use
closed loop: use feedback to adjust rates
enforced by host (so host must play nice) or network?
expressed in sliding window size or absolutes eg packets/s?
-> TCP: closed loop, host driven, window base
Host driven: network layer tells transport layer whether receive
r is full (binary signal). Sender uses a "control law" to adjust the sliding win
dow size.
Additive Increase Multiplicative Decrease (AIMD): a control
law where hosts additively (adding a constant) increase rate while network is no
t congested and multiplicatively (dividing by a constant) decrease rate when con
gestion happens
Fig 10
Very efficient because only needs 1 binary signal
Works better than any other control law (MIAD, MIMD, AIA
D)
ACK Clocking
helps sender pace packets according to slowest link's capacity
at first sender would send as fast as the fastest link
then the packets would bunch up in the bottleneck and slow down when the
y get to the receiver
so the ACKs would come back at the same slow rate
the sender would perceive the slower rate and start sending at that rate
instead
in TCP this is called the congestion window or cwnd
cwnd is the max rate the sender can send, vs the flow control window
which is the max rate the receiver can receive
the rate is roughly cwnd/RTT
Slow Start: way to allocate bandwidth more quickly than AIMD, used in TCP
window size starts at 1, doubles with every RTT until packet loss occurs
, then switch to AIMD to zero in on the right rate
-> the increase is slow at first (1, 2, 4) but accelerates quickly becau
se it's exponential
set a ssthresh variable that's infinity at first, then after the first p
acket loss, change it to cwnd/2
next time when cwnd >= ssthresh, switch from Slow Start to AIMD
after a timeout, cwnd starts with SS again with cwnd = 0
Difference from AIMD:
SS sends 2 packets with every ACK received, effectively doubling cwn
d every RTT because during one RTT, sender will have received cwnd ACKs (Fig 11
- SS timeline)
AIMD sends 1 more packet with every cwnd ACKs received, so only addi
ng 1 packet every RTT
Fast Retransmit: alternative to the MD part of AIMD
Recall that ACKs contain the SN of the last cumulative packet received p
lus 1, so if packets 13, 15, 16, 17 were received, the ACKs for 15, 16, 17 will
only report 14, because 14 wasn't received
After 3 duplicate ACKs (so really the 4th ACK bearing the same SN), the
sender will assume the next SN was lost and resend that
This way in the above example, receiver will continue to send ACK 14 a f
ew more times, then it'll get 14 and ACK 18 because it also received 15, 16, 17
Now everything can continue as before
This way the sender can detect a single loss faster than waiting for a c
onservative timeout
Fast Recovery
While waiting for the ACK to change, sender can continue to send new pac
kets because the duplicate ACKs mean other packets were still received. Thus the
ACK clock is kept going
Instead of reverting to slow start on a packet loss, just adjust cwnd to
current cwnd/2 and switch to AI
Note all this can only repair single packet loss; we still need timeouts for
bigger losses.
Used in TCP Reno.
Explicit Congestion Notification (ECN): router actively reports congestion
Not deployed yet
Router queues should normally be 0. When router senses the queue buildin
g up, meaning congestion is happening, it flips a bit in the IP header
The receiver sees the bit flipped and notify the sender
Advantages
clear signal
early detection
no loss
no extra packets to send
Disadvantages
need to upgrade router and hosts
Session layer
services for getting different resources for the same application (hence sam
e session)
Presentation layer
indicate type and encoding of content
type eg image/jpeg, text/html
encoding eg gzip
headers are small, so not packed to be easier to read and debug
Application Layer
doesn't always have a GUI, eg DNS
DNS
maps from names (high level resource identifiers) to IP addresses (low l
evel addresses)
before DNS, used HOSTS.TXT file regularly retrieved for all hosts from a
central machine at the Network Information Center (NIC)
not efficient, prone to naming conflicts, have to work with NIC to g
et your name in, unnecessary entries, etc
DNS is hierarchical, starting from "." (typically omitted)
generic
aero -> top level domain (TLD)
com
edu
washington
cs
robot
eng
gov
org
...
countries
au
edu
uwa
jp
uk
...
the namespace is divided into zones, eg "edu" or "washington + cs", that
are administered by the same entity
each zone has a nameserver that people should contact to access reso
urces in that zone
a zone is comprised of DNS resource records
type A: IPv4 address of a host
AAAA ("quad A"): IPv6 address
NS: nameserver of domain
CNAME: canonical name for an alias (eg galah.washington.edu may
be the same as cs.washington.edu)
SOA (Start of Authority): key info about the zone (nameserver, v
alidity period of entries, etc)
the upper level nameservers should have pointers to lower level ones
when client requests a domain name, client's nameserver first asks the root
servers, which tells it the lower level nameserver to ask next, until it gets to
a nameserver that has the requested domain in its zone
eg for filts.cs.vn.nl to access cs.washington.edu
filts.cs.vn.nl asks local (cs.vu.nl) nameserver
local nameserver asks root nameserver (a.root-servers.net)
root nameserver doesn't know but knows where to find .edu nameserver
local nameserver asks .edu nameserver
.edu nameserver doesn't know but knows where to find washington.edu
nameserver
local nameserver asks washington.edu nameserver
washington.edu nameserver doesn't know but knows where to find cs.wa
shington.edu nameserver
local nameserver asks cs.washington.edu nameserver
requested domain is within cs.washington.edu's zone, so it gives the
local nameserver its IP address
local nameserver sends IP address back to client
notice there were 2 kinds of queries
recursive: the kind done by client to local nameserver: nameserv
er does all the resolution work and return the complete answer to the client
lets server offload client burden
iterative: done by local nameserver to all other nameservers: na
meserver returns either the answer or a delegated subdomain and doesn't look fur
ther
good for servers with high loads
queries can be cached for instant access
there's a TTL field, usually between hours and days, during whic
h the cache is up to date
can cache partial addresses
eg if local nameserver doesn't know the IP of cs.washington.
edu but does know washington.edu, it can go directly to washington.edu
local nameserver address is part of info returned by DHCP
there are 13 root nameservers: a.root-servers.net through m.
all nameservers need to know the IPs of these 13
that's configured in a file named.ca (ca = cache) which helps th
e nameserver bootstrap
there are actually 250 instances of the root nameservers, but th
ey share the 13 IP addresses using IP anycast, which lets multiple locations hav
e the same IP (clients are routed to the geographically closest one)
DNS Protocol
built on UDP, port 53
use ARQ for reliability
query and response share a 16 bit message ID field
nameservers tend to be redundant which also helps with load balancing
DNS needs security so clients aren't directed to the wrong machine
DNSSEC (DNS with security extension) is being deployed
HTTP
designed to transmit various web resources in addition to HTML (JS, CSS
etc)
how it works:
start with the page URL
protocol (http://)
server (en.wikipedia.org)
page on server (/wiki/Vegemite)
resolve the server to IP address
set up TCP connection to server
send HTTP request for the page
get HTTP response
execute/fetch embedded resources/render (eg JS/images/CSS respective
ly)
clean up any idle TCP connections
methods: GET, POST et al
Response codes:
1xx: information
2xx: success
3xx: redirection
4xx: client error
5xx: server error
Headers
browser capabilities (client -> server)
User-Agent
Accept
Accept-Charset
Accept-Encoding
Accept-Language
caching related (both)
If-Modified-Since
Date
Last-Modified
Expires
browser context (client -> server)
Cookie
Referrer
Authorization
Host
content delivery
Content-Encoding
Content-Length
Content-Type
Set-Cookie
HTTP performance
measures
Page Load Time (PLT): key measure but hard to measure
early HTTP loads one resource after another, each in its own TCP con
nection -> inefficient (no parallelism, TCP overhead especially when only one wa
s needed)
to reduce PLT:
reduce content size, including taking client properties into acc
ount, eg big screen -> send big image; mobile phone -> send small image
compression
parallel connections: make x TCP connections at once
connections compete with each other
persistent connections: make 1 TCP connection and use it for mul
tiple requests
doesn't cause network bursts and loss like parallel
saves time because client would otherwise need to send a Con
nect request to the server with each TCP connection
also no slow-start
pipelining: like persistent but as soon as client finds out it n
eeds to make x more requests (eg after the page downloads, it finds there's x im
ages on the page), it makes those requests one after another instead of waiting
for response for each
caching: access local copy instead of requesting server again
to know if local copy is still valid:
Expires header if available
heuristics -> content available right away
is content cacheable (does it look static or dynamic
)
was it recently validated (recently fetched)
was it not modified recently
ask the server using a conditional GET (still saves work
because doesn't need to retransmit the whole content) -> content available afte
r 1 RTT
use Last-Modified header from server
use Etag (hash of content) header from server
web proxies: intermediary between pools of clients and e
xternal web servers
bigger shared cache -> one client can access the cac
he downloaded for another client earlier
security checking (content scanning)
firewalls (aka organizational access policies)
CDNs
local servers placed close to clients and serving the same c
opy to those clients instead of having them fetch individual copies over and ove
r from a remote server
popularity of content obeys Zipf's Law: the kth most popular
content is 1/k as popular as the most popular content, eg the 2nd most popular
content is half as popular as the most popular one.
-> graph of popularity from most to least looks like a rever
se log graph
-> caching the most popular contents take care of a lot of c
ontent, but there's still a long tail that needs taking care of
can just use browser and proxy caches but that only benefits
one client/one organization
want to place replicas all over the internet
-> use DNS servers
DNS servers are updated to return different server addre
sses depending on the client
the server looks at the client's IP, determines the loca
tion, and returns the closest server
SPDY ("speedy"): experimental protocol being deployed to speed u
p HTTP
parallel HTTP requests on a single TCP connection (vs serial
HTTP requests on a single TCP connection as in a persistent connection)
compressed headers
client priority
mod_pagespeed: speed things up on server side
minify JS (make variable name shorter)
make nested CSS inline
resize images for client
etc
P2P
disadvantages of client/server CDNs
expensive, managed infrastructure
no privacy
problems with P2P
peers have limit capabilities
solution: create minimum spanning tree with the root being the p
eer with the desired content
ta
if the buffer reaches the low watermark, start requesting new da
ta
uses TCP instead of UDP because low delay isn't essential, and loss
recovery simplifies presentation (eg fill in missing material)
how it works:
client browser sends media request (HTTP) to server
server sends HTTP response back with a metafile
client hands off metafile to media player
media player sends media request (Real-time Streaming Protocol RTSP) to server
server sends media response (TCP or UDP)
interactive, eg videoconferencing
Weighted fair queueing: alternative to FIFO queueing
when multiple apps share a router's FIFO queue, each with its own incomi
ng flow
shorter RTT flows are favored in the long term because its packets a
re emptied sooner from the queue
if one flow gets a lot of traffic, the other ones will have to wait
a long time for their turn
possible solution: round robin queueing
takes 1 packet from 1 flow at a time -> no starving
order of arrival isn't preserved
bandwidth isn't evenly shared because packets could be different siz
es in different flows
better solution: fair queueing
RR but approximate bit-level fairness
compute virtual finish time of each flow
virtual clock ticks every time all flows have sent 1 bit
send packets in order of their virtual finish times
doesn't aim to be perfect - don't interrupt transmitting a big p
acket to transmit a small one because the small one has an earlier finish time
notation:
Arrive(j)F: time packet j arrives in flow F
Length(J)F: length of packet j in flow F
Finish(j)F = max(Arrive(j)F, Finish(J - 1)F) + Length(j)F ->
Finish means the time when the router sends off the packet to the app and is do
ne with it. The formula means a packet's finish time is how long it takes to tra
nsmit it (length) plus a baseline of whatever is the time the router is ready to
work on it, which is the later of when it arrives and when the router is done w
ith the previous packet.
Example: 3 flows, F1-F3. Flows 1 and 3 have 1000B packet size, F
2 has 300B packets.
-> Finish(0)F = 0 for all F
assuming all queues are backlogged (aka Arrive(j)F < Finish(j 1)F always, meaning there's always more packets for all flows waiting in the que
ue)
then Finish(1)F1 = 1000, Finish(2)F1 = 2000, Finish(3)F1 = 3000,
etc
Finish(1)F2 = 300, Finish(2)F2 = 600, Finish(3)F2 = 900, 1200, 1
500, ...
etc
then we'll send in order of finish time
1(F2) -> 2(F2) -> 3(F2) -> 1(F1) -> 1(F3) -> 4(F2), etc
because their finish times are, respectively:
300, 600, 900, 1000, 1000, 1200, etc
Weighted Fair Queueing (WFQ): each flow has a weight, to be used to
compute Finish:
Finish(j)F = max(Arrive(j)F, Finish(j-1)F) + Length(j)F/Weight(F
)
h that after taking the bandwidth into account, you get the desired guarantee)
to guarantee a delay (>= latency), need to shape traffic -> use token bu
ckets
worst case delay is if the max burst of B bits arrive all at once ->
need to put all in the queue
then the last packet to join the queue will need B/R seconds to leav
e the queue, where R is the drain rate
to guarantee a delay across the network, observe that if traffic wer
e perfectly smooth, the delay would be the latency. If there's one burst, it'll
queue up in one router and be smoothed out as it empties into the next router an
d all subsequent routers -> a burst is only penalized once -> the max network de
lay is the same as the max router delay: latency + B/R
Security
security by obscurity: try to keep the encoding scheme secret
kinds of security threats:
eavesdropper: reads messages that aren't for them
in the Goal and Threat model, the goal is confidentiality: only reci
pient of a message can read it
if Alice sends a private message to Bob, and Eve can observe the mes
sage going back and forth, then Eve is a passive adversary
encryption/decryption scheme: Alice's plaintext message is encrypted
into cyphertext. The encryption scheme is parameterized by a secret key that Al
ice and Bob have, but the algorithm itself is public.
This way if the key is compromised, just use a different key.
weakness: key must be distributed
this is a symmetric key encryption scheme, eg Advanced Encryptio
n Standard (AES)
efficient, can send at high data rate
public key cryptography: Bob has a public key. Alice uses it to encr
ypt her message. Bob uses his private key to decrypt.
good for establishing a connection between people who don't know
each other well enough to exchange a key private to their relationship
slow
winning combination: use public key to send private key, which is a
small message, then use private key to send large messages using symmetric key
the key is called a session key
tamperer: use other people's machines to send messages
need to make sure the message came from the right person (integrity)
and is unchanged (authenticity)
person in the middle (Trudy - intruder) is an active adversary
public key schemes aren't enough if she can do things with messa
ges (even if she can't read them) eg flip some bits
she can rearrange blocks to make a different valid message, eg "
stop do not buy now" becomes "buy now do not stop"
solution: include a Message Authentication Code (MAC) to validate th
e integrity/authenticity of the message, eg a hash MAC
MACs are generated using a symmetric key scheme, so Bob can vali
date that it was Alice who sent the message, because only Alice has the key
but Bob can't convince someone else that the message came from A
lice, because he could have generated the message himself
solution 2: digital signatures
Alice has a private key. She uses it to generate a short signatu
re. Everyone else can use Alice's public key to verify that the signature is her
s
since signatures use public keys, they're slow
to speed things up, use message digests
message digests are a fixed-length hash of an arbitrarylength message, such that it's computationally infeasible to find 2 messages wit
h the same digest