Professional Documents
Culture Documents
Computer Networks - End Protocols
Computer Networks - End Protocols
Networks
ECE 5713
End-to-end Protocols
2
The Big Picture
application
}
IP
most
coverage
data link
until
now
physical
3
End-to-end Protocols
Outline
(reading: Peterson and Davie, Ch. 5)
End-to-end service model
Protocol examples
– User Datagram Protocol (UDP)
– Transmission Control Protocol (TCP)
Connection Establishment/Termination
Sliding Window Revisited
Flow Control; and adaptive Timeout
– Remote Procedure Call (RPC)
4
Where we are now …
• Understand how to
– Build a network on one physical medium
– Connect networks together (with switches)
– Implement a reliable byte stream on a variable
network (like the Internet)
– Implement a UDP/TCP connection/channel
– Address network heterogeneity
– Address global scale
• Today’s topic
– End-to-end issues and common protocols
5
End-to-End Service Model
• Recall user perspective of network
– Define required functionality/services
– Implementation is irrelevant
• Focus of end-to-end protocols (transport layer)
– Communication between applications (users)
– Translating from host-to-host services (network layer)
• Services implemented in end-to-end protocols
– Those that cannot be done well in lower layers (i.e., on
a per-hop basis); duplicate effort should be avoided
– Those not needed by all applications
6
End-to-End Service Model
• Services provided by underlying network: IP -
“best effort” delivery
– Messages sent from a host, delivered to a host (no
distinction between entities sharing a host)
– Drops some messages
– Reorders messages
– Delivers duplicate copies of a message
– Limits messages to some finite size
– Delivers messages after an arbitrarily long delay
7
End-to-End Service Model
• Common end-to-end services demanded by
applications
– Multiple connections (application processes) per host
– Guaranteed message delivery
– Messages delivered in the order they are sent
– Messages delivered at most once
– Arbitrarily large message support
– Synchronization between sender and receiver
– Flow control by the receiver
8
End-to-End Protocol Challenge
• Given IP service model
• Provide service model demanded by
applications
9
User Datagram Protocol (UDP)
• Thin veneer over IP services
• Addresses multiplexing of multiple connections
• Unreliable and unordered datagram service
• No flow control
• Endpoints identified by ports (multiplexing)
– 16-bit port space
– Well-known ports for certain services (see /etc/services)
• Checksum to validate header
– Optional in IPv4, but mandatory in IPv6 (why ?)
10
UDP Header Format
0 16 31
source port destination port
UDP length UDP checksum
• Length includes 8-byte header and data
• Checksum (purpose ?)
– Uses IP checksum algorithm
– Computed on pseudo-header, UDP header and data
0 8 16 31
source IP address
destination IP address
0 17 (UDP) UDP length
11
Reliable Byte-Stream (TCP)
Outline
Connection Establishment/Termination
Sliding Window Revisited
Flow Control
Adaptive Timeout
12
Transmission Control Protocol (TCP)
• Service model implements requirements listed earlier
– Multiple connections per host
– Guaranteed and in-order delivery
– Messages delivered at most once
– Arbitrarily large messages
– Synchronization between sender and receiver
– Flow control
• Multiplexing mechanism equivalent to that of UDP
• Checksum mechanism also equivalent, but
mandatory
13
TCP Overview
• Flow control: restricts rate to something
manageable by receiver
• Congestion control: restricts rate to something
manageable by network
• Connection-oriented: setup and teardown required
• Full duplex
– Data flows in both directions simultaneously
– Point-to-point communication
• Byte stream abstraction: no boundaries in data
14
TCP Byte Stream
• Application writes bytes
• TCP sends segments
• Application reads bytes
Write Read
bytes bytes
…
…
TCP TCP
Send buffer Receive buffer
…
Segment Segment Segment
Transmit segments
15
Data Link versus Transport
• Potentially connects many different hosts
– Need explicit connection establishment and termination
• Potentially different RTT
– Need adaptive timeout mechanism
• Potentially long delay in network
– Need to be prepared for arrival of very old packets
• Potentially different capacity at destination
– Need to accommodate different node capacity
• Potentially different network capacity
– Need to be prepared for network congestion
16
TCP Outline
• TCP vs. a sliding window on a direct link
• Model of use
• Segment header format and options
• States and state diagram (think-pair-share)
• Sliding window implementation details
• Flow control issues
• Bit allocation limitations
• Adaptive retransmission algorithms
17
TCP vs. Sliding Window on Direct Link
• RTT varies
– Among peers (hosts at other end of connections)
– Over time
– Requires adaptive approach to retransmission (and
window sizes)
• Packets can be
– Delayed for long periods (assume 60 seconds)
– Reordered in the network
– Must be prepared for arrival of very old packets
18
TCP vs. Sliding Window on Direct Link
• Peer capabilities vary
– Minimum link speed on route
– Buffering capacity at destination
– Requires adaptive approach to window sizes
• Network capacity varies
– Other traffic competes for most links
– Requires (global) congestion control strategy
• Why not implement more functionality in IP,
e.g., ordering guarantees or congestion control ?
19
End-to-End Argument
• A function should not be provided at a given layer
unless it can be completely and correctly
implemented at that layer
• In-order delivery: hop-by-hop ordering guarantee
is not robust to path changes or multiple paths
F
A
E
B D
C
20
End-to-End Argument
• Congestion control
– Should be stopped at source
– But network can provide feedback
100Mbps
1Mbps
A 5Mbps E
100Mbps 10Mbps 1Mbps
100Mbps C D 10Mbps
B 5Mbps F
100Mbps 10Mbps
blue should get 9Mbps, but gets only 5Mbps with hop-by-hop drops
21
TCP Model of Use
• Connection setup via 3-way handshake
• Data transport
– Sender writes some data
– TCP
• Breaks data into segments
• Sends each segment over IP
• Retransmits, reorders, and removes
duplicates as necessary
– Receiver reads some data
• Teardown through 4-step exchange
22
TCP Connection Setup
• TCP Connection Setup via 3-way Handshake
– J and K are (different) sequence numbers for messages
– Sequence numbers need not start at zero
SYN J
Active Passive
SYN K
participant participant
ACK J+1 (server)
(client)
ACK K+1
23
TCP Data Transport Model
• Data broken into segments
– Limited by maximum segment size (MSS)
– Defaults to 536 bytes
– Negotiable during connection setup
– Typically set to MTU of directly connected network
minus size of IP and TCP headers (40 bytes)
• Three events send segment
– >= MSS bytes of data ready to be sent
– Explicit PUSH operation by application
– Periodic timeout
24
TCP Connection Teardown
• TCP connection teardown in 4 steps
– Either client or server can initiate connection teardown
– FIN is associated with sequence number space (why?)
ACK K+1
25
TCP Segment Header & Pseudo-Header
0 4 10 16 31
source port destination port
sequence number
ACK sequence number
hdrlen 0 flags advertised window
TCP checksum urgent pointer
options (variable)
0 8 16 31
source IP address
destination IP address
0 6 (TCP) TCP length
26
TCP Segment Header
• 16-bit source and destination ports
• 32-bit send and ACK sequence numbers
• 4-bit header length in 4-byte words (minimum 5)
• Six 1-bit flags
– URG: segment contains urgent data
– ACK: ACK sequence number is valid
– PSH: do not delay delivery of data
– RST: reset connection (rejection or abnormal
termination); no sequence number associated
– SYN: synchronize segment for setup
– FIN: final segment for teardown
27
TCP Segment Header
• 16-bit advertised window
– Space remaining in receive window
• 16-bit checksum
– Uses IP checksum algorithm
– Computed on header, data, and pseudo-header
• 16-bit urgent data pointer
– If URG=1
– Index of last byte of urgent data in segment
28
TCP Segment Format
• Each connection identified with 4-tuple:
– (SrcPort,SrcIPAddr,DsrPort,DstIPAddr)
• Sliding window + flow control
– ACK, SequenceNum, AdvertisedWindow
Data(SequenceNum)
Sender Receiver
Acknowledgment +
AdvertisedWindow
• Flags
– SYN, FIN, RESET, PUSH, URG, ACK 29
TCP Options – Existing & Proposed
• Negotiate maximum segment size (MSS)
– Each host suggests a value
– Minimum of two values is chosen
– Prevents IP fragmentation over first/last hops
• Packet timestamp
– Allows RTT calculation for retransmitted packets
– Extends sequence number space for identification of
stray packets (packets arriving very late)
• Negotiate advertised window granularity
– Allows larger windows
– Good for routes with large bandwidth-delay products
30
TCP State Description
• CLOSED disconnected
• LISTEN waiting for incoming connection
• SYN_RCVD connection request received
• SYN_SENT connection request sent
• ESTABLISHED ready for data transport
• CLOSE_WAIT connection closed by peer
• LAST_ACK closed by peer, closed locally, await ACK
• FIN_WAIT_1 connection closed locally
• FIN_WAIT_2 closed locally and ACK’d
• CLOSING closed by both sides “simultaneously”
• TIME_WAIT wait for network to discard related packets
31
State Transition Diagram
CLOSED
Active open/SYN
Passive open Close
Close
LISTEN
Close/FIN ESTABLISHED
Close/FIN FIN/ACK
FIN_WAIT_1 CLOSE_WAIT
AC FIN/ACK
ACK K Close/FIN
+
FI
FIN_WAIT_2 N
/A CLOSING LAST_ACK
CK
ACK ACK
Timeout after
FIN/ACK two segment
TIME_WAIT CLOSED
lifetimes
32
Think-Pair-Share
• Describe the path taken
– By a server under normal conditions, and
– By a client under normal conditions,
– Assuming that the client closes the connection first.
• Consider the TIME_WAIT state
– What purpose does this state serve ?
– Prove that at least one side of a connection enters this
state before returning to CLOSED
– Explain how both sides might enter this state
33
Sliding Window Implementation
• Sequence numbers are indices into byte stream
• ACK sequence number is actually next byte
expected (as opposed to last byte received)
• Receiver buffers contain
– Data ready for delivery to application until requested
– Out-of-order data out to maximum buffer capacity
• Sender buffers contain
– Unacknowledged data
– Unsent data out to maximum buffer capacity
34
Sliding Window
application
• Sender side
TCP
LastByteAcked LastByteSent LastByteWritten
Window time
max buffer size
– Green: sent and acknowledged
– Red: sent (or can be sent) but not acknowledged
– Blue: available but not within send window
35
Sliding Window
application
• Receiver side
TCP
NextByteRead NextByteExpected LastByteReceived
36
Sliding Window Math
Sending application Receiving application
TCP TCP
LastByteWritten LastByteRead
38
TCP Flow Control Issues
• Problem: slow receiver application
– Advertised window goes to 0
– Sender cannot send more data
– Non-data packets used to update window ?
– Receiver may not spontaneously generate, or
update may be lost
• Solution: smart sender/dumb receiver
– Sender periodically sends a 1-byte segment,
ignoring advertised window of 0
– Eventually, window opens
– Sender learns of opening from next ACK of
1-byte segment
39
TCP Flow Control Issues
• Problem: app. delivers tiny pieces of data to TCP
– e.g., telnet in character mode
– Each piece sent as segment, returned as ACK
– Very inefficient
• Solutions
– Delay transmission to accumulate more data
– Nagle’s algorithm
• Send first piece
• Accumulate data until first piece ACK’d
• Send accumulated data and restart accumulation
• Not ideal for some traffic, e.g., mouse motion
40
TCP Flow Control Issues
• Problem: slow application reads data in tiny pieces
– Receiver advertises tiny window
– Sender fills tiny window
– Known as silly window syndrome
• Solution: due to Clark
– Advertise window opening only when MSS or ½ of
buffer is available
– Sender delays sending until window is MSS or ½ of
receiver’s buffer (estimated)
– Overridden by using PUSH
41
TCP Flow Control Math
• Send buffer size: MaxSendBuffer
• Receive buffer size: MaxRcvBuffer
• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer
– AdvertisedWindow = MaxRcvBuffer -
(NextByteExpected - NextByteRead)
• Sending side
– LastByteSent - LastByteAcked < = AdvertisedWindow
– EffectiveWindow = AdvertisedWindow - (LastByteSent
- LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer
– block sender if (LastByteWritten - LastByteAcked) + y >
MaxSenderBuffer
• Always send ACK in response to arriving data segment
• Persist when AdvertisedWindow = 0
42
TCP Bit Allocation Limitations
• Sequence numbers vs. packet lifetime
– Assumed that IP packets live less than 60 seconds
– Can we send 232 (4G) bytes in 60 seconds ?
– Only need a data rate of 573 Mbps!
– Less than an STS-12 line... (< Gigabit Ethernet)
• Advertised window vs. delay-bandwidth
– Only 16 bits (64kB) for advertised window
– For cross-country RTT of 100 milliseconds, adequate
for a mere 5.24 Mbps!
43
Protection Against Wrap Around
• 32-bit SequenceNum
44
Keeping the Pipe Full
• 16-bit AdvertisedWindow
46
Adaptive Retransmission Algorithm
• Original algorithm used only RTT estimate
• Theory: measure RTT for each segment + its ACK
– Estimate RTT
– Timeout is 2 * RTT to allow for variations
• Practice
– Use exponential moving average ( = 0.8 to 0.9)
– Estimate = * estimate + (1 - ) measurement
Measured
RTT depends on
time
47
Adaptive Retransmission Algorithm
• Problem: it did not handle variations well
Measured
RTT
time
• Ambiguity for retransmitted packets: was ACK in
response to first, second, etc. transmission ?
transmission
retransmission
RTT ? ? ?
48
Adaptive Retransmission
(Original Algorithm)
• Measure SampleRTT for each segment/ACK pair
• Compute weighted average of RTT
– EstRTT = a x EstRTT + b x SampleRTT
– where a + b = 1
- a between 0.8 and 0.9
- b between 0.1 and 0.2
• Set timeout based on EstRTT
– TimeOut = 2 x EstRTT
49
Karn/Partridge Algorithm
Sender Receiver Sender Receiver
Orig Orig
in al tra in al tra
n smis n smis
s io n s io n
SampleR TT
SampleR TT
Retr
ans miss AC K
ion
Retr
ansm
issio
n
ACK
52
Remote Procedure Call (RPC)
• Central premise: apply the familiar to the unfamiliar
– programmers understand procedure calls
– programmers do not understand client-server interactions
• Idea
– translate some calls into server request messages
– wait for reply, return reply value
– transparent to programmer!
• Transport protocol matching the needs of applications
involved in a request / reply message exchange
53
RPC Timeline
Client Server
Reque Blocked
st
Blocked Computing
Reply
Blocked
54
Remote Procedure Call (RPC)
• More than just a protocol: popular mechanism for
structuring distributed systems
• More complicated than local procedure calls
– Complex network between calling and called process
– Different architecture and data representation formats
• Complete RPC mechanism has two components
– Dealing with undesirable properties of network
– Packaging arguments into messages and vice-versa
(stub compiler)
55
Remote Procedure Call Semantics
write_check (UNIV, 13500, “tuition payment”);
• your machine looks up the bank server
• asks the server about the checking service
• receives port number for checking service
• packages up arguments with type write_check
• tags package (message) with your identification
• sends request to checking service on bank server
(potentially as several UDP packets)
• waits for a reply
56
Remote Procedure Call Semantics
write_check (UNIV, 13500, “tuition payment”);
• bank’s checking service receives request from
your home machine (possibly reassembling UDP
packets)
• verifies your identity (possibly via RPC to an
authority)
• identifies request as a write_check request
• unpackages arguments, possibly converting data
to another format
• calls write_check handler function with arguments
57
Remote Procedure Call Semantics
write_check (UNIV, 13500, “tuition payment”);
• At bank’s checking service, RPC layer receives
return value from handler function
• packages up arguments with type
write_check_reply
• tags package (message) with your bank’s
identification and unique ID for previous request
• sends reply to your machine (potentially as
several UDP packets)
58
Remote Procedure Call Semantics
write_check (UNIV, 13500, “tuition payment”);
• your machine receives reply from bank
(possibly reassembling UDP packets)
• verifies bank’s identity (possibly via RPC to
an authority)
• identifies reply as a write_check_reply
• finds corresponding request
• unpackages return value, possibly converting
data to another format
• returns value to caller
59
RPC Components
• Protocol Stack
– BLAST: fragments and reassembles large messages
– CHAN: synchronizes request and reply messages
– SELECT: dispatches request to the correct process
• Stubs Caller Callee
(client) (server)
RPC RPC
protocol protocol
60
Bulk Transfer (BLAST)
Sender Receiver
• Unlike AAL and IP, tries to Frag
men
recover from lost fragments Frag
men
t1
Frag t2
• Strategy men
t3
• Sender: Frag
men
t6
– after sending all fragments, set
SRR
timer DONE Frag
– if receive SRR, send missing men
t3
Frag
fragments and reset DONE men
t5
– if timer DONE expires, free
SRR
fragments
61
BLAST: Receiver Details
• When first fragments arrives, set timer LAST_FRAG
• When all fragments present, reassemble and pass up
• Four exceptional conditions:
– if last fragment arrives but message not complete
• send SRR and set timer RETRY
– if timer LAST_FRAG expires
• send SRR and set timer RETRY
– if timer RETRY expires for first or second time
• send SRR and set timer RETRY
– if timer RETRY expires a third time
• give up and free partial message
62
BLAST Header Format
• MID must protect against wrap 0 16 31
ProtNum
around MID
fragments FragMask
Data
• FragMask distinguishes among
fragments
– if Type=DATA, identifies this fragment
– if Type=SRR, identifies missing
fragments
63
Request / Reply (CHAN)
• Guarantees message delivery
• Synchronizes client with server
• Supports at-most-once semantics
Simple case Implicit Acks
Client Server Client Server
Requ R eq
est ues t
1
y1
ACK Repl
y Requ
Repl es t 2
ACK y2
Repl
…
64
CHAN Details
• Lost message (request, reply, or ACK)
– set RETRANSMIT timer
– use message id (MID) field to distinguish
• Slow (long running) server
– client periodically sends “are you alive” probe, or
– server periodically sends “I’m alive” notice
• Want to support multiple outstanding calls
– use channel id (CID) field to distinguish
• Machines crash and reboot
– use boot id (BID) field to distinguish
65
CHAN Header Format
typedef struct {
u_short Type; /* REQ, REP, ACK, PROBE */
u_short CID; /* unique channel id */
int MID; /* unique message id */
int BID; /* unique boot id */
int Length; /* length of message */
int ProtNum; /* high-level protocol */
} ChanHdr;
typedef struct {
u_char type; /* CLIENT or SERVER */
u_char status; /* BUSY or IDLE */
int retries; /* number of retries */
int timeout; /* timeout value */
XkReturn ret_val; /* return value */
Msg *request; /* request message */
Msg *reply; /* reply message */
Semaphore reply_sem; /* client semaphore */
int mid; /* message id */
int bid; /* boot id */
} ChanState;
66
Synchronous vs Asynchronous
Protocols
• Asynchronous interface
xPush(Sessn s, Msg *msg)
xPop(Sessn s, Msg *msg, void *hdr)
xDemux(Protl hlp, Sessn s, Msg *msg)
• Synchronous interface
xCall(Sessn s, Msg *req, Msg *rep)
xCallPop(Sessn s, Msg *req, Msg *rep, void *hdr)
xCallDemux(Protl hlp, Sessn s, Msg *req, Msg *rep)
• CHAN is a hybrid protocol
– synchronous from above: xCall
– asynchronous from below: xPop/xDemux
67
chanCall(Sessn self, Msg *msg, Msg *rmsg){
ChanState *state = (ChanState *)self->state;
ChanHdr *hdr;
char *buf;
68
/* attach header to msg and send it */
buf = msgPush(msg, HDR_LEN);
chan_hdr_store(hdr, buf, HDR_LEN);
xPush(xGetDown(self, 0), msg);
69
retransmit(Event ev, int *arg){
Sessn s = (Sessn)arg;
ChanState *state = (ChanState *)s->state;
Msg tmp;
70
chanPop(Sessn self, Sessn lls, Msg *msg, void *inHdr)
{
/* see if this is a CLIENT or SERVER session */
if (self->state->type == SERVER)
return(chanServerPop(self, lls, msg, inHdr));
else
return(chanClientPop(self, lls, msg, inHdr));
}
71
chanClientPop(Sessn self, Sessn lls, Msg *msg, void *inHdr)
{
ChanState *state = (ChanState *)self->state;
ChanHdr *hdr = (ChanHdr *)inHdr;
72
Dispatcher (SELECT)
• Dispatch to appropriate Client Server
procedure
Caller Callee
xCall xCallDemux
CHAN CHAN
Procedures
– flat: unique id for each
possible procedure
– hierarchical: program +
procedure number
73
Example Code
Client side
static XkReturn
selectCall(Sessn self, Msg *req, Msg *rep)
{
SelectState *state=(SelectState *)self->state;
char *buf;
74
Simple RPC Stack
SELECT
CHAN
BLAST
IP
ETH
75
VCHAN: A Virtual Protocol
static XkReturn
vchanCall(Sessn s, Msg *req, Msg *rep)
{
Sessn chan;
XkReturn result;
VchanState *state=(VchanState *)s->state;
76
SunRPC
• IP implements BLAST-equivalent SunRPC
transit
78
Multimedia Applications
79
Streaming Stored Audio/Video:
Using a Web Server
80
Streaming Stored Audio/Video:
Using a Web Server with a Metafile
81
Streaming Stored Audio/Video:
Using a Media Server
82
Streaming Stored Audio/Video:
Using a Media Server and RTSP
83
Real-time Transport Protocol (RTP)
• Application-Level Framing
• Data Packets
– sequence number
– timestamp (app defines “tick”)
• Control Packets (send periodically)
– loss rate (fraction of packets received since last report)
– measured jitter
84
Time Relationship
85
Jitter
• Jitter is introduced in real time data by the delay
between packets
86
Timestamps
87
Playback Buffer
• To prevent jitter, we can timestamp the packets
and separate arrival time from the playback time
88
Real-time Traffic Issues
• A playback buffer is required for real-time
traffic
• A sequence number on each packet is
required for real-time traffic
• Real-time traffic needs the support of
multicasting
• Mixing means combining several streams of
traffic into one stream
89
Transport Protocol for RTP
• TCP, with all its sophistication, is not suitable for
interactive multimedia traffic because we cannot
allow retransmission of packets
• UDP is more suitable than TCP for interactive
traffic. However, we need the services of RTP,
another transport layer protocol, to make up for
the deficiencies of UDP
90
Transport Protocol for RTP
• RTP uses a temporary even-numbered UDP port
• RTCP uses an odd-numbered UDP port number
that follows the port number selected for RTP
91
RTCP Message Types
92