Computer Networks Chapter 8: Transport Layer: Goal of This Chapter

Goal of this chapter
Computer Networks Chapter 8: Transport layer
We will discuss the mechanisms in addition to congestion control that are required to build a reliable, connectionoriented, end-to-end transport protocol
With TCP as the evident case study, intermixed with the presentation We will draw heavily on the material from the link layer, as many similar problems are solved there in similar fashion e.g., sliding window for error control
Holger Karl
A brief discussion of other transport protocols complements the chapter

Namely, UDP and RPC
Computer Networks Group Universitt Paderborn
WS 09/10, v 1.2
Computer Networks - Ch. 8 - Transport layer
Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues
Transport protocols Services to upper layer

Connection-less and connection-oriented transport service
Connection management necessary as auxiliary service setup and teardown
Reliable or unreliable transport

In-order, all packets
Be a good citizen in the network perform congestion control Allow several transport endpoints in a single host
Demultiplexing
Support different interaction models

Byte stream, messages, remote procedure invocation
WS 09/10, v 1.2
WS 09/10, v 1.2
Addressing
Provide multiple service access points to multiplex several applications
SAPs can identify connections or data flows
Overview
E.g., port numbers

Dynamically allocated Predefined for wellknown services e.g., port 80 for Web server
WS 09/10, v 1.2
WS 09/10, v 1.2
Connection establishment
How to establish a joint context, a connection, between sender and receiver?
Note: only relevant in end-system, network layer assumed to be connection-less
Failure of nave solution

Nave solution fails in realistic networks
Where packets can be lost, stored/reordered, and duplicated Transfer money
C ON NE REQ CTION UES T
Nave solution:
Sender sends a CONNECTION REQUEST Receiver answers by a CONNECTION ACCEPTED Sender proceeds once that message is received
Example failure scenario

All packets are duplicated and delayed Due to congestion, errors, ... and re-routing, e.g. In a banking application, e.g. Result: two independent transactions performed, where only one was intended
N CON ED PT E C AC T RA NSF ER OK N CON ED PT E C AC
OK
N CON ED PT E C AC DAT A
Problem are delayed duplicates

WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 7 WS 09/10, v 1.2
??? DROP ???
OK
OK
8
Computer Networks - Ch. 8 - Transport layer DROP
Adding an additional handshake

First idea: the sender has to re-confirm to the receiver that it actually wants to set up this connection Add a third message to the connection setup phase Three-way handshake This third message can already carry data
Is three-way handshake sufficient?

No, it does not protect against delayed duplicates!
Problem: If both the connection request and the connection confirmation are duplicated and delayed, receiver again has no way to ascertain whether this is a fresh or an old copy
N CON ED PT E C AC CO CON NN F IRM ED
Solution: Have the sender answer a question that the receiver asks!
Actually: Put sequence numbers into
connection request connection acknowledgement, and connection confirmation
Have to be copied by the receiving party to the other side Connection only established if the correct number is provided
WS 09/10, v 1.2
WS 09/10, v 1.2
10
Three-way handshake + sequence numbers

Two examples for critical cases (which are handled correctly): Connection request appears as an old duplicate: Connection request & confirmation appear as old duplicates:
Connection setup further issues

Further problems due to crashing nodes, need some memory Sequence numbers negotiated here are also used for the following data packets Terminology for TCP
Connection setup SYN (synchronize) packet Connection accepted SYN/ACK packet (because both the previous sequence number is acknowledged and a new sequence number from the receiver is proposed) Connection confirmation ACK packet (combined with DATA, as a rule) Problem: SYN attack flooding with nonsense SYN packets
WS 09/10, v 1.2
11
WS 09/10, v 1.2
12
Connection identification in TCP

A TCP connection is set up
Between a single sender and a single receiver More precisely, between application processes running on these systems TCP can multiplex several such connections over the network layer, using the port numbers as Transport SAP identifiers
Overview
A TCP connection is thus identified by a four tuple: (Source Port, Source IP Address, Destination Port, Destination IP Address)
WS 09/10, v 1.2
13
WS 09/10, v 1.2
14
Connection release
Once connection context between two peers is established, releasing a connection should be easy
Goal: Only release connection when both peers have agreed that they have received all data and have nothing more to say I.e., both sides must have invoked a Close-like service primitive
Two army problem

Coordinated attack
Two armies form up for an attack against each other One army is split into two parts that have to attack together alone they will lose Commanders of the parts communicate via messengers who might be captured
It fact, it is impossible
Problem: How to be sure that the other peer knows that you know that it knows that you know that all data have been transmitted and that the connection can now safely be terminated?
Which rules shall the commanders use to agree on an attack date? Provably unsolvable if the network can loose messages
How to coordinate?
Analogy: Two army problem
WS 09/10, v 1.2
15
WS 09/10, v 1.2
16
Connection release in practice

Two army problem equivalent to connection release But: when releasing a connection, bigger risks can be taken Usual approach: Three-way handshake again
Send disconnect request (DR), set timer, wait for DR from peer, acknowledge it
Problem cases for connection release with 3WHS

Lost ACK solved by (optimistic) timer in Host 2 Lost second DR solved by retransmission of first DR Timer solves (optimistically) case when 2nd DR and ACK are lost
WS 09/10, v 1.2
17
WS 09/10, v 1.2
18
State diagram of a TCP connection

TCP uses three-way handshake for connection setup and release
FIN, FIN/ACK for release Complicated by ability to have asymmetric, half-open connections and data in transit while close is in progress
CLOSED Active open/SYN Passive open Close Close LISTEN
TCP state sequence unfolding the state machine

Example scenario: Web server/client
Web server opens a socket and listens on a connection Web browser connects to that socket Browser sends a request for a web page and then closes its socket for sending (remains open for receiving data) Server receives request, sends Web page, then closes its socket also for sending
(Partial) state diagram

Expresses both server and client aspects Each has a separate copy Legend: event/action, segments all caps, local events normal capitalization
WS 09/10, v 1.2
SYN_RCVD
SYN/SYN + ACK Send/SYN SYN/SYN + ACK ACK SYN + ACK/ACK
SYN_SENT
Close/FIN Close/FIN FIN_WAIT_1 ACK FIN_WAIT_2

AC K
ESTABLISHED FIN/ACK CLOSE_WAIT FIN/ACK

+ FI N/ AC K
How does the TCP state sequence look like?

Simplifying assumption:
Close/FIN
CLOSING ACK Timeout after two segment lifetimes TIME_WAIT
LAST_ACK ACK CLOSED
No errors No unusual behavior (closing a socket before establishing connection, ) ! Note: Have a look at the netstat tool (Linux, Cygwin)
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 20
FIN/ACK
19
TCP MSC connection establishment

Web server appl.
Pass ive open
TCP MSC connection release (1)

Web browser appl. Web server appl. TCP @ Web server ESTABLISHED
e Activ open FIN FIN
TCP @ Web server CLOSED
Network
TCP @ Web browser CLOSED
Network
TCP @ Web browser ESTABLISHED

C
Web browser appl. lose
LISTEN
SYN SYN
SYN
FIN_WAIT_1
SYN_SENT
+AC K SYN + ACK ACK
ACK
CLOSE_WAIT
ACK
SYN_RCVD
FIN_WAIT_2 ESTABLISHED (continued on next slide)
ACK
ESTABLISHED
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 21 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 22
TCP MSC connection release (2)

Web server appl.
Clos e
Overview
TCP @ Web browser Web browser appl.
TCP @ Web server CLOSE_WAIT

FIN
Network
FIN_WAIT_2
LAST_ACK
FIN ACK
ACK
TIME_WAIT Timeout
CLOSED CLOSED
WS 09/10, v 1.2
23
WS 09/10, v 1.2
24
Flow control
Recall: Flow control = prevent a fast sender from overrunning a slow receiver
Similar issue in link and transport layer
Flow control buffer allocation

In order to support outstanding packets, the sender either
Has to rely on the receiver to process packets as they come in (packets must not be reordered) unrealistic, or Has to assume that the receiver has sufficient buffer space available
In transport layer more complicated

Many connections, need to adapt the amount of buffer per connection dynamically (instead of just simply allocating a fixed amount of buffer space per outgoing link) Transport layer PDUs can differ widely in size, unlike link layer frames Networks packet buffering capability clouds the picture
The more buffer, the more outstanding packets

Necessary to obtain highly efficient transmission, recall bandwidthdelay product!
How does sender have buffer assurance?

Sender can request buffer space Receiver can tell sender: I have X buffers available at the moment
For sliding window protocols: Influence size of senders send window
WS 09/10, v 1.2
25
WS 09/10, v 1.2
26
Example of flow control with ACK/permit separation

Arrows show direction of transmission, indicates lost packet Potential deadlock in step 16 when control PDU is lost and not retransmitted
Flow control permits and acknowledgements

Distinguish
Receiver-Permits (Receiver has buffer space, go ahead and send more data) Acknowledgements (Receiver has received certain packets)
Should be separated in real-world protocols!

Unfortunately, often intermixed TCP, e.g., uses ACKs as Network-Permits; see congestion control chapter
Can be combined with dynamically changing buffer space at the receiver

Due to, e.g., different speed with which the application actually retrieves received data from the transport layer Example: TCP
Send and receive buffers in TCP

TCP maintains buffer at
Sender, to assist in error control Receiver, to store packets not yet retrieved by application or received out of order
Old TCP implementations used GoBack-N, discarded out-of-order packets
TCP flow control: Advertised window

In TCP, receiver can advertise size of its receiving buffer
Buffer space occupied: (NextByteExpected-1) LastByteRead Maximum buffer space available: MaxRcvdBuffer Advertised buffer space (Advertised window): MaxRcvdBuffer ((NextByteExpected-1) LastByteRead)
Sender
Sending application
Receiving application
Receiver
Recall: Advertised window limits the amount of data that a sender will inject into the network
TCP sender ensures that LastByteSent LastByteAcked AdvertisedWindow Equivalently: EffectiveWindow = AdvertisedWindow (LastByteSent LastByteAcked)
LastByteWritten
TCP
LastByteRead
TCP
LastByteAcked
LastByteSent
NextByteExpected
LastByteRcvd
WS 09/10, v 1.2
29
WS 09/10, v 1.2
30
Nagles algorithm self-clocking and windows

Recall TCP self-clocking: Arrival of an ACK is an indication that new data can be injected into the network What happens when an ACK for only small amount of data (e.g., 1 byte) arrives?
Send immediately? Network will be burdened by small packets (silly window syndrome) inefficient
Overview
Nagles algorithm describes how much data TCP is allowed to send

When application produces data to send if both available data and advertised window MSS send a full segment else if there is unacked data in flight, buffer new data until MSS is full else send all the new data now
WS 09/10, v 1.2
32
How to select timeout values?

If round trip time between sender and receiver is a known constant d: Timeout = d +
small to allow for processing
A simple timeout selection rule

Possible rule: Timeout = c for some constant c > 1
Suppose expected value E[D] = is known
Interesting case: What if round trip delay D is a random variable? Select timeout such that
It is small enough not to wait too long in case of packet error It is big enough not to prematurely send out a retransmission Full analysis is complicated
Need to take into account both packet error rate and distribution of D
How to determine constant c? Case 1: Distribution function FD of D is known

P (premature retrans.) = P (D > Timeout) = P (D > c) = 1- FD (c) Required: p > 1-FD(c) c > 1/ FD-1(1-p)
Case 2: Distribution of D unknown

P(D > timeout) = P (D - > c - ) < P(|D-| > (c-1)) < 2/ ((c-1))2 (Tschebyschow inqueality!) Suppose this probability should be bounded by p
Implies c < 1 + 2 / (p 2) Or: Timeout = + 2 / (p )
Let us look at some simple rules

Concentrate on the probability of prematurely sending out the first retransmission Assume this probability shall be bounded by a given probability p
Need also to know D for decent timeout selection!
WS 09/10, v 1.2
33
WS 09/10, v 1.2
34
Timeout values depend on delay distribution

Round trip time exponentially or lognormal distributed
14000 12000 Required timeout value 10000 8000 6000 4000 2000 0 -1 10 exponential (mu=1) lognormal (mu=1, sigma=2)
Estimating delay parameters

Since usually distribution knowledge is not available, only Tschebyscheff-based approximations feasible Necessary: Estimates of the round trip time average and standard deviation Suppose d1, , dn RTT values have been observed
Estimating average: dd = 1/n di Estimating standard deviation: sd = ( 1/(n-1) (dd-di)2 )1/2
Estimating distribution parameters easy as long as the RTT distribution does not change Not true in practice!
10 10 10 Bound on probability of prematurely sending out the first retransmission
-2 -3 -4
10
-5
RTT can vary considerably over time (load patterns, failures, )

35 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 36
WS 09/10, v 1.2
Estimating parameters when distribution changes

What to do when the RTT distribution changes over time?
I.e., the samples di were not generated from independent, identically distributed random variables There is no longer the expected delay, the standard deviation At best: attempt to find a good approximation of the current distributions parameters Danger: tracking changes too slowly, overreacting to temporary fluctuations
Autoregressive estimation of mean values

Example scenario:
Mean of RTTs follows a sine function Each RTT sample generated by an exponential distribution with appropriate mean = 0.8
120 Mean RTT Sample Estimated mean
100
80 60
RTT
40
Popular approach: autoregressive model

EstimatedRTT (i) = EstimatedRTT(i-1) + (1-) di (0,1) controls agility of the model, how quickly it tracks changes in the underlying stochastic process = 0.8 0.9, typically
20
0 0
50
100 Time
150
200
WS 09/10, v 1.2
38
Problem: Variance not considered

Previous estimation does not consider variance of RTTs
Autoregressive estimates of standard deviation have high overhead Instead: use deviation as a simple approximation
Jacobsen/Karels algorithm Example

Same scenario
= 0.8 (for both mean and deviation estimation) =4 600 RTT samples
100 90 80 70 60 RTT 50 40 30 20 10 0 0 100 200 300 Time 400 500 600 Mean RTT Sample Estimated mean Estimated deviation Timeout
Jacobsen/Karels algorithm for new estimation:

Differencen = SampleRTTn EstimatedRTTn-1 EstimatedRTTn = EstimatedRTTn-1 + ( Differencen) Deviationn = Deviationn-1 + (|Difference|n Deviationn-1) Timeoutn = EstimatedRTTn + Deviationn
5% larger than their determined timeout!

Due to exponential distribution of RTTs! Not a practical value!
(0,1) a constant, typically = 1, = 4 (in TCP) Deviation is an upper bound of standard deviation, but easier to compute
WS 09/10, v 1.2
40
10
Problem: ACKs do not refer to a transmission!

Simple algorithm described above cannot obtain correct RTT samples if packets have been retransmitted
ACKs refer to data/sequence numbers, not to individual packets!
Timer and packet loss

What happens after a packet loss?
Recall Ethernet behavior: Become more and more careful, back off Same idea here: After packet loss detected by timeout, use successively larger timeout values Exponential backoff: Double timeout value for each additional retransmission
Two examples:
Sender
Orig inal trans miss
Receiver
Sender
Orig inal trans miss
Receiver
SampleR TT
Retr ansm issio n
SampleR TT
ion
ion
AC K
AC K Retr ansm issio n
The estimation of RTT and basic timeout value is performed as described on previous slide The multiplicative factor for exponential backoff is reset upon ACK arrival
Solution: Do not take RTT samples for retransmitted packets (Karn/Partridge algorithm)
WS 09/10, v 1.2
42
Timeouts & timer granularity

Timeout calculation can suffer from coarse-grained timer resolution in the operating system
Example: some Unixes only have 500 ms resolution
Overview
Also: firing a timer heavily depends on relatively precise timer interrupts to be available
Sometimes, timeouts can happen only seconds after a packet has been transmitted far too sluggish
WS 09/10, v 1.2
43
WS 09/10, v 1.2
44
11
TCP Summary
TCP consists of
DR
0.93 * MSS RTT * p
TCP fairness & TCP friendliness

TCP attempts to
Adjust dynamically to the available bandwidth Fairly share this bandwidth among all connections I.e.: If n connections share a given bottleneck link, each connection obtains 1/n of its bandwidth
Reliable byte stream error control via GoBack-N or Selective Repeat (depending on version) Congestion control window-based, AIMD, slow start, congestion threshold Flow control advertised receiver window Connection management three-way handshake for setup and teardown
max. TCP BandWidth Max. Segment Size Round Trip Time loss probability
Interaction with other protocols

Bottleneck bandwidth depends on load of other protocols as well, e.g., UDP which is NOT congestion-controlled I.e., UDP traffic can push aside TCP traffic
TCP is perhaps the single most complicated and subtle protocol in the internet
Many little details and extensions are not discussed here Interaction of TCP with other layers is more complicated than it looks (e.g., wireless) because of hidden, implicit assumptions
Consequence: Transport protocols should be TCP friendly

They should not consume more bandwidth than a TCP connection would consume in a comparable situation
WS 09/10, v 1.2
46
TCP header
0 4 10 SrcPort SequenceNum Acknowledgment HdrLen 0 Checksum Options (variable) Data Flags AdvertisedWindow UrgPtr 16 DstPort 31
Overview
WS 09/10, v 1.2
47
WS 09/10, v 1.2
48
12
A simple demultiplexer transport protocol UDP

An example for an unreliable, datagram protocol: User Datagram Protocol (UDP) Only real function: (de)multiplex several data flows onto the IP layer and back to the applications In addition: ensures packets correctness
Checksum out of UDP header + data + pseudoheader (pivotal IP header fields) 0 16 31
SrcPort DstPort Length Data
Remote Procedure Call (RPC)

UDP and TCP have essentially simple semantics: Transport data from A to B
A bit like goto in sequential languages: no structure in data flow
More complex interactions can be directly captured in communication primitives

Example: Invocation of a procedure as a primitive, as opposed to two separate gotos
UDP header
Checksum
Remote procedure call

Remote object invocation is really the same thing from a communication perspective
WS 09/10, v 1.2
49
WS 09/10, v 1.2
50
Remote procedure call Structure

Caller (client) Arguments Return value Client stub Request Reply Request Arguments Callee (server) Return value Server stub Reply
Overview
Client
Requ est
Server Blocked Computing

Reply
Blocked
RPC protocol
RPC protocol
Blocked
Network
Important goal: transparency to caller/callee

Achieved (partially) by stubs
13
A few words on performance

Performance is still important Measuring performance
Measure relevant metrics Make sure that the sample size is big enough Make sure that samples are representative Be careful when using a coarse-grained clock Be sure that nothing unexpected is going on during your tests Caching can wreak havoc with measurements Understand what you are measuring and what is going on Be careful about extrapolating results Use a proper experimental design to finish in one human lifetime
System design for better performance

CPU speed is more important than network speed
Or interfaces like NIC CPU
Reduce packet count to reduce software overhead Minimize context switches Minimize copying You can buy more bandwidth but not lower delay Avoiding congestion is better than recovering from it Avoid timeouts Optimize for the common case Depending on network: design for speed, not utilization
Think about the right performance metric, it can be, e.g., energy efficiency rather than throughput or delay
WS 09/10, v 1.2
53
WS 09/10, v 1.2
54
Conclusion
Transport protocols can be anything from trivial to highly complex, depending on the purpose they serve They determine to a large degree the dynamics of a network and in particular its stability
It is trivial to build faster than TCP protocols, but they are unstable
The interdependencies of various mechanisms in a transport protocols can be very subtle with big consequences
E.g., fairness, coexistence of different TCP versions
WS 09/10, v 1.2
55
14

Computer Networks Chapter 8: Transport Layer: Goal of This Chapter

Uploaded by

Copyright:

Available Formats

You might also like

Computer Networks Chapter 8: Transport Layer: Goal of This Chapter

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Networks Chapter 8: Transport Layer: Goal of This Chapter

Uploaded by

Copyright:

Available Formats

Goal of this chapter

Computer Networks Chapter 8: Transport layer

A brief discussion of other transport protocols complements the chapter

Computer Networks Group Universitt Paderborn

Computer Networks - Ch. 8 - Transport layer

Transport protocols Services to upper layer

Reliable or unreliable transport

Support different interaction models

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

E.g., port numbers

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

Failure of nave solution

C ON NE REQ CTION UES T

Example failure scenario

N CON ED PT E C AC T RA NSF ER OK N CON ED PT E C AC

Problem are delayed duplicates

??? DROP ???

Computer Networks - Ch. 8 - Transport layer DROP

Adding an additional handshake

Is three-way handshake sufficient?

N CON ED PT E C AC CO CON NN F IRM ED

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

Three-way handshake + sequence numbers

Connection setup further issues

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

Connection identification in TCP

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

Two army problem

Analogy: Two army problem

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

Connection release in practice

Problem cases for connection release with 3WHS

Computer Networks - Ch. 8 - Transport layer

Computer Networks - Ch. 8 - Transport layer

State diagram of a TCP connection

TCP state sequence unfolding the state machine

(Partial) state diagram

SYN/SYN + ACK Send/SYN SYN/SYN + ACK ACK SYN + ACK/ACK

Close/FIN Close/FIN FIN_WAIT_1 ACK FIN_WAIT_2

ESTABLISHED FIN/ACK CLOSE_WAIT FIN/ACK

How does the TCP state sequence look like?

CLOSING ACK Timeout after two segment lifetimes TIME_WAIT

LAST_ACK ACK CLOSED

Computer Networks - Ch. 8 - Transport layer

TCP MSC connection establishment

TCP MSC connection release (1)

TCP @ Web server CLOSED

TCP @ Web browser CLOSED

TCP @ Web browser ESTABLISHED

Web browser appl. lose

FIN_WAIT_2 ESTABLISHED (continued on next slide)