Computer Networks Chapter 8: Transport Layer: Goal of This Chapter

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Goal of this chapter

Computer Networks Chapter 8: Transport layer

We will discuss the mechanisms in addition to congestion control that are required to build a reliable, connectionoriented, end-to-end transport protocol
With TCP as the evident case study, intermixed with the presentation We will draw heavily on the material from the link layer, as many similar problems are solved there in similar fashion e.g., sliding window for error control

Holger Karl

A brief discussion of other transport protocols complements the chapter


Namely, UDP and RPC

Computer Networks Group Universitt Paderborn

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

Transport protocols Services to upper layer


Connection-less and connection-oriented transport service
Connection management necessary as auxiliary service setup and teardown

Reliable or unreliable transport


In-order, all packets

Be a good citizen in the network perform congestion control Allow several transport endpoints in a single host
Demultiplexing

Support different interaction models


Byte stream, messages, remote procedure invocation

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

Addressing
Provide multiple service access points to multiplex several applications
SAPs can identify connections or data flows

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

E.g., port numbers


Dynamically allocated Predefined for wellknown services e.g., port 80 for Web server

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

Connection establishment
How to establish a joint context, a connection, between sender and receiver?
Note: only relevant in end-system, network layer assumed to be connection-less

Failure of nave solution


Nave solution fails in realistic networks
Where packets can be lost, stored/reordered, and duplicated Transfer money
C ON NE REQ CTION UES T

Nave solution:
Sender sends a CONNECTION REQUEST Receiver answers by a CONNECTION ACCEPTED Sender proceeds once that message is received

C ON NE REQ CTION UES T

Example failure scenario


All packets are duplicated and delayed Due to congestion, errors, ... and re-routing, e.g. In a banking application, e.g. Result: two independent transactions performed, where only one was intended

N CON ED PT E C AC T RA NSF ER OK N CON ED PT E C AC

OK

N CON ED PT E C AC DAT A

Problem are delayed duplicates


WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 7 WS 09/10, v 1.2

??? DROP ???

OK
OK
8

Computer Networks - Ch. 8 - Transport layer DROP

Adding an additional handshake


First idea: the sender has to re-confirm to the receiver that it actually wants to set up this connection Add a third message to the connection setup phase Three-way handshake This third message can already carry data
C ON NE REQ CTION UES T

Is three-way handshake sufficient?


No, it does not protect against delayed duplicates!
Problem: If both the connection request and the connection confirmation are duplicated and delayed, receiver again has no way to ascertain whether this is a fresh or an old copy

N CON ED PT E C AC CO CON NN F IRM ED

Solution: Have the sender answer a question that the receiver asks!
Actually: Put sequence numbers into
connection request connection acknowledgement, and connection confirmation

Have to be copied by the receiving party to the other side Connection only established if the correct number is provided

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

10

Three-way handshake + sequence numbers


Two examples for critical cases (which are handled correctly): Connection request appears as an old duplicate: Connection request & confirmation appear as old duplicates:

Connection setup further issues


Further problems due to crashing nodes, need some memory Sequence numbers negotiated here are also used for the following data packets Terminology for TCP
Connection setup SYN (synchronize) packet Connection accepted SYN/ACK packet (because both the previous sequence number is acknowledged and a new sequence number from the receiver is proposed) Connection confirmation ACK packet (combined with DATA, as a rule) Problem: SYN attack flooding with nonsense SYN packets

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

11

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

12

Connection identification in TCP


A TCP connection is set up
Between a single sender and a single receiver More precisely, between application processes running on these systems TCP can multiplex several such connections over the network layer, using the port numbers as Transport SAP identifiers

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

A TCP connection is thus identified by a four tuple: (Source Port, Source IP Address, Destination Port, Destination IP Address)

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

13

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

14

Connection release
Once connection context between two peers is established, releasing a connection should be easy
Goal: Only release connection when both peers have agreed that they have received all data and have nothing more to say I.e., both sides must have invoked a Close-like service primitive

Two army problem


Coordinated attack
Two armies form up for an attack against each other One army is split into two parts that have to attack together alone they will lose Commanders of the parts communicate via messengers who might be captured

It fact, it is impossible
Problem: How to be sure that the other peer knows that you know that it knows that you know that all data have been transmitted and that the connection can now safely be terminated?

Which rules shall the commanders use to agree on an attack date? Provably unsolvable if the network can loose messages
How to coordinate?

Analogy: Two army problem

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

15

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

16

Connection release in practice


Two army problem equivalent to connection release But: when releasing a connection, bigger risks can be taken Usual approach: Three-way handshake again
Send disconnect request (DR), set timer, wait for DR from peer, acknowledge it

Problem cases for connection release with 3WHS


Lost ACK solved by (optimistic) timer in Host 2 Lost second DR solved by retransmission of first DR Timer solves (optimistically) case when 2nd DR and ACK are lost

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

17

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

18

State diagram of a TCP connection


TCP uses three-way handshake for connection setup and release
FIN, FIN/ACK for release Complicated by ability to have asymmetric, half-open connections and data in transit while close is in progress
CLOSED Active open/SYN Passive open Close Close LISTEN

TCP state sequence unfolding the state machine


Example scenario: Web server/client
Web server opens a socket and listens on a connection Web browser connects to that socket Browser sends a request for a web page and then closes its socket for sending (remains open for receiving data) Server receives request, sends Web page, then closes its socket also for sending

(Partial) state diagram


Expresses both server and client aspects Each has a separate copy Legend: event/action, segments all caps, local events normal capitalization
WS 09/10, v 1.2
SYN_RCVD

SYN/SYN + ACK Send/SYN SYN/SYN + ACK ACK SYN + ACK/ACK

SYN_SENT

Close/FIN Close/FIN FIN_WAIT_1 ACK FIN_WAIT_2


AC K

ESTABLISHED FIN/ACK CLOSE_WAIT FIN/ACK


+ FI N/ AC K

How does the TCP state sequence look like?


Simplifying assumption:
Close/FIN

CLOSING ACK Timeout after two segment lifetimes TIME_WAIT

LAST_ACK ACK CLOSED

No errors No unusual behavior (closing a socket before establishing connection, ) ! Note: Have a look at the netstat tool (Linux, Cygwin)
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 20

FIN/ACK

Computer Networks - Ch. 8 - Transport layer

19

TCP MSC connection establishment


Web server appl.
Pass ive open

TCP MSC connection release (1)


Web browser appl. Web server appl. TCP @ Web server ESTABLISHED
e Activ open FIN FIN

TCP @ Web server CLOSED

Network

TCP @ Web browser CLOSED

Network

TCP @ Web browser ESTABLISHED


C

Web browser appl. lose

LISTEN
SYN SYN

SYN

FIN_WAIT_1

SYN_SENT
+AC K SYN + ACK ACK

ACK

CLOSE_WAIT

ACK

SYN_RCVD

FIN_WAIT_2 ESTABLISHED (continued on next slide)

ACK

ESTABLISHED
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 21 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 22

TCP MSC connection release (2)


Web server appl.
Clos e

Overview
TCP @ Web browser Web browser appl.

TCP @ Web server CLOSE_WAIT


FIN

Network

FIN_WAIT_2

LAST_ACK

FIN ACK

ACK

TIME_WAIT Timeout

Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

CLOSED CLOSED

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

23

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

24

Flow control
Recall: Flow control = prevent a fast sender from overrunning a slow receiver
Similar issue in link and transport layer

Flow control buffer allocation


In order to support outstanding packets, the sender either
Has to rely on the receiver to process packets as they come in (packets must not be reordered) unrealistic, or Has to assume that the receiver has sufficient buffer space available

In transport layer more complicated


Many connections, need to adapt the amount of buffer per connection dynamically (instead of just simply allocating a fixed amount of buffer space per outgoing link) Transport layer PDUs can differ widely in size, unlike link layer frames Networks packet buffering capability clouds the picture

The more buffer, the more outstanding packets


Necessary to obtain highly efficient transmission, recall bandwidthdelay product!

How does sender have buffer assurance?


Sender can request buffer space Receiver can tell sender: I have X buffers available at the moment
For sliding window protocols: Influence size of senders send window

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

25

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

26

Example of flow control with ACK/permit separation


Arrows show direction of transmission, indicates lost packet Potential deadlock in step 16 when control PDU is lost and not retransmitted

Flow control permits and acknowledgements


Distinguish
Receiver-Permits (Receiver has buffer space, go ahead and send more data) Acknowledgements (Receiver has received certain packets)

Should be separated in real-world protocols!


Unfortunately, often intermixed TCP, e.g., uses ACKs as Network-Permits; see congestion control chapter

Can be combined with dynamically changing buffer space at the receiver


Due to, e.g., different speed with which the application actually retrieves received data from the transport layer Example: TCP
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 27 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 28

Send and receive buffers in TCP


TCP maintains buffer at
Sender, to assist in error control Receiver, to store packets not yet retrieved by application or received out of order
Old TCP implementations used GoBack-N, discarded out-of-order packets

TCP flow control: Advertised window


In TCP, receiver can advertise size of its receiving buffer
Buffer space occupied: (NextByteExpected-1) LastByteRead Maximum buffer space available: MaxRcvdBuffer Advertised buffer space (Advertised window): MaxRcvdBuffer ((NextByteExpected-1) LastByteRead)

Sender

Sending application

Receiving application

Receiver

Recall: Advertised window limits the amount of data that a sender will inject into the network
TCP sender ensures that LastByteSent LastByteAcked AdvertisedWindow Equivalently: EffectiveWindow = AdvertisedWindow (LastByteSent LastByteAcked)

LastByteWritten

TCP

LastByteRead

TCP

LastByteAcked

LastByteSent

NextByteExpected

LastByteRcvd

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

29

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

30

Nagles algorithm self-clocking and windows


Recall TCP self-clocking: Arrival of an ACK is an indication that new data can be injected into the network What happens when an ACK for only small amount of data (e.g., 1 byte) arrives?
Send immediately? Network will be burdened by small packets (silly window syndrome) inefficient

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

Nagles algorithm describes how much data TCP is allowed to send


When application produces data to send if both available data and advertised window MSS send a full segment else if there is unacked data in flight, buffer new data until MSS is full else send all the new data now
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 31

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

32

How to select timeout values?


If round trip time between sender and receiver is a known constant d: Timeout = d +
small to allow for processing

A simple timeout selection rule


Possible rule: Timeout = c for some constant c > 1
Suppose expected value E[D] = is known

Interesting case: What if round trip delay D is a random variable? Select timeout such that
It is small enough not to wait too long in case of packet error It is big enough not to prematurely send out a retransmission Full analysis is complicated
Need to take into account both packet error rate and distribution of D

How to determine constant c? Case 1: Distribution function FD of D is known


P (premature retrans.) = P (D > Timeout) = P (D > c) = 1- FD (c) Required: p > 1-FD(c) c > 1/ FD-1(1-p)

Case 2: Distribution of D unknown


P(D > timeout) = P (D - > c - ) < P(|D-| > (c-1)) < 2/ ((c-1))2 (Tschebyschow inqueality!) Suppose this probability should be bounded by p
Implies c < 1 + 2 / (p 2) Or: Timeout = + 2 / (p )

Let us look at some simple rules


Concentrate on the probability of prematurely sending out the first retransmission Assume this probability shall be bounded by a given probability p

Need also to know D for decent timeout selection!

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

33

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

34

Timeout values depend on delay distribution


Round trip time exponentially or lognormal distributed
14000 12000 Required timeout value 10000 8000 6000 4000 2000 0 -1 10 exponential (mu=1) lognormal (mu=1, sigma=2)

Estimating delay parameters


Since usually distribution knowledge is not available, only Tschebyscheff-based approximations feasible Necessary: Estimates of the round trip time average and standard deviation Suppose d1, , dn RTT values have been observed
Estimating average: dd = 1/n di Estimating standard deviation: sd = ( 1/(n-1) (dd-di)2 )1/2

Estimating distribution parameters easy as long as the RTT distribution does not change Not true in practice!
10 10 10 Bound on probability of prematurely sending out the first retransmission
-2 -3 -4

10

-5

RTT can vary considerably over time (load patterns, failures, )


35 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 36

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

Estimating parameters when distribution changes


What to do when the RTT distribution changes over time?
I.e., the samples di were not generated from independent, identically distributed random variables There is no longer the expected delay, the standard deviation At best: attempt to find a good approximation of the current distributions parameters Danger: tracking changes too slowly, overreacting to temporary fluctuations

Autoregressive estimation of mean values


Example scenario:
Mean of RTTs follows a sine function Each RTT sample generated by an exponential distribution with appropriate mean = 0.8
120 Mean RTT Sample Estimated mean

100

80 60

RTT

40

Popular approach: autoregressive model


EstimatedRTT (i) = EstimatedRTT(i-1) + (1-) di (0,1) controls agility of the model, how quickly it tracks changes in the underlying stochastic process = 0.8 0.9, typically
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 37

20

0 0

50

100 Time

150

200

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

38

Problem: Variance not considered


Previous estimation does not consider variance of RTTs
Autoregressive estimates of standard deviation have high overhead Instead: use deviation as a simple approximation

Jacobsen/Karels algorithm Example


Same scenario
= 0.8 (for both mean and deviation estimation) =4 600 RTT samples
100 90 80 70 60 RTT 50 40 30 20 10 0 0 100 200 300 Time 400 500 600 Mean RTT Sample Estimated mean Estimated deviation Timeout

Jacobsen/Karels algorithm for new estimation:


Differencen = SampleRTTn EstimatedRTTn-1 EstimatedRTTn = EstimatedRTTn-1 + ( Differencen) Deviationn = Deviationn-1 + (|Difference|n Deviationn-1) Timeoutn = EstimatedRTTn + Deviationn

5% larger than their determined timeout!


Due to exponential distribution of RTTs! Not a practical value!

(0,1) a constant, typically = 1, = 4 (in TCP) Deviation is an upper bound of standard deviation, but easier to compute
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 39

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

40

10

Problem: ACKs do not refer to a transmission!


Simple algorithm described above cannot obtain correct RTT samples if packets have been retransmitted
ACKs refer to data/sequence numbers, not to individual packets!

Timer and packet loss


What happens after a packet loss?
Recall Ethernet behavior: Become more and more careful, back off Same idea here: After packet loss detected by timeout, use successively larger timeout values Exponential backoff: Double timeout value for each additional retransmission

Two examples:
Sender
Orig inal trans miss

Receiver

Sender
Orig inal trans miss

Receiver

SampleR TT

Retr ansm issio n

SampleR TT

ion

ion

AC K

AC K Retr ansm issio n

The estimation of RTT and basic timeout value is performed as described on previous slide The multiplicative factor for exponential backoff is reset upon ACK arrival

Solution: Do not take RTT samples for retransmitted packets (Karn/Partridge algorithm)
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 41

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

42

Timeouts & timer granularity


Timeout calculation can suffer from coarse-grained timer resolution in the operating system
Example: some Unixes only have 500 ms resolution

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

Also: firing a timer heavily depends on relatively precise timer interrupts to be available
Sometimes, timeouts can happen only seconds after a packet has been transmitted far too sluggish

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

43

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

44

11

TCP Summary
TCP consists of

DR

0.93 * MSS RTT * p

TCP fairness & TCP friendliness


TCP attempts to
Adjust dynamically to the available bandwidth Fairly share this bandwidth among all connections I.e.: If n connections share a given bottleneck link, each connection obtains 1/n of its bandwidth

Reliable byte stream error control via GoBack-N or Selective Repeat (depending on version) Congestion control window-based, AIMD, slow start, congestion threshold Flow control advertised receiver window Connection management three-way handshake for setup and teardown

max. TCP BandWidth Max. Segment Size Round Trip Time loss probability

Interaction with other protocols


Bottleneck bandwidth depends on load of other protocols as well, e.g., UDP which is NOT congestion-controlled I.e., UDP traffic can push aside TCP traffic

TCP is perhaps the single most complicated and subtle protocol in the internet
Many little details and extensions are not discussed here Interaction of TCP with other layers is more complicated than it looks (e.g., wireless) because of hidden, implicit assumptions
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 45

Consequence: Transport protocols should be TCP friendly


They should not consume more bandwidth than a TCP connection would consume in a comparable situation

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

46

TCP header
0 4 10 SrcPort SequenceNum Acknowledgment HdrLen 0 Checksum Options (variable) Data Flags AdvertisedWindow UrgPtr 16 DstPort 31

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

47

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

48

12

A simple demultiplexer transport protocol UDP


An example for an unreliable, datagram protocol: User Datagram Protocol (UDP) Only real function: (de)multiplex several data flows onto the IP layer and back to the applications In addition: ensures packets correctness
Checksum out of UDP header + data + pseudoheader (pivotal IP header fields) 0 16 31
SrcPort DstPort Length Data

Remote Procedure Call (RPC)


UDP and TCP have essentially simple semantics: Transport data from A to B
A bit like goto in sequential languages: no structure in data flow

More complex interactions can be directly captured in communication primitives


Example: Invocation of a procedure as a primitive, as opposed to two separate gotos

UDP header

Checksum

Remote procedure call


Remote object invocation is really the same thing from a communication perspective

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

49

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

50

Remote procedure call Structure


Caller (client) Arguments Return value Client stub Request Reply Request Arguments Callee (server) Return value Server stub Reply

Overview
Services & addressing Connection setup Connection release Flow control Timer and timeouts TCP summary UDP & RPC Performance issues

Client
Requ est

Server Blocked Computing


Reply

Blocked

RPC protocol

RPC protocol

Blocked

Network

Important goal: transparency to caller/callee


Achieved (partially) by stubs
WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 51 WS 09/10, v 1.2 Computer Networks - Ch. 8 - Transport layer 52

13

A few words on performance


Performance is still important Measuring performance
Measure relevant metrics Make sure that the sample size is big enough Make sure that samples are representative Be careful when using a coarse-grained clock Be sure that nothing unexpected is going on during your tests Caching can wreak havoc with measurements Understand what you are measuring and what is going on Be careful about extrapolating results Use a proper experimental design to finish in one human lifetime

System design for better performance


CPU speed is more important than network speed
Or interfaces like NIC CPU

Reduce packet count to reduce software overhead Minimize context switches Minimize copying You can buy more bandwidth but not lower delay Avoiding congestion is better than recovering from it Avoid timeouts Optimize for the common case Depending on network: design for speed, not utilization

Think about the right performance metric, it can be, e.g., energy efficiency rather than throughput or delay

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

53

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

54

Conclusion
Transport protocols can be anything from trivial to highly complex, depending on the purpose they serve They determine to a large degree the dynamics of a network and in particular its stability
It is trivial to build faster than TCP protocols, but they are unstable

The interdependencies of various mechanisms in a transport protocols can be very subtle with big consequences
E.g., fairness, coexistence of different TCP versions

WS 09/10, v 1.2

Computer Networks - Ch. 8 - Transport layer

55

14

You might also like