Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 90

Lecture #14-21

3-1
Outline
 Transport-layer  Connection-oriented
services transport: TCP
 Multiplexing and  segment structure
demultiplexing  reliable data transfer
 Connectionless
 flow control
 connection management
transport: UDP
 Principles of congestion
 Principles of reliable
data transfer control
 TCP congestion control

3-2
Transport services and protocols
 provide logical communication application
transport
between app processes network
data link
running on different hosts physical
network
data link
network physical

lo
 transport protocols run in data link

gi
ca
physical
end systems

le
network

nd
data link
 send side: breaks app

-e
physical network

nd
data link
messages into segments, physical

tr
an
passes to network layer network

s
data link

po
physical

rt
 rcv side: reassembles
segments into messages, application
transport
passes to app layer network
data link
 more than one transport physical

protocol available to apps


 Internet: TCP and UDP

3-3
Transport vs. network layer
 network layer: logical Household analogy:
communication 12 kids sending letters
between hosts to 12 kids
 transport layer: logical  processes = kids
communication  app messages = letters
between processes in envelopes
 relies on, enhances,  hosts = houses
network layer services
 transport protocol =
Ann and Bill
 network-layer protocol
= postal service

3-4
Internet transport-layer protocols
 reliable, in-order application
transport

delivery (TCP) network


data link network
physical
congestion control
data link
 network physical

lo
data link

gi
 flow control

ca
physical

le
network
connection setup

nd
 data link

-e
physical network

nd
data link
 unreliable, unordered physical

tr
an
delivery: UDP
network

s
data link

po
physical

rt
 no-frills extension of
“best-effort” IP application
transport
network
 services not available: data link
physical

 delay guarantees
 bandwidth guarantees

3-5
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process

application P3 P1
P1 application P2 P4 application

transport transport transport

network network network

link link link

physical physical physical

host 2 host 3
host 1
3-6
How demultiplexing works
 host receives IP datagrams
 each datagram has source IP address,
destination IP address 32 bits
 each datagram carries 1 transport-
layer segment source port # dest port #
 each segment has source, destination
port number
(recall: well-known port numbers for other header fields
specific applications)
 host uses IP addresses & port numbers
to direct segment to appropriate socket
application
data
(message)

TCP/UDP segment format

3-7
Connectionless demultiplexing
 When host receives UDP
 Create sockets with port
segment:
numbers:
DatagramSocket mySocket1 = new
 checks destination port
DatagramSocket(99111); number in segment
DatagramSocket mySocket2 = new  directs UDP segment to
DatagramSocket(99222); socket with that port
 UDP socket identified by number
 IP datagrams with
two-tuple:
(dest IP address, dest port number) different source IP
addresses and/or source
port numbers directed
to same socket

3-8
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);

P2 P1
P1
P3

SP: 6428 SP: 6428


DP: 9157 DP: 5775

SP: 9157 SP: 5775


client DP: 6428 DP: 6428 Client
server
IP: A IP: C IP:B

SP provides “return address”

3-9
Connection-oriented demux
 TCP socket identified  Server host may support
by 4-tuple: many simultaneous TCP
 source IP address sockets:
 source port number  each socket identified by
 dest IP address its own 4-tuple
 dest port number  Web servers have
 recv host uses all four different sockets for
values to direct each connecting client
segment to appropriate  non-persistent HTTP will
socket have different socket for
each request

3-10
Connection-oriented demux
(cont)

P1 P4 P5 P6 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A
IP: C S-IP: B IP:B
D-IP:C D-IP:C

3-11
Connection-oriented demux:
Threaded Web Server

P1 P4 P2 P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

SP: 9157 SP: 9157


client DP: 80 DP: 80 Client
server
IP: A S-IP: A
IP: C S-IP: B IP:B
D-IP:C D-IP:C

3-12
Figure 11-1

Position of UDP in the TCP/IP protocol suite

3-13
Figure 11-2

UDP versus IP

3-14
Figure 11-3
Port numbers

3-15
Figure 11-4
IP addresses versus port numbers

3-16
Figure 11-5

IANA ranges

3-17
Figure 11-6

Socket addresses

3-18
UDP: User Datagram Protocol [RFC 768]

 “no frills,” “bare bones”


Internet transport Why is there a UDP?
protocol  no connection
 “best effort” service, UDP
establishment (which can
segments may be: add delay)
 lost  simple: no connection state
 delivered out of order at sender, receiver
to app  small segment header
 connectionless:  no congestion control: UDP
 no handshaking between can blast away as fast as
UDP sender, receiver desired
 each UDP segment
handled independently
of others

3-19
UDP: more
 often used for streaming
multimedia apps 32 bits
 loss tolerant
Length, in source port # dest port #
 rate sensitive bytes of UDP length checksum
segment,
 other UDP uses
including
 DNS header
 SNMP
 reliable transfer over UDP: Application
add reliability at data
application layer (message)
 application-specific
error recovery!
UDP segment format

3-20
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment

Sender: Receiver:
 treat segment contents  compute checksum of received
as sequence of 16-bit segment
integers  check if computed checksum
 checksum: addition (1’s equals checksum field value:
complement sum) of  NO - error detected
segment contents  YES - no error detected.
 sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later ….
field

3-21
Internet Checksum Example
 Note
 When adding numbers, a carryout from the
most significant bit needs to be added to the
result
 Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
3-22
Principles of Reliable data transfer
 important in app., transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine complexity of reliable data transfer protocol
(rdt)

3-23
Reliable data transfer: getting started
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

3-24
Reliable data transfer: getting started
We’ll:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer
 but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
1 event
uniquely determined 2
by next event actions

3-25
Rdt1.0: reliable transfer over a reliable channel

 underlying channel perfectly reliable


 no bit errors
 no loss of packets

 separate FSMs for sender, receiver:


 sender sends data into underlying channel
 receiver read data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

3-26
Rdt2.0: channel with bit errors
 underlying channel may flip bits in packet
 checksum to detect bit errors

 the question: how to recover from errors:


 acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
 negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
 sender retransmits pkt on receipt of NAK
 new mechanisms in rdt2.0 (beyond rdt1.0):
 error detection
 receiver feedback: control msgs (ACK,NAK) rcvr->sender

3-27
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for

call from
sender below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-28
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted?  sender adds sequence
 sender doesn’t know what number to each pkt
happened at receiver!  sender retransmits current
 can’t just retransmit: pkt if ACK/NAK garbled
possible duplicate  receiver discards (doesn’t
deliver up) duplicate pkt

stop and wait


Sender sends one packet,
then waits for receiver
response

3-29
rdt2.1: discussion
Sender: Receiver:
 seq # added to pkt  must check if received
 two seq. #’s (0,1) will packet is duplicate
suffice. Why?  state indicates whether
0 or 1 is expected pkt
 must check if received
seq #
ACK/NAK corrupted  note: receiver can not
 twice as many states
know if its last
 state must “remember” ACK/NAK received OK
whether “current” pkt
at sender
has 0 or 1 seq. #

3-30
rdt2.2: a NAK-free protocol
 same functionality as rdt2.1, using ACKs only
 instead of NAK, receiver sends ACK for last pkt
received OK
 receiver must explicitly include seq # of pkt being ACKed
 duplicate ACK at sender results in same action as
NAK: retransmit current pkt

3-31
rdt3.0: channels with errors and loss

New assumption: Approach: sender waits


underlying channel can “reasonable” amount of
also lose packets (data time for ACK
or ACKs)  retransmits if no ACK
 checksum, seq. #, ACKs, received in this time
retransmissions will be  if pkt (or ACK) just delayed
of help, but not enough (not lost):
 retransmission will be
duplicate, but use of seq.
#’s already handles this
 receiver must specify seq
# of pkt being ACKed
 requires countdown timer

3-32
rdt3.0 in action

3-33
rdt3.0 in action

3-34
Performance of rdt3.0
 rdt3.0 works, but performance stinks
 example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

Ttransmit = L (packet length in bits) 8kb/pkt


= = 8 microsec
R (transmission rate, bps) 10**9 b/sec

U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R microsec
onds
 U sender : utilization – fraction of time sender busy sending
 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
 network protocol limits use of physical resources!

3-35
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send
ACK

ACK arrives, send next


packet, t = RTT + L / R

U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R microsec
onds

3-36
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged pkts
 range of sequence numbers must be increased
 buffering at sender and/or receiver

 Two generic forms of pipelined protocols: go-Back-N,


selective repeat
3-37
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!

U 3*L/R .024
= = = 0.0008
sender 30.008
RTT + L / R microsecon
ds
3-38
Go-Back-N
Sender:
 k-bit seq # in pkt header
 “window” of up to N, consecutive unack’ed pkts allowed

 ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”


 may receive duplicate ACKs (see receiver)
 timer for each in-flight pkt
 timeout(n): retransmit pkt n and all higher seq # pkts in window

Transport Layer 3-39


GBN

ACK-only: always send ACK for correctly-received pkt


with highest in-order seq #
 may generate duplicate ACKs
 need only remember expectedseqnum
 out-of-order pkt:
 discard (don’t buffer) -> no receiver buffering!
 Re-ACK pkt with highest in-order seq #

Transport Layer 3-40


GBN in
action

Transport Layer 3-41


Selective Repeat
 receiver individually acknowledges all correctly
received pkts
 buffers pkts, as needed, for eventual in-order delivery
to upper layer
 sender only resends pkts for which ACK not
received
 sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 again limits seq #s of sent, unACKed pkts

Transport Layer 3-42


Selective repeat: sender, receiver windows

Transport Layer 3-43


3-44
Example Sliding Window

45
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
 if next available seq # in  send ACK(n)
window, send pkt  out-of-order: buffer

timeout(n):  in-order: deliver (also


 resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
 mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
 if n smallest unACKed pkt,
 ACK(n)
advance window base to
next unACKed seq # otherwise:
 ignore

Transport Layer 3-46


Selective repeat in action

Transport Layer 3-47


Selective repeat:
dilemma
Example:
 seq #’s: 0, 1, 2, 3
 window size=3

 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new in
(a)

Q: what relationship
between seq # size and
window size?
Transport Layer 3-48
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

 point-to-point:  full duplex data:


 one sender, one receiver  bi-directional data flow

 reliable, in-order byte in same connection


 MSS: maximum segment
stream:
size
 no “message boundaries”
 connection-oriented:
 pipelined:
 handshaking (exchange
 TCP congestion and flow
of control msgs) init’s
control set window size sender, receiver state
 send & receive buffers before data exchange
 flow controlled:
application application
sock et
w rites data reads data
sock et
 sender will not
door
overwhelm receiver
door
TCP TCP
send buffer receive buffer
segm ent

3-49
Figure 12-19
TCP segment format

3-50
Figure 12-20

Control field

3-51
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

3-52
TCP seq. #’s and ACKs
Seq. #’s:
Host A Host B
 byte stream
“number” of first User Seq=4
2, A C
byte in segment’s types K=79,
da t a =
‘C’ ‘C’
data host ACKs
ACKs: receipt of
ta = ‘C’ ‘C’, echoes
 seq # of next byte 3, da
9 , A CK=4 back ‘C’
expected from other S eq =
7

side
 cumulative ACK host ACKs
receipt Seq=4
Q: how receiver handles of echoed 3, ACK
=80
out-of-order segments ‘C’
 A: TCP spec doesn’t
say, - up to
time
implementor
simple telnet scenario

3-53
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value?  SampleRTT: measured time from
 longer than RTT segment transmission until ACK
 but RTT varies
receipt
 ignore retransmissions
 too short: premature
timeout  SampleRTT will vary, want
 unnecessary estimated RTT “smoother”
 average several recent
retransmissions
 too long: slow reaction measurements, not just
to segment loss current SampleRTT

3-54
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

 Exponential weighted moving average


 influence of past sample decreases exponentially fast
 typical value:  = 0.125

3-55
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

3-56
TCP Round Trip Time and Timeout
Setting the timeout
 EstimtedRTT plus “safety margin”
 large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from EstimatedRTT:

DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|

(typically,  = 0.25)

Then set timeout interval:

TimeoutInterval = EstimatedRTT + 4*DevRTT

3-57
RTO calculation for Time-out (Karn’s Algorithm)

 Do not consider the round-trip time of a retransmitted


segment in the calculation of RTTs.
 Do not update the value of RTTs until you send a segment
and receive an acknowledgment without the need for
retransmission.
 TCP does not consider the RTT of a retransmitted segment
in its calculation of a new RTO.
 Exponential Backoff Algorithm: The value of RTO is
doubled for each retransmission.
 So if the segment is retransmitted once, the value is two
times the RTO. If it transmitted twice, the value is four times
the RTO, and so on. 3-58
TCP reliable data transfer
 TCP creates rdt  Retransmissions are
service on top of IP’s triggered by:
unreliable service  timeout events
 Pipelined segments  duplicate acks
 Cumulative acks  Initially consider
 TCP uses single simplified TCP sender:
 ignore duplicate acks
retransmission timer
 ignore flow control,
congestion control

3-59
TCP sender events:
data rcvd from app: timeout:
 Create segment with  retransmit segment
seq # that caused timeout
 seq # is byte-stream  restart timer
number of first data Ack rcvd:
byte in segment  If acknowledges
 start timer if not
previously unacked
already running (think segments
of timer as for oldest  update what is known to
unacked segment) be acked
 expiration interval:  start timer if there are
TimeOutInterval outstanding segments

3-60
TCP Flow Control
flow control
sender won’t overflow
 receive side of TCP
receiver’s buffer by
connection has a transmitting too
receive buffer: much,
too fast

 speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
 app process may be
slow at reading from
buffer
3-61
TCP Flow control: how it works
 Rcvr advertises spare
room by including value
of RcvWindow in
segments
 Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order  guarantees receive
segments) buffer doesn’t overflow
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]

3-62
Chapter 3 outline
 3.1 Transport-layer  3.5 Connection-oriented
services transport: TCP
 3.2 Multiplexing and  segment structure
demultiplexing  reliable data transfer
 3.3 Connectionless
 flow control
 connection management
transport: UDP
 3.6 Principles of
 3.4 Principles of
reliable data transfer congestion control
 3.7 TCP congestion
control

3-63
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
 initialize TCP variables:  specifies initial seq #
 seq. #s  no data
 buffers, flow control
Step 2: server host receives
info (e.g. RcvWindow) SYN, replies with SYNACK
 client: connection initiator segment
Socket clientSocket = new
Socket("hostname","port
 server allocates buffers
 specifies server initial seq.
number");
 server: contacted by client #
Socket connectionSocket = Step 3: client receives SYNACK,
welcomeSocket.accept(); replies with ACK segment,
which may contain data

3-64
Figure 12-28

Three-way handshaking

3-65
TCP Connection Management (cont.)

Closing a connection: client server

client closes socket: close


FIN
clientSocket.close();

Step 1: client end system


ACK
sends TCP FIN control close
segment to server FIN

Step 2: server receives FIN,

timed wait
ACK
replies with ACK. Closes
connection, sends FIN.

closed

3-66
TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK.
closing
FIN
 Enters “timed wait” - will
respond with ACK to
received FINs
ACK
closing
Step 4: server, receives ACK. FIN
Connection closed.

Note: with small modification,


timed wait
ACK
can handle simultaneous FINs.
closed

closed

3-67
Figure 12-29

Four-way handshaking

3-68
TCP Connection Management (cont)

TCP server
lifecycle

TCP client
lifecycle

3-69
Principles of Congestion Control

Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!

Transport Layer 3-70


Causes/costs of congestion: scenario 1
Host A out
 two senders, two in : original data

receivers
unlimited shared
 one router,
Host B
output link buffers

infinite buffers
 no retransmission

 large delays
when congested
 maximum
achievable
throughput
Transport Layer 3-71
Causes/costs of congestion: scenario 2
 one router, finite buffers
 sender retransmission of lost packet

Host A in : original data out

'in : original data, plus


retransmitted data

Host B finite shared output


link buffers

Transport Layer 3-72


Causes/costs of congestion: scenario 2
 always: = 
 (goodput)
in out
 “perfect” retransmission only when loss:  > out
in
 retransmission of delayed (not lost) packet makes 
in larger
(than perfect case) for same out
R/2 R/2 R/2

R/3
out

out

out
R/4

R/2 R/2 R/2


in in in

a. b. c.
“costs” of congestion:
 more work (retrans) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-73
Causes/costs of congestion: scenario 3
 four senders
Q: what happens as 
 multihop paths in
and  increase ?
 timeout/retransmit in
Host A out
in : original data
'in : original data, plus
retransmitted data
finite shared output
link buffers

Host B

Transport Layer 3-74


Causes/costs of congestion: scenario 3
H 
o
s o
t
u
A
t

H
o
s
t
B

Another “cost” of congestion:


 when packet dropped, any “upstream transmission
capacity used for that packet was wasted!

Transport Layer 3-75


Approaches towards congestion control
Two broad approaches towards congestion control:

End-end congestion Network-assisted


control: congestion control:
 no explicit feedback from  routers provide feedback
network to end systems
 congestion inferred from  single bit indicating
end-system observed loss, congestion (SNA,
delay DECbit, TCP/IP ECN,
 approach taken by TCP ATM)
 explicit rate sender
should send at

Transport Layer 3-76


Case study: ATM ABR congestion control

ABR: available bit rate: RM (resource management)


 “elastic service” cells:
 if sender’s path  sent by sender, interspersed
“underloaded”: with data cells
 sender should use  bits in RM cell set by switches
available bandwidth (“network-assisted”)
 if sender’s path  NI bit: no increase in rate

congested: (mild congestion)


 sender throttled to  CI bit: congestion indication

minimum guaranteed  RM cells returned to sender by


rate receiver, with bits intact

Transport Layer 3-77


Case study: ATM ABR congestion control

 two-byte ER (explicit rate) field in RM cell


 congested switch may lower ER value in cell
 sender’ send rate thus minimum supportable rate on path

 EFCI bit in data cells: set to 1 in congested switch


 if data cell preceding RM cell has EFCI set to 1, receiver
sets CI bit of RM cell to 1 and send back to sender.

Transport Layer 3-78


TCP Congestion Control
 end-end control (no network How does sender
assistance) perceive congestion?
 sender limits transmission:
 loss event = timeout or
LastByteSent-LastByteAcked
3 duplicate acks
 CongWin
 Roughly,
 TCP sender reduces
rate (CongWin) after
CongWin loss event
rate = Bytes/sec
RTT
 CongWin is dynamic, function of three mechanisms:
perceived network congestion  AIMD
 slow start
 conservative after
timeout events
Transport Layer 3-79
TCP AIMD
multiplicative decrease: additive increase:
cut CongWin in half increase CongWin by
after loss event 1 MSS every RTT in
the absence of loss
congestion
window events: probing
24 Kbytes

16 Kbytes

8 Kbytes

time

Long-lived TCP connection


Transport Layer 3-80
TCP Slow Start
 When connection begins,
 When connection begins,
CongWin = 1 MSS increase rate
exponentially fast until
 Example: MSS = 500
bytes & RTT = 200 msec first loss event
 initial rate = 20 kbps
 available bandwidth may
be >> MSS/RTT
 desirable to quickly ramp
up to respectable rate

Transport Layer 3-81


TCP Slow Start (more)
 When connection Host A Host B
begins, increase rate
exponentially until
one s e gm
ent

RTT
first loss event:
two segm
 double CongWin every en ts
RTT
 done by incrementing
CongWin for every ACK four segm
ents
received
 Summary: initial rate
is slow but ramps up
exponentially fast time

Transport Layer 3-82


Refinement
Philosophy:
 After 3 dup ACKs:
 CongWin is cut in half • 3 dup ACKs indicates
 window then grows linearly network capable of
 But after timeout event:
delivering some segments
 CongWin instead set to 1 MSS;
 window then grows exponentially
• timeout before 3 dup
 to a threshold, then grows linearly
ACKs is “more alarming”

Transport Layer 3-83


Refinement (more)
Q: When should the
exponential increase
switch to linear?
A: When CongWin gets
to 1/2 of its value
before timeout.

Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event

Transport Layer 3-84


Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.

Transport Layer 3-85


TCP throughput
 What’s the average throughout ot TCP as a
function of window size and RTT?
 Ignore slow start
 Let W be the window size when loss occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2,
throughput to W/2RTT.
 Average throughout: .75 W/RTT

Transport Layer 3-86


TCP Futures
 Example: 1500 byte segments, 100ms RTT, want 10
Gbps throughput
 Requires window size W = 83,333 in-flight
segments
 Throughput in terms of loss rate:

1.22  MSS
RTT L
 ➜ L = 2·10-10 Wow
 New versions of TCP for high-speed needed!

Transport Layer 3-87


TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

Transport Layer 3-88


Fairness (more)
Fairness and UDP Fairness and parallel TCP
 Multimedia apps often connections
 nothing prevents app from
do not use TCP
 do not want rate opening parallel cnctions
throttled by congestion between 2 hosts.
control  Web browsers do this
 Instead use UDP:  Example: link of rate R
 pump audio/video at
supporting 9 cnctions;
constant rate, tolerate
packet loss
 new app asks for 1 TCP, gets
rate R/10
 Research area: TCP  new app asks for 11 TCPs,
friendly gets R/2 !

Transport Layer 3-89


Thank You

Transport Layer 3-90

You might also like