Professional Documents
Culture Documents
Transport Layer: 3516 - Computer Networks
Transport Layer: 3516 - Computer Networks
Transport Layer: 3516 - Computer Networks
lo
running on different hosts
gi
ca
• Transport protocols run in
le
nd
end systems
-en
– send side: breaks app
dt
messages into segments,
ra
ns
passes to network layer
po
– receive side: reassembles
rt
application
transport
segments into messages, network
data link
passes to app layer physical
• reliable, in-order
application
transport
delivery (TCP)
network
data link
physical
network
– congestion control
lo
data link
gi
physical network
– flow control
ca
data link
physical
le
– connection setup
nd
-
• unreliable, unordered
en
network
dt
data link
delivery: UDP
physical network
ra
data link
ns
physical
– no-frills extension of
po
network
“best-effort” IP
rt
data link
application
physical network transport
data link
• services not available:
network
physical data link
physical
– delay guarantees
– bandwidth guarantees
Chapter 3 outline
• 3.5 Connection-oriented
• 3.1 Transport-layer services transport: TCP
• 3.2 Multiplexing and demultiplexing
– segment structure
• 3.3 Connectionless transport: UDP data transfer
– reliable
• – flow
3.4 Principles of reliable data control
transfer
– connection management
• 3.6 Principles of
congestion control
• 3.7 TCP congestion
control
Chapter 3 outline
• 3.5 Connection-oriented
• 3.1 Transport-layer services transport: TCP
• 3.2 Multiplexing and demultiplexing
– segment structure
• 3.3 Connectionless transport: UDP data transfer
– reliable
• – flow
3.4 Principles of reliable data control
transfer
– connection management
• 3.6 Principles of
congestion control
• 3.7 TCP congestion
control
UDP: User Datagram Protocol [RFC 768]
Sender: Receiver:
• treat segment contents • compute checksum of
as sequence of 16-bit received segment
integers • check if computed checksum
• checksum: addition (1’s equals checksum field value:
complement sum) of – NO - error detected
segment contents – YES - no error detected.
• sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later
field ….
Internet Checksum Example
• Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
send receive
side side
???
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
Rdt2.0 Has a Fatal Flaw!
Receiver:
Sender:
• must
seq #check
addediftoreceived
pkt packet is duplicate
– state
• two seq.indicates
#’s (0,1)whether 0 or 1 is expected pkt seq #
will suffice
• note: receiver
must check can not know
if received if its last
ACK/NAK ACK/NAK
corrupted
• received OK atstates
twice as many sender
– state must “remember” whether “current” pkt has 0 or 1
seq. #
Rdt2.2: a NAK-free Protocol
• Reduce type of response ACK only
• Same functionality as rdt2.1, using ACKs only
• Instead of NAK, receiver sends ACK for last pkt
received OK
– receiver must explicitly include seq # of pkt being
ACKed
• Duplicate ACK at sender results in same action as
NAK: retransmit current pkt
Rdt2.2: Sender & Receiver Fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for
Wait for
ACK 0 isACK(rcvpkt,1) )
call 0 from
above
udt_send(sndpkt)
sender FSM
fragment
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) || L
has_seq1(rcvpkt)) Wait for
receiver FSM
0 from
udt_send(sndpkt)
below fragment
How to determine if a
packet is lost?
Rdt3.0: Channels with Errors and Loss
Approach: sender waits
New assumption: underlying channel can also
“reasonable” lose of
amount
packets (data or ACKs) time for ACK
– checksum, seq. #, ACKs, retransmissions will be of help,
• Retransmits if no ACK
but not enough
received in this time
• If pkt (or ACK) just delayed
(not lost):
– Retransmission will be
duplicate, but use of seq.
#’s already handles this
– Receiver must specify seq
# of pkt being ACKed
• Requires countdown timer
Rdt3.0 Sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
start_timer
rdt_rcv(rcvpkt)
L
L Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_rcv(rcvpkt) &&
rdt_send(data) L
sndpkt = make_pkt(1, data, checksum)
( corrupt(rcvpkt) || udt_send(sndpkt)
isACK(rcvpkt,0) ) start_timer
L
Rdt3.0 in Action
Rdt3.0 in Action
Performance of Rdt3.0
• Rdt3.0 works, but performance stinks…
• ex: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
L 8000bits
d trans 9
8 microseconds
R 10 bps
U sender: utilization – fraction of time sender busy sending
U L/R .008
sender
= = = 0.00027
RTT + L / R 30.008 microsec
onds
• 1KB pkt every 30 msec -> 33kB/sec throughput over 1 Gbps link
• Network protocol limits use of physical resources!
U L/R .008
sender
= = = 0.00027
RTT + L / R 30.008 microsec
onds
Pipelined Protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged pkts
– Range of sequence numbers must be increased
– Need buffering at sender and/or receiver
Pipelining: Increased Utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
U 3*L/R .024
sender
= = = 0.0008
RTT + L / R 30.008 microsecon
ds
• Two generic forms of pipelined protocols:
go-Back-N, selective repeat
Pipelining Protocols
Go-back-N: overview Selective Repeat: overview
• sender: up to N • sender: up to N unACKed
unACKed pkts in packets in pipeline
pipeline • receiver: ACKs individual
• receiver: only sends pkts
cumulative ACKs • sender: maintains timer
– doesn’t ACK pkt if for each unACKed pkt
there’s a gap – if timer expires: retransmit
• sender: has timer for only unACKed packet
oldest unACKed pkt
– if timer expires:
retransmit all unACKed
packets
Go-Back-N
Sender:
• k-bit seq # in pkt header
• “window” of up to N, consecutive unACKed pkts allowed
http://media.pearsoncmg.com/aw/aw_kurose_network_4/applets/go-back-n/index.html
Selective Repeat
• Receiver individually acknowledges all correctly
received pkts
– Buffers pkts, as needed, for eventual in-order delivery
to upper layer
• Sender only resends pkts for which ACK not
received
– Sender timer for each unACKed pkt
• Sender window
– N consecutive seq #’s
– Again limits seq #s of sent, unACKed pkts
Selective Repeat: sender, receiver windows
Selective Repeat
sender receiver
pkt n in [rcvbase, rcvbase+N-1]
data from above : • send ACK(n)
• if next available seq # in window, •send pkt
out-of-order: buffer
timeout(n): • in-order: deliver (also deliver
• resend pkt n, restart timer buffered, in-order pkts),
advance window to next not-
ACK(n) in [sendbase,sendbase+N]: yet-received pkt
• mark pkt n as received pkt n in [rcvbase-N,rcvbase-1]
• if n smallest unACKed pkt, advance• window
ACK(n)base to next
unACKed seq #
otherwise:
• ignore
Selective Repeat in Action
• receiver sees no
difference in two
scenarios!
• incorrectly passes
duplicate data as new
in (a)
Q: what relationship
between seq # size
and window size?
SR Applet!
http://media.pearsoncmg.com/aw/aw_kurose_network_4/applets/SR/index.html
Chapter 3 outline
• 3.5 Connection-oriented
• 3.1 Transport-layer services transport: TCP
• 3.2 Multiplexing and demultiplexing
– segment structure
• 3.3 Connectionless transport: UDP data transfer
– reliable
• – flow
3.4 Principles of reliable data control
transfer
– connection management
• 3.6 Principles of
congestion control
• 3.7 TCP congestion
control
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
out-of-order segments
– A: TCP spec doesn’t
say up to simple telnet scenario time
implementer
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value?
• Longer than RTT
– but RTT varies
• Too short? premature
timeout
– unnecessary
retransmissions
• Too long? slow reaction
to segment loss
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? • SampleRTT: measured time from
• Longer than RTT segment transmission until ACK
– but RTT varies receipt
– ignore retransmissions
• Too short? premature
timeout • SampleRTT will vary, want
– unnecessary estimated RTT “smoother”
retransmissions – average several recent
• Too long? slow reaction measurements, not just
to segment loss current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
• Retransmissions
TCP creates rdt are
service
triggered
on top by:
of IP’s unreliable
service
– Timeout events
• – Duplicate
Pipelined ACKs
segments
• Initially
Cumulativeconsider
ACKs simplified TCP sender:
– Ignore duplicate ACKs
• TCP uses single retransmission timer
– Ignore flow control, congestion control
TCP Sender Events:
Data rcvd from app: Timeout:
• Create segment with • retransmit segment
seq # that caused timeout
• seq # is byte-stream • restart timer
number of first data ACK rcvd:
byte in segment • If acknowledges
• Start timer if not previously unACKed
already running (think segments
of timer as for oldest – update what is known to
unACKed segment) be ACKed
• Expiration interval: – start timer if there are
outstanding segments
TimeOutInterval
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
Seq=92 timeout
20 by
timeout
tes d
ata
CK =100
A 0
10
X CK
A AC
=
K =120
loss
Seq=9 Seq=9
2, 2, 8 byte
8 b y te
s data Sendbase s data
= 100
Seq=92 timeout
SendBase
= 120 =120
K
C K =100 AC
A
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
TCP Retransmission Scenarios (more)
Host A Host B
Seq=9
2, 8 b
yt e s d
ata
timeout
Seq=1 =100
00, A CK
20 byt
es dat
a
X
loss
SendBase =120
ACK
= 120
time
Cumulative ACK scenario
TCP ACK generation [RFC 1122, RFC 2581]
Fast Retransmit
seq # x1
seq # x2
seq # x3 ACK x1
seq # x4 X
seq # x5
ACK x1
ACK x1
ACK x1
triple
duplicate
resen
ACKs d seq X
2
timeout
time
Chapter 3 outline
• 3.5 Connection-oriented
• 3.1 Transport-layer services transport: TCP
• 3.2 Multiplexing and demultiplexing
– segment structure
• 3.3 Connectionless transport: UDP data transfer
– reliable
• – flow
3.4 Principles of reliable data control
transfer
– connection management
• 3.6 Principles of
congestion control
• 3.7 TCP congestion
control
TCP Flow Control
flow control
sender won’t overflow
• Receive side of TCP receiver’s buffer by
connection has a transmitting too much,
receive buffer: too fast
IP
(currently)
TCP data application • speed-matching
service: matching
unused buffer
datagrams space
(in buffer) process
IP
(currently)
TCP data application • Receiver: advertises
unused buffer space by
unused buffer
datagrams space
(in buffer) process
including rwnd value in
rwnd segment header
RcvBuffer
• sender: limits # of
(suppose TCP receiver unACKed bytes to rwnd
discards out-of-order – guarantees receiver’s
segments) buffer doesn’t overflow
• unused buffer space:
= rwnd
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Chapter 3 outline
• 3.5 Connection-oriented
• 3.1 Transport-layer services transport: TCP
• 3.2 Multiplexing and demultiplexing
– segment structure
• 3.3 Connectionless transport: UDP data transfer
– reliable
• – flow
3.4 Principles of reliable data control
transfer
– connection management
• 3.6 Principles of
congestion control
• 3.7 TCP congestion
control
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
• initialize TCP variables: – specifies initial seq #
– seq. #s – no data
– buffers, flow control Step 2: server host receives
info (e.g. RcvWindow) SYN, replies with SYNACK
• client: connection initiator segment
Socket clientSocket = new – server allocates buffers
Socket(“hostname”, port#); – specifies server initial seq.
• server: contacted by client #
Socket connectionSocket = Step 3: client receives SYNACK,
welcomeSocket.accept();
replies with ACK segment,
which may contain data
TCP Connection Management (cont.)
timed wait
FIN, replies with ACK.
Closes connection, sends
FIN.
closed
TCP Connection Management (cont.)
ACK
closed
TCP Connection Management (cont.)
TCP server
lifecycle
TCP client
lifecycle
infinite buffers
• No
retransmission
• Large delays
when congested
• Maximum
achievable
throughput
Causes/costs of Congestion: Scenario 2
• One router, finite buffers
• Sender retransmission of lost packet
R/3
lout
lout
lout
R/4
a. b. c.
“Costs” of congestion:
• More work (retrans) for given “goodput”
• unneeded retransmissions: link carries multiple copies of pkt
Causes/costs of Congestion: Scenario 3
• Four senders Q: what happens as l
• Multihop paths in
and l increase ?
• Timeout/retransmit in
Host A lout
lin : original data
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
Causes/costs of Congestion: Scenario 3
H l
o
s o
t
u
A
t
H
o
s
t
B
X TCP’s
“sawtooth”
behavior
time
RTT
– initial rate = 20 kbps
two segm
• available bandwidth may be >> ents
MSS/RTT
– desirable to quickly ramp up
to respectable rate four segm
ents
• increase rate exponentially
until first loss event or when
threshold reached
– double cwnd every RTT
time
– done by incrementing cwnd
by 1 for every ACK received
Transitioning into/out of slowstart
ssthresh: cwnd threshold maintained by TCP
• on loss event: set ssthresh to cwnd/2
– remember (half of) TCP rate when congestion last occurred
• when cwnd >= ssthresh: transition from slowstart to congestion
avoidance phase
duplicate ACK
dupACKcount++ new ACK
cwnd = cwnd+MSS
dupACKcount = 0
L transmit new segment(s),as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0
slow L congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
timeout
retransmit missing segment
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
TCP: Congestion Avoidance
• When cwnd > ssthresh AIMD
grow cwnd linearly
• ACKs: increase cwnd
– increase cwnd by 1 by 1 MSS per RTT:
MSS per RTT additive increase
– approach possible • loss: cut cwnd in half
congestion slower (non-timeout-detected
than in slowstart loss ): multiplicative
– implementation: cwnd decrease
= cwnd + AIMD: Additive Increase
MSS/cwnd for each Multiplicative Decrease
ACK received
TCP Congestion Control FSM: overview
TCP Reno
ssthresh
segments)
ssthresh
TCP Tahoe
Transmission
round
Summary: TCP Congestion Control
• ➜ L = 2·10-10 Wow
• New versions of TCP for high-speed
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
Why is TCP fair?
Two competing sessions:
• Additive increase gives slope of 1, as throughout increases
• multiplicative decrease decreases throughput proportionally
Connection 1 throughput R
Fairness (more)
Fairness and UDP Fairness and Parallel TCP
• Multimedia apps often Connections
do not use TCP • Nothing prevents app
– do not want rate from opening parallel
throttled by congestion connections between 2
control hosts.
• Instead use UDP: • Web browsers do this
– pump audio/video at • Example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
– new app asks for 1 TCP, gets
rate R/10
– new app asks for 11 TCPs,
gets R/2 !
Chapter 3: Summary
• Principles behind transport
layer services:
– multiplexing,
demultiplexing
– reliable data transfer
– flow control Next:
– congestion control • leaving the network
• Instantiation and “edge” (application,
implementation in the transport layers)
Internet • into the network
– UDP “core”
– TCP