TCP Refresher - Part 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

TCP (No IP) Refresher

Part-1
Disclaimer
• This Presentation contains a mixture of self made slides and material
already out on the Internet.
• The goal of this presentation/talk is to provide TCP refresher.
• At the end you wont know everything about TCP.
• The goal here is to introduce you to the Mountains by climbing on top
of a hill. How to climb the Mountains is an exercise left for the
audience/reader .
• My hope is that you will get at least something new out of this but if
you already knew everything then at least you got the free Lunch .
Problem Statement
• So assume that we have to develop a protocol which is responsible for
data transfer over a medium that is lossy. This protocol needs to
provide mechanism to transfer data reliably.

• If an error occurs during the data transfer there are primarily two ways
it can be corrected:
• Error Correction Code (FEC): Shannon theory forms the base and mostly used
in low level protocols like Optical, Memory chips etc..
• Automatic Repeat Request (ARQ): Basically brute force method where you try
to send the data again. TCP uses this form to provide reliability.
Reliability and Error Recovery
• General ARQ Algorithms
• Stop & Wait
• Sliding Window Protocols
• Go-Back-N
• Selective Repeat

4
Error Recovery: Stop and Wait
• ARQ
– Receiver sends acknowledgement (ACK)
when it receives packet
– Sender waits for ACK and timeouts if it Sender Receiver
does not arrive within some time period
• Simplest ARQ protocol

Timeout
• Send a packet, stop and wait
until ACK arrives
Time
Recovering from Error

Timeout
Timeout

Timeout
Time
Timeout

Timeout

Timeout
ACK lost Packet lost Early timeout
DUPLICATE DUPLICATE
PACKETS!!! PACKETS!!!
Problems with Stop and Wait
• How do I recognize a duplicate packet ? (Generic Problem)
• Performance issues:
• Can only send one packet per round trip
• Question: How long the sender should wait for an ACK ??
• This turns out to be a hard problem to solve. Will look later.
How to Recognize Duplicates?
• Use sequence numbers
• both packets and acks
• Sequence # in packet is finite  How big
should it be?
• For stop and wait?
• One bit – won’t send seq #1 until
received ACK for seq #0
Got Reliability but what about efficiency?
• So we found a way to achieve reliability in our protocol but sending a
packet and then waiting for an ACK isn't very efficient.

• The throughput of our Stop and Wait protocol = “M/R”


• M is the size of the packet
• R is the RTT
As you can see the throughput is inversely proportional to the RTT.

• If a packet is lost, we have to send that data again which brings our
“GoodPut” further down..
Got Reliability but what about efficiency?
Sender Receiver
first packet bit
transmitted, t = 0

first packet bit


RTT arrives

ACK arrives, send


next packet, t =
RTT + L / R

• Can’t keep the pipe full: Utilization is low when bandwidth-delay product (R x RTT)is large!.
R is the link capacity and RTT is the round-trip
• What is BDP ?: It’s a measure of capacity of the network in bits. Or in other words, amount
of data on the network at any given time, i.e. data that has been transmitted but not yet
acknowledged.
So how can we make it efficient ?
• By letting sender sending more packets in the network at a time without waiting for
first to be acked.
• Number of pkts in flight = window
So how can we make it efficient ?
Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged data
segments
• range of sequence numbers must be increased
• buffering at sender and/or receiver
Now things are getting complicated, isn’t it ?
• Sender Side Complexity:
• Sender needs to decide when to inject packets into the network ?
• How many packets do I need to inject into the network ?
• Need to keep the timers when waiting for ACKs and keep a copy of the un-acked
packets in case retransmission is needed.
• Receiver Side Complexity:
• It needs to have a logic for ACK to distinguish which packets have been received
and which have not.
• Needs a buffer which allows to hold “out-of-sequence” packets (due to re-
ordering), unless it wants to throw those packets which will be very inefficient
(remember our goal ?, that efficiency thing we were talking about)
• Sender and Receiver speed mismatch (Flow control)
• Network can be overwhelmed by the sender (Congestion control)
Introducing Windows of Packet(Sliding Windows)
• Define a Window of packets that have been injected but not yet
ACK’d. (window size is the number of packets unack’d)
• We will slide the window forward as the packets get acknowledged
(Sliding Window).
Introducing Windows of Packet(Sliding Windows)
• Sliding window will be kept as a data structure on both Sender and
Receiver.
• It allows the sender to keep track on what packets can be released,
packets waiting for ACKs, and packets which cannot yet be sent.
• It allows the receiver to keep track of packets already received and
ACK’d.
• This structure looks good and promising but where is the guideline
for:
• How large the window should be ?
• What if the receiver can not handle the sender data rate ?
We also need to worry about Flow Control
• Flow Control is the problem which arises when there is a mismatch
between Sender and Receiver Data rate.
• Window based flow control is used to solve the flow control problem.
• In this approach Window size isn’t fixed and varies over time.
• In order to make this work a Receiver needs to signal back the Sender
on how large a Window to use. This is known as Window Update.
• Technically an ACK sent by a receiver is different then Window Update
but in practice it piggy backs on the ACK packets.
• As we can see that this approach clearly allows the sender to control
the number of packets which can be injected by controlling the
window size.
Flow Control
• Assume that the sender is allowed to inject “W” packets into the
network before it hears an ACK for any of them.
• Assume both Sender and Receiver are sufficiently fast and the
network has no Loss with infinite capacity.
• Then the Throughput = (SW/R) where W is the window size, S is the
packet size in bits, and R is the RTT.
• By controlling the size of “W” we can control the sender throughput.
• Now time to throw another monkey wrench:
• What about the network carrying traffic ? It is possible for the Sender to inject
packets which can exceed routers ability causing packet loss.
• This problem is called Congestion Control
Congestion Control
• This problem involves the sender slowing down as to not overwhelm
the network between Sender and Receiver

• How can we solve this problem ?


• Two Approaches:
• We can use explicit signaling similar to Flow Control problem (Where Receiver
advertised the window size its expecting to the source)
• We can use implicit signaling where the Sender guesses that it needs to slow
down based on some indication. (popular research area)
Quick Recap on what we have so far..
• Sequence Nos. for Data and ACKs: Solve the problem of Identifying
duplicates
• Pipelining: brings efficiency by sending more than one packet at a time.
• Sender and Receiver keeps Window Size as Data structure: Helps in
keeping track of how many packets in flight, ACK’d, Next packets which
need to be sent.
• Receiver tells his Window to the Sender: Helps in achieving Flow control.
• Some form of implicit and explicit signaling to handle congestion control
(we haven’t gone into the specifics yet)
Introducing TCP
TCP Header
TCP Connection Management
Sequence Numbers
• So How large does size of sequence number space need to be?
• Must be able to detect wrap-around
• Depends on sender/receiver window size
• E.g.
• size of seq. no. space = 8, send win=recv win=7
• If pkts 0 to 6 are sent successfully and all acks lost
• Receiver expects 7,0 to 5, sender retransmits old 0 to 6!!!
• size of sequence no. space must be  send window + recv window
Sequence Numbers in TCP
• TCP regards data as a “byte-stream”
• each byte in byte stream is numbered.
• 32 bit value, wraps around (around 4GB)
• initial values selected at start up time
• TCP breaks up byte stream in packets.
• Packet size is limited to the Maximum Segment Size (MSS)
• Each packet has a sequence number
• seq. no of 1st byte indicates where it fits in the byte stream
TCP Options: MSS
• MSS is the largest segment that a TCP receiver is willing to receive
from its peer and, as a result, the largest size its peer should ever use
when sending

• It only counts TCP data bytes

• MSS option is carried within the SYN segment. Default is 536 bytes.
Typical value we see is 1460 bytes
TCP Options: Window Scale
• Without Window Scaling, max amount of data which can be sent is
216 = 65535 bytes (65KB).
• WS option effectively increases the capacity of the TCP window
advertisement field from 16 to 30 bits.
• Window Scale option applies a scaling factor the 16 bit Window
advertisement value.
• A shift count of 0 indicates no scaling and maximum scale value is 14
which provides (65535 x 214 ), effectively 1GB.
• This option can only appear in SYN segments, so the scale factor is
fixed in each direction when the connection is established.
• Sample PCAP
TCP Options: Timestamp Options
• Timestamp option allows the sender place two 4-byte timestamp
values in every segment.
• Receiver reflects these values in the ACK, allowing the sender to
calculate an estimate of the connection’s RTT for each ACK received.
• Sender puts a 32 bit value field call TSV or Tsval and the receiver
echoes this back unchanged in the second Timestamp Echo Retry
(TSER)
• This option serves two purpose:
• Better estimation of RTT which goes as a factor for RTO. (We will later dig
deeper)
• It also helps in Protection against Wrapped Sequence Numbers (PAWS)
• Sample PCAP
TCP Timeout (RTO) and Retransmission
• One problem we mentioned earlier was that how long a Sender
should wait before concluding that the packet is lost (Sent packet or
ACK)
• A Time out needs to be at least somewhere around = Time to send
the packet to Receiver+ Time for the ACK to travel back
• Problem is that both factors aren’t constant and can vary in the
network and we cant rely on the users to tell for every circumstances.
• The Protocol tries to (guess-)estimate by itself and is known as Round
Trip Estimation.
• It’s a statistical process and is close to sample mean of a collection of
samples of RTTs. (Note: The avg. changes over time as the network
condition changes)
TCP Round Trip Time and Timeout
Q: Challenges with setting TCP timeout Q: how to estimate RTT?
value? • SampleRTT: measured time from
• It has to be longer than RTT segment transmission until ACK receipt
• but RTT varies so it must adapt • ignore retransmissions, why?
• Importance of accurate RTT estimators: • SampleRTT will vary, want estimated
RTT “smoother”
• If its set too short:
• premature timeout which will cause unnecessary • average several recent
retransmissions measurements, not just current
• If its set too long: SampleRTT
• slow reaction to segment loss
Adaptive Retransmission(Original Algorithm)

• Measure SampleRTT for each segment/ACK pair


• Compute weighted running average of RTT
• EstRTT = a x EstimatedRTT + (1-a) x SampleRTT
 a between 0.8 and 0.9 ( to smooth Estimated RTT)
- Small a indicates temp. fluctuation, a large value more stable, may not be quick
to adapt to real changes
- with recommended value of a, 80% to 90% of each new estimate is from
previous estimate and 10 to 20% is from new measurement.
• Set timeout based on EstRTT
• TimeOut = 2 x EstRTT
Example RTT Estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT


Problem: Retransmission Ambiguity
Sender Receiver Sender Receiver

SampleR TT

SampleR TT
• ACK is for Original transmission but was for retransmission => Sample RTT is too large
• ACK is for retransmission but was for original => Sample RTT too small
Karn/Partridge Algorithm(Simple Proposal)

• Do not sample RTT when retransmitting


• only measures sample RTT for segments sent once

• Karn and Patridge proposal is exponential backoff


• Congestion is most likely cause of lost segments
• TCP sources should not react too aggressively to a timeout
• More timeouts mean more cautious the source should become (congestion problem)
Jacobson/Karels Algorithm
• Original computation for RTT did not take the variance of sample RTTs into account
• The original formula couldn't keep up with wide fluctuations in the RTT.
• If variation among samples is small, Estimated RTT can be better used without increasing the estimate
twice
• A large variance in the samples mean Time out values should not be too tightly coupled to the Estimated
RTT
• Jacobson’s new Calculations for average RTT kept track of both variability in
samples + avg.
• Using Standard Deviation was considered expensive operation (due to Sq. Root) so
Mean deviation was chosen though many say it’s baloney…
• Further Reading: https://www.leeds.ac.uk/educol/documents/00003759.htm
Jacobson/Karels Algorithm
• Diff = SampleRTT - EstRTT
• EstRTT = EstRTT + (  x Diff)
• Dev = Dev +  ( |Diff| - Dev)
• where  is a fraction between 0 and 1
• TimeOut = EstRTT + 4 x Dev  Empirical Values
Timestamp Extension for RTT estimate
• This improves the timeout mechanism by providing accurate
measurement of RTT
• When sending a packet, insert current time into option
• 4 bytes for time, 4 bytes for echo a received timestamp
• Receiver echoes timestamp in ACK
• Actually will echo whatever is in timestamp
• Removes retransmission ambiguity
• Can get RTT sample on any packet
Linux RTO
• Linux RTO mechanism includes both Standard RTT and Timestamp’s
to increase the RTO accuracy.
• One basic problem with standard RTT is that if there is a huge RTT
drop , RTO will increase which is counterintuitive. Linux takes care of
that.
So far what we have learned
• Little bit about TCP and the problems it tries to solve
• TCP header, Few TCP Options
• TCP timeout calculation  how is RTT estimated.
Fast Retransmit
• So before Fast Retransmit, TCP detected packet loss solely by using “timeout” which means it’s
taking a long time to reach on lost packets(what ever the RTO is)

• Sender sends a packet, TCP sets up the timer (Standard Algo or Timestamps) and if the ACK isn't
received within that time then it triggers a retransmit of the packet.

• Retransmission Timeout are considered really bad within TCP world (Will look later on why, for now
just take the word)

• That’s why “Fast Retransmit” was introduced within TCP.

• The basic idea is that a receiver get’s 3 DUP ACK’s then send the packet immediately and not wait
for the RTO which allows to react faster on lost packets.

• We will still have Retransmission Timeout (RTO) but it will act as a back up mechanism.
Fast Retransmit (Ex:1)
Ignore cwnd at this moment. We will look
into that during congestion avoidance
Assumption:
• Only one packet in a window is dropped. In this
diagram, it is Pkt1.
• TCP at the sending side implements fast
retransmit algorithm.
• The retransmitted packet is completely received
at the receiver.
All packets in transit are not enough to
trigger fast retransmit.
Fast Retransmit (Ex:2)
Assumption:
• Only one packet in a window is
dropped. In this diagram, it is Pkt5.
• TCP at the sending side implements
fast retransmit algorithm.
• The retransmitted packet is
completely received at the receiver
In the example, packets in transit are
enough to trigger fast retransmit.
Therefore, after receiving the third
duplicate ACK5, TCP at the sending
side retransmits Pkt5
Fast Retransmit(Ex:3)
Assumption:
• Two packets in a window are
dropped. In this diagram, they
are Pkt7 and Pkt9.
• TCP at the sending side
implements fast retransmit
algorithm.
• The retransmitted packet is
completely received at the
receiver.
Retransmission of Pkt9 happens
because Pkt9's RTO is expired - not fast
retransmit. Why? As shown that there
are no enough packets in transit for
triggering the 2nd fast retransmit.
Fast Retransmit - Recap
• So we saw how Fast Retransmit helps in recovering quickly in the case of a single
packet loss but not in the case of multiple packet loss.
Lets talks about Flow Control
TCP Traffic Control
• Traffic control
• There are two reasons for sender to reduce the rate of sending packets.
• When receiver’s buffer space is not enough,  flow control
• When the network is congested,  congestion control

network
congestion

Small-capacity
receiver
Flow Control
• The idea here is to not let the Sender overwhelm with more data then
what a receiver can handle.
• The way this can be achieved is by Receiver telling the Sender whats its
buffer size (Capability to receive is) which gives the sender an idea how
much it can send.
• This is achieved by maintaining a Window Structure on Sender and
Receiver side.
Sliding Window Flow Control
Segments sent, but
Segments that can be sent
not acknowledged

0 1 2 3 0 1 2 3

Window is shrinking Window expands


as the segments are sent as the acks are received

(a) sender’s window

Segments that were received Segments that will be received

0 1 2 3 0 1 2 3
Receiver tells the Sender its
window size which is
The last segment Window is shrinking Window expands
tracked at the Sender side
That was acked as the segments are received as acks are sent
as a variable “awnd”
(b) receiver’s window (Advertised Window)
Window Flow Control
RTT

Source 1 2 W 1 2 W
time

data ACKs

Destination 1 2 W 1 2 W
time

• ~ W packets per RTT


• Lost packet detected by missing ACK
Source Rate (Throughput)
• Limit the number of packets in the network to window W (which is
primarily dominated by “awnd”)

W  MSS
• Source rate = bps
RTT
• If W too small then rate « capacity
If W too big then rate > capacity => congestion
• This doesn't take packet loss into account, Matthew Mathis formula
fixes it.
TCP Persist Timer
• Background
• When the TCP receiver advertises window = 0, the TCP sender stops sending
temporarily. Afterwards, the receiver lets the sender know it can receive
segments again by sending new window advertisement. But if this new
window advertisement is lost (since this doesn't contain any data, its not
delivered reliably), the sender will wait for the new advertisement forever.
(Deadlock!!)
• Solution
• After the sender knows window=0, the sender transmits window probe
segment periodically to check out if the receiver is ready to accept. The
window probe is sent according to the persist timer.
• Window probe is a segment of 1 byte length.
• TCP allows sender to transmit one byte even if the receiver’s window is closed.
• TCP persist timer is increasing exponentially.
TCP Persist Timer
win=0

window probe
win=0
ACK(win=0)
win=256 window probe
lost
ACK(win=0)

Deadlock window probe

ACK(win=0)

Persist Timer
(normal TCP Exponential backoff)
Delayed Acknowledgement
Delayed acknowledgments for reducing number of segments: The receiver does not send Acks immediately after the
receipt of an (error-free) packet but waits up to ~200ms/500ms if a packet is in the send buffer (depends on host OS). If so
it, piggybacks the ACK onto the transmit packet; if no transmit packet is available the receiver sends an Ack latest after
~200ms/500ms
Questions ?

You might also like