Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

TCP TUNING

TCP is a reliable transport layer protocol that offers a full duplex connection byte stream
service. The bandwidth of TCP makes it appropriate for wide area IP networks where there is
a higher chance of packet loss or reordering. What really complicates TCP are the flow control
and congestion control mechanisms. These mechanisms often interfere with each other, so
proper tuning is critical for high-performance networks. Here we describe in detail how to tune
TCP, depending on the actual deployment.

TCP Tuning on Sender Side


TCP tuning on the sender side controls how much data is injected into the network and the
remote client end. There are several concurrent schemes that complicate tuning. So to better
understand, we will separate the various components and then describe how these mechanisms
work together. We will describe two phases: Startup and Steady State. Startup Phase TCP
tuning is concerned with how fast we can ramp up sending packets into the network. Steady
State Phase tuning is concerned about other facets of TCP communication such as tuning
timers, maximum window sizes, and so on.

Startup Phase
In startup phase, TCP sender starts to initially on a particular connection. One of the issues
with a new connection is that there is no information about the capabilities of the network pipe.
Start injecting packets blindly at a faster and faster rate until we understand the capabilities and
adjust accordingly. Manual TCP tuning is required to change macro behavior, such as when
having very slow pipes as in wireless or very fast pipes such as 10 Gbit/sec. Sending an initial
maximum burst has proven disastrous. It is better to slowly increase the rate at which traffic is
injected based on how well the traffic is absorbed.
During this phase, the congestion window is much smaller than the receive window. This
means the sender controls the traffic injected into the receiver by computing the congestion
window and capping the injected traffic amount by the size of the congestion window. Any
minor bursts can be absorbed by queues. there are three important TCP tunable parameters:
• tcp_slow_start_initial: sets up the initial congestion window just after the socket
connection is established.
• tcp_slow_start_after_idle: initializes the congestion window after a period of inactivity.
Since there is some knowledge now about the capabilities of the network, we can take a
shortcut to grow the congestion window and not start from zero, which takes an
unnecessarily conservative approach.
• tcp_cwnd_max: places a cap on the running maximum congestion window. If the receive
window grows, then tcp_cwnd_max grows to the receive window size.
In different types of networks, these values can be tuned slightly to impact the rate at which
you can ramp up. If you have a small network pipe, you want to reduce the packet flow,
whereas if you have a large pipe, you can fill it up faster and inject packets more aggressively.
Steady State Phase
after the connection has stabilized and completed the initial startup phase, the socket
connection reaches a phase that is fairly steady and tuning is limited to reducing delays due to
network and client congestion. An average condition must be used because there are always
some fluctuations in the network and client data that can be absorbed. Tuning TCP in this
phase, we look at the following network properties:
• Propagation Delay – This is primarily influenced by distance. This is the time it takes one
packet to traverse the network. In WANs, tuning is required to keep the pipe as full as
possible, increasing the allowable outstanding packets.
• Link Speed – This is the bandwidth of the network pipe. Tuning guidelines for link speeds
from 56kbit/sec dial-up connections differ from 10Gbit/sec optical local area networks
(LANs).

TCP Adjustment
TCP tuning techniques adjust the network congestion avoidance parameters of Transmission
Control Protocol (TCP) connections over high-bandwidth, high-latency networks.
Bandwidth-delay product (BDP) is a term primarily used in conjunction with TCP to refer to
the number of bytes necessary to fill a TCP "path", i.e. it is equal to the maximum number of
simultaneous bits in transit between the transmitter and the receiver.
High performance networks have very large BDPs. To give a practical example, two nodes
communicating over a geostationary satellite link with a round-trip delay time (or round-trip
time, RTT) of 0.5 seconds and a bandwidth of 10 Gbit/s can have up to 0.5×1010 bits, i.e., 5
Gbit = 625 MB of unacknowledged data in flight. Despite having much lower latencies than
satellite links, even terrestrial fiber links can have very high BDPs because their link capacity
is so large. Operating systems and protocols designed as recently as a few years ago when
networks were slower were tuned for BDPs of orders of magnitude smaller, with implications
for limited achievable performance.
Original TCP configurations supported TCP receive window size buffers of up to 65,535
(64 KiB - 1) bytes, which was adequate for slow links or links with small RTTs. Larger
buffers are required by the high performance options described below.
Buffering is used throughout high performance network systems to handle delays in the
system. In general, buffer size will need to be scaled proportionally to the amount of data "in
flight" at any time. For very high performance applications that are not sensitive to network
delays, it is possible to interpose large end to end buffering delays by putting in intermediate
data storage points in an end to end system, and then to use automated and scheduled non-
real-time data transfers to get the data to their final endpoints.

Window Size Adjustment


RWIN (TCP Receive Window) is the amount of data that a computer can accept without
acknowledging the sender. If the sender has not received acknowledgement for the
first packet it sent, it will stop and wait and if this wait exceeds a certain limit, it may
even retransmit. This is how TCP achieves reliable data transmission.
Even if there is no packet loss in the network, windowing can limit throughput. Because TCP
transmits data up to the window size before waiting for the acknowledgements, the full
bandwidth of the network may not always get used. The limitation caused by window size
can be calculated as follows:

where RTT is the round-trip time for the path.


The window advertised by the receive side of TCP corresponds to the amount of free receive
memory it has allocated for this connection. Otherwise it would risk dropping received packets
due to lack of space.
The sending side should also allocate the same amount of memory as the receive side for good
performance. That is because, even after data has been sent on the network, the sending side
must hold it in memory until it has been acknowledged as successfully received, just in case it
would have to be retransmitted. If the receiver is far away, acknowledgments will take a long
time to arrive. If the send memory is small, it can saturate and block emission. A simple
computation gives the same optimal send memory size as for the receive memory size given
above.
When packet loss occurs in the network, an additional limit is imposed on the connection. In
the case of light to moderate packet loss when the TCP rate is limited by the congestion
avoidance algorithm, the limit can be calculated according to the formula (Mathis, et al.):

where MSS is the maximum segment size and Ploss is the probability of packet loss. If packet
loss is so rare that the TCP window becomes regularly fully extended, this formula doesn't
apply.
Network Characteristics
As the Internet has progressed, user experience has always been the most important factor. The
new breadth of access technologies leads to a wide spread of network characteristics.
Nowadays, many network access has shifted from wired networks to 3G and 4G cellular
networks.

Modern network traffic is harder to control because packet loss does not necessarily mean
congestion in the networks, and congestion does not necessarily mean packet loss. As shown
in above table, 3G and 4G networks both exhibit different types of behavior based on their
characteristics, but a server may view the different aspects as congestion. This means that an
algorithm cannot only focus on packet loss or latency for determining congestion. Other
modern access technologies, such as fiber to the home (FttH) and WiFi, expand upon the
characteristics represented above by 3G and 4G, making congestion control even more
difficult. With different access technologies having such different characteristics, a variety of
congestion control algorithms has been developed in an attempt to accommodate the various
networks.

Packet-Loss Algorithms
Initial algorithms, such as TCP Reno, use packet loss to determine when to reduce the
congestion window, which influences the send rate. TCP Reno increases the send rate and
congestion window by 1 MSS (maximum segment size) until it perceives packet loss. Once
this occurs, TCP Reno slows down and cuts the window in half. However, as established in the
previous section, modern networks may have packet loss with no congestion, so this algorithm
is not as applicable.

Bandwidth-Estimation Algorithms
The next generation of algorithms is based on bandwidth estimation. These algorithms change
the transmission rate depending on the estimated bandwidth at the time of packet loss. TCP
Westwood and its successor, TCP Westwood+, are both bandwidth-estimating algorithms, and
have higher throughput and better fairness over wireless links when compared to TCP Reno.
However, these algorithms do not perform well with smaller buffers or quality of service (QoS)
policies.

Latency-Based Algorithms
The latest congestion control algorithms are latency-based, which means that they determine
how to change the send rate by analyzing changes in round-trip time (RTT). These algorithms
attempt to prevent congestion before it begins, thus minimizing queuing delay at the cost of
goodput (the amount of useful information transferred per second). An example of latency-
based algorithms is TCP Vegas. TCP Vegas is heavily dependent upon an accurate calculation
of a base RTT value, which is how it determines the transmission delay of the network when
buffers are empty. Using the base RTT, TCP Vegas then estimates the amount of buffering in
the network by comparing the base RTT to the current RTT. If the base RTT estimation is too
low, the network will not be optimally used; if it is too high, TCP Vegas may overload the
network. Also, as mentioned earlier, large latency values do not necessarily mean congestion
in some networks, such as 4G.
By knowing the traffic characteristics and keeping the current inadequate algorithms in mind,
service providers can implement an ideal TCP stack.

The Ideal TCP Stack


The ideal TCP stack should achieve one goal: optimizing a subscriber’s QoE. To accomplish
this, it must do three things: establish high goodput, minimize buffer bloat, and provide fairness
between the flows.
o High Goodput
High goodput is important for determining if the stack is optimized because it is a
measure of how much of the data going through the network is relevant to the client.
Goodput is different from throughput, which includes overhead such as unnecessary
retransmission and protocol headers. Goodput also addresses the difference between
content that was stalled or failed to complete versus content that the consumer was able
to utilize. To help with maximizing goodput, TCP needs to address packet loss from
interference as well as handle both small and large router buffers. Delay-based
algorithms fail when competing with other flows for bandwidth; bandwidth-based
algorithms fail when the buffers are too small or when quality of service policies are
present in the network; loss-based algorithms fail by incorrectly slowing down for
interference-based loss.
High goodput is achieved by maximizing the amount of data sent within a single packet
and optimizing how quickly data is sent. The proprietary hybrid loss and latency-based
algorithm, named TCP Woodside, is designed to maximize goodput while minimizing
buffer bloat. It controls buffer size by constantly monitoring network buffering, and
will slow down preemptively when needed—leading to a reduction in packet loss and
minimal buffer bloat. However, when the queuing delay is minimal, TCP Woodside
will rapidly accelerate to maximize the use of the available bandwidth, even when
interference-based packet loss is present.

Figure1: Comparison of real network tests between three carriers of TCP High Speed,
TCP Illinois, and TCP Woodside algorithms. TCP Woodside performs particularly well.
o Buffer Bloat
Buffer bloat occurs when too many packets are buffered, increasing queuing delay and
jitter in the network. Buffer bloat leads to performance issues by impacting interactive
and real-time applications. It also interferes with the RTT calculation and negatively
impacts retransmission behaviors. Thus, minimizing buffer bloat is ideal for an
optimized TCP stack. Loss-based algorithms fail to minimize buffer bloat because they
react after packets have been lost, which only happens once a buffer has been filled.
These algorithms fail to lower the send rate and allow the buffer to drain. Instead, the
algorithms choose rates that maintain the filled buffer.
Buffer bloat can be avoided by pacing the flow of data transmitted across the network. By
knowing the speed at which different flows are being sent, the stack can control how quickly
to send the packets through to the end device. This allows the buffers to adjust up without being
overfilled. As a result, inconsistent traffic behaviors and packet loss due to network congestion
are prevented.
o Flow Fairness
Fairness between flows ensures that no one user’s traffic dominates the network to the
detriment of other users. Delay-based algorithms fail to fulfill this criteria because loss-
based flows will fill all of the buffers. This leads to the delay-based flows backing off
and ultimately slowing down to a trickle.
Rate pacing not only help with buffer bloat, but it also improves the fairness across flows.
Without rate pacing, packets are sent immediately and consecutively. Having two flows at the
same time means one flow will see different network conditions than the other flow, usually
with respect to congestion. These conditions will affect the behavior of each flow.
Sometimes one flow has more bandwidth and sends more information. However, the next
second, another flow may gain that bandwidth and stop the flow of others.
Controlling the speed at which packets are sent on a connection allows gaps to occur between
packets on any individual flow. Instead of both flows attempting to send consecutive packets
that become intermixed, one flow will send a packet, and the second flow can then send another
packet within the time gap of the first flow. This behavior changes how the two flows see the
network as well. Rather than one flow seeing an open network and the other seeing a congested
network, both flows will likely recognize similar congestion conditions and be able to share the
bandwidth more efficiently.

You might also like