Multiple TCP Connections Improve HTTP Throughput Myth or Fact?

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1

Multiple TCP Connections Improve HTTP


Throughput − Myth or Fact?
Preethi Natarajan1, Fred Baker2, and Paul D. Amer1

1
CIS Dept., University of Delaware, {nataraja, amer}@cis.udel.edu

2
Cisco Fellow, Cisco Systems Inc., fred@cisco.com

Abstract— Our earlier research showed that HTTP over TCP connections have remained an attractive option to improve
suffers from aggravated head-of-line (HOL) blocking in web response times.
browsing conditions found in the developing world [13]. We A multistreamed transport is another approach to alleviate
proposed a multistreamed transport such as SCTP to alleviate
HOL blocking [21, 22, 9]. A transport stream is a logical data
HOL blocking [21]. Using emulations we showed that a single
multistreamed SCTP association reduced HOL blocking and flow with its own sequencing space. Within each stream, the
improved response times when compared to a single TCP transport receiver delivers data in-sequence to the application,
connection. In this work we evaluate HTTP performance over regardless of data ordering on other streams. Transferring
multiple TCP connections vs. a single multistreamed SCTP independent HTTP responses over different streams of a
association. While we expect multiple TCP connections to multistreamed transport such as SCTP [RFC4960] eliminates
improve HTTP throughput, emulation results show otherwise.
We find that the competing and bursty nature of multiple TCP
HOL blocking. We designed and implemented HTTP over
senders degrades HTTP performance in low bandwidth last SCTP in the Apache server and Firefox browser [21].
hops. A single multistreamed SCTP association not only Emulation results confirmed that persistent and pipelined
eliminates HOL blocking, but also boosts throughput compared HTTP 1.1 transfers over a single multistreamed SCTP
to multiple TCP connections. association (SCTP’s term for a transport connection) suffered
less HOL blocking and experienced better response times
Index Terms—Transport protocols, Developing regions,
HTTP Performance, Multiple TCP connections
than transfers over a single TCP connection. SCTP’s
improvements were visually perceivable under browsing
I. INTRODUCTION conditions found in the developing world where HTTP/TCP
suffers from aggravated HOL blocking [13, 23].
A large, growing portion of WWW users in the developing
This work compares HTTP performance over multiple TCP
world experience high Internet delays and considerable loss
connections vs. a single multistreamed SCTP association.
rates [6, 15, 4]. Persistent and pipelined HTTP 1.1 transfers
Similar to [13], investigations here focus on browsing
over a single TCP bytestream suffer from significant head-of-
conditions in the developing world. However, unlike [13]
line (HOL) blocking under such browsing conditions, such
which considered a high bandwidth last hop (~1Mbps), this
that, a lost HTTP response delays the delivery and display of
work considers lower bandwidth last hops, representing the
subsequent HTTP responses at the web browser [21].
popular last-mile technologies in developing regions such as
Aggravated HOL blocking worsens web response times [13].
dialup, ADSL, and shared VSAT links (64Kbps-256Kbps)
The current workaround to improve an end user’s
[7].
perceived WWW performance is to download an HTTP
The emulation results expose interesting dynamics between
transfer over multiple TCP connections. An application
multiple TCP senders and HTTP performance. Contradictory
employing multiple TCP senders exhibits an aggressive
to expectations, the competing and bursty nature of multiple
sending rate, and consumes a higher share of the bottleneck
TCP senders adversely impacts HTTP throughput in low
bandwidth than an application using fewer or single TCP
bandwidth bottlenecks. Note that each TCP connection
connection(s) [12, 5]. Therefore, multiple TCP connections
increases the processing and resource overhead at the web
not only reduce HOL blocking, but are also expected to
server/proxy. Therefore, multiple TCPs not only degrade the
improve HTTP throughput. As a result, multiple TCP
end user’s perceived WWW performance, but also waste
valuable resources on the server side. Secondly, a single
Research funded by Cisco Systems Inc.
Prepared through collaborative participation in the Communication and multistreamed SCTP association outperforms multiple TCPs,
Networks Consortium sponsored by the US Army Research Lab under and is a less resource intensive solution to alleviate HOL
Collaborative Tech Alliance Program, Coop Agreement DAAD19-01-2-0011.
The US Gov’t is authorized to reproduce and distribute reprints for Gov’t
blocking than multiple TCPs [21].
purposes notwithstanding any copyright notation thereon. The rest of the paper is organized as follows. Section II
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2

discusses required preliminaries regarding the nature of web following initial investigations, we decided to employ a
workloads, and the emulated HTTP 1.1 transaction model. custom built HTTP 1.1 client instead of Firefox, due to the
Sections III and IV describe the experiment setup and results, following reason.
respectively. Section V concludes and proposes future work. 1) HTTP 1.1 Transactions in Firefox
RFC2616 recommends a maximum of 2 transport
II. PRELIMINARIES connections to the same server/proxy. In Firefox, this number
can be easily increased via user configuration. Firefox parses
A. Multiple TCPs and Application Throughput an URL, sets up the first transport connection to the
In congestion-controlled transports such as TCP and SCTP, appropriate web server, and retrieves index.html. After
the amount of outstanding (unacknowledged) data is limited parsing index.html, Firefox opens the remaining
by the data sender’s congestion window (cwnd). Immediately connection(s) to the server, and pipelines further requests
after connection establishment, the sender can transmit up to across all connection(s) in a round-robin fashion.
initial cwnd bytes of application data [RFC3390, RFC4960]. Initial investigations revealed that Firefox delays
Until congestion detection, both TCP and SCTP employ the pipelining requests on a new transport connection.
slow start algorithm that doubles the cwnd every RTT. Specifically, the first HTTP transaction on a transport
Consequently, the higher the initial cwnd, the faster the cwnd connection is always non-pipelined. After the successful
growth and more data gets transmitted every RTT. When an receipt of the first response, subsequent requests on the same
application employs N TCP connections, during the slow start transport connection are then pipelined. We believe this
phase, the connections’ aggregate initial cwnd and their cwnd behavior could be Firefox’s means of verifying whether a
growth increases N-fold. Therefore, until congestion server supports persistent connections [RFC2616 Section 8].
detection, an application employing N TCP connections can, However, this precautionary behavior increases the per
in theory, experience up to N times more throughput than an connection transfer time by at least 1 RTT, and packet losses
application using a single TCP connection. during the first HTTP transaction further increase the transfer
When a TCP or SCTP sender detects packet loss, the time. Clearly, this behavior is detrimental to HTTP
sender halves the cwnd, and enters the congestion avoidance throughput over multiple TCP connections. Also, this
phase [11, RFC4960]. If an application employing N TCP behavior interferes in the dynamics we are interested in
connections experiences congestion on the transmission path, investigating – interaction between multiple TCP connections
not all of the connections may suffer loss. If M of the N open and HTTP performance. Therefore, we developed the
TCP connections suffer loss, the multiplicative decrease following simple HTTP 1.1 client, which better models the
factor for the connection aggregate is (1 - M/2N) [5]. If this general behavior of HTTP 1.1 over multiple transport
decrease factor is greater than one-half (which is the case connections.
unless all N connections experience loss, i.e., M<N), the 2) In-house HTTP 1.1 Client
connections’ aggregate cwnd and throughput increase after The in-house client reproduces most of Firefox’s
congestion detection is more than N times that of a single transaction model, except that this client immediately starts
TCP connection. pipelining on a new transport connection. The client employs
On the whole, the aggressive sending rate of parallel either TCP or SCTP for the HTTP transfer. While one or
transport connections provides an application with a higher more TCP connections are utilized for the HTTP 1.1 transfer,
share of the bottleneck bandwidth than an application using the complete page is downloaded using a single
fewer or single connection(s). Multiple TCP connections’ multistreamed SCTP association such that, each pipelined
aggressive sending behavior has been shown to increase transaction is retrieved on a different SCTP stream. Detailed
throughput for various applications so far. Reference [18] information on the design of HTTP over SCTP streams can be
employs multiple TCP connections to maintain the data found in [21]. Additionally, the client mimics all of Firefox’s
streaming rate in multimedia applications. Reference [17] interactions with the transport layer such as non-blocking
proposes the PSockets library, which employs parallel TCP reads/writes, and disabling the Nagle algorithm [RFC896].
connections to increase throughput for data intensive The following algorithm describes the client in detail:
computing applications. Likewise, we expect multiple TCP 1. Setup a TCP or SCTP socket.
connections to improve HTTP throughput. 2. If SCTP, set appropriate data structures to request for
specific number of input and output streams during
B. HTTP 1.1 Transaction Model association establishment.
Similar to [13], this work uses real implementations to 3. Connect to the server.
evaluate persistent and pipelined HTTP 1.1 transfers over 4. Timestamp “Page Download Start Time”.
multiple TCP connections vs. a single multistreamed SCTP 5. Request for index.html.
association. The original plan was to use the Apache web 6. Receive and process index.html.
server and the Firefox browser for the evaluations. But, 7. Make the socket non-blocking, and disable Nagle.
8. While there are more transport connections to be opened:
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3

8.1. Setup socket (non-blocking, disable Nagle). following more limited last-mile bandwidths found in
8.2. Connect to the server. developing regions [7]: 64Kbps, 128Kbps, and 256Kbps.
9. While the complete page has not been downloaded: Also, varying end-to-end propagation delays are considered
9.1. Poll for read, write, or error events on socket(s). [15]:
9.2. Transmit pending requests on TCP connections or 1. 200ms RTT: User in East Asia, accessing a web server in
SCTP streams in a round-robin fashion. North America over a land line.
9.3. Read response(s) from readable socket(s). 2. 350ms RTT: User in South Asia, accessing a web server
10. Timestamp “Page Download End Time”.
in North America over a land line.
C. Nature of Web Workloads 3. 650ms RTT: User accessing a web server over a shared
Several web characterization studies have identified certain VSAT link.
key properties of the WWW. These properties have led to a Web pages of online services such as Google’s image
sound understanding of WWW’s nature, and the design of search (image.google.com) or photo albums (flickr.com)
better algorithms for improved WWW performance. consist of ~8-20 embedded objects. Based on these trends, the
Using server logs from six different web sites, Arlitt et. al. sample web page used in the emulations comprises an
identified several key web server workload attributes that index.html with 10 embedded objects. Also, we consider the
were common across all six servers [3]. Their work also simple case where all embedded objects are of same size –
predicted that these attributes would most likely persist over 5KB (Section IIC). Section IVC discusses the impact of
time. Of these attributes, the following are most relevant to varying object sizes.
our study: (i) both file size and transferred file size FreeBSD TCP implements RFC3390 and the default initial
distributions are heavy-tailed (Pareto), and (ii) the median cwnd is 4MSS [8]. The recommended initial cwnd in SCTP is
transferred file size is small (≤5KB). A similar study 4MSS as well. FreeBSD TCP implements packet counting,
conducted several years later confirmed that the above two while SCTP implements Appropriate Byte Counting (ABC)
attributes remained unchanged over time [19]. Also [19] with L=1 [RFC4960, RFC3465]. Additionally, FreeBSD TCP
found that the mean transferred file size had slightly implements the Limited Transmit algorithm [RFC3042],
increased over the years, due to an increase in the size of a which enhances loss recoveries for flows with small cwnds.
few large files. Other studies such as [10, 20] agree on [3]’s Both transports implement SACKs and delayed acks.
findings regarding transferred file size distribution and The FreeBSD TCP implementation tracks numerous sender
median transferred file size. and receiver related statistics such as the number of timeout
These measurement studies have led to a general consensus recoveries, fast retransmits etc. After each TCP run, some of
that unlike bulk file or multimedia transfers, HTTP transfers these statistics were gathered either directly from the TCP
are short-lived flows, such that, a typical web object can be stack or using the netstat utility.
transferred in a few RTTs. In fact, RFC3390 proposes an
optional increase of TCP’s initial cwnd from 1 or 2 segments
to 4MSS so that the short HTTP flows can finish sooner.

III. EXPERIMENT SETUP


The emulations were performed on the FreeBSD platform
which had the kernel-space reference SCTP implementation.
The experimental setup, shown in Figure 1 uses three nodes
running FreeBSD 6.1: (i) a node running the in-house TCP or
SCTP HTTP 1.1 client, (ii) a server running Apache and (iii)
a node running Dummynet [16] connecting the server and
client. Dummynet’s traffic shaper configures a full-duplex
link between client and server, with a queue size of 50 Figure 1: Experiment Setup
packets in each direction. Both forward and reverse paths
experience Bernoulli losses, and the loss rates vary from 0%- IV. RESULTS
10% ─ typical of the end-to-end loss rates observed in The HTTP page download time is measured as “Page
developing regions [6, 15]. We are currently investigating the Download End Time” – “Page Download Start Time”
(complex) emulation setup of a more realistic loss model per (Section IIB2). Figure 2 shows the HTTP page download
recommendations in [2]. times over a single multistreamed SCTP association (a.k.a.
Our previous evaluations in [13] considered a 1Mbps last SCTP) vs. N TCP connections (N=1, 2, 4, 6, 8, 10; a.k.a. N-
hop, deemed to be a costly and high end option for an average TCP). Note that each embedded object is transmitted on a
user in the developing world. This work considers the different TCP connection in 10-TCP, and employing more
TCP connections is unnecessary. The values in Figure 2 are
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4

averaged over 40 runs (up to 60 runs for 10% loss case), and 350ms and 650ms RTTs (Figures 8, 10 in Appendix).
plotted with 95% confidence intervals. To summarize, HTTP throughput improvements over low
bandwidth last hops is limited by the available bandwidth. As
A. During No Congestion
bandwidth decreases, fewer TCP senders are required to fully
Evaluations with 0% loss (Figure 2) help understand the utilize the available bandwidth.
behavior of multiple TCPs during congestion. As mentioned 2) Queuing Delay at the Bottleneck
earlier, the initial cwnds of both TCP and SCTP are similar ─ Figure 3 shows the mean number of timeout expirations on
4MSS. Since there are no losses, both transports employ slow data at the server for all 4 bandwidth scenarios. Note that the
start during the entire page download. This equivalent values plotted are the mean timeouts per HTTP transfer.
behavior results in similar throughputs between SCTP and 1- When N>1 TCP senders are employed for the HTTP transfer,
TCP in 64Kbps and 128Kbps bandwidths. In [13], we the plotted values denote the sum of timeouts across all N
discovered that a FreeBSD 6.1 TCP receiver transmits extra senders. We first focus on the values at 0% loss. Surprisingly,
acks in the form of window updates, which causes the packet- except 1Mbps, some TCP sender(s) in the other bandwidth
counting TCP sender to grow its cwnd more aggressively than scenarios do undergo timeout recoveries. Since no packets
SCTP. As the available bandwidth increases (256Kbps, were lost, these timeouts are spurious, due to the following.
1Mbps), this difference in cwnd growth facilitates 1-TCP to During connection establishment, a FreeBSD TCP sender
slightly outperform SCTP. estimates RTT and calculates the retransmission timeout
Recall from Section IIA that N-TCP’s aggressive sending value (RTO) [8, RFC2988]. For a 200ms RTT, the calculated
rate can increase an application’s throughput by up to N RTO equals the recommended minimum of 1 second
times during slow start. Therefore, as the number of TCP [RFC2988]. Connection establishment is soon followed by
senders increase, we expected multiple TCPs to outperform data transfer from the server. Lower bandwidth translates to
both 1-TCP and SCTP. Surprisingly, the results indicate that higher transmission and queuing delays. In a 64Kbps pipe,
multiple TCPs perform similar to 1-TCP at 1Mbps and the transmission of one 1500byte PDU takes ~186ms, and a
256Kbps bandwidths. As bandwidth decreases, multiple TCPs queue of ~5 such PDUs gradually increases the queuing delay
perform similar or worse (!) than both 1-TCP and SCTP. and the RTT to more than 1 second. When outstanding data
Further investigations revealed the following reasons. remains unacknowledged for more than the 1 second RTO,
1) Throughput Limited by Bottleneck Bandwidth the TCP sender(s) (wrongly) assume data loss, and spuriously
Low bandwidth pipes can transmit only a few packets per timeout and retransmit unacknowledged data.
second. For example, a 64Kbps bottleneck cannot transmit As the number of TCP senders increase, more packets
more than ~5.3 1500byte PDUs per second or roughly 1 PDU arrive at the bottleneck, and the increased queuing delay
per 200ms RTT. A single TCP sender’s initial cwnd allows triggers spurious timeouts at higher number of TCP senders.
the server to transmit 4MSS bytes of pipelined responses Note that the 1Mbps transfers do not suffer from spurious
back-to-back, causing a low bandwidth pipe (64Kbps, timeouts. As the bottleneck bandwidth decreases, queuing
128Kbps, and 256Kbps) to be fully utilized during the entire delay increases. Therefore HTTP transfers over smaller
RTT. More data transmitted during this RTT cannot be bandwidths experience more spurious timeouts.
forwarded, and gets queued at the bottleneck router. A spurious timeout is followed by unnecessary
Therefore, data transmitted by N≥2 TCP senders do not retransmissions and cwnd reduction. If the TCP sender has
contribute to reducing page download times, and N-TCPs more data pending transmission, spurious timeouts delay new
perform similar to 1-TCP in 64Kbps (N=10), 128Kbps (N=8, data transmission, and increase page download times (N=2,
10), and 256Kbps (N>2) bandwidths. The 1Mbps bottleneck 4, 6, 8 TCP in 64Kbps, and N=4, 6 TCP in 128Kbps). As the
is completely utilized by the initial cwnd of N=4 TCP senders number of TCP connections increase, fewer HTTP responses
(~16 1500byte PDUs per RTT). Therefore, 2≤N≤4 TCP are transmitted per connection. For example, each HTTP
senders slightly improve page download times when response is transmitted on a different connection in 10-TCP.
compared to 1-TCP and N>4 TCP senders do not further Though the number of spurious timeouts (and unnecessary
reduce page download times. retransmissions) is highest in 10-TCP, the TCP receiver
As the propagation delay and RTT increase, the bottleneck delivers the first copy of data to the HTTP client, and discards
router forwards more packets per RTT. For example, the the spuriously retransmitted copies. Therefore, 10-TCP’s page
1Mbps pipe can transmit ~53 PDUs per RTT in the 650ms download times are unaffected by the spurious timeouts.
scenario vs. ~16 PDUs per RTT in the 200ms scenario. Nonetheless, spurious timeouts cause wasteful
Consequently, more TCP senders help fully utilize the 1Mbps retransmissions that compete with other flows for the already
pipe at 650ms RTT, and N-TCPs decrease page download scarce available bandwidth.
times (Figure 10d in Appendix). However, similar to the
200ms RTT scenario, lower bandwidths limit HTTP
throughput and N-TCPs perform similar to 1-TCP in the
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5

(a): 64Kbps.200ms (a): 64Kbps.200ms

(b): 128Kbps.200ms (b): 128Kbps.200ms

(c): 256Kbps.200ms (c): 256Kbps.200ms

(d): 1Mbps.200ms) (d): 1Mbps.200ms


Figure 2: HTTP Throughput (5K Objects; 4MSS Initial Cwnd) Figure 3: RTO Expirations at Server (5K Objects; 4MSS Initial Cwnd)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6

As the propagation delay increases, the RTO calculated represents retransmissions from the server to client, and does
during connection establishment is increased (> 1 second). not include retransmissions after timeout expirations. Similar
Since transmission and queuing delays remain unaffected, to values in Figure 3, each value in Figure 4 represents the
they impact the RTT less at higher propagation delays. average bytes retransmitted per HTTP transfer, i.e., total
Consequently, spurious timeouts slightly decrease at 350ms bytes retransmitted by all TCP senders.
and 650ms RTTs, but still remain significant at lower During 0% loss, data is always received in-order at the
bandwidths (Figures 9, 11), and increase page download client. The acks from client to server contain no SACK
times (Figures 8, 10 in Appendix). blocks, and the server does not undergo SACK recoveries
To summarize, the aggressive sending rate of multiple TCP (Figure 4). During loss, data received out-of-order at the
senders during slow start does NOT necessarily translate to client triggers dupacks containing SACK blocks. On
improved HTTP throughput in low bandwidth last hops. receiving 3 dupacks, a TCP sender enters SACK recovery and
Bursty data transmission from multiple TCP senders fast retransmits missing data [8]. Higher loss rates trigger
increases queuing delay causing spurious timeouts. The more SACK recovery episodes, and increase retransmissions
unnecessary retransmissions following spurious timeouts (i) during SACK recoveries (Figure 4). However, for a given loss
compete for the already scarce available bandwidth, and (ii) rate, the retransmissions decrease as the number of TCP
adversely impact HTTP throughput when compared to 1-TCP connections increase. That is, for the same fraction of lost
or SCTP. The throughput degradation is more noticeable as HTTP data (same loss rate), loss recoveries based on fast
the bottleneck bandwidth decreases. retransmits decrease as the number of TCP senders increase.
Note that loss recovery based on fast retransmit relies on
B. During Congestion
dupack information from the client. As the number of TCP
Though SCTP and TCP congestion control are similar, connections increase, data transmitted per connection
minor differences such as SCTP’s byte counting and more decreases, thus reducing the number of potential dupacks
accurate gap-ack information enable better loss recovery and arriving at each TCP sender. Ack losses on the reverse path
increased throughput in SCTP [1, 13]. As the loss rate further decrease the number of dupacks received. While the
increases, SCTP’s better congestion control offsets FreeBSD TCP senders implement Limited Transmit [RFC3042] to
TCP’s extra ack advantage during no losses (Section IVA), increase dupack information, the applicability of Limited
and SCTP outperforms 1-TCP. Transmit diminishes as the amount of data transmitted per
Recall from Section IIA that N-TCPs’ (N>1) aggressive TCP connection decreases.
sending rate during congestion avoidance can, in theory, In summary, increasing the number of TCP connections
increase throughput by more than N times. Therefore, we decreases per connection dupack information. This reduces
expected multiple TCPs to outperform both 1-TCP and SCTP. chances of fast retransmit-based loss recovery, forcing each
On the contrary, multiple TCP connections worsen HTTP sender to perform more timeout- based loss recovery.
page download times, and the degradation becomes more 2) Increased Connection Establishment Latency
pronounced as loss rate increases. This observation is true for The in-house HTTP client, which closely resembles
all 4 bandwidth scenarios studied. Further investigations Firefox’s transaction model, first opens a single TCP
revealed the following reasons. connection to the server, and retrieves and parses index.html.
1) Increased Number of Timeout Recoveries at the Server Then, the client establishes more TCP connection(s) for
For every loss rate, the mean number of timeout expirations requesting embedded objects in a pipelined fashion. Note that
at the server increases as the number of TCP senders HTTP requests can be transmitted over these connections only
increases (Figure 3). Section IVA2 discussed how increased after successful connection establishment, i.e., only when the
queuing delays cause spurious timeouts even at 0% loss. Such TCP client has successfully sent a SYN and received a SYN-
spurious timeouts, observed during lossy conditions as well, ACK. Any delay in connection establishment due to SYN or
delay new data transmission, thus worsening HTTP page SYN-ACK loss delays HTTP request (and response)
download times. transmission.
Recall that the 1Mbps transfers did not suffer spurious Figure 5 shows the average number of SYN or SYN-ACK
timeouts (0% loss in Figure 3d). However, multiple TCPs still retransmissions for the 64Kbps and 1Mbps transfers,
amplify timeout expirations in 1Mbps transfers. Further respectively. (Results for the other bandwidths were similar
investigation revealed that multiple TCPs reduce ack and hence not shown.) When multiple TCP connections are
information which is crucial for fast retransmit-based loss employed for an HTTP transfer, the number of SYN, SYN-
recoveries. ACK packets increase, and the probability of SYN or SYN-
Figure 4 shows the average number of bytes retransmitted ACK losses increase. Therefore, the number of SYN or SYN-
during TCP SACK recovery episodes (fast recovery) in the ACK retransmissions tends to increase as the number of TCP
64Kbps and 1Mbps transfers, respectively. (Results for other connections increase.
bandwidths were similar and hence not shown.). Each value
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7

(a): 64Kbps.200ms (a): 64Kbps.200ms

(b): 1Mbps.200ms (b): 1Mbps.200ms


Figure 4: Fast Retransmits during SACK Recovery (5K Objects; 4MSS Initial Figure 5: SYN or SYN-ACK Retransmissions (5K Objects; 4MSS Initial Cwnd)
Cwnd)
A SYN or SYN-ACK loss can be recovered only after the compared to the 5K transfers (Figure 7). Consequently, N-
recommended initial RTO value of 3 seconds [RFC2988], and TCPs improved HTTP throughput in the 10K transfers.
increases the HTTP page download time by at least 3 seconds. However, as the last hop bandwidth decreases, the negative
Consequently, losses during connection establishment consequences of multiple TCP senders, such as increased
degrade HTTP throughput more when the time taken to queuing delay and connection establishment latency, increase
download HTTP responses (after connection establishment) is the page download times, and N-TCPs perform similar to or
smaller compared to the initial RTO value. worse than 1-TCP. More importantly, note that, SCTP’s
enhanced loss recovery helps outperform N-TCPs even in the
C. Impact of Varying Object Sizes
10K transfers
To investigate how object size impacts HTTP throughput, To summarize, object size affects HTTP throughput over
we repeated the emulations with larger (10K) embedded multiple TCP connections. Smaller objects reduce dupack
objects. The results are shown in Figure 6. Comparing information per TCP connection and degrade HTTP
Figures 2 and 6, we see that the trends between 1-TCP and throughput more than bigger objects. However, the impact of
multiple TCPs remain similar between the 5K and 10K object size decreases, and the negative consequences of
transfers for all bandwidth scenarios except 1Mbps. multiple TCP senders dominate more and bring down HTTP
In 1Mbps, N-TCPs perform better than 1-TCP, and the throughput at lower bandwidths.
improvement is more pronounced at higher loss rates. Figure
7 shows the server’s mean timeout recoveries for the 10K
transfers in the 1Mbps scenario. Comparing values in Figure
7 with Figure 3d, we see that the 10K transfers suffered fewer
timeout recoveries per transfer time unit than 5K transfers. In
the 10K transfers, each TCP sender transfers more data and
receives more dupacks per TCP connection than the 5K
transfers (Section IVB2). The increased flow of acks in the
10K transfers triggered more retransmissions in SACK
recovery episodes, and reduced timeout-based recoveries
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8

(a): 64Kbps.200ms Figure 7: RTO Expirations at Server (10K Objects; 4MSS Initial Cwnd)

D. Impact of a Smaller Initial Congestion Window


Section IVA discussed how data transmitted by multiple
TCP senders increased the bottleneck queue size and the
queuing delay. Each TCP sender’s initial cwnd was 4MSS
bytes in these transfers. When each TCP sender employs a
smaller initial cwnd, fewer packets are queued at the
bottleneck. Therefore, multiple TCP senders, each with a
smaller initial cwnd, can possibly reduce queuing delays and
spurious timeouts, and improve HTTP throughput compared
to 1-TCP. To investigate the same, we repeated the 5K
transfers with RFC3390 turned off, i.e., initial cwnd=2MSS.
Apart from values for 0% loss, a smaller initial cwnd did
(b): 128Kbps.200ms not significantly change the trend between 1-TCP and
multiple TCPs (Figure 12 in Appendix). Similar to the 4MSS
transfers, N-TCPs perform similar or worse, but never better
than 1-TCP. Further investigations revealed two opposing
forces in the interactions between N-TCPs with a smaller
initial cwnd and HTTP throughput. N-TCPs with 2MSS
initial cwnd suffered fewer spurious timeouts when compared
to N-TCPs with 4MSS initial cwnd (Figure 13). This factor
slightly improved HTTP throughput over N-TCPs (N=2, 4, 6)
at 0% loss in 64Kbps and 128Kbps bandwidth scenarios. On
the downside, a lower initial cwnd increased the number of
RTTs required to complete the transfer ─ the page download
time.
Longer transfer time also implies more new data pending
(c): 256Kbps.200ms
transmission after spurious timeouts (Section IIIA2), which
increases page download time for N=8,10 TCPs in 64Kbps
and 128Kbps at 0% loss. For other loss rates, the two
opposing factors result in similar throughputs between 2MSS
and 4MSS scenarios, even in the 350ms and 650ms
propagation delays. Therefore, we conclude that in Internet
paths with a low bandwidth last hop, a smaller initial cwnd
does not help multiple TCP senders to perform better.

V. CONCLUSIONS & FUTURE WORK


In this work we compared HTTP throughput over the two
approaches that alleviate HOL blocking − multiple persistent
TCP connections and a single multistreamed SCTP
(d): 1Mbps.200ms
Figure 6: HTTP Throughput (10K Objects; 4MSS Initial Cwnd)
association. The emulation results unexpectedly show that
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9

multiple TCP connections do not always improve HTTP [15] PingER Detail Reports, October 2007. http://www-
wanmon.slac.stanford.edu/cgi-wrap/pingtable.pl
performance, especially in low bandwidth last hops found in [16] L. Rizzo, "Dummynet: A simple approach to the evaluation of network
the developing world. Negative factors such as spurious protocols," ACM Computer Communications Review, 27(1):31-41,
January 1997.
timeouts due to increased queuing delay, longer loss recovery
[17] H. Sivakumar, S. Bailey, R. Grossman, "PSockets: The Case for
durations due to reduced ack information, and increased Application-level Network Striping for Data Intensive Applications using
connection establishment latency actually degrade HTTP High Speed Wide Area Networks," High-Performance Network and
Computing Conference, Dallas, Texas, November 2000.
performance over multiple TCP connections. SCTP’s [18] S. Tullimas, T. Nguyen, R. Edgecomb, S. Cheung, "Multimedia streaming
enhanced loss recovery enables similar or better throughput using multiple TCP connections," ACM Transactions in Multimedia
when compared to multiple TCPs. Also, SCTP Computing Communications and Applications (TOMCCAP), Commun.
Appl, 4(2), pp:1-20, 2008.
multistreaming eliminates HOL blocking. [19] A. Williams, M. Arlitt, C. Williamson, K. Barker, "Web workload
Finally, note that when multiple TCP connections are Characterization: Ten years later," Book Chapter in "Web Content
Delivery," Springer, 2005. ISBN: 0-387-24356-9.
employed for an HTTP transfer, each TCP connection [20] C. Williamson, N. Markatchev, "Network-Level Impacts on User-Level
increases the processing and resource overhead at the web Web Performance. International Symposium on Performance Evaluation of
server/proxy. This resource overhead is higher than the Computer and Telecommunication Systems (SPECTS), Montreal, Canada,
July 2003.
overhead associated with an SCTP stream. Also, the [21] P. Natatarajan, J. Iyengar, P. Amer, R. Stewart, “SCTP: An Innovative
difference in resource requirements increases with the Transport Protocol for the WWW,” 15th International Conference on
WWW (WWW2006), Edinburg, May 2006.
number of clients, and can be significant at a server handling [22] B. Ford, "Structured Streams: A New Transport Abstraction," ACM
thousands of clients. Using optimized TCP and SCTP SIGCOMM 2007, Kyoto, Japan, August 2007.
implementations, we plan to measure this difference in [23] Movies Comparing HTTP downloads over SCTP streams vs. TCP, URL:
http://www.cis.udel.edu/~amer/PEL/leighton.movies/index.html.
resource requirements, and analyze how much a
multistreamed SCTP association reduces resources to provide APPENDIX
concurrent HTTP transfers than multiple TCPs.

REFERENCES
[1] R. Alamgir, M. Atiquzzaman, W. Ivancic, "Effect of Congestion Control
on the Performance of TCP and SCTP over Satellite Networks", NASA
Earth Science Technology Conference, Pasadena, CA, June 2002.
[2] L. Andrew, C Marcondes, S. Floyd, L. Dunn, R. Guillier, W. Gang, L.
Eggert, S. Ha, I. Rhee., “Towards a Common TCP Evaluation Suite,” In
PFLDnet 2008, March 2008.
[3] M. Arlitt, C. Williamson, "Internet web servers: Workload characterization
and performance implications," IEEE/ACM Transactions on Networking,
5(5) pp. 631-645, October 1997.
[4] J. Baggaley, B. Batpurev, J. Klaas, “Technical Evaluation Report 61: The
World-Wide Inaccessible Web, Part 2: Internet routes,” International
Review of Research in Open and Distance Learning, 8(2), 2007.
[5] H. Balakrishnan, V. N. Padmanabhan, S. Seshan, M. Stemm, R. Katz,
“TCP behavior of a busy Internet server: Analysis and Improvements,”
INFOCOM, San Francisco, March 1998.
[6] L. Cottrell, A. Rehmatullah, J. Williams, A. Khan, "Internet Monitoring
and Results for the Digital Divide," International ICFA Workshop on Grid
Activities within Large Scale International Collaborations, Sinaia,
Romania, October 2006.
[7] B. Du, M. Demmer, E. Brewer, "Analaysis of WWW Traffic in Cambodia
and Ghana," 15th International conference on World Wide Web,
Edinburgh, Scotland, May 2006.
[8] FreeBSD TCP and SCTP Implementation, October 2007. URL:
www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/.
[9] J. Gettys, email to end2end-interest mailing list, 10/02. URL:
www.postel.org/pipermail/end2end-interest/2002-October/002436.html.
[10] G. Houtzager, C. Williamson, "A Packet-Level Simulation Study of
Optimal Web Proxy Cache Placement," 11th IEEE International
Symposium on Modeling, Analysis, and Simulation of Computer and
Telecommunications Systems (MASCOTS), Orlando, Florida, October
2003.
[11] V. Jacobson, “Congestion avoidance and control,” ACM SIGCOMM,
Stanford, August 1988.
[12] J. Mahdavi, S. Floyd, “TCP-Friendly Unicast Rate-Based Flow Control,”
Technical note sent to the end2end-interest mailing list, January 1997.
[13] P. Natarajan, P. Amer, R. Stewart, "Multistreamed Web Transport for
Developing Regions," ACM SIGCOMM Workshop on Networked
Systems for Developing Regions (NSDR), Seattle, August 2008.
[14] P. Natarajan, F. Baker, P. Amer, "Multiple TCP Connections Improve
HTTP Throughput – Myth or Fact?,” Tech. Report, CIS Dept., Univ. of
Delaware, August 2008.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10

(a): 64Kbps.350ms (a): 64Kbps.350ms

(b): 128Kbps.350ms (b): 128Kbps.350ms

(c): 256Kbps.350ms (c): 256Kbps.350ms

(d): 1Mbps.350ms (d): 1Mbps.350ms


Figure 8: HTTP Throughput (5K Objects; 4MSS Initial Cwnd) Figure 9: RTO Expirations at Server (5K Objects; 4MSS Initial Cwnd)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11

(a): 64Kbps.650ms (a): 64Kbps.650ms

(b): 128Kbps.650ms (b): 128Kbps.650ms

(c): 256Kbps.650ms (c): 256Kbps.650ms

(d): 1Mbps.650ms (d): 1Mbps.650ms


Figure 10: HTTP Throughput (5K Objects; 4MSS Initial Cwnd) Figure 11: RTO Expirations at Server (5K Objects; 4MSS Initial Cwnd)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 12

(a): 64Kbps.200ms (a): 64Kbps.200ms

(b): 128Kbps.200ms (b): 128Kbps.200ms

(c): 256Kbps.200ms (c): 256Kbps.200ms

(d): 1Mbps.200ms (d): 1Mbps.200ms


Figure 12: HTTP Throughput (5K Objects; 2MSS Initial Cwnd) Figure 13: RTO Expirations at Server (5K Objects; 2MSS Initial Cwnd)

You might also like