Professional Documents
Culture Documents
Understanding Data Center Traffic Characteristics
Understanding Data Center Traffic Characteristics
net/publication/220195280
CITATIONS READS
426 1,393
4 authors, including:
Ashok Anand
University of Wisconsin–Madison
40 PUBLICATIONS 2,949 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ashok Anand on 28 May 2014.
65
the average link loads are high in the core of data center models or WAN-based models for evaluation (e.g., constant
switching architectures and they fall progressively as one traffic in [2]). Our observations can be used to create more
moves toward the edge. In contrast, average link losses are realistic workloads to evaluate current and future proposals
higher at the edge and lowest at the core (Section 4). for data center design and management.
We then analyze finer-grained packet traces collected from
a small number of edge switches in a data center to do a
microscopic study of traffic behavior in data centers. We
3. DATA SETS
find that traffic at the data center edge can be character- In this section, we describe the datasets used in our study.
ized by ON-OFF patterns where the ON and OFF periods, We collected two sets of measurement data. The first data
and packet interarrival times with an ON period follow log- set comprised of SNMP data extracted from 19 corporate
normal distributions (Section 4). and enterprise data centers hosting either intranet and ex-
Finally, we develop a strawman framework for reverse en- tranet server farms or internet server farms. These data
gineering fine-grained characteristics for data center traffic centers support a wide range of applications such as search,
using coarse grained information (i.e. SNMP counters), cou- video streaming, instant messaging, map-reduce, and web
pled with high-level inferences drawn from finer-grained in- applications. SNMP data provides aggregate traffic statis-
formation collected from a small fraction of network links tics of every network device (switch or router) in five minute
(i.e. the fact that traffic is ON-OFF in nature). This frame- increments and is typically collected because of the infeasible
work is particularly useful to simulate the prevalent traffic storage overhead required to continually collected detailed
characteristics in a data center even though it is not possi- packet traces from all devices. The second data set is com-
ble to collect fine-grained data everywhere. Our framework prised of packet traces from five switches in one of the data
takes as input the coarse-grained loss rate and traffic vol- centers.
ume distribution at a network link, and outputs parameters Table 1 summarizes device information about the first
for the traffic arrival process on the link that best explains data set and illustrates the size of each data center and the
the coarse-grained information. This framework combines fractions of devices belonging to each layer. All these data
ideas from search space exploration with statistical testing centers follow a tiered architecture, in which network devices
to quickly examine the space of possible ON-OFF traffic are organized into two or three layers. The highest and low-
source patterns in order to obtain ones that best explain the est layers are called the core and the edge layers. Between
coarse-grained information. the two layers, there may be an aggregation layer when the
Our framework has two applications: (1) It can be used to number of devices is large. The first eight data centers have
optimize traffic engineering and load balancing mechanisms. two layers due to their limited size. The remaining eleven
To illustrate this, we apply the framework to SNMP data data centers have all the three layers, with the number of
from a data center network. We leverage the insights de- devices ranging from tens of devices to several hundred of
rived from the framework to infer that packet losses on links devices. Note that due to security concerns, we do not in-
close to the edge of the data center occur in bursts. We fur- clude absolute values.
ther note that the bursts are too short-lived for traditional Tens of Gigabytes of SNMP data was collected from all of
traffic engineering approaches to react to them. Thus, a fine- the devices in the 19 data centers over a 10-day period. The
grained traffic engineering approach may have to be adopted SNMP data is collected and stored by a measurement server
by the data center in question. (2) It can be used to design which polls each device using the RRDTool [6]. The RRD-
workload generators for data centers. We are currently de- Tool is configured to poll a device every 5 minutes and record
veloping such a data center network traffic generator based 7 data-points for an active interface. These data-points in-
on our empirical insights. clude: inoctect (the number of bytes received), outoctect
(the number of bytes sent), indiscards (the number of input
packets discarded), inerrors (the number of input error pack-
2. RELATED WORK ets), outerrors (the number of output error packets), and
Traffic analysis in ISP backbones faces similar challenges outdiscards(the number of output packets discarded). The
to data centers in terms of the significant investment in in- network devices currently run version 2 of the MIB proto-
frastructure required to capture fine-grained information. col [8], wherein packets can be discarded if an interface runs
As with data centers today, ISP backbones collect coarse out of resources, such as memory for queuing. Packet dis-
grained information using SNMP. In the ISP case, this gran- cards usually indicate congestion.
ularity proves sufficient for most management tasks such as The second dataset contains packet traces from five switches
traffic engineering [3]. However, as prior work [2] has shown, in one of data centers included in the first dataset. The
such coarse grained information is insufficient for traffic en- traces were collected by attaching a dedicated packet sniffer
gineering needs in data centers. We develop an algorithm to a SPAN port on each of the switches. The sniffer ran
for deriving fine grained information from the coarse grained WinDump, which is able to record packets at a granularity
SNMP data, which can then be used in traffic engineering. of 10 milliseconds. The packet traces were collected during
Numerous studies [5, 7] have been performed on modeling a 15-day period and provide fine-grained traffic information
wide-area and ethernet traffic. Such models have informed such as packet inter-arrival times.
various approaches for traffic engineering, anomaly detec-
tion, provisioning and synthetic workload generation. How-
ever, no similar models exist for the nature of traffic in data 4. EMPIRICAL STUDY
centers. In that respect, we take the first steps toward an In this section, we identify the most fundamental proper-
extensive study and modeling of the traffic behavior in data ties of data center traffic. We first examine the SNMP data
centers. to study the link utilization and packet loss of core, edge,
Finally, recent research on data centers utilized toy traffic and aggregation devices. We then characterize the tempo-
66
Data-Center Fraction Core Frac Aggr Frac Edge 1
Name Devices Devices Devices
DC1 0.000 0.000 1.000
0.8
DC2 0.667 0.000 0.333
DC3 0.500 0.000 0.500
DC4 0.500 0.000 0.500 0.6
CDF
DC5 0.500 0.000 0.500
DC6 0.222 0.000 0.778 0.4
DC7 0.200 0.000 0.800
DC8 0.200 0.000 0.800
0.2
DC9 0.000 0.077 0.923
DC10 0.000 0.043 0.957
DC11 0.038 0.026 0.936 0
0 0.002 0.004 0.006 0.008 0.01
DC12 0.024 0.072 0.904 Percent of 10 Day Period that link is unused
DC13 0.010 0.168 0.822
DC14 0.031 0.018 0.951
DC15 0.013 0.013 0.973 Figure 1: A CDF of the percent of times a link is
DC16 0.005 0.089 0.906
DC17 0.016 0.073 0.910
unused during the 10 day interval
DC18 0.007 0.075 0.918
DC19 0.005 0.026 0.969
1
Table 1: We present information about the devices
in the 19 Data Centers studied. For each Data Cen- 0.8 Edge
ter, we present the total number of devices and the Agg
0.6
CDF
fraction of devices in each layer. Core
0.4
67
1
0.8
# of packets received
0.6
CDF
0.4
0.2
0
0 500 1000 1500
Packet Size (in bytes)
# packets received
observe a bimodal distribution with peaks around 40B and
1500B. We observe an average packet size of 850B, we use
this average packet size as the scaling factor. The bimodal
distribution is likely due to the usage of layer 2 ethernet
within the data centers. With the widespread deployment
of ethernet switches in all data centers, we expect similar
distributions in other data centers.
The real loss rates are likely to be different; however, com-
parison of loss rate distributions across the three layers is 0 2 4 6 8 10
likely to be the same for real and scaled loss rates. We (b) Time (in milliseconds) x 10
4
68
0
10 1
−1
10 0.8 wbl :0.016516
logn :0.016093
−2
10 wbl: 0.013792 0.6 exp :0.01695
CDF
CDF
logn: 0.011119 pareto :0.03225
−3
10 exp: 0.059716 0.4 data
pareto: 0.027664
−4
10 data 0.2
2 3 4 0
10 10 10 20 40 60 80 100
interarrival times (in milliseconds) Length of ON−Periods *(in milliseconds)
Figure 6: CDF of the distribution of the arrival Figure 8: CDF of the distribution of the ON-Period
times of packets at one of the switches in DC 10. The lengths at one of the switches in DC 10. The figure
figure contains best fit curve for lognormal,weibul, contains best fit curve for lognormal,weibul, pareto,
pareto, and exponential as well as the least mean and exponential as well as the least mean errors for
errors for each curve. We notice that the lognormal each curve. We notice that the lognormal fit pro-
fit produces the least error duces the least error
1
periods and the packet inter-arrival times within ON periods
0.9 all follow some lognormal distributions. In the next section,
0.8
we will present our techniques for finding the appropriate
wbl: 0.090269
lognormal random processes that can generate traffic under
0.7
logn: 0.081732 certain volume and loss rate conditions.
0.6 exp: 0.11594
pareto: 0.66908
CDF
0.5 data
5. GENERATING FINE-GRAINED OBSER-
0.4 VATIONS FROM COARSE-GRAINED
0.3 DATA
0.2 It is difficult for operators to instrument packet sniffing
0.1
and netflow on all the devices in a data center. However,
access to fine-grained data provides insight into traffic char-
2 3 4 5 6 7 8 9 10 acteristics such as time-scales of congestion which can then
Length of OFF−Periods(in milliseconds) x 10
4
be used to inform traffic engineering and switching fabric
design. In this section, we present a framework for gener-
Figure 7: CDF of the distribution of OFF-Period ating fine-grained traffic data from readily available coarse-
lengths at one of the switches in DC 10. The figure grained data such as SNMP polls. This framework utilizes
contains best fit curve for lognormal,weibul, pareto, the observations made earlier in Section 4 that the arrival
and exponential as well as the least mean errors for processes at the edge switches can be explained by 3 lognor-
each curve. We notice that the lognormal fit pro- mal distributions. The framework aims to discover a set of
duces the least error parameters for these distributions such that the traffic data
generated by the arrival process explains the coarse-grained
characteristics captured by SNMP.
within ON periods at one of the switches. We bin the inter- 5.1 Parameter Discovery Algorithm
arrival times according to the clock granularity of 10 us. Finding the appropriate parameters for an arrival process
Clearly, the distribution has a positive skew and a long tail. that matches the SNMP data is a non-trivial task because it
Using Matlab, we attempt to fit the observed distribution requires searching through a multi-dimensional space, where
with a number of well-known curves, such as lognormal, ex- each dimension in the space represents possible values of the
ponential, and Pareto and found that the lognormal curve parameters for one of the three distributions. Since each
produces the best fit with the least mean error. Figure 8 distribution is described by 2 parameters (the mean and the
shows the distribution of the durations of ON periods. Sim- standard deviation), this results in a 6-dimensional search
ilar to the inter-arrival time distribution, this ON period space.
distribution also exhibits a positive skew and fits well with Exploring all points, or system parameters, in the search
a lognormal curve. The same observation can be applied to space will result in accurate results but is clearly infeasi-
the OFF period distribution as well, as shown in Figure 7. ble. In what follows, we describe a new search space explo-
To summarize, we find that traffic at the five edge switches ration algorithm that is both fast and reasonably accurate.
exhibits an ON/OFF pattern. The durations of the ON/OFF Our algorithm is iterative in nature and is similar to simu-
69
lated annealing — in each iteration the algorithm explores The above subroutine determines the loss and volume dur-
the search space by examining the neighbors of a candidate ing a 5-minute interval as the sum of loss and volume in each
point and moving in the direction of the neighbor with the individual on-period. Line 0 calculates the average length of
highest “score”. Here, “score” is a measure of how well the an on-period and an off-period. The volume in an on-period
parameters corresponding to the point in question describe is the sum of the bytes in the packets received, where packets
the coarse-grained distributions. are spaced based on the inter-arrival time distribution (cal-
There are four challenges in developing such an algorithm; culated in Line 5.c). The loss in that ON-period is the num-
1) developing an accurate scoring function for each point, ber of bytes received minus the bytes successfully transmit-
2) determining a set of terminating conditions 3) defining a ted during the on period and the number of bytes buffered.
heuristic to avoid getting stuck in local maxima and selecting We assume an average packet size of 1KB. In Line 5.b, the
an appropriate starting point and 4) defining the neighbors interarrival time distribution is used in the generation of ai
of a point and selecting the next move . – each time a new value, ai , is drawn from the distribution.
Our framework takes as input the distribution of SNMP- The distribution A in Line 5.b is a lognormal distribution
derived volumes (volumeSNM P ), and loss rates with the following parameters, µarrival , σarrival ).
(lossrateSNM P ) for a given link at the edge of the data We run the above subroutine several times to obtain mul-
center. The approach returns as output, the parameters tiple samples for voltotal and losstotal. From these sam-
for the 3 distributions (ontimes , of ftimes , arrivaltimes ) that ples, we derive the distributions volumegenerated and loss-
provide fine-grained descriptions of the traffic on the edge -rategenerated .
link. Statistical Test. We use the Wilcoxon similarity test [9]
To create a distribution of volume and loss, we aggregate to compare the distribution of computed volumes volum-
several hours worth of data and assume that the target dis- -egenerated (loss rates) against the distribution of empirical
tributions remain relatively constant during this period. volumes volumeSNM P (loss rates). The Wilcoxon test is a
non-parametric test to check whether two different distri-
butions are equally distributed around a mean – the test
5.1.1 Scoring Function
returns the probability that this check holds. The Wilcoxon
An effective scoring function, for deciding the utility of a test is used because unlike the popular t- or chi-test, the
point and the appropriate direction for the search to proceed wilcoxon does not make any assumptions about the distri-
in, is not obvious. To score the parameters at a point, we bution of the underlying input data-sets. The usage of a
utilize two techniques: first, we use a heuristic algorithm to distribution free statistical test allows for the underlying dis-
approximate the distributions of loss and volume that the tribution for any of the 3 parameters to change.
parameters corresponding to the point in question generate; We compute the score of a point as the minimum of the
we refer to these as volumegenerated and lossrategenerated . two Wilcoxon scores – the confidence measure for similarity
Second, we employ a statistical test to score the parameters – for volume distribution and loss distribution. We use the
based on the similarity of the generated distributions to the minimum for comparison instead of other functions such as
input distributions volumeSNM P and lossrateSNM P . average, because this forces the search algorithm to favor
Obtaining volumegenerated and lossrategenerated . We points with high scores for both distributions.
use a simple heuristic approach to obtain the loss rate and
volume distributions generated by the traffic parameters 5.1.2 Starting Point Selection And Local Optimum
(µon , σon , µof f , σof f , µarrival , σarrival) Avoidance
corresponding to a given point in the search space. Our
heuristic relies on the following subroutine to derive a single To avoid getting stuck in a local optimum, we run our
sample for the volumegenerated and lossrategenerated distri- search a predefined number of times, Nsub and vary the
starting points for each search. Our framework then com-
butions:
pares the scores returned by each of the search and chooses
//0. calculate the mean on and off period lengths the parameters for the search with the best score. Two key
0. meanon = exp(µon ) + σon challenges are determining the number of searches to per-
0. meanof f = exp(µof f ) + σof f form and determining the start point for each search.
// 1. determine the total on-time in a 5 minute interval To solve both challenges, we partition the search space
1. totalon = 300 ∗ (meanon /(meanof f + meanon )) into Nsub regions and initiate an independent search at a
randomly chosen point in each sub-region. Our algorithm
// 2. calculate the average number of on-periods
2. N umOnP eriods = totalon /meanon . performs a parallel search through each of the Nsub regions
and returns the parameters for the regions with the highest
// 3. calculate the maximum number of bytes score. The choice of Nsub depends on the trade-off between
// that can be sent during the on period
3. linkcapacity = linksspeed ∗ meanon /8. the time consumed by the search space exploration algo-
rithm (which deteriorates with Nsub ) and the accuracy of
// 4. determine how much bytes can be absorbed by buffering the outcome (which is good for high Nsub ). In our evalua-
//during the off period
4. bufcapacity = min(bitsof buf f ering, linksspeed ∗ meanof f )/8
tion, we find that Nsub = 64 offers a good trade-off between
speed and accuracy.
// iterate over on-period to calculate net volume and loss rate
// observed over the 5 minute interval 5.1.3 Neighbor Selection and Choosing The Next Move
5. for i = 0 to N umOnP eriods
a. ai ∈ A{interarrival time distribution} Each point is described as a coordinate in the 6 dimension
b. volon = (meanon /ai ) ∗ pktSize space. The natural neighbor for such a point, are the points
c. voltotal + = min(volon , linkcapacity + bufcapacity )
d. losstotal + = max(volon − linkcapacity − bufcapacity , 0) closest to it in the coordinate space. For each point we
evaluate 12 neighbors using the heuristic defined in 5.1.1.
70
These 12 neighbors represent adjacent points along each of 1
the 6 dimensions.
0.8
Once all 12 neighbors are evaluated, the algorithm chooses
CDF
the neighbor with the best score or randomly selects a neigh- 0.6
bor if all neighbors have identical scores. 0.4
71
X Min loss rate Fractions of drops Our generator has several tuneable parameters such as the
during microburst outside microbursts size of the data center, the number of servers and the inter-
5 30% 10% connection topology. All servers generate on-off traffic where
10 16% 3% the parameters of the traffic are drawn at random from the
15 12% 1.6% traffic models that we have derived from the 19 data centers
examined in this study. Rather than requiring users to sim-
Table 3: Existence of burst losses. ulate popular data center applications like Map-Reduce and
scientific computing, our simulator directly generates rep-
We first study how often losses happen in such bursts, or resentative network traffic that is a result of multiple data
equivalently, how many losses are isolated or not part of a center applications running together.
microburst. The results are shown in Table 3. We find only
a small fraction of losses do not belong to any microburst. 7. CONCLUSION AND FUTURE WORK
This indicates that, more often that not, when losses happen
In this paper, we studied traffic patterns in data center
at the edge or aggregation links, they happen in bursts.
networks at both macroscopic and microscopic scales. Us-
We now study the length of microbursts, defined as the
ing network-wide SNMP data collected at 19 data centers,
difference in the timestamps of the packet losses at either
we found that links in the core of data centers are more
boundary. In Figure 11 we show a distribution of the lengths
heavily utilized on average, but those closer to the edge ob-
of the microburst losses at the aggregation switch for X =
serve higher losses on average. Using a limited set of packet
5, 10, 15. We see that an overwhelming fraction of microbursts
traces collected at a handful of datacenter switches we found
last less than 10s. Dynamic load balancing and rerouting
preliminary evidence of ON-OFF traffic patterns.
could help avoid losses due to microbursts. However, these
A key contribution of our work is a new framework for de-
initial results indicate that, to be effective, they must take
riving likely candidates for parameters that define the send-
rerouting/load balancing decisions at the granularity of once
ing behaviors of data center traffic sources. Interestingly, our
every few seconds.
approach relies on network-wide coarse-grained data that is
easily available and combines it with high-level information
1 derived from fine-grained data that is collected at a small
5 number of instrumented locations. This general framework
10 can be used to examine how to design traffic engineering
15
0.8 mechanisms that ideally suit the prevailing traffic patterns
in a data center.
As part of future work, we plan to develop new fine-
0.6
grained traffic engineering mechanisms that are more re-
CDF
8. ACKNOWLEDGMENTS
0 We would like to thank the anonymous reviewers for their
0 1 2 3 4 5 6 7
Length of Microburst (in microseconds) 7
valuable feedback. This work was supported in part by an
x 10
NSF CAREER Award (CNS-0746531) and an NSF NeTS
Figure 11: A CDF of the length of the microbursts FIND Award (CNS-0626889).
(in microseconds).
9. REFERENCES
[1] The network simulator - ns-2. http://www.isi.edu/nsnam/ns/.
http://www.isi.edu/nsnam/ns/.
6.2 Traffic generators for data centers [2] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable,
Several recent studies have examined how to design scal- commodity data center network architecture. In SIGCOMM,
pages 63–74, 2008.
able data center topologies. To evaluate the proposed topolo- [3] N. Feamster, J. Borkenhagen, and J. Rexford. Guidelines for
gies, these studies have relied on experiments that use “toy” interdomain traffic engineering. SIGCOMM Comput. Commun.
traffic workloads such as mixtures of long and short-lived Rev., 33(5), 2003.
flows. It is unclear if these workloads are representative of [4] A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta.
Towards a next generation data center architecture: scalability
network traffic patterns in data centers. Other studies have and commoditization. In PRESTO ’08: Proceedings of the
employed application-specific workloads such as map-reduce ACM workshop on Programmable routers for extensible
or scientific computing and used the resulting traffic traces services of tomorrow, pages 57–62, New York, NY, USA, 2008.
ACM.
for evaluating data center network designs and management [5] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson. On
approaches. However, network traffic in data centers is com- the self-similar nature of ethernet traffic (extended version).
posed of traffic from a variety of other applications with IEEE/ACM Trans. Netw., 2(1), 1994.
diverse send/receive patterns. [6] T. Oetiker. Round robin database tool.
http://oss.oetiker.ch/rrdtool/.
Our empirical observations and the framework we pre- [7] V. Paxson and S. Floyd. Wide area traffic: the failure of poisson
sented in the previous section could help in the design of modeling. IEEE/ACM Trans. Netw., 3(3):226–244, 1995.
realistic network traffic generators for data centers. As part [8] C. S. version 2 MIBs. http://www.cisco.com/public/mibs/.
of current work, we are developing such a traffic generator. [9] F. Wilcoxon. Biometrics bulletin. 1:80–83, 1945.
72