Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

SAI Intelligent Systems Conference 2016

September 21-22, 2016 | London, UK

A Forecasting Model for Data Center Bandwidth


Utilization

Samar Raza Talpur Prof. Tahar Kechadi


School of Computer Science School of Computer Science
University College Dublin University College Dublin
Belfield Dublin, Ireland Belfield Dublin, Ireland
Email: samar.talpur@ucdconnect.ie Email: tahar.kechadi@ucd.ie

Abstract—Bandwidth optimization and its efficient utilization also used ARIMA model for traffic monitoring to predict the
is more challenging in operating data centers. Our model can future traffic utilization, by using four different components
assist for proper usage of resource utilization and accommodate (a) Campus networks (b) dialup connections (c) Local Area
large scale of bursty data. In this paper we propose forecast Network and (d) backbone. Network Bandwidth Utilization
model for Data Center Bandwidth Utilization system; a forecast Forecast Model on High Bandwidth Network [38] uses time se-
model for data centers to predict and estimate proper bandwidth
ries model and claims efficient resources of network utilization.
utilization in real-world situations. Based on self-learning proce-
dures, the proposed forecasting model will optimize the traffic The results showed that it can reduce computational time by
and predict bandwidth more efficiently. Our approach is based 83.2% compared to other traditional approaches. This model
on Time Series and Vector Autoregression (VAR-Model) models, can help scheduling data movements on high bandwidth net-
it optimizes the bandwidth traffic detecting and diagnosing the works. However, if the planning about network usage is known
future based on historical data. in advance, the performance can be improved significantly. The
challenge, here is how to build a forecast model for real-world
Keywords—Bandwidth Utilization, Network Forecasting, Time
Series, ARIMA Models, Exponential Smoothing, VAR Model. traffic in data centers. This will help not only improving the
performance of both data centers and the networks but also
help managing, planning and upgrading both the network and
I. I NTRODUCTION data center resources. Considering all the above mentioned
models neither are specifically designed for data centers nor
While Data Centers are growing in use and popularity,
they all have real network traffic of data centers. Our model is
the business communities are also more concerned about
entirely different and exclusively designed for data centers,
speed, reliability and better performance of networks. A typical
we used absolute genuine traffic to experiment with and
Data Center uses the shared network resources. The network
apply the same for monitoring and prediction. In order to
performance is affected when the bandwidth consumption
improve traffic accuracy in data centers, we developed Data
increases. When the demand is above a certain limit of its total
Center Bandwidth Utilization forecasting model. Statistical
capacity, the whole network becomes unstable. Network band-
models observed high volume of real traffic and can predict
width importance is very valuable when it comes to business
accurately network performance which can be helpful for a
use, especially for Business-to-Business (B2B) community, as
good network administrator. With the growth of data centers,
they heavily rely on healthier internet connectivity; it is a
the big number of data center service providers have thousands
prerequisite for all users that are connected with each other and
of servers and other equipment installed, doubling all these
need a good speed level of data exchange. Certainly they need
equipment every 18 months and proving the prediction of
fast internet and higher bandwidth and more importantly when
Moore’s law [26]. Once we look at the running cost, millions of
using data center resources involving significant data volume.
dollars were spent on diversified hardware, complex workload
Global network traffic evolved with an impressive usage, in
and thousands of various applications. Despite of all the
relation to the foregoing observations of the last two decades.
conventional data centers neither provide proper access for
In 1992 the total Internet traffic per day was 100 GB, soon
public trace nor real time monitoring system for researchers.
after one decade later in 2002 it jumped at 100 GB per second,
Linear regression, using Granger causality and VAR model
even as in 2014 it arrived at 16,144 GB per second, whereas
(Eviews tools) forecasting is also nicer option for predicating
the predication for 2019 is expected to be more than 51,974
the bandwidth utilization [2], [11].
GB per second [21], [5], [29]. Motivated by system model-
ing, Network Bandwidth Predictor (NBP) [7] used a neural-
network based approach to predict the bandwidth usage and II. M ETHODOLOGY
network performance. NBP combined with NWS (Network A. Design of Data Center
Weather Service) is developed for observing subsystems and
measuring the network behavior. GLMM [15] presented for Among many data center architectures, we opt for a typical
high speed in data network analysis, a new predictive model three-tier architecture [24] which contains layer 2 to 4 devices
which analyzes the traffic patterns and conditions of several such as core routers, aggregation and access switches. The
networks. The simulation based study claims high accuracy hierarchies of networks are connected with each other from
in MSPE (Mean Squared Predication Error). Moussas [28] higher layers to lower layers. The distribution and aggregation
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 195 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

layer switches at higher layers are directly connected with the


outside traffic (the Internet), while edge layer switches are
connected with servers and end-user computers.

We also fixed a dedicated database server, which operated


constantly (24 x 7) as a traffic capturing server for collecting
network traffic and network behavior. As shown in figure,
we collected different types of data from both networks i.e.
public and private networks. The secured network, treated as
private network, are connected with firewall via a number
of restricted policies for In-Out traffic. On the other-hand
unrestricted networks bypass the firewall and open for all users,
these networks are public.
Fig. 3: Captured Traffic of 24 Hours

Access Aggregation Core

Switch

Core Router
(Pub-NW)
Core Switch Internet

Firewall

Core Switch

Firewall

Core Router
(Pvt NW)

Traffic
Database
Fig. 4: Captured Traffic of 30 Days
Fig. 1: Standard Topology of Data Center

All generated network traffic of LANs and WANs is


divided into different categories of time and intervals i.e;
minutes, hours and days. For example, Figure-2 shows 60
B. Traffic Generation minutes of generated traffic with 30 second interval, and
Figure-3 shows 24 hours generated traffic and captured with
More than 150 computers are running absolutely (24x7) average of 5 minutes’ intervals, while the Figure-4 displays 30
to generate diversified traffic within a particular interval of days of traffic captured at time intervals of 60 minutes. The
time. The captured traffic (In-Out) from different LANs and purpose of this exercise is to demonstrate different granularities
WANs (public and private) is monitored and stored in database of time slots with standard refresh time of intervals (i.e.
server. Database server constantly running and storing the minutes, hours and days) so that selected sample of small and
traffic without any interruption and downtime. big units to update the real time traffic and to cope with more
accuracy and better performance. In future we plan to utilize
these data sets to deal with latency and jitter of traffic.

C. Data Collection
The fundamental reason for capturing the real-time traffic is
to monitor the whole network, and fulfill the upcoming demand
of bandwidth and installation of network devices. We captured
the traffic on routers’ ports with the following factors. (Note
that the real-time traffic was taken from June 2014 to June
2015).
However routers can have multiple types of ports and connec-
tors, but we connect fast Ethernet port of routers and collect the
traffic in raw format. So far we clean the data (raw to csv) for
popular format for slicing and visualization, by going through
all the process of data mining and modeling. Subsequently the
Fig. 2: Captured Traffic of 60 Minutes traffic presents in shape with In-Out in kilo bits per second
and Kilobytes per second on initial stage.
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 196 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

D. Traffic Observation R-Language has built-in facilities for analyzing time series
data, using ”auto.arima” function and passing multiple argu-
During the experiments, we try to monitor the In-Out traffic ments easily produce some basic forecasting results. Linear
on on Router’s port1 (FastEthernet0/0) (Bandwidth in both filtering of time series (X, T, S and et) are major factors,
KBytes/s and kb/s). All the traffic data is captured and stored where X= time series, T=trend, S=seasonal component and
in database server and it is directly connected to a router on et=reminder components. “ts” function converts numeric vec-
fast Ethernet port. Using Wireshark software tool we captured tor in to R, first we convert one variable at a time into time
all the incoming and outgoing traffic. The traffic was recorded series object by using ”ts” function. The typical format of “ts”
for various time intervals: minutes, hours and days format from has four standard object for observations i.e. vector, start, end
networks to the Internet. The reasons for capturing diversified and frequency, example given here;
traffic of various variations to obtain more precise results.
port.1.365.days.T IN.ts < −ts(port.1.365.daysT raf f icI N,
III. E XPERIMENTAL R ESULTS start = c(2014, as.P OSIXlt(”2014 − 05 − 28”)yday),
ARIMA(1,0,1): This is the combination of autoregressive f requency = 365)plot.ts(port.1.365.days.T IN.ts)
(AR) model and Moving Average (MA) models with integra-
tion of order zero. Let T(In) be considered for In traffic which However available filtering analysis used in weeks, months
measured in kb/s and  can be uses normalisation of data. It and quarters i.e. (a=2, a=12, and a=40) respectively [40].
can be jotted down as follows:

T (In)t = µ + αT (In)t−1 + γt−1 + t (1)


Where α, non-zero is the mean or drift component and (t )
is the disturbance term that is normally distributed. The α
and γ are the coefficients of the lag terms T(In) and Errors
respectively.
In the similar fashion, model can be constructed for the
Tout traffic follows:

T (Out)t = µ + αT (Out)t−1 + γt−1 + t (2)


Focusing on various results was an additional motivation to
work with different models, based on multiple sources and
previous data we got the deliverable outcomes. Our productive
outcomes help the analytical results for decision making.
Subsequently endeavoring couple of models and compare the
difference with one another to optimize the process and precise
the network bandwidth. To predict the futuristic movement
the time series (ARIMA) and VAR models were best options
to choose. Heading to summarize different types of data sets
then convert same data into machine readable format the most
common approach EDA (Exploratory Data Analysis) [36] was Fig. 5: correlation between In and Out traffic
used. Analyzing and visual plotting the EDA was better choice
to opt, we plotted both traffic (In & Out) on one graph to
observe the correlation of traffic with each other. We can see in above figure a visual representation of the
original data, red lines represent In data and green lines Out
A. Auto-Regressive Integrated Moving Average (ARIMA) data. As shown in the same figure the quarterly visual repre-
Model sentations between two variables (In-Out). Logarithm presents
in four equal quarters (July 2014-April 2015), we assume
Using time-series analysis, we adopted an Autoregressive maximum variation in third quarter jumps much higher, and
Integrated Moving Average (ARIMA) model to forecast the hitting the maximum amount of bandwidth. It demonstrates the
network traffic. ARIMA (p, d, q) uses non seasonal data rush hour of time when maximum Internet users’ are on-line,
while ARIMA (P, D, Q)m use seasonal data for generalization. and network requests are much higher than its total capacity.
Integer p, d, q are non-negative integers, where “p” = order of During that time congestion may occurs and users may face
model, “d” = degree of differencing and “q” = moving average problems while browsing.
model. While uppercase (P, D, Q)m represent seasonal ARIMA During the time of investigation we had many choices to select
models, integer “m” represents amount of periods of individual the software tools, after investigation R-language was better
seasons. The following are the mathematical equations of option to decide on. R has more natural and steep learning, ma-
seasonal and non-seasonal of ARIMA for forecasting [18], jor reasons (a.) free distribution and widely used for analysis.
[14]. Immense on-line support with millions of blogs and lectures
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 197 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

(b.) however R is not built for professional developers but more the weighted averages.
close to scientists and maintain by scientists, (c.) No need In the similar fashion model can be constructed for T(Out) as
to write larger code, R studio provides better interface, this follows:
environment gives free hand to observe the output lively. If
T (Out)t − T (Out)t−1|t
The visual representations of results are shown as under; t =
T (Out)t−1|t
then

T (Out)t = (lt−1 + θct−1 )(1 + t )

lt = (lt−1 + θct−1 )(1 + αt )

bt = θbt−1 + β(lt−1 + θct−1 )t

ETS(M, N, N): Simple Exponential Method with Multiple


Errors
If
T (In)t − T (In)t−1|t
t =
T (In)t−1|t
then

T (In)t = lt−1 (1 + t )

lt = αT (In)t + (1 − α)lt−1

lt = lt−1 (1 + αt )
Fig. 6: Forecast ARIMA, ETS(M,Ad,N) and ETS(M,N,N)
In the similar fashion model can be constructed for T(Out)
as follows:
B. Exponential Smoothing Functions If
T (Out)t − T (Out)t−1|t
ETS (M, Ad , N ) Exponential Smoothing Damped Trend t =
Method with multiplicative Errors T (Out)t−1|t
then
If T (Out)t = lt−1 (1 + t )
T (In)t − T (In)t−1|t
t =
T (In)t−1|t
lt = αT (Out)t + (1 − α)lt−1
then

lt = lt−1 (1 + αt )
T (In)t = (lt−1 + θct−1 )(1 + t )
While using time series in R, ts-package offers Alpha (α),
Beta (β) and gamma ((γ) functions, as HoltWinters generalized
lt = (lt−1 + θct−1 )(1 + αt ) the procedure to compact with trend and seasonal variation.
Given parameters Alpha describes “level, Beta deal with
“trend” and Gamma deal with “seasonal variation”. However
bt = θbt−1 + β(lt−1 + θct−1 )t data-set for exponential smoothing functions are described in
above code. [19], [17], [40]
Where α, β and θ are the smoothing parameters and l0 and However for Exponential Smoothing function we use the
b0 are the initial guesses for level and trend components. They following equation
are used to maximize the likelihood. Traditionally, the possi-
ble values of the smoothing parameters have been restricted
between zero and one so that equation can be constrained to F t + 1 = F t + α(At–F t)ORF t + 1 = αAt + (1–α)F t (7)
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 198 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

The forecast values are converted in the shape of two


models i.e. Exponential smoothing and ARIMA [9], [20]
1. Forecast from ETS (M,Ad,N) ETS=Exponential Trial
Smoothing (Error, Tend, Seasonal) Where M=Multiplicative,
Ad=additive damped, N=None 2. Forecast from ETS(M,N,N)
ETS=Exponential Trial Smoothing (Error, Tend, Seasonal)
Where M=Multiplicative, N=None

The table shows fifteen exponential smoothing methods of


trend components and seasonal components of ARIMA model
using R-language [18].

Trend Component Seasonal Component


N (None) A (Additive) M (Multiplicative)

N (None) N, N N, A N, M
Fig. 8: Results of Vector Autoregression (VAR) (p) Model
A (Additive) A, N A, A A, M

Ad (Additive damped) A d, N A d, A A d, M

M (Multiplicative) M, N M, A M, M

Md (Multiplicative damped) M d, N M d, A M d, M

Fig. 7: Exponential Smoothing Methods

C. Vector Autoregression (VAR) (p) model

However predicting and obtaining the futurist data in my


research study, We used “Vector Autoregression” (VAR) (p) Fig. 9: Residuals Graph
model to predict the In and Out bandwidth traffic (kbps) and
have used the AIC (Akaike information criterion) and LR test
(Likelihood-ratio test) for lag selection criterion. The equations
We have used VAR(1) to predict the bandwidth traffic Out
for Vector Autoregression (VAR) can be represented are as
(kbp/s) the estimated VAR(1) is given by
under:

Xt = c + aXt − 1 + e (8)

Int = c + a1(In)t − 1 + a2(In)t − 2


+a3Out(t) − 1 + a4Out(t) − 2 + e1t Xt = 606.2201 + 0.5148Xt − 1 (9)

Estimated results of the above equation is given in Figure-


7, results shows that
c= 606.2201 with p = 0000 which is highly significant and a =
Out(t) = c + b1(Out)t − 1 + b2(Out)t − 2 0.514803 with p-value = 0000 which is also highly significant.
+b3(In)t − 1 + a4(In)t − 2 + e2t F-value is 256.1357 with p value= 0.0000 which shows that
the our VAR(1) is highly significant. Durbin-Watson stat shows
that there is no problem of auto-correlation (as value is close
We estimated two equations and results are given in Figure to 2). Residuals are very large therefore R2=0.265114 which
7 and 8, as results show for equation-1 all coefficients are is very small. Residuals are shown in the graph (i.e. Figure 4:
statistically significant except the coefficient a3. There are Residuals Graph)
two coefficients statistically insignificant b2 and b4, it means This shows that variation in the period more than 400 is
Lag 2 of In and Out variables are not significant. However, larger as compare to the variation less than 400, which may
we applied granger causality test to check for In and Out cause the small R2 and large standard error.
bandwidth traffic affected by the past values of In and Out
bandwidth. The result of Granger causality test is in Figure-8. D. Implementation tools and Specifications
Both Hypothesis are rejected so we can conclude that the IN
and Out bandwidth can be predicted by the past values of IN Following programming tools and applications are used for
and Out bandwidth traffics. above research:
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 199 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

• Programming Language: R and Java out the errors in forecast on diversified time-scales. Study
tries to present that the forecasting inaccuracies do not reflect
• Programming Platform: Windows and Unix
monotonically reduction with ironing on big platform [32].
• Data Mining: Weka and RapidMiner (Basic version,
free for students) The estimation of bandwidth can be calculated by sending
the probe packets on network and measured with list of easily
• Packet analyzer: Wireshark available applications and tools i.e. IGI, Pathload, pathChrip,
• Statistical forecasting: R and Eviews (student version) Spruce and IGI [16], [22], [33], [35].
In the paper “The Nature of Data center Traffic: Measure-
IV. BACKGROUND AND R ELATED W ORK ments Analysis” [23], the experiment was simulated to create
Our proposed forecasting model is suitable for proper the traffic pattern intentionally to observe for mining the huge
resource utilization and also accommodates large scale of data-sets for analyzing the performance.
bursty data in data center networks. A number of other One we see the network architecture of data centers,
forecasting models are also proposed to measure the network the geographically distributed data centers have three layered
and bandwidth traffic, most of them used to collect the network connectivity (a) layer-2 and over fiber, (b) Layer-3 with
traffic on short term basis (minutes to hours), while some of WAN over dark fiber and (c) storage extension. However the
other models used for full-term basis (hour to days) [6], [25], distributed data centers connectivity expect the compatibility
[31], [34]. But our forecasting model used for both from short and fasten support within all network vendors and all tech-
time as well as full time to capture the traffic i.e minutes, nologies. At present DCI uses layer 2 - 3 Virtual Private
hour and days. The reason for both short term and full term to Networks with Multi-Label Switching (VPN-MPLS), Secure
acquire the detailed traffic traffic and obtain the better results Socket Layer with Virtual Private Networks (SSL-VPN), and
with more precise outputs. some other bundles of secure protocols. Various other protocols
Numerous researches have presented, examined, and com- i.e. IPSEC-VPNs and Vx (virtual private networks and bundle
pared for forecasting tool to predict bandwidth and its proper of virtual s) for secure connections are used.
utilization for example: Multipathing in Data Center Ethernet (DLBMP) is an
Linear regression by using Granger causality and forecast- alternative solution of STP (Spanning Tree Protocol), DLBMP
ing by using VAR model (by Eviews tools) is also nicer option propose the solution to overcome the proper bandwidth uti-
for predicating the futurist data. [2], [11], [13]. lization on data link layer (L2) by using Dijkstra algorithm.
Since STP has problem of unexpected blockage for links and
UANM (Unified architecture for network measurement) ports. In DLBMP redundant physical links have deployed to
offered an end to end measurement tool for bandwidth es- overcome the failure of physical links, it has more performance
timation. Using UANM tool, authors illustrated to achieve and can handle 300% more bandwidth capacity while compare
synchronized dimensions and avoid interferences, also increase with STP. The communication between nodes and traffic are
the accuracy and reliability of any measurement of bandwidth dynamically adjustable, the load balancing are feasible and
from end to end [1]. ease to achieve their efficient link with proper bandwidth
Traffic-prediction-assisted dynamic bandwidth assignment utilization [39].
for hybrid optical wireless networks proposes performance
Portland and VL2 [10], [30] uses the architectural model
based extensive simulation based on extensive simulation and
of Switch-Centric routing structure, which controls the com-
architectural tool for scheduling. The mechanism propose for
munication by using network switches for routing, the same
dynamic bandwidth assignment (DBA) and optical network
anatomy used in three-tier (i. access ii. aggregate iii. core)
unit (ONU) which predict the incoming traffic and manage the
and fat-tree. This type of architectures are largely used in
network scheduler for better performance on network [27].
conventional data centers’ physical topology. But the three-
Dynamic bandwidth allocation with high utilization tier topology schemes are very large, complicated and heavy
(DBAHU) Algorithm proposes to utilize the unused bandwidth looking for price and power [4]. The Helios (Hybrid Electrical
of a service class. The exemplified procedure uses simple tech- and Optical structure) [8] combines pod switches with core
niques for dynamic bandwidth allocation (SFDBA) algorithm. switches, the architecture propose the reductions of switch-
Both SFDBA and DBAHU practice a collective accessible ing elements, cabling, cost, and power consumption. While
byte counter and a mutual counter for multiple queues of a cThrough [37] architecture by combining the optical and
service class. Moreover it addresses the problem of un-utilized electrical technology, the optical segment routing performs one
bandwidth with service class, and unoccupied bandwidth used hop exchange of communication while the electrical segment
by available byte counter [12]. works like routing in tree, although optical solution has better
performance in power saving but rarely used in data centers
The SNMP data pattern used in Network Traffic Charac-
due to high-priced cost of switches and complex configuration.
teristics of Data Centers in the Wild uses paper, author studied
numerous network patterns [12]. A flexible reservation algo-
rithm for advance network provisioning proposed a framework V. C ONCLUSION AND F UTURE W ORK
for network reservation and claimed to deliver the guaranteed
This research paper presented an innovative idea and
bandwidth [3].
approach to support the research communities of networking
While An empirical study of the multi-scale predictability especially data center professionals. However it’s challenging
of network traffic illustrated an experimental study to find to build forecasting models for real world traffic of Data
978-1-5090-1121-6/16/$31.00 ©2016 IEEE 200 | P a g e
SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

Centers, more important to make predication of futuristic [13] Craig Hiemstra and Jonathan D Jones, Testing for linear and nonlinear
traffic when designing, managing and upgrading the complex granger causality in the stock price-volume relation, The Journal of
network of data centers. In this paper we will focus on Finance 49 (1994), no. 5, 1639–1664.
two main objectives; (1) proposing simple and yet scalable [14] https://en.wikipedia.org/, Autoregressive integrated moving average.
techniques for analysis and forecasting, (2) Implementing and [15] Kejia Hu, Jaesik Choi, Alex Sim, and Jiming Jiang, Best predictive
generalized linear mixed model with predictive lasso for high-speed
evaluating these techniques on real-world data centers. Our network data analysis, International Journal of Statistics and Probability
next project for research to work on complex protocols used in 4 (2015), no. 2, p132.
data centers; especially take counter measures for the security [16] Ningning Hu and Peter Steenkiste, Evaluation and characterization
of data centers. of available bandwidth probing techniques, IEEE journal on Selected
Areas in Communications 21 (2003), no. 6, 879–894.
[17] Rob J Hyndman, Muhammad Akram, and Blyth C Archibald, The
ACKNOWLEDGEMENT admissible parameter space for exponential smoothing models, Annals
of the Institute of Statistical Mathematics 60 (2008), no. 2, 407–426.
I take this opportunity to express gratitude to all unknown [18] Rob J Hyndman and Yeasmin Khandakar, Automatic time series for
reviewers for their feedback and make me able to participate forecasting: the forecast package for r, Tech. report, 2007.
for this conference. This research was supported by Sukkur [19] Rob J Hyndman, Anne B Koehler, Ralph D Snyder, and Simone Grose,
Institute of Business Administration, this prestigious institute A state space framework for automatic forecasting using exponential
allowed me to mentioned the name to acknowledge. I would smoothing methods, International Journal of Forecasting 18 (2002),
like to express my sincere gratitude to my supervisor Prof. M- no. 3, 439–454.
Tahar Kechadi, who is second author of this paper; this study [20] Rob J Hyndman and Andrey V Kostenko, Minimum sample size require-
is nothing with the exception of his continuous support and ments for seasonal forecasting models, Foresight 6 (2007), no. Spring,
12–15.
motivation. My sincere thanks to my ex-colleague Mr. Fahad
[21] Cisco Visual Networking Index, Forecast and methodology, 2014–2019
Rahim Qasmi for providing the partial data and excess of data white paper, cisco, 2015.
center. [22] Manish Jain and Constantinos Dovrolis, End-to-end available band-
width: Measurement methodology, dynamics, and relation with tcp
throughput, vol. 32, ACM, 2002.
R EFERENCES
[23] Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel,
[1] Giuseppe Aceto, Alessio Botta, Antonio Pescapé, and Maurizio and Ronnie Chaiken, The nature of data center traffic: measurements
D’Arienzo, Unified architecture for network measurement: The case of & analysis, Proceedings of the 9th ACM SIGCOMM conference on
available bandwidth, Journal of Network and Computer Applications Internet measurement conference, ACM, 2009, pp. 202–208.
35 (2012), no. 5, 1402–1414. [24] Dzmitry Kliazovich, Pascal Bouvry, and Samee Ullah Khan, Green-
[2] I Gusti Ngurah Agung, Time series data analysis using eviews, John cloud: a packet-level simulator of energy-aware cloud computing data
Wiley & Sons, 2011. centers, The Journal of Supercomputing 62 (2012), no. 3, 1263–1283.
[3] Mehmet Balman, Evangelos Chaniotakisy, Arie Shoshani, and Alex [25] Balaji Krithikaivasan, Yong Zeng, Kaushik Deka, and Deep Medhi,
Sim, A flexible reservation algorithm for advance network provision- Arch-based traffic forecasting and dynamic bandwidth provisioning for
ing, 2010 ACM/IEEE International Conference for High Performance periodically measured nonstationary traffic, IEEE/ACM Transactions
Computing, Networking, Storage and Analysis, IEEE, 2010, pp. 1–11. on Networking (TON) 15 (2007), no. 3, 683–696.
[4] Kashif Bilal, Samee U Khan, Limin Zhang, Hongxiang Li, Khizar [26] Marcel Margulies and Egholm, Genome sequencing in microfabricated
Hayat, Sajjad A Madani, Nasro Min-Allah, Lizhe Wang, Dan Chen, high-density picolitre reactors, Nature 437 (2005), no. 7057, 376–380.
Majid Iqbal, et al., Quantitative comparisons of the state-of-the-art [27] Maysam Mirahmadi and Abdallah Shami, Traffic-prediction-assisted
data center architectures, Concurrency and Computation: Practice and dynamic bandwidth assignment for hybrid optical wireless networks,
Experience 25 (2013), no. 12, 1771–1783. Computer Networks 56 (2012), no. 1, 244–259.
[5] VNI Cisco, The zettabyte era: Trends and analysis, Cisco visual [28] Vassilios C Moussas, Marios Daglis, and Eva Kolega, Network
networking white paper (2014). traffic modeling and prediction using multiplicative seasonal arima
[6] Paulo Cortez, Miguel Rio, Miguel Rocha, and Pedro Sousa, Multi- models, Proceedings of the 1st International Conference on Ex-
scale internet traffic forecasting using neural networks and time series periments/Process/System Modeling/Simulation/Optimization, Athens,
methods, Expert Systems 29 (2012), no. 2, 143–155. 2005, pp. 6–9.
[7] Alaknantha Eswaradass, Xian-He Sun, and Ming Wu, Network band- [29] Cisco Visual Networking, The zettabyte era–trends and analysis, Cisco
width predictor (nbp): A system for online network performance fore- white paper (2013).
casting, Cluster Computing and the Grid, 2006. CCGRID 06. Sixth [30] Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nel-
IEEE International Symposium on, vol. 1, IEEE, 2006, pp. 4–pp. son Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subra-
[8] Nathan Farrington, George Porter, Sivasankar Radhakrishnan, manya, and Amin Vahdat, Portland: a scalable fault-tolerant layer 2
Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, data center network fabric, ACM SIGCOMM Computer Communica-
George Papen, and Amin Vahdat, Helios: a hybrid electrical/optical tion Review, vol. 39, ACM, 2009, pp. 39–50.
switch architecture for modular data centers, ACM SIGCOMM [31] Konstantina Papagiannaki, Nina Taft, Zhi-Li Zhang, and Christophe
Computer Communication Review 41 (2011), no. 4, 339–350. Diot, Long-term forecasting of internet backbone traffic, IEEE Trans-
[9] Everette S Gardner Jr and ED McKenzie, Forecasting trends in time actions on Neural Networks 16 (2005), no. 5, 1110–1124.
series, Management Science 31 (1985), no. 10, 1237–1246. [32] Yi Qiao, Jason Skicewicz, and Peter Dinda, An empirical study of
[10] Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, the multiscale predictability of network traffic, High performance
Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Distributed Computing, 2004. Proceedings. 13th IEEE International
Sudipta Sengupta, Vl2: a scalable and flexible data center network, Symposium on, IEEE, 2004, pp. 66–76.
Communications of the ACM 54 (2011), no. 3, 95–104. [33] V Ribeino and Riedi R Baraniuk Retal Pathchimp, Efficient a vailable-
[11] William E Griffiths, R Carter Hill, and Guay C Lim, Using eviews for bandwidth estimation for network paths pam, 2003.
principles of econometrics, (2008). [34] Aimin Sang and San-qi Li, A predictability analysis of network traffic,
[12] Man-Soo Han, Dynamic bandwidth allocation with high utilization for Computer networks 39 (2002), no. 4, 329–345.
xg-pon, 16th International Conference on Advanced Communication [35] Jacob Strauss, Dina Katabi, and Frans Kaashoek, A measurement study
Technology, IEEE, 2014, pp. 994–997. of available bandwidth estimation tools, Proceedings of the 3rd ACM

978-1-5090-1121-6/16/$31.00 ©2016 IEEE 201 | P a g e


SAI Intelligent Systems Conference 2016
September 21-22, 2016 | London, UK

SIGCOMM conference on Internet measurement, ACM, 2003, pp. 39–


44.
[36] John W Tukey, Exploratory data analysis, (1977).
[37] Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina
Papagiannaki, TS Ng, Michael Kozuch, and Michael Ryan, c-through:
Part-time optics in data centers, ACM SIGCOMM Computer Commu-
nication Review, vol. 40, ACM, 2010, pp. 327–338.
[38] Wuchert Yoo and Alex Sim, Network bandwidth utilization forecast
model on high bandwidth networks, Computing, Networking and Com-
munications (ICNC), 2015 International Conference on, IEEE, 2015,
pp. 494–498.
[39] Yang Yu, Khin Mi Mi Aung, Edmund Kheng Kiat Tong, and
Chuan Heng Foh, Dynamic load balancing multipathing in data center
ethernet, Modeling, Analysis & Simulation of Computer and Telecom-
munication Systems (MASCOTS), 2010 IEEE International Symposium
on, IEEE, 2010, pp. 403–406.
[40] Walter Zucchini and Oleg Nenadic, Time series analysis with r-part i,
Document de cours (2011).

978-1-5090-1121-6/16/$31.00 ©2016 IEEE 202 | P a g e

You might also like