Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

1

Traffic analysis in the TRAMMS project


Andreas Aurelius1 , Christina Lagerstedt1 , Maria Kihl2 , Marcell Perényi3 , Iñigo Sedano4 , Felipe Mata5
1 Acreo AB, Kista, Sweden
2 Dept. of Electrical and Information Technology, Lund University, Sweden
3 Budapest University of Technology and Economics, Budapest, Hungary
4 Robotiker Tecnalia, Bilbao, Spain
5 Universidad Autonoma de Madrid, Madrid, Spain

Abstract—Internet usage is evolving, from the traditional II. A BOUT TRAMMS


WWW usage (i.e. downloading web pages), to triple-play usage
where households may have all their communication services
The TRAMMS project is part of the Celtic framework, a
(telephony, data, TV) through their broadband access connection. EUREKA cluster focusing on telecommunications. The project
The challenge is to design IP access networks so that they can (Traffic Measurements and Models in Multi-Service Networks)
deliver services with strict QoS demands such as IPTV at the [1] is a three year project with partners from Sweden, Spain
same time as having capacity for (from the operator’s perspective) and Hungary:
unwanted traffic, for example file sharing, demanded by the
users. • Acreo AB, Sweden
One important part in meeting this research challenge is to • BUTE -Budapest University of Technology & Eco-
identify and monitor Internet usage. Traffic modeling is tightly nomics, Hungary
coupled both to traffic measurements and to engineering and • Ericsson AB, Sweden
techno economics. The focus of the measurements in this paper
• Euskaltel, Spain
lies on analyzing parameters of interest for network management
decisions, i.e. traffic volume, application usage, locality of content • GCM Communications, Spain
and popularity of content. For locality analysis, the geographical • Lund University, Sweden
end points of data flows are studied. For content popularity • Procera networks, Sweden
analysis, Youtube videos are analyzed looking at the number of • Fundación Robotiker, Spain
parallel sessions and content popularity distributions. Indepen-
• Telefónica I+D, Spain
dent of the type of model, traffic measurements are a common
denominator that provide input for the model parameters. In • Telnet-RI, Spain
this paper, detailed traffic measurements performed as a part of • Universidad Autónoma de Madrid, Spain
the Celtic TRAMMS project are presented. The main objective of TRAMMS is to model traffic in
multi-service IP networks, and to use the models as input for
I. I NTRODUCTION capacity planning of tomorrow’s networks. The models will be
The Internet as we know it today, is a network of net- built upon data acquired with advanced traffic measurements
works that has continuously evolved over the years as an on the application level with deep packet/deep flow inspection
effect of rapid innovation and development. This innovation in different parts of Europe, combined with bottleneck analysis
and development is driven by various actors with differing and interdomain routing analysis.
agendas. These agendas can be commercial, political, social, Parameters such as applications used, trends in application
developmental, etc, and with the support of a rapid technology usage, penetration of applications, peak hours, peak rates,
development, new services and new ways of communicating traffic volume, uplink/downlink ratios, network traffic locality,
emerge. Due to the complexity in the driving forces and service specific user behavior are analyzed at different time
the complexity in the underlying technologies that make up scales, and typical user types are defined. The influence on the
the backbone of the Internet, it is increasingly important to user behavior from different first mile technologies is studied
monitor and analyze the traffic in the networks and to stay up as well as the difference in user behavior between different
to date with trends and paradigm shifts. regions in Europe.
The main goal of the TRAMMS project is to provide data
and analysis for the above mentioned purposes. In short the III. T RAFFIC MEASUREMENTS
project measures and analyzes IP traffic in access networks. A large focus of the project and a particular strength is
The analysis strives to address issues like: the direct access to measurements in live networks. In the
• user behavior commercial networks the measurement equipment is installed
• user characterization near the end-users in order to ensure a high level of detail
• trend analysis for the analysis. Traffic measurements in the following fixed
• bottleneck analysis metro/access and wireless access networks are performed
within the project:
Contact information: Andreas Aurelius, Netlab, Acreo AB, Electrum
236, 164 40 Kista, Sweden, telephone: +46 8 632 78 02, email: an- • Swedish municipal network 1: an open fibre based net-
dreas.aurelius@acreo.se work with approximately 2600 FTTH and 200 DSL
2

customers. The FTTH customers represent many social the other networks. The main conclusion is that the shape of
and ethnic groups, while the DSL customers constitute a the daily traffic patterns depends on the subscriber type of
more homogeneous group of Swedish middle class living the network such as residential, enterprise or academic. There
in single family houses. is also a common daily traffic pattern for the networks that
• Swedish municipal network 2: a FTTH network with 350 have mainly residential users which is the same for both the
IPTV users. Measurements from this network was used Swedish and the Spanish residential networks.
for studying IPTV user behavior.
• Spanish commercial network: a network containing both 1
RedIRIS

fixed and wireless access networks. The wireline part 0.9 CMTS
FTTH
0.8 DSL
consists of a fibre network to the cabinet (FTTC) and the 0.7
Average

last mile consists of Cable Modem Termination System 0.6

(CMTS) and ADSL. The wireless access is a combined 0.5

GPRS and UMTS system. 0.4

Spanish university network RedIRIS: a network that


0.3

0.2

interconnects and allows Internet access to more than 300 0.1

institutions with 2.7 million users. The network is SDH- 0


0 5 10 15 20 25

based with link speeds from 2.5 Gbps up to 10 Gbps.


Hours of the day

Figure 1. Normalized daily downlink traffic pattern for the different networks
A. Measurement tools and normalized average daily downlink traffic pattern.
The measurements in the residential networks have been
performed using PacketLogic (PL) [2], a commercial real-time
hardware/software solution used mainly for traffic surveil- B. Comparison of application usage in different networks and
lance, traffic shaping or as a firewall. Traffic is identified based technologies
on packet content and flow behavior using deep packet in- A comparison of application usage in the different net-
spection and deep flow inspection. PL uses the self-developed works was performed in order to see differences between
Datastream Recognition Definition Language (DRDL) [3] to user groups and countries. This comparison uses data from
identify different application protocols. The PL can identify Swedish municipal network 1 and the Spanish commercial
more than 800 application protocols. networks. In order to avoid differences in the amount of
The measurements in the RedIRIS network have been unrecognized traffic between the networks, only the traffic
performed with Cisco NetFlow [4]. NetFlow is a proprietary from the following application groups has been considered:
network protocol developed by Cisco Systems to run on web browsing, P2P file sharing and multimedia streaming.
their routers and implemented by other vendors as well. This These application groups are seen to have the largest traffic
protocol is used to monitor the traffic that traverses a router and volumes.
to keep statistics of the performance by sampling some of the Concerning uplink traffic, P2P file sharing is responsible for
packets. Cisco defines a flow as a unidirectional sequence of more than 97% of the considered traffic in the fixed networks
packets sharing all the following 7 values, commonly referred regardless of the technology (FTTH, DSL and CMTS). In the
as 7-tuple: Source and Destination IP addresses, IP protocol, case of the downlink traffic, in the fixed networks (FTTH,
Source and Destination ports (when the IP protocol is TCP or DSL and CMTS) the P2P file sharing generates an important
UDP), Ingress interface and IP Type of Service. amount of traffic depending on the technology (from 66% to
88%). Approximately 60% of the rest of the traffic corresponds
IV. T RAFFIC ANALYSIS to web browsing and 40% to multimedia streaming. Regarding
the mobile network (GGSN) the amount of web browsing
The results in this paper are taken from the TRAMMS traffic is five times higher than the multimedia streaming.
public deliverable D3.2 [5], available at [1] on request. Compared to the fixed networks, the P2P file sharing traffic in
the mobile network is lower in uplink (77% of the considered
A. Daily profiles traffic) and much lower in downlink (27% of the considered
An overview of the normalized average daily profiles in the traffic). In downlink the mobile network traffic belongs mainly
networks is shown in Figure 1. to web browsing (61% of the traffic).
The figure shows that the Spanish fixed network and the Downlink (%)
Swedish network, despite using different access technologies Sw network 1 Sp network
(CMTS, DSL, FTTH), have similar daily traffic patterns in the App group FTTH DSL CMTS GGSN
Web browsing 7.06 20.6 12.1 60.5
downlink direction. The same similarity is seen for the uplink P2P file sharing 88.3 66.0 80.8 26.6
traffic as well. However, the RedIRIS academic network shows Multimedia streaming 4.7 13.4 7.0 13.0
a very different daily downlink traffic pattern. The amount of Table I
downlink traffic is less constant in the RedIRIS network than VOLUME SHARE OF APPLICATION GROUPS IN THE DIFFERENT NETWORKS
in the other networks (10% of the maximum traffic at 5 a.m.)
and from 1 p.m. to 9 p.m. it decreases while it increases in
3

Uplink (%)
Sw network 1 Sp network
App group FTTH DSL CMTS GGSN
Web browsing 0.4 2.6 1.9 20.7
P2P file sharing 99.4 96.7 97.0 76.5
Multimedia streaming 0.2 0.6 1.2 2.8
Table II
VOLUME SHARE OF APPLICATION GROUPS IN THE DIFFERENT NETWORKS

C. Content locality (a) Packets

Analyzing the geographical locality of end-to-end flows is


increasingly important in order to understand the trends in
Internet application usage. This knowledge is also important
for e.g. network performance management and optimization.
For operators, traffic exchange has a direct impact on their
business, making this kind of analysis a crucial part of the
business intelligence for network management.
In this paper, network traffic locality measurements in the
RedIRIS network are presented. Traffic sent to and from
six universities within the RedIRIS network was collected
(b) Flows
using NetFlow [4] and analyzed to find the traffic source or
destination IP address. The IP addresses were mapped with
their related countries using MaxMind [6], a public database
giving the geographic localization of IP addresses with an
accuracy of 99.5%. The analysis was done both for incoming
traffic, i.e. traffic from the Internet to RedIRIS entities, and
outgoing traffic, i.e. the traffic from the RedIRIS universities
to the Internet, as well as for the total traffic in both directions.
Results concerning the total traffic is shown in Figure 2. Spain
and the United States have been excluded as they dominate
the graph. The results for incoming and outgoing traffic are
quite similar to the result for the total traffic. More details can (c) Bytes
be found in the TRAMMS public deliverable D3.2 [5].
In this study, the majority of the packets, around 40%, are Figure 2. Percentages of traffic sent and received by each country. Spain
and the US are not included in the figure.
sent and received within Spain. This is not unexpected since all
the studied traffic comes from Spanish users and they naturally
connect mainly to Spanish sites or to sites with content in
Spanish. The United States comes in second place with 20%
of the sent and received packets. As the United States is a We have also studied two more metrics, namely flows and
leading nation in research and IT, this is of course expected. bytes counts. The results are very similar to those presented
Moreover, the majority of the most visited web pages are for the packet count, see Figure 2(b) and 2(c). There is,
hosted in the United States. In third place we find some of the however, one difference that can be seen for the incoming
most important countries of the European Community such traffic when counting the packets and the bytes. Although
as United Kingdom, Germany, France, to mention a few, see more packets were received from Spain than from the United
Figure 2(a). They account for between 2.5 and 6% of the States, the number of bytes received from the United States
total amount of packets per country. In fourth position we were greater than that received from Spain which is shown
encounter Latin American countries such as Argentina, Chile, in Figure 3. This is a common phenomenon in the Internet,
etc., accounting for between 0.5 and 1.5% of the total share. known as "the elephants and mice phenomenon" [7], where a
In these countries the main spoken language is Spanish so it small percentage of the flows (about 10%) account for a high
is common to get redirected to a Web page in Latin America percentage of the traffic (80%). The most likely explanation
when browsing. Also, there are a lot of people from those is that the type of traffic to and from the US differs from
countries involved in research in Spain thanks to travel grants that to and from Spain. Since the measurements do not reveal
or other arrangements. Finally, we find that there is traffic the applications used, we can not analyze the application
going to or from nearly all countries although their percentages mix in these two cases, but it is likely that in the former
of the total are very small (less than 10−3 % of the traffic). case (US) the traffic is dominated by streaming traffic and
However, their combined traffic accounts for nearly 5% of the downloads, while in the later case there is a larger share of
total amount. instant communication which gives rise to smaller packets.
4

IDs and time stamps when the videos were viewed. The latter
is determined by the packet arrival time. The packet dump is
anonymized and processed to obtain a text file containing only
the content IDs and times. This text file is later loaded into a
database system for further analysis.
Figure 4 shows the number of viewed videos per hour
throughout the 16 day long measurement, which is an estima-
tion of the user activity (and the traffic intensity as well). The
user activity seems more intense on weekdays and lower on
the weekend (confirmed by Figure 5 as well). In Figure 5 the
(a) Packets
final day of the measurement is cut so that it contains samples
of exactly two weeks; this way no distortion is introduced in
the sampling. The same technique is applied to calculate the
daily distribution (see Figure 6 below) of the access times.

Number of Youtube videos viewed (per hour)


70

60

50

40

30

20

10

0
0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h 0h 12h
Time (from 2008. november 17., Mon, 16:31:26 to 2008. november 25., Tue, 18:11:02)

(b) Bytes
Figure 4. Number of viewed YouTube videos per hour (Swedish network
measurement (DSL) from 2008-11-17 to 2008-12-02)
Figure 3. Traffic in the incoming direction. Spain and the US are excluded
from the figure.

D. YouTube content popularity analysis


One important network design parameter is the usage of
caching of popular content. This may help eliminate bottle-
necks and reduce traffic in links with limited capacity. There is
also an important business aspect here in that traffic exchange
may be costly for operators, and it may be in the operator’s
interest to keep the volume of exchanged traffic down.
The aim of the YouTube content popularity analysis is
to investigate popularity distributions for a widespread and
popular video sharing network. The purpose of the analysis Figure 5. Weekly distribution of viewed YouTube videos per day (Swedish
network measurement (DSL) from 2008-11-17 16:31 to 2008-12-01 16:34)
is to be able to draw conclusions on caching gain, built on
real user statistics. The results so far are preliminary, and the Figure 6 shows the daily distribution of the videos; ap-
final conclusions on caching gain can not be drawn from these parently, user activity is higher in the afternoon and evening
initial measurements, but they are very promising and they hours. The busy hours start at around 4 PM, which is the
provide a detailed picture of user activity, user behavior, traffic typical time when people return home from work. The peak
intensity and content popularity. Naturally, the focus is not on hours can be observed around 6-7 PM which is consistent with
the content of the video itself but on determining properties the daily traffic pattern as shown above in Figure 1.
such as intensity, popularity distribution, content lifetime, etc, Finally, Figure 7 shows the popularity distribution of the
of a collection of videos. Preliminary results from this study YouTube videos. The videos were ranked according to the
are shown below. number of times they had been watched. Figure 7(a) shows
To achieve the measurements, a firewall rule was set up the popularity distribution on a linear-linear scale, while the
using the PacketLogic measurement equipment filtering out Figure 7(b) shows the same on a linear-logarithmic scale.
all HTTP GET requests containing the following query string: They suggest an exponential-like decrease in popularity. Con-
http://www.youtube.com/watch?v=. The equation sign is fol- sequently, the popularity of the content is definitely not even; a
lowed by the 11 character long YouTube content ID, which limited number of videos are extremely popular, while others
is the subject of the investigation. The content ID is used to are watched rarely.
get statistics of the video usage. For integrity reasons, video
content and individual usage is not analyzed. The PacketLogic V. C ONCLUSIONS AND O UTLOOK
logged and dumped all IP packets containing the given pattern. In conclusion, results from the TRAMMS project have been
By processing the trace, it is possible to extract the content presented. It is seen that there is a common daily traffic pattern
5

90
from Spain than from the United States although the number
Number of YouTube videos viewed (per 15 mins)
80

70
of bytes received from the United States were greater than
60 that received from Spain. Concerning the youtube content
50 analysis, it is found that the peak hour is consistent with the
40 general daily traffic pattern. Also, the video content popularity
30 is uneven with a few video clips that are very popular, while
20 the rest are seldom watched.
10
The TRAMMS project is ongoing and measurement data are
0
0 AM 6 AM 12 PM 18 PM 24 PM collected continuously. The analysis of these measurements
Time of the day
will, of course, go on and be deepened to gain more specific
knowledge concerning content and user behavior. The content
Figure 6. Daily distribution of viewed YouTube videos per hour (Swedish locality studies and youtube analysis previously described are
network measurement (DSL) from 2008-11-17 to 2008-12-02)
preliminary and will be further explored within the project. In
the future the aim is to include more of end user behavior and
6 user quality of experience.

5
VI. ACKNOWLEDGEMENTS
The work has partly been financed by the CELTIC project
Number of views

4
TRAMMS, with the Swedish Governmental Agency for Inno-
3 vation Systems (VINNOVA) supporting the Swedish contribu-
tion. Maria Kihl is financed in the VINNMER programme at
2
VINNOVA. UAM would like to acknowledge the support of
1
the DIOR project that gave us the RedIRIS measurements.

0 R EFERENCES
0 500 1000 1500 2000 2500
YouTube videos (ranked) [1] “Traffic measurements and models in multi-service networks.”
(a) linear-linear scale http://projects.celtic-initiative.org/tramms/.
[2] “Procera networks.” http://www.proceranetworks.com.
[3] “Packetlogic drdl signatures and properties 10340 (beta),” tech. rep., 2007.
6
[4] “Cisco netflow.” http://www.cisco.com/go/netflow.
[5] “Deliverable d3.2 - traffic models,” tech. rep., 2009.
5 [6] http://www.maxmind.com/app/geolitecountry.
[7] K.Papagiannaki, N. Taft, S. Bhattacharyya, P. Thiran, K. Salamatian, and
Number of views

4 C. Diot, “A pragmatic definition of elephants in internet backbone traffic,”


2002.
3

0
0 1 2 3 4
10 10 10 10 10
YouTube videos (ranked)

(b) logarithmic-linear scale

Figure 7. Ranking of YouTube videos according to popularity, Swedish


network measurement (DSL) from 2008-11-17 to 2008-12-02.

for both Swedish and Spanish residential users. The traffic


pattern is, however, found to vary between residential and
non-residential users. P2P file sharing applications dominate
the traffic volumes in the fixed networks followed by web
browsing and multimedia streaming. In the mobile network the
volume of the file sharing traffic is smaller than in the fixed
networks especially in the downlink direction mainly in favor
of web browsing. The content locality analysis performed for
the Spanish RedIRIS academic network shows that most of
the traffic is sent to and from Spain followed by the USA and
Europe, the exception being that more packets were received

You might also like