Security of SIP-based Voice Over IP in Enterprise Networks

INFOTECH
Universität Stuttgart
Pfaffenwaldring 47
70569 Stuttgart Germany
2005 - 2006
Security of SIP-based Voice over IP

in
enterprise networks
By
Christina Chalastanis
Master Thesis report submitted to the University of Stuttgart in

Partial fulfilment for the master degree of
Information Technology
November 2006
Under the supervision of:
Prof. Dr. Paul J. Kühn

Dipl.-Ing. Andreas Gutscher
Dipl.-Ing. Martin Neubauer
Institut für Kommunikationsnetzte und
Rechnersysteme, Universität Stuttgart
Dr. Stephan Rupp

Dipl.-Ing. Franz-Josef Banet
Alcatel SEL AG, Stuttgart
ii
Abstract
Voice over IP (VoIP) has known the last few years an irrefutable success and an ever growing
worldwide deployment and is thought to progressively replace in the near future traditional
circuit-switched communications. However, this complete replacement is preconditioned by a
factor which is often considered as secondary although it is fundamental: security.
Enterprises, independently of their sizes, have been among the first to adopt and deploy
VoIP systems, in particular to save costs such as those of international calls between different
sites as well as those of separate maintenance of networks. Despite the several advantages of
VoIP for enterprises, companies seem to be hesitating to invest in VoIP technology because of
their lack of confidence in the feasibility of secure VoIP deployments.
This master thesis aims at showing how it is possible for enterprises to migrate through
different stages to a full-IP VoIP system in a secure way. These different stages have been
designed and modelled even though it is thought that most enterprises will take small cautious
steps and will first adopt intermediary hybrid VoIP solutions combining old legacy networks and
IP networks.
Enterprises must become aware that, before any deployment, a tailored risk assessment is a
sine qua non for the security of their future VoIP networks. This phase might take long but
should not be skipped or dashed off and should include several sub-steps such as threat analysis,
risk analysis and mitigation. These have been thoroughly studied in this report based on models
of deployments in small and large enterprises. The outcome of the threat and risk analyses shows
the different types of threats and attacks against VoIP networks, the goals they fulfil, the
motivations to perform them and the risks they pose in enterprises.
Technical solutions for securing VoIP deployments are numerous and therefore, this is
fundamental that enterprises select them appropriately and implement them according to security
recommendations. The last two years, several sets of recommendations have been published by
major security institutions, and VoIP designers in enterprises would gain at analysing and
comparing them. This master thesis presents a first comparison of four significant sets of
recommendations outlining which are the incontrovertible recommendations (common points)
which must be implemented, and which are the disputable ones (points of divergence) which
must be closely considered.
Since VoIP security may seem difficult and expensive to implement, in particular for small
enterprises with limited resources, hosted VoIP solutions, i.e. solutions where a VoIP Service
Provider offers to customer enterprises to deploy and manage VoIP and host call processing
entities, will certainly gain ground very soon in the VoIP market for small enterprises. Therefore,
enterprises must make sure that offered solutions are secure enough and securely support mobile
workers. Specific security solutions must be found in order to allow mobile workers to have
access to the VoIP system of their enterprises and to be reached as if they were in their offices; a
couple of suggestions have been presented in this document.
iii
iv
Acknowledgements
This report is the result of my master thesis work performed in Alcatel SEL AG and in
the Institute of Communication Networks and Computer Engineering (IKR) at the University of
Stuttgart.
First and foremost I would like to thank my supervisors in Alcatel SEL AG, Dr Stephan
Rupp and Franz-Josef Banet for their help, support and encouragements. Discussions during our
meetings have always been very enriching and I am very obliged for their advice and assistance
whenever I needed them. I would also thank them for the opportunity they have given to Pawel
Lawecki and me to attend the 3rd Annual Workshop in Berlin in June 2006 and to live this
unique experience.
I would like to express my gratitude to Wolfgang Lautenschlager and Matthias Duspiva

for having patiently taken the time to answer all my questions and for having so readily and
kindly helped with their advice, knowledge and experience.
I cannot omit to thank Harald Orlamunder for his contribution by providing me with
useful documents and information.
Thanks to Mr Rheinbay from Alcatel for his contribution in the initial phase of my work.
My thanks also go to Andreas Gutscher and Martin Neubauer for supervising me at the
University of Stuttgart and for advising me judiciously.
Special acknowledgements to Professor Paul J. Kuhn for giving me the opportunity to do

my master thesis at the IKR Institute and for examining my work.
Dziękuję bardzo, Paweł. Thanks to Pawel Lawecki, friend and companion of master
thesis, working on a “twin” subject about VoIP security in public networks. His support and the
discussions we had have always been precious to me.
Decir gracias no sería suficiente para aquel a quién quiero, Angel.
Grazie a Maura e Barbara per essere sempre state vicino a me come due “mamme” e due
bellissime amiche. Grazie a Kathrin di essere anche lei una buonissima amica sempre con dolci
parole d’ incoraggiamento.
Merci à Lina et Vanessa pour leur soutien moral et leur amitié infaillible.
Τέλος, με τί λόγια να ευχαριστήσω την οικογενειά μου και τα αδέλφια μου για την αγάπη
τους και την υποστήριξή τους σε όλες τις στιγμές της ζωής μου;
Αυτή η αναφορά τους είναι αφιερωμένη.
v
vi
Table of contents
1. INTRODUCTION AND MOTIVATION ...................................................................................... 1

2. INTRODUCTION TO ENTERPRISE NETWORKS AND VOICE OVER IP .................................... 4
2.1. Enterprise networks.................................................................................................. 4
2.1.1. Definition.................................................................................................................................... 4
2.1.2. Interconnecting various sites...................................................................................................... 5
2.1.3. Securing enterprise networks ..................................................................................................... 6
2.1.4. Models of network infrastructure ............................................................................................... 9
2.2. Introduction of Voice over IP (VoIP) in enterprises .............................................. 12
2.2.1. Definition of Enterprise Voice over IP ..................................................................................... 12
2.2.2. Types of VoIP services.............................................................................................................. 13
2.2.3. Importance of VoIP for enterprise networks ............................................................................ 14
2.2.4. Challenges of a VoIP migration in enterprises ........................................................................ 14
3. DESIGNING AND MODELLING VOIP DEPLOYMENTS IN ENTERPRISE NETWORKS ............ 16
3.1. VoIP deployment scenarios.................................................................................... 16
3.1.1. Deployment in a small single-site enterprise............................................................................ 16
3.1.2. Deployment in a large multi-site enterprise network ............................................................... 20
3.1.3. Hosted IP-PBX solution ........................................................................................................... 25
3.2. VoIP infrastructure using the SIP signalling protocol ........................................... 28
3.2.1. Components of the VoIP infrastructure.................................................................................... 28
3.2.2. SIP architecture and its components ........................................................................................ 32
3.2.3. VoIP architecture with IP-PBX using SIP................................................................................ 36
3.2.4. VoIP architecture using the Asterisk server ............................................................................. 39
3.3. User Access infrastructure: IP phones ................................................................... 42
3.4. Management system infrastructure ........................................................................ 44
3.5. Introduction to security threats to VoIP ................................................................. 44
4. TECHNOLOGICAL BACKGROUND OF VOIP USING THE SIP PROTOCOL ........................... 47
4.1. Choice of the SIP protocol ..................................................................................... 47
4.2. Media transport of voice ........................................................................................ 48
4.2.1. Real-Time Transport Protocol (RTP)....................................................................................... 48
4.2.2. Real-Time Control Protocol (RTCP)........................................................................................ 50
4.2.3. Real-Time Streaming Protocol (RTSP) .................................................................................... 51
4.2.4. Codecs ...................................................................................................................................... 52
4.3. SIP Signalling......................................................................................................... 53
4.3.1. Introducing the Session Initiation protocol (SIP)..................................................................... 53
4.3.2. SIP architectural components .................................................................................................. 54
vii
4.3.3. SIP messages ............................................................................................................................ 54
4.3.4. Typical SIP Dialogues.............................................................................................................. 56
4.3.5. SIP address resolution and routing .......................................................................................... 58
4.3.6. Presence service ....................................................................................................................... 58
4.3.7. SIP-based mobility ................................................................................................................... 59
4.3.8. Security issues of SIP................................................................................................................ 62
4.4. Supported protocols and languages........................................................................ 62
4.4.1. Session Description Protocol (SDP) ........................................................................................ 62
4.4.2. Session Announcement Protocol (SAP) .................................................................................... 63
4.4.3. Call Processing Language (CPL)............................................................................................. 63
4.5. VoIP, quality of service and security ..................................................................... 63
5. VOIP THREAT AND RISK ANALYSIS IN ENTERPRISE NETWORKS....................................... 64
5.1. Introduction ............................................................................................................ 64
5.2. Assessing security requirements of VoIP enterprise networks .............................. 64
5.2.1. Definition of security requirements .......................................................................................... 64
5.2.2. Assessment of security requirements ........................................................................................ 66
5.3. Studying the threat and risk analysis processes for enterprise VoIP systems........ 66
5.3.1. Processes in risk assessment .................................................................................................... 66
5.3.2. VoIP system characterization................................................................................................... 67
5.3.3. Types of threat classifications/taxonomies ............................................................................... 68
5.3.4. VOIPSA Threat taxonomy ........................................................................................................ 70
5.3.5. Choice of a threat analysis model ............................................................................................ 71
5.3.6. Risk assessment results............................................................................................................. 72
5.4. Performing a risk analysis in two models of VoIP networks................................. 72
5.4.1. Characterization of the enterprise VoIP systems to analyse .................................................... 72
5.4.2. Threat analysis of the VoIP systems in two models .................................................................. 77
5.4.3. Risk analysis and results .......................................................................................................... 79
5.5. Analysing a selection of major attacks against enterprise VoIP networks ............ 82
5.5.1. Some VoIP-specific Denial of service attacks .......................................................................... 82
5.5.2. Call hijacking with 3xx code responses.................................................................................... 85
5.6. Discussion of results............................................................................................... 86
6. GENERAL TECHNICAL SOLUTIONS FOR VOIP SECURITY.................................................. 87
6.1. Encryption in VoIP................................................................................................. 87
6.1.1. Introduction .............................................................................................................................. 87
6.1.2. Encryption of media stream...................................................................................................... 87
6.1.3. Encryption of signalling stream ............................................................................................... 92
6.1.4. Summary................................................................................................................................... 93
6.2. SIP authentication mechanisms.............................................................................. 94
viii
6.3. Solutions to SIP NAT and firewall traversal issues ............................................... 95
6.3.1. NAT and Firewall traversal issues ........................................................................................... 95
6.3.2. Solutions to the Firewall/NAT traversal issues ........................................................................ 97
6.4. SPIT prevention...................................................................................................... 99
6.5. VoIP VPNs: secure interconnection of distant VoIP systems.............................. 102
6.6. VoWLAN security ............................................................................................... 105
6.7. Limits on the technical and commercial efforts ................................................... 106
7. SECURING VOIP SYSTEMS IN ENTERPRISE NETWORKS .................................................. 108
7.1. Studying the risk mitigation process .................................................................... 108
7.2. Comparing major VoIP security recommendation reports .................................. 109
7.2.1. Interest of a comparison......................................................................................................... 109
7.2.2. Several approaches of recommendations ............................................................................... 110
7.2.3. Common points of recommendations ..................................................................................... 113
7.2.4. Divergence in recommendations ............................................................................................ 115
7.2.5. Suggestions and comments ..................................................................................................... 116
7.3. Applying security concepts in two models of enterprises.................................... 117
7.3.1. Recommended secure VoIP architecture in a small enterprise .............................................. 117
7.3.2. Recommended secure VoIP architecture for a large enterprise............................................. 124
7.4. Discussion of results............................................................................................. 127
8. SECURITY CONCEPTS FOR SIP MOBILITY IN HOSTED VOIP DEPLOYMENTS ................. 128
8.1. Setting the problem of hosted VoIP and mobility support................................... 128
8.2. Designing and modelling two scenarios of secure hosted IP-PBX solutions with
mobility support ..................................................................................................................... 130
8.2.1. Introduction ............................................................................................................................ 130
8.2.2. Scenario 1: Mobile workers connected via VPNs to the VoIP Service Provider network...... 132
8.2.3. Scenario 2: Mobile workers connected via VPNs to the enterprise network.......................... 135
8.3. Comparing scenarios from the security point of view ......................................... 138
8.4. Discussion of results............................................................................................. 139
9. CONCLUSION .................................................................................................................... 140
ANNEX 1 – VOIP SECURITY THREATS AND VULNERABILITIES ............................................. 144
ANNEX 2 – COMPARISON OF VOIP SECURITY RECOMMENDATIONS .................................... 159
REFERENCES .......................................................................................................................... 169
ABBREVIATIONS ..................................................................................................................... 174
ix
List of figures
Figure 2.1 – Enterprise networks and interconnection of sites _______________________________________________ 4

Figure 2.2 – Small single-site private enterprise network__________________________________________________ 11
Figure 2.3 – Large multi-site private enterprise network__________________________________________________ 11
Figure 2.4 – Four types of VoIP services ___________________________________________________________ 13
Figure 3.1 – IP-enabled VoIP deployment in a small single-site enterprise network ________________________________ 17
Figure 3.2 – Full-IP VoIP deployment in a small single-site enterprise network __________________________________ 18
Figure 3.3 – Analogue telephone adaptors___________________________________________________________ 19
Figure 3.4 – VoIP deployment with legacy PBX system in a large multi-site enterprise network ________________________ 21
Figure 3.5 – Hybrid VoIP deployment in a large multi-site enterprise_________________________________________ 22
Figure 3.6 – Call flow between an IP phone and a TDM phone in a hybrid VoIP architecture ________________________ 23
Figure 3.7 – Full-IP VoIP deployment in a large multi-site enterprise network___________________________________ 24
Figure 3.8 – Deployment of a hosted IP-PBX solution in a small single-site enterprise network ________________________ 27
Figure 3.9 – IP-PBX switching calls ______________________________________________________________ 29
Figure 3.10 – SIP architecture – SIP session through a Proxy Server_________________________________________ 32
Figure 3.11 – SIP architecture & SIP session establishment through a SIP redirect server ___________________________ 34
Figure 3.12 – SIP architecture & SIP session establishment through SIP Proxy servers and Redirect servers _______________ 35
Figure 3.13 – VoIP architecture with IP-PBX using SIP ________________________________________________ 37
Figure 3.14 – SIP call flow with a stateless Proxy Server_________________________________________________ 37
Figure 3.15 – SIP call flow with a Call Stateful Proxy Server _____________________________________________ 37
Figure 3.16 – IP-PBX with B2BUA architecture _____________________________________________________ 38
Figure 3.17 – VoIP deployment using Asterisk servers in a small single-site enterprise______________________________ 41
Figure 3.18 – VoIP deployment using Asterisk servers in a large multi-site enterprise ______________________________ 41
Figure 3.19 – Initialization process of Cisco IP phones (source: [23]) _________________________________________ 42
Figure 3.20 – Process flow of initialization process of IP phones_____________________________________________ 43
Figure 3.21 – ARP poisoning attack in a small enterprise network __________________________________________ 45
Figure 3.22 – Cache tables of ARP Poisoning victims___________________________________________________ 46
Figure 4.1 – VoIP protocols stack _______________________________________________________________ 47
Figure 4.2 – Protocols assuring the transport of voice over IP_______________________________________________ 48
Figure 4.3 – Encapsulation of real-time data into RTP packets ____________________________________________ 49
Figure 4.4 – Mixing of several contributing sources in RTP _______________________________________________ 50
Figure 4.5 – RTCP packets ___________________________________________________________________ 51
Figure 4.6 – Example of recording voice mail ________________________________________________________ 52
Figure 4.7 – SIP in the VoIP architecture __________________________________________________________ 53
Figure 4.8 – SIP Response messages ______________________________________________________________ 55
Figure 4.9 – INVITE and 200 OK messages _______________________________________________________ 55
Figure 4.10 – Basic SIP session _________________________________________________________________ 56
Figure 4.11 – Registration of a user agent with a SIP Registrar ____________________________________________ 56
Figure 4.12 – Call forking: parallel (left) and sequential (right) ____________________________________________ 57
Figure 4.13 – SIP mobility using a SIP redirect server __________________________________________________ 60
Figure 4.14 – SIP mobility using a SIP forking proxy __________________________________________________ 61
Figure 4.15 – SIP mobility using a Presence server _____________________________________________________ 61
Figure 4.16 – Example of CPL_________________________________________________________________ 63
Figure 5.1 – Steps in risk assessment (according to NIST guide) ____________________________________________ 67
Figure 5.2 – VOIPSA Threat taxonomy __________________________________________________________ 71
Figure 5.3 – Areas to secure in the VoIP system of a small single-site enterprise __________________________________ 74
Figure 5.4 – Areas to secure in the VoIP system of a large multi-site enterprise __________________________________ 76
Figure 5.5 – Layered overview of DoS attacks against VoIP systems (source: [80])________________________________ 82
Figure 5.6 – Registration hijacking attack in an enterprise environment _______________________________________ 83
Figure 5.7 – SIP CANCEL DoS (left) and SIP BYE DoS (right) ________________________________________ 84
Figure 5.8 – Call hijacking by using a 301 Moved permanently message _______________________________________ 85
Figure 5.9 – Theft of credentials by performing call hijacking ______________________________________________ 85
Figure 6.1 – Secure RTP packet (left) and secure RTCP packet (right) _______________________________________ 88
Figure 6.2 – Encryption using AES (left) and authentication using HMAC-SHA1 (right) (from [88]) _________________ 89
Figure 6.3 – Establishment of an SRTP session using ZRTP _____________________________________________ 90
x
Figure 6.4 – INVITE message with master key information (source: [90])_____________________________________ 91
Figure 6.5 – Registration with Digest Authentication ___________________________________________________ 94
Figure 6.6 – EAP authentication and AAA ________________________________________________________ 95
Figure 6.7 – Private addresses embedded in INVITE message _____________________________________________ 96
Figure 6.8 – VoIP spamming against an enterprise’s personnel ____________________________________________ 100
Figure 6.9 – CPE model of VoIP VPN__________________________________________________________ 103
Figure 6.10 – Hosted Model of VoIP VPN with MPLS_______________________________________________ 104
Figure 6.11 – 802.1x authentication for VoWLAN__________________________________________________ 106
Figure 7.1 – Suggestion 1: Creation of a data and voice VLAN for a secure VoIP deployment in a small enterprise _________ 119
Figure 7.2 – Suggestion 2: Installation of VoIP servers in the DMZ in a separate VLAN as this of data servers for a secure VoIP
deployment in a small enterprise_________________________________________________________________ 120
Figure 7.3 – Suggestion 3: Creation of multiple DMZs for a secure VoIP deployment in a small enterprise _______________ 121
Figure 7.4 – Synchronization with personal address book (source: [122]) _____________________________________ 122
Figure 7.5 – Suggestion for a secure VoIP deployment in a large multi-site enterprise ______________________________ 126
Figure 8.1 – Small enterprise with hosted IP-PBX and supporting external mobile workers _________________________ 128
Figure 8.2 – Suggestion for a hosted VoIP architecture adopted by a small enterprise with mobile workers ________________ 130
Figure 8.3 – Scenario 1: Mobile workers connected to the VoIP Service Provider via a VPN ________________________ 133
Figure 8.4 – Establishment of calls in scenario 1 _____________________________________________________ 134
Figure 8.5 – Scenario 2: Mobile workers connected to the enterprise network via VPNs____________________________ 135
Figure 8.6 – Establishment of calls between external users and a mobile worker in scenario 2 ________________________ 136
Figure 8.7 – Establishment of a call between two mobile workers in scenario 2 __________________________________ 137
List of tables
Table 2.1 – General network security requirements ............................................................................................................. 6

Table 2.2 – Sources of threats, motivations and threat actions (source: [1]) .............................................................................. 7
Table 3.1 – Comparison of IP Centrex and hosted IP-PBX features (according to [15]) ......................................................... 26
Table 3.2 – Basic and advanced features of IP-PBXs........................................................................................................ 30
Table 4.1 – Codecs .................................................................................................................................................... 52
Table 4.2 – SIP Request methods defined by RFC 3261................................................................................................... 54
Table 5.1 – Security requirements of VoIP systems in enterprises......................................................................................... 65
Table 5.2 – Different approaches in the classification of VoIP threats and vulnerabilities.......................................................... 69
Table 5.3 – Risk-level matrix (source: NIST)................................................................................................................. 72
Table 5.4 – Attack tree for “Interception and manipulation” in an enterprise VoIP network (extract from the threat analysis I have
performed in Annex 1)................................................................................................................................................ 78
Table 5.5 – Threat likelihood matrix ............................................................................................................................ 79
Table 7.1 – Recommendations common to the majority of selected recommendation sets............................................................ 114
Table 7.2 – Points of divergence in the selected recommendation sets .................................................................................... 116
Table 7.3 – Examples of traffic flows between data VLAN and voice VLAN .................................................................. 118
Table 7.4 – Regulation of traffic flow between VLANs and DMZs.................................................................................. 122
Table 7.5 – Suggestion of a few important recommendations for the VoIP system of a small enterprise ....................................... 123
Table 7.6 – Suggestion of a few important recommendations for the VoIP system of a large enterprise ........................................ 125
Table 8.1 – Questions destined to VoIP Service Providers about the security of their hosted IP-PBX services (according to [82])..... 129
Table 8.2 – Suggestion of questions destined to VoIP Service Providers about the security of their hosted IP-PBX services combined with
support of mobile workers........................................................................................................................................... 129
Annex Table 1 – Attack trees analyzing the threats to an enterprise VoIP system at the network level ..................................... 145
Annex Table 2 – Attack trees analyzing the threats to an enterprise VoIP system at the application level.................................. 154
Annex Table 3 – Attack trees presenting extra threats to VoIP systems due to wireless VoIP ................................................ 155
Annex Table 4– Attack trees presenting threats to VoIP systems due to mobility support....................................................... 156
Annex Table 5 – Comparison between the recommendations of the DISA, BSI, NIST and NSA reports ................................ 160
xi
xii
1. Introduction and motivation
“Perfect is the enemy of good”
“Security is not sexy”
Lecturers in the 3rd Annual Workshop
on VoIP Security, Berlin 2006
Voice over IP (VoIP) communications have known the last few years an irrefutable success
and an ever growing worldwide deployment and are thought to progressively replace in the near
future traditional circuit-switched communications. However, the complete replacement of
PSTN/ISDN technology by VoIP technology for voice communications is preconditioned by a
factor which is often considered as secondary although it is fundamental: security.
Enterprises, independently of their sizes, have been among the first to try to adopt and
deploy VoIP systems, in particular to save costs such as costs of international calls between
different sites as well as costs of separate maintenance of their legacy telephone networks and of
their IP networks. Despite the several advantages of VoIP for enterprises, companies seem
hesitating to invest in VoIP technology because of their lack of confidence in the feasibility of
secure VoIP deployments.
However, security is not an unreachable utopian goal. Even though absolute security cannot
be achieved, not only in VoIP systems but in general, enterprises must gain confidence and
understand that an appropriate and adequate level of security can be found and applied to protect
their networks and indirectly their business missions. To achieve that, a tailored analysis of
threats and risks is a prerequisite for the VoIP deployment in enterprises. Once a thorough and
long over time analysis has been completed, mitigation measures must be thought about and
implemented. Enterprises can find some support in this mitigation process by a few valuable
recommendations published by international security authorities presenting the best practices,
according to them, for secure VoIP deployment.
VoIP security has gained in the past few years special attention and is currently one of the
trendiest subjects in information technology. Indeed, in a future world where all enterprises will
have adopted VoIP communications not only for inter-sites calls but for all of them, internal or
external, VoIP systems will become a target of predilection of attackers and therefore, it is crucial
that main security issues are solved before that VoIP is adopted by all enterprises. This explains
that today, still at an early stage of VoIP global expansion, VoIP security becomes the major
concern of industries, authorities, universities, operators and service providers. Significant
evidence of this concern is:
• The foundation in early 2005 of the Voice over IP Security Alliance (VOIPSA) to
bring together all the different actors involved in VoIP and “promote the current state
of VoIP security research, VoIP security education and awareness, and free VoIP testing
methodologies and tools” (www.voipsa.org)
• The recent plethora of publications of recommendations and best practices by
authorities like the National Institute for Standards and Technology (NIST) or the
National Security Agency (NSA) or by VoIP security experts. These documents have
been mostly published in 2005 and 2006.
• The recent creation of security blogs on VoIP security like the Voice of VOIPSA
blog, Mark Collier’s blog or the VoIP Security Podcast
• The inclusion for the first time of VoIP servers and VoIP phones in the SANS
Top 20 Internet Security Attack Target List [22] in mid-November, 2006
• The massive publication in 2005 and 2006 of white papers by researchers,
vendors, manufacturers, providers etc. about the general subject of VoIP security
1
or about solutions to specific threats like the currently hyped Spam over IP
Telephony (SPIT) attack.
Besides, a profusion of software tools performing VoIP attacks have recently seen the light
(Cain & Abel, SiVus, tools from the www.hackingvoip.com website…) and it is thought that
many others will appear and maybe freely released in the close future; this profusion is an
accelerating factor for the research of VoIP security solutions.
At the same time, aware of the particularities of voice over IP networks and of the necessity
of a specific security approach for VoIP, different from that for data networks, enterprises, in
particular large ones, are looking for VoIP security experts to manage the security of their
networks, giving that way birth to new roles like VoIP administrators.
All this shows that VoIP security is still a new subject in full bloom and at the centre of
current research in telecommunications.
The present master thesis report handles the subject of security of SIP-based Voice over IP in
enterprise networks, which means that the study of security has been restricted, for reasons
specified later in Chapter 4, to enterprise VoIP networks using the Session Initiation Protocol
(SIP) as signalling protocol. It is important to determine which VoIP signalling protocol is used
because SIP-specific vulnerabilities and threats will have to be analysed and taken into account.
The objectives of the master thesis are to:
• Define enterprise Voice over IP and its characteristics
• Define, design and model the different migration stages towards a full-IP VoIP
network in scenarios of enterprise networks
• Present the VoIP and SIP architectures and how to combine them
• Present VoIP protocols
• Identify, define and assess the security requirements in enterprise VoIP systems
• Present the steps of risk management for VoIP security
• Identify and compare the different classifications of threats to VoIP systems
• Perform a threat and risk analysis in models of enterprise VoIP networks
• Exhaustively list all VoIP security threats
• Present a few major attacks against enterprise VoIP systems
• Present the main technical solutions for VoIP security
• Explain which are the current prevention methods against SPIT attacks
• Present and compare a selection of sets of security recommendations
• Apply security concepts in models of enterprise VoIP networks
• Study a particular VoIP security issue which has been so far little studied
To reach these objectives, the present work has been divided into seven chapters:
In Chapter 2, enterprise networks have been presented and modelled and enterprise Voice
over IP has been defined.
Chapter 3 includes the design and the modelling of VoIP deployments in the two models of
enterprises presented in the previous chapter. The VoIP user access infrastructure has also been
described and an introductory example of an attack against a VoIP network has been given.
Chapter 4 corresponds to the technological background of VoIP including the justification of
the choice of SIP protocol and the description of VoIP protocols.
In Chapter 5, security requirements specific to VoIP have been identified, defined and
illustrated. Then, a study of the different classifications of VoIP threats has been carried out and
the risk assessment process has been presented. Afterwards, a threat and risk analysis has been
performed in the same two models of enterprises after having chosen particular VoIP
2
deployments to study, a particular threat analysis model and a specific threat taxonomy used as
frame for the analyses. A few major application-layer threats to VoIP have also been detailed.
In Chapter 6, technical solutions for VoIP security like encryption, authentication
mechanisms, SPIT prevention systems, VoIP Virtual Private Networks (VPNs)… have been
exposed.
Chapter 7 presents briefly the mitigation process and then compares four sets of
recommendations published by famous international organisations. The results of this
comparison for enterprises have been clearly outlined. Finally, security concepts have been
applied to the two models of enterprises studied in the previous chapters.
The last chapter, Chapter 8, handles a completely new facet of VoIP security: since VoIP
hosted solutions are believed to become in the future the most popular VoIP solutions adopted
mostly by small or medium size enterprises, security concepts for the support of mobility of
workers while combined with VoIP hosted deployments have been found and explained.
3
2. Introduction to enterprise networks and Voice over IP
2.1. Enterprise networks
2.1.1. Definition
Nowadays most medium and large enterprises are seldom located in one single
geographical site but are often constituted by branches located either in the same geographical
area or in different countries all over the world, following that way the trends of an ever
globalizing economy. These distant corporate sites must have telecommunication networks that
cooperate together, that is to say, that, at least, share and exchange corporate data or share
resources. For that, companies have to deploy private enterprise networks.
An enterprise (or corporate) network is a private network interconnecting the Local Area
Networks (LANs) of the various distributed branches. It is the most common example of an
intranet. Intranets are private computer networks belonging to and managed by entities like
companies, organisations or universities. They are usually connected to the Internet but they are
insulated and protected from it. Intranets are networks supporting the TCP/IP protocol suite, on
top of which many application protocols can lie offering that way a multitude of services to the
employees of the enterprise. For each of these client-server protocols, application servers are
needed and constitute fundamental components of the enterprise network infrastructure.
Intranets make it possible to share and exchange all kind of resources: employees can work in
groups, share ideas in online forums, communicate through audio or video teleconferencing, or
use sophisticated corporate directories, Concurrent Versions System (CVS) repositories, and
common devices like printers. The term of intranet refers sometimes to the internal website of
the company but its main definition in telecommunications is this of a private network.
Figure 2.1 – Enterprise networks and interconnection of sites

Very often, enterprises want to establish tight business relationships with their partners,
suppliers or customers. In that case, they have the possibility to provide them with an extension
4
of their intranet services. This extension of the intranet is called extranet. An extranet allows to
securely share part of a business’ information and its services, to exchange large volumes of data,
to collaborate on joint development efforts etc.
Finally, private enterprise networks interconnect as illustrated in Figure 2.1:
the different sites of the enterprise network
the enterprise network and its customers’ and business partners’ networks
the enterprise network and its remote or mobile workers premises
These interconnections can be achieved by adopting different technical solutions. The most
frequent are leased lines (LL), dial-up remote access connections and Virtual Private Networks
(VPNs).
2.1.2. Interconnecting various sites

The choice of the interconnection type depends on the degree of security that is required
but also on the bandwidth needed and on the amount of money to invest.
2.1.2.1. Dedicated leased lines

An enterprise can make the choice to interconnect its geographically distant sites, with
leased lines creating that way a Wide Area Network (WAN), i.e. a computer network covering a
large geographical area and connecting several LANs together. WANs are most often built using
leased lines.
Leased lines are permanent symmetric telephone connections between two points set up
by a common carrier. They are always active dedicated connections: it means that employees do
not have to dial any telephone number to connect to the distant sites. They maintain a single
open circuit at all times and they can carry telephone, data or Internet traffic. Leased lines
provide relatively fast and secure communications between the sites but are very expensive. High
levels of quality of transmission can be achieved and guaranteed. Leased lines terminate on a
router which converts the traffic to the Ethernet protocol.
Interconnecting two distant sites with leased lines allows employees to access file servers
at the distant sites, to use common applications and generally to create fast and effective
communication between several locations. However, with regard to expenses, traditional WAN
architecture with leased lines does not provide the flexibility and solutions required. Leased lines
connections often require significant planning time and besides, they do not support
communication with mobile employees or remote sites like homes.
2.1.2.2. Virtual Private Networks (VPNs)

A VPN is a private network that uses a public network such as the Internet to connect to
remote sites or users. The deployment of VPNs in enterprises is mainly due to the increasing
current trend of employees’ teleworking, of creating global distributed business partnerships and
of making distributed business operations. Indeed, by implementing VPNs, an enterprise can
provide access to the private enterprise network to its mobile or remote employees working in
distant locations like their homes or remote offices, or working while travelling. That way, users
are given the impression to work from inside the company and the nature of the intermediate
networks that the transmitted data is passing through is transparent to them, i.e. it appears to
them that the data is being sent over a dedicated private line. VPN connections across the
Internet logically operate as WAN links.
The use of VPNs cancels the administrative and financial problems associated with a
traditional leased line wide-area networks (WAN) and allows remote and mobile users to be more
productive. It does so without impacting the integrity of the computer systems and data on the
private company network. When connected to a VPN the client computer is “virtually” a full
member of the corporate network, able to see and potentially access the entire private enterprise
5
network. That is why VPNs use security mechanisms, the most important of which are data
encryption and tunnelling. The tunnelling process aims at transporting the encrypted data across
the Internet by encapsulating them. VPNs have a guaranteed quality of service (QoS) defined in
Service Level Agreements (SLAs), which are agreements between the private enterprise and its
Internet Service Provider (ISP).
2.1.2.3. Dial-up remote access

Remote workers as well as distant sites can connect to the private enterprise network by
establishing dial-up connections over regular phone lines (PSTN, ISDN). Dial-up remote access
is another technology for remote access to an enterprise network.
2.1.3. Securing enterprise networks

The fact that an enterprise network is connected to the Internet makes the former
vulnerable to all kinds of attacks and threats coming from the latter. That is why security is the
foremost concern of an enterprise which wants to have access to the public Internet and to
communicate with teleworkers, business partners and customers over the Internet without
putting at stake its own private network. To protect itself, an enterprise network must control the
access to it, keep its internal technological structure hidden from the Internet, adopt enterprise
security policies and reinforce its infrastructure to meet the standard security requirements for
enterprises and to enable constantly the critical communications of the enterprise.
The security of the enterprise network infrastructure is a sine qua non of the development
of services on it, and by extension of the development of Voice over IP (VoIP) telephony
systems. In order to maintain high-quality VoIP services, the underlying IP network must provide
a secure, consistent and reliable end-to-end connectivity.
2.1.3.1. Network security requirements

A private enterprise network is regarded as secure if it meets all the requirements presented in
Table 2.1.
Requirement Definition
Authenticity Reliable determination of the authentic identity of a corresponding host.
Authorisation Process of granting or denying an authenticated user access to resources, types of activities, services
according to its access rights.
Integrity Protection against the alteration of data between the data generation and data reception.
Liability Ability of a sender to prove that the recipient to whom it has sent data has really received them or
ability of a receiver to prove the identity of the sender (non-repudiation).
Availability Protection of the availability of services and resources against malicious attacks that interrupt or
degrade them.
Confidentiality Protection against unwanted disclosure of information.
Privacy Protection of information about a user, including protection of identity (anonymity), of personal
information and information about personal activities.
Table 2.1 – General network security requirements
2.1.3.2. Threats and vulnerabilities

According to [1], it is possible to define a threat as “the potential for a particular threat-source to
successfully exercise (accidentally trigger or intentionally exploit) a particular vulnerability”. A vulnerability is
defined as a “weakness that can be accidentally triggered or intentionally exploited”. Under the term of
threat-source, or source of threat, it is meant “either the intent and method targeted at the intentional
exploitation of a vulnerability or a situation and method that may accidentally trigger a vulnerability”. A source
of threat does not represent any risk when there is no vulnerability. In other terms, the
6
vulnerabilities are the inherent weaknesses or flaws of private enterprise networks whereas
threats are potential actions that take profit from network vulnerabilities. It is important to
define and distinguish these terms since they are too often mistaken one for another in the
literature.
From the three main types of sources of threats (natural, human and environmental)
defined by [1], the human source of threats is the most interesting one, due to its great variety
and its dangerousness. Table 2.2 shows the classification, according to [1], of human threats, their
motivations and their possible actions.
Threat-source Motivation Threat actions

Hacker, cracker Challenge Hacking
Ego Social Engineering
Rebellion System intrusion, break-ins
Unauthorized system access
Computer criminal Destruction of information Computer crime (e.g. cyberstalking=harassment of a victim using
Illegal information disclosure electronic communication)
Monetary gain Fraudulent act (interception, replay)
Unauthorized data alteration Information bribery
Spoofing
System intrusion
Terrorist Blackmail Bomb/Terrorism
Revenge Information warfare
Destruction System attack (e.g. distributed denial of service)
Exploitation System penetration
System tampering
Industrial espionage Competitive advantage Economic exploitation
(companies, foreign Economic espionage Information theft
governments, other Intrusion on personal privacy
government interests) Social engineering
System penetration
Insiders (poorly Curiosity Assault on an employee
trained, disgruntled, Ego Blackmail
malicious, dishonest, Intelligence Browsing of proprietary information
negligent or terminated Monetary gain Computer abuse
employees) Revenge Fraud and theft
Unintentional errors and omissions Information bribery
Input of falsified, corrupted data
Interception
Malicious code
Sale of personal information
System bugs
System intrusion
System sabotage
Table 2.2 – Sources of threats, motivations and threat actions (source: [1])
According to network security experts, the majority of network attacks undergone by private
enterprise networks have not been initiated by attackers from the Internet or the extranet but by
some of their employees:
• Malicious employees – they can serious threaten the company since they can have access to
internal information of the company, they know the value of it and can use it to compromise
the network. Malicious employees can be angry or vengeful employees or even “snoops”, i.e.
people who spy, either because they take part to espionage, gaining unauthorized access to
confidential data in order to provide competitors with normally inaccessible information or
people just curious to access to private information of their colleagues.
• Unknowingly malicious employees – they can accidentally take actions which compromise
not only their computers but also the enterprise network. Enterprises should be wary of
human errors: employees can accidentally overlook warnings regarding security threats or
install incorrectly some software.
7
Besides, extending the private enterprise network to telecommuters, branch offices and
business partners create new vulnerabilities of the enterprise network, in particular if the
networking assets are not sufficiently monitored and secured.
Types of network threats and attacks

An attack is the fulfilment of a threat. It is possible to classify the attacks in different
ways. The main distinction is between a passive and an active attack.
A passive attack is one in which intruders eavesdrop but do not modify the message
stream. Interception and eavesdropping are passive attacks and more particularly attacks on
confidentiality.
An active attack is one in which the intruder may transmit messages under another
identity, replay old messages, modify messages in transit, or delete selected messages from the
wire. Interruption (attack on availability), modification (attack on integrity) and fabrication
(attack on authenticity) are the three main types of active attacks.
It is also possible to distinguish attacks into two types: malware attacks and hacker
attacks. Malware (“malicious software”) is software designed to infiltrate, damage or disrupt a
computer system, unbeknown the owner. The most common malware are viruses, worms, Trojan
horses, spyware and adware. Generally attackers do not use a single form of attack but a
combination of several attacks to bypass the many layers of protection and gain root
administrative access in a network.
Vulnerabilities
Enterprise networks have security vulnerabilities coming from the deficiencies and flaws
of the network protocols used, in particular the TCP/IP protocol suite which is the most
widespread. The paper Security problems in the TCP/IP protocol suite [2] presents an exhaustive
analysis of the TCP/IP protocol suite vulnerabilities.
2.1.3.3. Security solutions

Security solutions adopted by enterprises are multiple and various. The most commonly
adopted solutions are perimeter security systems such as firewalls, the Network Address
Translation (NAT) system, VPNs, resource access control, data authentication, user
authentication, intrusion detection systems, encryption of data traffic, malware protection and
spam protection. The description of these solutions is out of the scope of this master thesis.
2.1.3.4. Security policies

Enterprises generally establish security policies. A security policy is a formal document
that establishes security objectives and constraints for the enterprise network. It outlines rules for
computer network access, determines how policies are enforced and lays out some of the basic
architecture of the company security environment. It is a very complex document, meant to
govern data access, web browsing habits, use of passwords and encryption, email attachments....
It specifies these rules for individuals or groups of individuals throughout the company and
defines a hierarchy of access permissions. Security policy should protect the network from
malicious attackers and also exert control over potential risky users within the enterprise. Before
creating a security policy it is important to understand what information and services are available
and to which users, what the potential is for damage and whether any protection is already in
place to prevent misuse.
8
2.1.4. Models of network infrastructure
Throughout this master thesis, various scenarios of VoIP deployment in private
enterprise networks will be studied and will serve as a basis for the further study and analysis of
the existing threats and vulnerabilities. Two types of private networks have been selected: a small
single-site enterprise network and a large multi-site enterprise network. In Chapter 3, the
deployment of VoIP systems in these two models of networks will be designed and modelled.
2.1.4.1. Single-site small private enterprise network

The first model of enterprise network in which VoIP deployment will be studied is a single-
site enterprise network. Figure 2.2 below represents a model of network infrastructure usually
adopted by small enterprises with limited needs and requirements. The enterprise illustrated in
Figure 2.2 is constituted by a single site, does not have any telecommuters and does not offer any
extranet to business partners. All these network characteristics explain why VPNs or leased lines
are not needed for site interconnection.
This single-site private enterprise is constituted by two totally separate networks:
• the traditional enterprise telephony network
• the private IP enterprise network
These two networks have different and specific cabling and are configured and maintained by
personnel with different competence and technical skills.
All the TDM phones (classical phones) of the company are connected to a Private Branch
eXchange (PBX), which provides connection to the public telephone network (PSTN/ISDN
networks). A legacy PBX is a telephone exchange that is owned by the enterprise and installed in
its premises. PBXs are a solution to avoid connecting all enterprise phones with a direct line to
the public telephone network, which would incur a connection and line charge. With a PBX, an
enterprise does not need as many external phone lines as phone extensions, but only a limited
number of external lines, called trunk lines, which are shared by all the employees. These trunk
lines should be sufficient for receiving and making external calls over the public telephone
network. Besides, PBXs are switching systems that manage calls between internal employees.
They establish internal calls without routing them out of the enterprise. ISDN PBXs offer also
some features like conference calling, call forwarding, voice mail, music-on-hold and many other
services ([3], “PBX”). Enterprises can make the choice to manage and maintain their own PBXs
in their premises or to have PBXs hosted and managed externally by a service provider. A hosted
PBX saves maintenance and upgrade costs to the company but the company has to pay charges
to the service provider for providing and running this service. In the scenario illustrated above,
the small enterprise will be in charge of the maintenance of its own PBX.
As for the private IP enterprise network, it is connected to the Internet by means of an edge
router. This model of private IP network is divided into two sections:
• A Demilitarized Zone (DMZ) – In this model, it is assumed that the small enterprise
offers a few services (web site or email service) to external users. Thus, to ensure security,
this network must implement a Demilitarized Zone (DMZ) protecting the public servers;
this means that public servers are placed between an external firewall and an internal
firewall.
• An Intranet – it allows employees to share common resources of the enterprise. It can
also include servers which do not offer public services like DHCP servers.
2.1.4.2. Large multi-site private enterprise network

The second model of enterprise network in which the VoIP deployment will be studied is
this of a large multi-site enterprise network. The enterprise, illustrated in Figure 2.3, is
constituted, as in the previous model, of two distinct networks:
9
• The traditional enterprise telephone network, whose core is the PBX connecting the
telephone sets to the public telephone network (PSTN or ISDN)
• An IP enterprise network, which itself comprises an intranet connecting the main
headquarters to its small remote offices and an extranet connecting the same
headquarters with the private enterprise networks of business partners such as customers
or suppliers.
In this model, the headquarters network entails two parts: a Demilitarized Zone (DMZ),
where there are all public servers and an Intranet which is divided by routers (user segment routers)
into several sub-networks. The DMZ ensures protection to its hosts which provide public
services like the Domain Name Service (DNS), email, web… to the external network.
The headquarters network is connected to remote offices and to customer and supplier
enterprise networks by means of leased lines, since it is important to have a high level of security
and a considerable guaranteed bandwidth between them. However, the headquarters network is
also connected to the networks of remote offices by means of VPN channels. For that, a VPN
server has been placed in the DMZ of the headquarters and in the small office network. When
the VPN server is placed in the DMZ, then the external firewall must be configured with input
and output filters so that encrypted traffic can be tunnelled through it to the VPN server. Of
course, there are other positions where to place VPN servers [4], but it is advantageous to place
the VPN server behind the external firewall because:
• Enterprises take advantage of the strengths of firewalls, such as packet filtering and
logging
• The VPN server does not need to be highly secured since it is already secured from
the Internet by the firewall
• Additional IP addresses are not required for the VPN server. VPN connections can
connect to the same address (firewall address) that is used for other services like Web
services.
However, placing the VPN server behind the external firewall presents some disadvantages,
namely the configuration of more complex firewall rules and the bandwidth limitations since
VPN bandwidth is shared.
Mobile workers can connect to the intranet of their enterprise from the Internet via VPN
channels. In Figure 2.3, small remote office networks have rather simple architectures: they are
protected by a LAN access router with a firewall, include a VPN server and a user sub-network.
10
Figure 2.2 – Small single-site private enterprise network
Figure 2.3 – Large multi-site private enterprise network
11
2.2. Introduction of Voice over IP (VoIP) in enterprises
2.2.1. Definition of Enterprise Voice over IP

Let’s first define Voice over IP and IP Telephony and then customer and enterprise VoIP.
2.2.1.1. Voice over IP vs. IP Telephony

Voice over IP (VoIP) is a generic term referring to the routing of voice traffic over IP-
based networks, i.e. packet-switched networks, by opposition to the traditional telephony which
routes voice traffic over circuit-switched networks like the PSTN or ISDN networks.
Very often, the term of VoIP is used instead of IP Telephony; however, these terms are
not exactly synonyms even if they are closely related. The difference is that:
♦ In Voice over IP, voice is encapsulated within IP packets between private IP sites
(exactly between VoIP gateways). Voice over IP does not imply the replacement of
all PSTN equipment and devices in private enterprise networks. On the contrary,
VoIP allows enterprises to keep their legacy telephone PBX and TDM phones. As
illustrated later in Figure 3.4, when an employee places a call from a TDM phone to
another TDM phones located in another enterprise for example, the voice traffic
passes through a VoIP gateway (this term will be defined in the next chapter)
located in the enterprise premises and is routed by this entity to the enterprise’s IP
network and then to the public Internet in order to reach its destination. At the
receiving side, the voice traffic passes through the destination private IP network
and then is routed through a VoIP gateway to the destination TDM phone. Voice
over IP allows then to bypass the PSTN/ISDN networks to save costs.
♦ In IP Telephony, voice is encapsulated end-to-end from phone to phone (these
phones are not TDM phones anymore but IP phones connected to the enterprise
LAN). IP Telephony means the replacement of the PSTN infrastructure by a
totally IP-based infrastructure, since IP Telephony tries to reproduce the
traditional telephony functionality in an IP environment. Therefore, in IP
Telephony, legacy PBXs are replaced by servers called IP-PBXs which have the
same role but in a packet-switched network; similarly, new components are
introduced in IP Telephony like voicemail servers, IP phones, unified messaging
servers and other telephony application servers. Thus, IP Telephony is a larger
concept than Voice over IP and this is towards it that enterprises should gradually
migrate.
Despite this difference between the Voice over IP and IP Telephony terms, the term “Voice
over IP” will be used indifferently in this master thesis report to design both.
2.2.1.2. Consumer VoIP vs. Enterprise VoIP

VoIP telephony can then be further subdivided into two types or “flavours” which cover
different concepts: consumer VoIP and enterprise VoIP. Even if both of them use the IP
protocol as underlying transport mechanism, they differ on the nature of the network by which
they are supported: while consumer VoIP uses public networks like the public Internet to
transport voice packets between calling parties, enterprise VoIP transports them over private
managed IP networks. Thus the major difference between them is that consumer VoIP service is
not a reliable service and without any guaranteed quality, whereas enterprise VoIP can provide
reliable and high-quality services. This is due to the fact that consumer VoIP relies on the
Internet which does not guarantee any bandwidth reservation or timely delivery, whereas
enterprise VoIP relies on private managed IP networks whose network parameters can be
controlled.
In this master thesis, only the enterprise VoIP will be studied.
12
2.2.2. Types of VoIP services
Four ways of how VoIP service can be used can be distinguished depending on the nature
of terminals and the type of network. Figure 2.4 illustrates VoIP scenarios showing the different
types of services which are namely:
• Peer-to-peer or PC-to-PC services (1): Both calling and called parties use VoIP
applications, called softphones, installed on their computers. Communication is
achieved through access of both parties to the Internet via an Internet Service Provider
network. This type of VoIP does not involve any communication over the
PSTN/ISDN networks.
• PC-to-TDM phone or dial-out only services (2): The calling parties use a softphone on
their computers to call a user on the PSTN/ISDN networks. With such a service, users
using a softphone on their computers can call TDM phones but not vice-versa.
• TDM phone-to-PC or dial-in only services (3): The calling party is a PSTN/ISDN
subscriber who calls a user reachable on his computer. Since the calling party can call
only by dialling a classical phone number (E.164 number), the called computer should
be attributed an E.164 number, for example by the IP-PBX of the private network
where it is located or by a VoIP provider. With such a service, computers can be called
by TDM phones but not vice-versa.
• TDM phone-to-TDM phone (4): Both calling and called parties are PSTN/ISDN
subscribers. The voice traffic generated by the calling party passed through a VoIP
gateway so that this traffic is transported over IP-based networks and passes at the
receiving side through another VoIP gateway to reach the destination TDM phone.
Figure 2.4 – Four types of VoIP services
In these types of services, IP phones have exactly the same role as computers since they are
devices directly connected to the IP network used to place and receive calls. So, the PC-to-PC
service is similar to an IP phone-to-IP phone service. Besides, TDM phones connected to the IP
network through analogue adaptors (ATA adaptors) can also be seen, considered as a set, as IP
phones or computers and the communication between two such sets can be regarded as a PC-to-
PC communication.
Looking closely at the above mentioned types of services, it is possible to separate them in
services which require the intervention of a telephony company (Telco) and the use of gateways
like in type 2, 3 or 4, and services which do not like type 1, IP phone-to-IP phone services, TDM
phone with adaptor-to- TDM phone with adaptor and all the possible associations like computer-
to-IP phone…
13
2.2.3. Importance of VoIP for enterprise networks
The adoption of Voice over IP technology by enterprises can present several advantages to
them, not only financially. The main incentives for VoIP deployment in enterprises are:
• Optimisation of wiring infrastructure since VoIP uses the enterprise IP network, so
there is no need for an enterprise to deploy two different networks, to hire staff with
different competences and to maintain two networks. This ability to transport voice
and data traffic over the same IP network leads enterprise to cost savings.
• Cost savings in PSTN/ISDN communications. Enterprises need to subscribe to fewer
costly telephone lines.
• Costs savings in inter-site communications. For multi-site enterprises, it is interesting
to adopt VoIP for inter-site calls, bypassing that way the PSTN/ISDN networks and
their expensive tolls. The cost savings are a significant incentive in this case.
• Scalability and large flexibility in the management of phones. With VoIP, it is much
easier for a user settling in their new office to keep the same phone number as they had
in their previous office (“Changing”). Besides, users can move from one office to
another and login themselves to the local phones so that they can be reached as if they
were in their office under the same address or number (“Moving”). This is due to the
location-independent addressing scheme used in SIP-based VoIP. In addition, the
addition of new VoIP users is very easy (“Adding”), in contrast with legacy telephony
which requires the reconfiguration of the PBX, the change in ports on the PBX and the
change in dialling numbers. With VoIP, the Moves, Adds and Changes (MAC) are
easier and are no expensive compared to the MACs in a legacy telephone network.
• Use of intelligent terminals. VoIP terminals like IP phones are more user-friendly and
more feature-rich than TDM phones. IP phones have a large display with menus
offering different telephony options to users in a more ergonomic way. Besides, with
VoIP deployed in enterprise, it is possible to create a flexible and mnemonic dialling
plan, based on names for example, making it easier to place calls to employees.
• Mobility of employees. By logging in to IP phones no matter if their location is in the
enterprise, in some branch or at home through a VPN, it is possible for users to be
reached at their personal professional phone number (SIP URL) as well as to have
access to the same services as if they were at office, like voice mail box, agenda, address
book... In addition, VoIP makes the telephony service uniform all over the sites
• Integration of new applications and media. VoIP offers the same services as legacy
telephony, even advanced ones like Automated Attendant or Interactive Voice
Response. However, with VoIP, new services can be integrated and existing services
like Presence, Location Awareness or Instant Messaging can be used by IP phones.
Besides, new services like Unified Messaging which save time to employees can also be
integrated with voice applications. VoIP additionally allows to do video conversations,
data exchange during ongoing calls, manage address books… The addition of services
in VoIP networks is not as difficult as in traditional telephone networks, since
• Improvement of teamwork. With VoIP, synergy is enhanced. For example, it is easier
for employees to work in teams even at distance by exchanging documents while
discussing orally at phone.
2.2.4. Challenges of a VoIP migration in enterprises

Before proceeding to a VoIP deployment, enterprises must consider several challenges,
business challenges as well as technical challenges, to evaluate whether such a deployment is
worth or not. First, concerning business challenges, enterprises have to determine the costs and
benefits of a VoIP migration, the impact on the corporate processes and employees and the
optimal time to do the migration.
14
As for the technical challenges, before migrating to VoIP it is crucial to determine the
general architecture of the VoIP network like for example, to define whether the network will
have a centralized or distributed call processing or whether it will adopt a full-IP VoIP
architecture or an intermediary hybrid network configuration. Other technical challenges are to
decide whether some components of the legacy telephone network could be kept and integrated
in the new VoIP network or which vendor solutions and products to choose according to
specific needs. A major challenge is to determine whether the current data IP network is able to
support VoIP and which changes in the network configuration, in components, in bandwidth…
should be made accordingly.
To face these challenges, enterprises should proceed to a general assessment to determine
and define the requirements of a possible VoIP system. This general assessment encompasses
several assessments [25] like assessment of calling features and functionality, of performance, of
reliability, of capacity, of financial metrics… and of security. Security assessment will be studied
later in this report and is fundamental in VoIP migration because lack of security implementation
could not only jeopardize the VoIP network but the whole IP network.
Then, enterprises can use and analyse the results coming from this general assessment in
order to design a possible VoIP network architecture responding to the enterprise’s needs and
adapted to the enterprise’s already existing IP architecture. The design should be high-level and
take into account the new VoIP components to add in the network as well as the changes in the
IP network in order to host VoIP. The next step is to develop a migration strategy defining with
accuracy the several milestones of the VoIP migration like the VoIP equipment procurement, the
training of staff, the installation, the configuration and the testing. Last, a business case analysis
should be carried out to determine the total cost of ownership (TOC) which includes costs of
implementation and deployment, costs of training, costs of telephone calls… Once these
assessments and financial analyses have been performed, enterprises can decide whether to
deploy VoIP or not and how to proceed from the timing and budgeting point of view.
Finally, it is important to underline that all enterprises do not have the appropriate in-
house expertise to carry out security assessments. Expertise is one of the major differences
between small and large enterprises presented earlier: while large enterprises have the financial
possibility to develop their own technical expertise by training their technical staffs to deploy and
maintain VoIP networks, small enterprises generally have difficulties in building their own in-
house expertise, in particular in an emerging field like VoIP security. For that reason, it is strongly
advised for small enterprises to have recourse to security experts and third-party consultants, for
example, to perform vulnerability assessments during implementation and to provide enterprises
with tools to perform regularly on their own such assessments after the deployment of their
VoIP network.
After having seen the different incentives for enterprises to deploy VoIP and the
importance of security assessment before, during and after the VoIP migration, it will be
interesting now to design and model VoIP deployments in enterprise networks, by using the two
models of enterprises, small and large enterprises, presented earlier in this chapter.
15
3. Designing and modelling VoIP deployments in
enterprise networks
Nowadays, an increasing number of enterprises, small and medium size enterprises
(SMEs) as well as large enterprises decide to deploy a VoIP system in their existing enterprise
network. A VoIP system is the combination and/or superposition of four main infrastructures:
- An IP enterprise network infrastructure which is the fundamental underlying
infrastructure, enabling IP communications between network entities. This infrastructure
has already been described in 2.1.4 in the case of several networks.
- A VoIP infrastructure which is an infrastructure superposing the IP network
infrastructure and providing VoIP telephony service to users. This infrastructure
introduces new components, briefly presented in 3.1 and detailed in 3.2.
- A User Access infrastructure which aims at enabling the employees of the enterprise to
access the VoIP system from the Internet remotely or from within the intranet.
- A Management System infrastructure, needed to provide VoIP management functions to
network administrators who have to configure and maintain VoIP components.
This sub-chapter aims at presenting these four infrastructures and in particular to expose through
examples the most interesting infrastructure: the VoIP infrastructure. I have first designed
several VoIP deployment scenarios which could take place in the enterprise networks previously
presented in 2.1.4 and which are going to provide us with a support for security analysis
throughout this thesis. These deployment scenarios are described in 3.1 and the components of
VoIP infrastructure in 3.2. Since the Session Initiation Protocol (SIP) is the only signalling
protocol which will be studied in this thesis, I have thought it would be necessary, for clarity
reasons, to explain which are the components belonging to the VoIP architecture and which to
the SIP architecture and to show how it is possible to integrate the SIP protocol and components
in a VoIP architecture (3.2.3). Finally, the User Access infrastructure and the Management
System infrastructure will be presented respectively in 3.3 and in 3.4.
3.1. VoIP deployment scenarios

The scenarios described in this section correspond to several degrees of VoIP migration
and can be regarded as the different steps towards a full-IP voice system deployment.
3.1.1. Deployment in a small single-site enterprise

The enterprise network considered for this VoIP deployment is this of the small
enterprise presented in Figure 2.2. Two scenarios are presented with this underlying IP network
infrastructure. The first scenario corresponds to a so-called IP-enabled VoIP architecture, which
is an intermediary step to complete IP convergence, while the second scenario presents a full-IP
VoIP architecture. In both scenarios, it has been assumed that the small company has no direct
access to the PSTN/ISDN networks but an indirect access to them through its Service Provider.
This indirect connection to PSTN/ISDN networks is called SIP Trunking [5]. For enterprises,
SIP Trunking means that they no longer need to have costly local PSTN- and ISDN-gateways
since the local IP-PBX can connect to the service provider’s PSTN/ISDN gateways over the
Internet.
3.1.1.1. IP-enabled VoIP architecture

Small enterprises are not always in favour of deploying a full-IP VoIP system in their
enterprise network, in particular if they have already recently invested in a costly traditional
telephony network including new and generally expensive PBX equipment.
16
The scenario represented in Figure 3.1 illustrates how a small enterprise can adopt an
intermediary VoIP solution and at the same time, prolong the useful life of its legacy PBX
equipment. Such architecture is called IP-enabled VoIP architecture.
Figure 3.1 – IP-enabled VoIP deployment in a small single-site enterprise network

The IP-enabled VoIP architecture keeps the legacy telephony system of an enterprise with its
PBX equipment and links it to the IP enterprise network infrastructure.
It is possible to link them in different ways:
- Either by IP-enabling the existing PBX system. It is possible to upgrade legacy PBXs to
make them IP-enabled by adding a gateway-on-a-line-card holding a TDM-to-VoIP
processor and voice quality enhancement device.
- Either by keeping the existing PBX as it is and by connecting it to a VoIP
media/signalling gateway. A VoIP media/signalling gateway is a device which interfaces
the circuit-switched network and the IP network. It translates the SIP signalling protocol
of the IP network side into the signalling protocols of the circuit-switched networks (SS7,
ISUP …) and vice versa, and converts the media traffic from one network to the other.
- Either by connecting the PBX to a router which has gateway functionalities.
In the scenario of Figure 3.1, it has been chosen to connect the legacy PBX to a VoIP
media/signalling gateway. Employees can only call from traditional TDM phones. This solution
of interfacing the circuit-switched networks to IP network via a VoIP media/signalling gateway is
adapted for small and medium size enterprises (SMEs) that require a low-cost and low-
maintenance solution. The gateway can be installed and configured easily and quickly and can
help enterprises reap the benefits of VoIP with a low cost.
The deployment of an IP-enabled architecture mainly aims at transporting voice between
the different sites of an enterprise (not represented in this scenario) over an IP network, which is
free of additional costs for the enterprise. An IP-enabled solution can be either a simple solution
of call toll bypass or, at least, of cost reduction of big call volumes or international calls, or the
first step towards a full migration to VoIP.
17
However, toll bypass is not the only reason for SMEs to deploy VoIP solutions. SMEs
nowadays want to integrate multimedia convergence services like collaboration, voice mail or
unified messaging in their enterprise network.
To conclude, IP-enabled VoIP architectures have the advantage to make small enterprises
keep their existing legacy telephone infrastructure while benefiting from the advantages of VoIP
for inter-sites communications. The main advantages of an IP-enabled VoIP solution with a
VoIP media/signalling gateway are the following:
• A low-cost way to deploy a VoIP system in the enterprise
• The reduction of call tolls
• The protection of former investment in legacy telephony equipment
• The introduction of enhanced services aimed at employees and clients
However, this solution has some drawbacks which are that it does not offer as many services as
in a full-IP VoIP system, that the enterprise has to keep its personnel with PBX competences and
in parallel, to develop VoIP knowledge (by hiring maybe specialists or by educating the existing
IP personnel). An analysis of the strengths, weaknesses, opportunities and risks, called SWOT
analysis, of this solution has been carried out by Cesmo Consulting [6].
3.1.1.2. Full-IP VoIP architecture

Instead of adopting the IP-enabled VoIP architecture, SMEs, in particular those which
just start up and do not have invested yet in a legacy telephony infrastructure, can adopt
immediately the full-IP VoIP architecture. To build such a system, SMEs can either fully replace
their legacy PBX system or build it from scratch.
Figure 3.2 – Full-IP VoIP deployment in a small single-site enterprise network
A full-IP VoIP system does not have any legacy PBX equipment anymore, as illustrated
in the scenario of Figure 3.2, but has an equivalent IP device called IP-PBX. This scenario
represents a model of a centralized IP-PBX system with a single IP-PBX core located at the
premises of the enterprise and working together with a location server and possibly with an
ENUM DNS server. Besides, employees can call now from IP phones, telephony applications
18
installed on their computers called softphones or from TDM phones which connect to the IP
network through adaptors. Let’s explain briefly all these terms.
An IP-PBX (Internet Protocol Private Branch Exchange) is a telephone switching system within
an enterprise which manages calls over the Internet Protocol, called VoIP calls, and extensions
(phones). It is possible to find a more detailed definition of an IP-PBX in 3.2.1.1. IP-PBXs are
very advantageous for SMEs since they offer them the possibility to use features and
functionalities that could be afforded up to now only by large enterprises. Besides, PBX systems
are very costly to run, maintain and upgrade, and, thus, they use to be remotely hosted and
managed by service providers. In addition, PBX systems are not very flexible in their
configuration; for example, it is difficult to configure it in case that employees decide to move or
change location. In contrast, IP-PBX systems are very scalable, i.e. it is easy to configure the
addition or the removal or the move of extensions. They are owned by enterprises and can be
managed in-house. In a full-IP VoIP system, only signalling traffic passes through the IP-PBX
core, otherwise media traffic bypasses it.
IP-PBXs may need a location server and possibly an ENUM DNS server. The location
server will be a Session Initiation Protocol (SIP) location server, since SIP is the signalling
protocol for VoIP calls which is being studied in this Master Thesis. A location server receives
queries from the IP-PBX which ask for the resolution of a SIP address into an IP address. The
IP-PBX makes such queries when it has to forward a call to a SIP address and does not know the
corresponding IP address.
As for ENUM DNS servers, it would be useful to say first that ENUM (E.164 Number
Mapping) is a protocol defined in the RFC 3761 [7] and according to the RFC 3764 [8], ENUM
can be defined as “a system that uses DNS to translate telephone numbers, like '+12025332600', into
URIs, like 'sip: egar@example.com’. ENUM exists primarily to facilitate the interconnection of systems that
rely on telephone numbers with those that use URIs to route transactions”.
Employees use IP phones and/or softphones to place VoIP calls. IP phones are phones
which are connected directly to the IP network and which are destined to replace the classical
TDM phones without disturbing the behaviour of employees. Softphones are software
applications running on computers and which can place or receive VoIP calls. Typical softphones
are Skype, AIM or VoIPstunt; however they are not SIP softphones. An example of a SIP
softphone is X-Lite and the list is still long [10].
Nevertheless, the adoption of a full-IP VoIP system does not exclude the use of classical
TDM phones (analogue phones). It is possible to keep some of them but, in that case, it is
necessary to connect them to the VoIP network via telephone adaptors (TAs). These telephone
adaptors are also called FXS Adaptors (Foreign Exchange Subscriber Adaptors) or ATA Adaptors
(Analogue Telephone Adaptor) and they interface the analogue phones, faxes or other analogue
devices with the VoIP network as represented in Figure 3.3. The most common ATA adaptors in
enterprise networks have usually one
Ethernet jack used to connect the adaptor
to the LAN and one, or more, FXS ports.
A FXS port is a telephone interface which
provides battery power to the connected
phone, sends dial tone, and generates
ringing voltage. Any analogue device has an
FXO port (Foreign Exchange Office port);
a cable connects this port to one of the
FXS ports of the adaptor. Telephone
Figure 3.3 – Analogue telephone adaptors adaptors communicate with the IP-PBX
using a VoIP protocol such as SIP and
encode and decode the voice signal using voice codecs. A list of SIP telephone adaptors can be
found in [11]. As for ISDN telephones, it is possible to connect them to the enterprise LAN by
19
using so-called ISDN-VoIP adaptors which connect one or more ISDN telephones or an ISDN
PBX to an ISDN S0 bus and to an Ethernet LAN interface with VoIP functionality. They
convert ISDN signals to SIP signals and voice streams are converted to VoIP packets.
To opt for a full-IP VoIP deployment, i.e. for a completed VoIP migration, enterprises
opt for a rupture with the traditional telephony network, which presents the below main
advantages:
• Already existing underlying infrastructure
• Single integrated voice and data network
• Simplified Moves, Adds and Changes (MACs)
• Flexible deployment: the different VoIP components can be located in different
communication rooms as long as there is IP connectivity between the locations
• Excellent application support: for example, it supports Unified Messaging, an application
that enables users to fully integrate their email and voice mail mailboxes
• Possibility to extend the VoIP services to the several sites of the enterprise
But, according to the SWOT analysis of [6], this solution also entails some weaknesses and risks:
• Longer and more complex phase of migration
• Replacement of all TDM equipment
• Potential hidden extra costs & difficulty to assess the ROI (Return On Invest)
3.1.2. Deployment in a large multi-site enterprise network

The enterprise network considered for this VoIP deployment is this of the large multi-site
enterprise presented in Figure 2.3 which is composed by headquarters to which a small remote
office is connected via a VPN, by remote workers connected also via VPNs and an extranet.
Three scenarios are presented with this underlying IP network infrastructure. These
scenarios will correspond to the different steps towards a full-IP VoIP migration: IP-enabled
VoIP deployment, hybrid VoIP deployment and full-IP VoIP deployment.
In these three scenarios, it has been assumed that the enterprise has direct access to the
PSTN/ISDN networks and not only through a service provider (no SIP Trunking).
3.1.2.1. IP-enabled VoIP architecture

As seen previously in the case of a small enterprise network in 3.1.1.1, it is possible to
adopt an intermediary VoIP solution called IP-enabled VoIP architecture which allows the
coexistence and the intercommunication of its legacy telephone network and its IP network by
means of a VoIP media/signalling gateway. Large multi-site enterprises can adopt an IP-enabled
VoIP architecture by installing a VoIP media/signalling gateway or by IP-enabling their legacy
PBX (see 3.1.1.1).
Figure 3.4 illustrates the adoption of such IP-enabled VoIP architecture by a large multi-
site enterprise. This enterprise has headquarters, composed by a traditional telephone network
linked to the LAN by a VoIP media/signalling gateway, and has also a small remote office,
composed, like the headquarters, by a traditional telephone network linked to its LAN via a VoIP
gateway. The two distant LANs are linked by a VPN. For such an enterprise, it is very
advantageous to adopt this solution before migrating to IP convergence, at least for toll
bypassing between the different sites of the enterprise. Indeed, with an IP-enabled solution, all
internal calls placed from a TDM phone located in the headquarters to a TDM phone located in
the small remote office and vice-versa are free of toll charges, since voice is transported through
the Internet, as illustrated in Figure 3.4. In this figure, a caller located in the headquarters of the
enterprise wants to call an employee working in the small remote office. Instead that the voice
passes through the PSTN/ISDN networks (pink path), it follows a path within the enterprise IP
network (blue path). Of course, the voice is not transported in the same way in the two paths. In
20
the pink path, voice is transported under the form of an analogue or digital signal, whereas in the
blue path, between the two VoIP gateways, voice is transported under the form of packets.
Figure 3.4 – VoIP deployment with legacy PBX system in a large multi-site enterprise network
In this scenario, that makes no sense to install IP phones or softphones since VoIP calls cannot
be managed by the only IP-enabled legacy PBX. This means that mobile workers cannot have
access to the VoIP system.
Let’s see now a more flexible VoIP architecture.
3.1.2.2. Hybrid VoIP architecture

It is possible to integrate VoIP in a large enterprise network by keeping the legacy
telephone network and connecting it to the IP network, as presented in the previous scenario,
and at the same time, to make a step further towards full-IP convergence.
To achieve that, enterprises have to adopt the hybrid VoIP architecture. In such
architecture, there is still interconnection and coexistence of the legacy telephone network and of
the IP network. However, a new device is introduced in the VoIP system: an IP-PBX core. In
this hybrid VoIP architecture, there is coexistence and cooperation between the multiple legacy
PBXs and the multiple IP-PBXs.
In the scenario illustrating this architecture in Figure 3.5, it appears that each site, the
headquarters as well as a small remote office, has its own IP-PBX: it is a distributed IP-PBX
architecture. If we consider only a site, with this new architecture, there are no changes between
the legacy PBX and the PSTN/ISDN network. The legacy PBX continues to route inbound and
outbound calls. The only changes are that IP-PBXs have been introduced while legacy PBXs
have been kept and that they have to be configured so that they can collaborate together.
How is it achieved? The collaboration between IP-PBXs and PBXs is tightly linked to the
challenge of how to translate circuit-switched calls from the traditional PBX to VoIP calls from
the new IP-PBX and vice-versa. This challenge is in the charge of the VoIP media/signalling
gateway which translates protocols from one network to another (for example, translation of SIP
21
Figure 3.5 – Hybrid VoIP deployment in a large multi-site enterprise
signalling protocol in the IP network side into Q.931 signalling protocol in the circuit-switched
network side) as well as media types from one network to the other; its work is explained later in
details in 2.3.2.1. Besides, to be able to cooperate with legacy PBXs, IP-PBX must set up a dial
plan which is compatible with the existing dial plan set up on the legacy PBX. A call’s destination
may be local to the IP-PBX, on the traditional PBX, or out to the PSTN/ISDN networks and all
these calls will have a prefix showing the type of their destinations; for external calls, the IP-PBX
dial plan must comply with the PBX’s one (E.164 phone numbers).
In the scenario of Figure 3.5, we assume that a Business Manager working in the
enterprise headquarters wants to place calls from his IP phone to people located in different
places and having different kinds of phones, analogue or IP phones.
Call destination in the enterprise legacy phone network. Let’s assume that the
Business Manager wants his Accountant having a TDM phone. In Figure 3.6, it is
possible to see which the exchanges of messages are between the different VoIP
components when the signalling protocol in the headquarters’ LAN is the SIP protocol
and the signalling in the legacy telephone network is the Q.931 protocol.
Q.931 is a signalling protocol for ISDN communications which is used in VoIP.
This protocol is involved in the setup and termination of connections according to the
H.225 protocol for digital telephone services. The messages in Q.931 include: Setup
(establishment of a connection), Call proceeding (call is being processed by the destination
terminal), Ring-alert (destination set is ringing), Connect (the intended destination phone set
has received the call) and Release/complete (call termination).
In Figure 3.6, the digits that the Business Manager presses are collected by the IP
phone and then, are inserted under the form of a SIP URL in a SIP INVITE message
destined to the IP-PBX. According to its call routing tables, the IP-PBX recognizes the
22
digits as an outbound call destined to a TDM phone belonging either to the private legacy
telephone network or to the public PSTN/ISDN networks. It forwards the message to
the VoIP media/signalling gateway. The gateway translates the SIP INVITE message
into a Q.931 SETUP message understandable by the PBX which receives this message.
According to its call routing tables, the PBX recognizes the digits as a local call destined
to a TDM phone directly linked to it. It forwards Q.931 SETUP message to the
Accountant. For the tear-down of the call, the Business Manager’s IP phone sends a SIP
BYE message to the IP-PBX, which is forwarded to the VoIP media/signalling gateway
and then converted by the latter into a Q.931 DISCONNECT message.
The role of the VoIP media/signalling gateway is very important since it is a sort of VoIP
terminal from the IP-PBX’s point of view, representing all legacy phones.
Figure 3.6 – Call flow between an IP phone and a TDM phone in a hybrid VoIP architecture
Call destination in the PSTN/ISDN networks. Let’s now assume that the Business
Manager wants to call a Consultant working at home and who has a legacy phone. The
call flow is almost the same as in the previous case of the Business Manager and the
Accountant described above, except that, always according to its routing tables, the legacy
PBX does not forward the call (i.e. the Q.931 SETUP message) to a local phone, but
would pass it to PSTN/ISDN facilities for processing by the respective carrier.
Call destination local to IP-PBX. Let’s assume now that the Business Manager wants
to call a HR Manager working in another department and having an IP phone. It is the
easiest case: the IP-PBX receives the SIP INVITE message from the Business Manager,
looks at its SIP address and forwards it, according to its routing table.
Many different types of call flows in such an architecture can be found in [12]. It is important to
underline that the IP-PBX has authorization and authentication functions and manages user
23
access rights. For example, some employees may be allowed to place calls to local TDM phones
but not to external TDM phones; this is the role of the IP-PBX to check the rights of the caller
and reject or accept calls. At any case, VoIP media/signalling gateways should accept only SIP
messages coming from the IP-PBX. IP-PBX authentication is explained later in Chapter 6.
Hybrid VoIP architecture is currently very appreciated by enterprises even if this is not
a full-IP VoIP architecture, because it keeps the enterprise legacy telephone network while
introducing new services and is an advanced step towards IP convergence.
3.1.2.3. Full-IP VoIP architecture

Enterprises can also opt for a full-IP VoIP architecture, also called a client-server
architecture. This type of architecture corresponds to the final stage of a VoIP migration since
there is not anymore any enterprise legacy telephone network. To achieve completely this
migration is financially very difficult for large enterprises which have already invested in a private
legacy telephone network.
Figure 3.7 – Full-IP VoIP deployment in a large multi-site enterprise network
The full-IP VoIP architecture is a real rupture with the private legacy telephone network
since it does not unify the two worlds, the circuit-switched and the VoIP worlds, as in hybrid
architectures, but it makes the VoIP network world predominant and eliminates the legacy
telephone network (no legacy PBXs anymore). The major part of TDM phones is replaced and
those which are kept are connected to the VoIP system through FXS adaptors (Figure 3.3).
Usually employees located in the enterprise premises use IP phones and mobile workers usually
communicate with the enterprise VoIP system through softphones installed on their laptops.
The scenario presented in Figure 3.7 shows a model of full-IP VoIP architecture
deployed in a large multi-site enterprise. Since the enterprise has many sites and many mobile
workers, the choice has been made to adopt a distributed IP-PBX model, i.e. to install an IP-
24
PBX in each individual site. Each IP-PBX manages the local IP calls, which means, for example,
that the IP-PBX in the headquarters manages only calls placed by employees located in the
headquarters. The IP-PBX in the headquarters will receive calls made by employees from the
headquarters and it will then forward them either to local employees or to the remote IP-PBX
situated in the small remote office. The most important advantage of a distributed IP-PBX model
is that if the IP-PBX located in the headquarters goes down, the outage will at least affect the
headquarters but not the other sites.
The IP-PBX has been placed in the Demilitarized Zone (DMZ) of the headquarters IP
network in order to process safely the external calls and to protect the internal VoIP network.
However, this may introduce some latency, even negligible. For example, calls coming from the
PSTN/ISDN networks and passing through the VoIP media/signalling gateway pass through the
internal firewall of the DMZ to reach the IP-PBX and to be forwarded by it. This means that
VoIP packets have to pass at least twice through the internal firewall which is a well-known
source of problems and delay for VoIP traffic.
A location server and possibly an ENUM DNS server, present in each site, are also
included in this VoIP architecture to help the IP-PBX in its address resolution (see 3.1.1.2).
This architecture may seem ideal; however, it presents many difficulties of deployment
such as the difficulty of building a uniform dial plan for all the sites. Besides, the overall
management of such a network is very difficult. On top of that, if the IP-PBXs have been
designed by different manufacturers, then the interoperability of these IP-PBXs introduces a
more considerable difficulty. These difficulties discourage enterprises and lead them very often
to opt for a hosted VoIP solution.
3.1.3. Hosted IP-PBX solution

“In the converged world, it may make more sense to rent rather than buy” [14]. This sentence is for sure
arguable but this summarizes the philosophy behind the adoption and the ever growing market
of a particular type of VoIP deployment: the hosted VoIP solution.
3.1.3.1. Hosted IP-PBX vs. IP-Centrex

Very often in the literature or in marketing product descriptions, the term “hosted IP-
PBX” (also called “hosted PBX”) is often confused with the term “IP Centrex” and both are
considered synonymous. Even if both of them are hosted VoIP solutions, which means that the
IP-PBX is no longer located in the customer’s network but installed in the premises of a VoIP
Service Provider, they present great differences [15]. They also have to be distinguished from the
so-called “managed IP-PBX” which refers to the deployment of an IP-PBX in an enterprise’s
premises with the VoIP Service Provider responsible for the remote management
An IP Centrex solution is a sort of Centrex solution but integrated in an IP environment.
In a legacy telephone network, a Centrex solution was a solution adopted in the past by
enterprises which did not want to manage a legacy PBX but preferred to sign a contract with a
Centrex provider (called Central Office) which owned and operated equipment providing the call
control and call service functions (residing in a so-called Class 5 Switch in the provider’s
premises). A hosted IP-PBX is simply an outsourced IP-PBX, located in the premises of a service
provider.
The main difference between an IP Centrex solution and a hosted IP-PBX is their
richness in features and functions. As one can see in Table 3.1, an IP Centrex has rather poor
features compared to a hosted IP-PBX. Hosted IP-PBX is a solution more adapted to medium to
large enterprises whereas IP Centrex targets preferably small to medium-size enterprises.
25
According to [15], “since
IP Centrex Hosted IP-PBX the feature /function gap between
Basic features Same features as IP Centrex Centrex and PBXs (IP or otherwise)
Call forward +
Call transfer Call management is so great, there is much to do before
Call waiting Click-to-call a Centrex variant can be more
Last-number redial Phone lists/directories acceptable than a PBX-based
Consultation hold solution. IP Centrex may be
Calling-line ID Unified messaging
Three-way calling Outlook integration acceptable to many SME customers,
Voice mail but more capability in a hosted
Dialling features Facsimile/e-mail solution is needed for the larger, often
Extension dialling
Speed dial Instant messaging and presence distributed, enterprise or
Calling plans organization”. However, in the
Call screening future, the difference between
Other features Ringing priorities/style
Hunt groups Call accept/reject these services will certainly blur
Voice messaging and only the term of hosted IP-
Voice portal Remote offices PBX service will be used to
Web-browser-based
MACs Other advanced features designate complete and rich-
Alternate numbers/shared appearances featured hosted VoIP solutions
Auto attendant/attendant console destined to all types of
Account/authorization codes
Call centre applications
enterprises.
Nevertheless, the
Table 3.1 – Comparison of IP Centrex and hosted IP-PBX features distinction between IP Centrex
(according to [15]) and Hosted IP-PBX is still
confusing because of the lack
of unclear definition of characteristics and because of the plurality of definitions by vendors
which do not always accord with one another. For example, Cisco alleges in a white paper [17]
that in a hosted IP-PBX solution IP-PBXs located in the VoIP Service Provider’s site are
dedicated to enterprises while in an IP Centrex solution IP-PBXs are not dedicated but shared
between several enterprises.
Regardless of the exact difference, only the hosted IP-PBX solution will be considered in
this master thesis.
3.1.3.2. Deployment of a hosted IP-PBX solution

If a company does not want to invest in a new costly VoIP infrastructure and in the management
and maintenance of it, it can adopt a hosted IP-PBX solution, that is to say, it can choose to have
recourse to a service provider (also called “host provider”) offering to host at its premises an IP-
PBX and provide telephony services for the enterprise. By placing its IP-PBX in a service
provider’s premises, an enterprise displaces the intelligence in the core of the network.
The scenario of Figure 3.8 shows a hosted IP-PBX architecture adopted by a small single-site
enterprise. As illustrated, the telephony architecture is hosted off-site at the service provider
network whose access is protected by a Session Border Controller (SBC). The role of SBCs will
be explained later. Only signalling messages have to pass through the service provider’s network;
in contrast, the media traffic does not have to pass through it, so that there is no centralization of
voice media traffic. Besides, the enterprise is interconnected to the PSTN/ISDN networks via
the service provider’s network which is connected to these networks by means of gateways
(PSTN and ISDN gateways). To adopt this architecture, the enterprise must, however, purchase
new devices such as IP phones and must adapt its private enterprise network to voice traffic.
Generally, what enterprises expect from hosting are reduced capital expenditure, a single point of
contact, protection against obsolescence of VoIP equipment, security, rich PBX features accessed
by conventional-looking business phones and managed via web interfaces.
However, is a hosted IP-PBX solution better than an in-house IP-PBX solution?
26
It is very difficult to answer this question and the opinions are generally divided. The
question of service centralization is the most controversial; the detractors of hosted IP-PBX
claim that centralization is negative for IP distributed networks because it does not exploit their
resiliency [16], whereas its advocates think that centralization of voice applications for all
locations and phones at any site is positive and wished by enterprises. Besides, another
controversial question is this of enterprise’s control over its telephony system. Many enterprises
do not outsource their VoIP system in order to keep control over it and prefer to purchase and
own their own IP-PBX. But the ownership of an IP-PBX implies maintenance, expertise, costs…
That is why, according to the analysis of the InfoTech research firm [14], the real question is
“How much control is necessary and how valuable is it?” and the answer is different for every enterprise.
For example, for enterprises where the ability to quickly modify telephony configurations is a
mission-critical requirement, control over the telephony system should not be outsourced.
Some advantages of hosted IP-PBX solutions are listed below but exhaustive and detailed
lists of advantages can be found in [14] and [18]:
• In a hosted solution, initial capital expenditures are lower because the amount of
customer-premise equipment is reduced (no IP-PBX to purchase).
• With a hosted solution, any problems that do occur can sometimes be fixed more quickly
because of the centralized nature of the service and because of the 24x7 monitoring and
management in hosted networks.
• Hardware and software upgrades can be managed more easily via a hosted solution than a
premise-based solution. Since service providers always have to keep up with technology
upgrades, enterprises protect themselves from the quick obsolescence of VoIP devices.
Figure 3.8 – Deployment of a hosted IP-PBX solution in a small single-site enterprise network
Nevertheless, the main advantage of hosted IP-PBX solutions is to allow enterprises to

concentrate on their business, rather than having to expend resources on the area of competence
of service providers.
On the other hand, the greatest advantage of an in-house IP-PBX solution is its ability to
integrate customized advanced applications more adapted to the needs of the enterprise.
27
However, this advantage has lost of its strength given that, according to [18], “companies like
LignUp, Broadsoft, Tekelec, and Sylantro now offer customer site API interfaces that bridge the gap
considerably”.
However, hosted services present important challenges and problems. As C. Stredicke
mentions it in his article [19], “Though registration, call-setup and other baseline features of business-class IP
telephones are largely insensitive to the presence of a heterogeneous, long-haul broadband IP network between the
phone and its associated communication server; and though intelligible media transport can, given sufficient
bandwidth between endpoints, normally be assumed, certain kinds of stimulus-based signalling, display
management and user-feedback features do not fare well when phones and their servers are far-separated from one
another.” And he gives some real examples.
Such technical problems, in addition to the high demand of SMEs for flexibility – access
to features, manageability and ability to integrate advanced customized applications – can be
solved by the adoption of a “hybrid strategy” by service providers. This means that service
providers do not provide all VoIP features from their host platform but also from the premises
of the customer enterprise. This solution could be really adapted to customer enterprises who
want to adopt a hosted IP-PX solution while maintaining control over particular features. A
hybrid hosted solution could be enabled thanks to the SIP protocol whose distributed
architecture allows an extremely flexible distribution of functionality across the hierarchy of
proxies. According to [19], “in a SIP-based hybrid solution, local call-switching, media service, user interface
and feature support on VoIP hardphones, softphones, PC-based attendant stations and web-based
configuration/management consoles are supported locally, by premise proxy/registrar/media servers. Host
infrastructure, meanwhile, provides value in the form of 'SIP Trunking' (network-side proxy), NAT/firewall
traversal, PSTN gateway routing and internetwork signalling, IP inter-carrier peering, and high-value/high-
margin application support.” The advantages of such a solution for enterprises are numerous:
improved manageability and flexibility, ability to integrate advanced customized applications,
increased security and privacy of business-critical information and corporate directories, better
performance due to reduced latency of signalling messages… To adopt a hybrid hosted solution,
enterprises have to install proxy/registrar servers and media servers (components of a VoIP
network which provides media processing functions, such as audio announcements, IVR
(Interactive Voice Response), and multimedia conferencing) with high requirements but should
not be constrained to significantly change their existing IP network to support VoIP. To achieve
that, new devices to install should obey some criteria [19].
To conclude, hosted VoIP solutions can be viable solutions for enterprises which are
ready to do some concessions, in particular about the control over their VoIP system, and which
just wish to reap the benefits of VoIP without a deep involvement in its installation and
maintenance.
3.2. VoIP infrastructure using the SIP signalling protocol
3.2.1. Components of the VoIP infrastructure

The deployment of VoIP in enterprise networks adds new components to the IP network
which are specific to VoIP. All these new components get a network address, i.e. an IP address,
which allows to identify them. However, adding more components means increasing the number
of vulnerabilities since each component will become a new target for attackers and that the
network is rendered more complex, making it difficult to pinpoint security loopholes.
Components constituting the VoIP ecosystem are presented below.
28
3.2.1.1. Internet Protocol Private Branch Exchanges (IP-PBXs)
Definition
Very often, IP-PBXs are also called call managers, call processors, controllers, servers or
IP call servers, which brings some confusion; however, all these terms are more or less
synonymous and in this report only the term of IP-PBX will be used for clarity.
An IP-PBX is an Internet Protocol Private Branch Exchange and provides services similar to
PBX services, but whereas IP-PBXs provide services over data networks like LANs or WANs,
PBXs provide services over circuit-switched networks. The evolution from PBX to IP-PBX has
been detailed in paper [20]. An IP-PBX is a customer premises equipment (CPE) phone system
owned by the enterprise; it is very often represented as a device but is actually software running
on a server and providing telephony services for users.
One of the main advantages of an IP-PBX is the fact that it employs converged data and
voice networks. This means that Internet access, as well as VoIP communications and traditional
telephone communications, are all possible using a single line to each user. This provides
flexibility as an enterprise grows, and can also reduce long-term operation and maintenance costs.
An IP PBX is a telephone switching system within an enterprise which allows all users in the
enterprise network to share a limited number of external traditional phone lines and which
typically switches calls between two IP phones on a same local network, between IP and TDM
phones or even between two TDM phones, as represented in Figure 3.9.
Figure 3.9 – IP-PBX switching calls
Not only do IP-PBXs switch calls but they above all check the access rights of calling
users and authorize them to place calls. Besides, IP-PBXs can collaborate with other IP-PBXs of
the company or legacy PBXs. This is why an IP-PBX can be seen as a communication interface
between user terminals, VoIP gateways, other IP-PBXs and application servers.
29
Why is an IP-PBX useful to place a call? To place a VoIP call, it is necessary for the
calling terminal to know the IP address and the port number of the terminal to call. However, it
is obvious that it is impossible for a calling terminal to store all IP addresses for placing calls (in
particular, DHCP protocol makes it almost impossible). This is why an IP-PBX is useful: it stores
all telephone addresses along with the IP addresses of terminals and it is therefore capable of
mapping a telephone address to a user. For that, all IP terminals have to register their telephone
addresses with the IP-PBX. When a VoIP user dials a telephone address, his IP phone forwards
the call to the IP-PBX which tries to resolve the destination telephone address into an IP address
(network address). To achieve this resolution, the IP-PBX may interact with other telephony
servers.
Functions
To provide telephone features, IP-PBXs have to:
• Maintain dial plans
• Perform phone number translation
• Coordinate the call signalling
• Provide the signalling and control services that coordinate the media gateway functions
• Perform the accounting
A dial plan establishes the expected length (number of digits) and format (country codes,
access codes, area codes…) of a typical, valid telephone number. IP-PBXs support variable-
length dial plans: for example, it is possible to create a local dial plan that handles only local calls
and an international dial plan that allows to place international calls. In the case of international
dial plans, they must comply with E.164, an ITU-T recommendation which defines the
international public telecommunication numbering plan used in the PSTN and some other data
networks and which also defines the format of telephone numbers. E.164 numbers can have a
maximum of 15 digits and are usually written with a + prefix. To actually dial such numbers from
a normal fixed line phone the appropriate international call prefix must be used. For users to dial
out, a dialling plan must be set up on the IP-PBX that is aligned with the existing dialling plan on
the traditional PBX. A call’s destination may be local to the IP-PBX (within the enterprise), on
IP PBXs
Basic features Advanced features
PBX features Conferencing
• Speed dial, Call menus, Call forwarding, Call • Multi-way conferencing, Conference Bridge,
hold, Call waiting, Music on hold, 3-way multiple Codecs…
conferencing, Call park, Anonymous call Find-me-follow-me
rejection, Calling Line ID (CLID) blocking, • Selective call forwarding, simultaneous ringing,
Last number redial, call path…
Automated attendant Instant messaging
• Simultaneous calls, transfer to extensions,
transfer to groups, direct access from outside, Automatic Call Distributor (ACD)
distinctive ringing…
Voicemail and voice mailboxes Group management
• Password management, user management, • Ring groups, hunt groups, …
forward message to email, message notification
by email, voice mail recording and storage, … Administration features
Administration features • Rights management, scalability, …
• Managing extensions (for administrators) • Voicemail to email
• Branch office support: remote administration
of extensions in other offices
• Web-based management and administration
Table 3.2 – Basic and advanced features of IP-PBXs
30
the traditional PBX, or out to the PSTN (outside the enterprise). When a user dials a number that
has the same length and format as a predefined dial plan, the phone can recognize that dialling
pattern and route it to the appropriate SIP address automatically when it is complete.
Features
IP-PBXs provide the employees of a company with basic telephony features as well as
with elaborate ones. Basic features of IP-PBXs are the same as these of simple legacy PBXs and
are namely call switching, call completion, call connection, call termination and basic accounting.
There is no standard distinction between basic and advanced features of IP-PBXs. Very often,
this distinction is made by vendors but the line between them is unclear and many vendors have
different conceptions about what is basic and what is advanced. Some commonly accepted basic
and advanced features of IP-PBXs are listed in Table 3.2. Obviously, there are many other
advanced IP-PBX features and new features constantly appear since IP-PBXs can be extended
with new modules. Most of these features are described in an IP-PBX Buyer’s Guide 2006 [21].
3.2.1.2. VoIP media/signalling gateway

VoIP media/signalling gateways, or simply called VoIP gateways, are telephony endpoints
that facilitate calls between endpoints that usually would not interoperate. They are responsible
for interfacing VoIP communications with the traditional circuit-switched network, i.e. for
connecting the enterprise’s IP-PBX to the external PSTN/ISDN telephony network. They
provide call origination, call detection, analogue-to-digital conversion of voice and creation of
voice packets. In addition, VoIP gateways may provide optional features, such as voice
compression, echo cancellation, silence suppression and statistics gathering. VoIP gateways
encompass two main types of features and services:
• Signalling Gateway (SG) – Signalling Gateways are only responsible for the call control,
which implies the translation of the VoIP signalling protocol of the IP network into the
legacy PSTN or ISDN signalling protocol on the PSTN/ISDN network side. In the case
of the scenarios presented above, the signalling gateway translates the SIP signalling
messages into ISUP or SS7 signalling messages.
• Media Gateway (MG) – Media Gateways generally connect different types of networks
and thus, in particular, IP networks and traditional circuit-switched telephone networks.
They convert voice transported on the IP network using digital packet formats to
analogue or digital voice signals on the PSTN/ISDN network and vice versa. One of its
main functions is to convert the different transmission and coding techniques (codecs)
from one network to another. Media streaming functions such as echo cancellation, and
tone sender are also located in the Media Gateway.
These gateway functions can be distributed in different devices in the network or collocated in a
single device. In the previously presented scenarios, a single system provide both the media and
signalling gateway functions and interfaces between the VoIP network and traditional phone
network: the VoIP media/signalling gateway.
3.2.1.3. Conference Bridge

A Conference Bridge (not represented in the figures of this report) is a server that allows
multiple users to talk to one another. They provide the means to have 3-point or multi-point
conferences that can either be ad-hoc or scheduled. In an enterprise network, employees who
want to make a conference call with other internal or external employees or external customers
connect via their phones to the enterprise conference bridge. Conference bridges are usually
dedicated servers with special media hardware because they have to meet high resource
requirements.
31
3.2.2. SIP architecture and its components
Without entering the details of the SIP protocol which will be explained later in 4.3, I
have decided to briefly introduce here the SIP architecture and its components in order to show
in 3.2.3 how it is possible to combine the SIP architecture and IP-PBX VoIP architecture.
The Session Initiation Protocol (SIP) is a signalling protocol for setting up sessions
between users in a network. SIP allows endpoints on the Internet to discover one another in
order to exchange signalling messages necessary to the establishment of a session between them.
SIP is a multi-service protocol capable of initiating sessions involving different forms of real-time
communications simultaneously. Users in a SIP network are identified by globally unique SIP
addresses, called SIP URLs. A SIP address is similar to an e-mail address and is in the format of
sip:userID@gateway.com. The user ID can be either a username or an E.164 address.
Figure 3.10, Figure 3.11 and Figure 3.12 represent briefly the SIP architecture and
different ways of establishing a SIP session for placing a call. These figures show the components
defined by the SIP protocol which can be distinguished into two types: SIP User Agents (UAs),
endpoints of a call, and SIP servers, network devices that handle the signalling of several calls.
There are four different kinds of SIP servers: SIP proxy servers, SIP registration servers, SIP
location servers and SIP redirect servers. The main function of the SIP servers is to provide
name resolution and user location, since callers are unlikely to know the IP address or hostname
of their called parties, and to pass on messages to other servers using next hop routing protocols.
Figure 3.10 – SIP architecture – SIP session through a Proxy Server
SIP User Agents are endpoints which use the SIP protocol to find one another and to
negotiate the characteristics of a session established between them. User Agents are usually
applications running on a user’s computer but can also be IP phones. A User Agent consists of
two functionalities: a User Agent Client (UAC) and a User Agent Server (UAS). User Agent
Clients are used to initiate calls, whereas User Agent Servers to respond to call requests. The
32
UAC and the UAS can be located on the same device such as an IP-phone or be part of the same
telephony application on a user’s computer. SIP calls can be made directly to another UA, or
through a SIP Redirect Server, or through a SIP Proxy Server.
SIP Proxy Servers are fundamental entities in the SIP architecture, since the SIP
architecture can be seen as a net of SIP Proxy Servers. They route session invitations (called
INVITE messages) according to the User Agent Server’s current location, authentication,
accounting… which means that SIP Proxy Servers can provide functions such as authentication,
authorization, network access control, routing, reliable request retransmission, and security.
A SIP Proxy Server handles SIP invitation requests emitted by User Agent Clients. It
intercepts these requests, contacts a Location Server to resolve the User Agent Server’s SIP
address into a network address. If the User agent Server has moved and is registered with a SIP
Registrar in a different domain, the Location Server will only return the address or domain name
of the next SIP Proxy Server to contact. The requesting Proxy Server will then forward the
invitation request to the SIP proxy server indicated by the Location Server, which will forward it
to the next Proxy Server and so on until this invitation request reaches a SIP Proxy Server which
knows the current actual location of the UAS. The last SIP Proxy Server will then forward the
request directly to the User Agent Server which will accept or decline the session invitation. No
matter what the UAS will respond, it will send a response message which will go through all the
SIP Proxy Servers traversed by the corresponding invitation request.
SIP Proxy Servers can have two operating modes: a stateless and a stateful. A stateless
Proxy Server forgets all the information about a session once it has forwarded an invitation
request whereas a stateful Proxy Server remembers the incoming requests it receives, along with
the responses it sends back and the outgoing requests it sends on. Stateless Proxy Servers are
likely to be the backbone of the SIP architecture while stateful Proxy Servers are likely to be the
local devices close to the User Agents, controlling domains of users. Besides, there are also two
kinds of stateful Proxy Servers: Transaction Stateful Proxy Servers, which maintain state of a
SIP transaction, but may not see all transactions associated with a session (e.g. BYE messages)
and Call Stateful Proxy Servers, which are in the call path from the setup to the tear-down of a
session and which can provide information about the call to users or service providers.
Besides, there is another way to classify SIP Proxy Servers: they can be outbound proxies,
which means that they are used by a UAC to route an outgoing request, or inbound proxies, in
that case they are servers which receive incoming requests, or forward proxies, i.e. intermediate
proxies.
SIP Redirect Servers perform the same resolution function as SIP Proxy Servers; this
means that when they receive a SIP invitation request, they ask a Location Server to resolve the
User Agent Server’s SIP address into an IP address, obtain the UAS’s IP address and return this
information in a Redirection Message to the original User Agent Client. Then, the User Agent
Client must send its invitation request directly to the resolved address.
SIP Redirect Servers and SIP Proxy Servers may be integrated. According to RFC 3261 [34], a
Redirect Server is a server that accepts a SIP request, maps the address into zero or more new
addresses and returns these addresses to the client. Unlike a proxy server, it does generate SIP
requests on behalf of UAs and it does not accept calls. Redirect Servers are used to reduce the
processing load on Proxy Servers that are responsible for routing requests, and improve signalling
path robustness, by relying on redirection.
A SIP registration server, or SIP Registrar, is a SIP entity which receives registration
requests from User Agents present in its domain. From these registration requests, the SIP
Registrar extracts information about the current location of UAs such as their IP address, the
port and the username (or User ID of the SIP address) and provides it, upon request, to a
Location Server. The SIP Registrar is consulted by SIP Proxy Servers or SIP Redirect Servers so
that they route correctly SIP requests. In order to increase the speed of processing, SIP Registrars
are often collocated with SIP Redirect or SIP Proxy Servers.
33
A SIP location server stores information about User Agents (IP address, port and
username) which has been supplied by multiple SIP registration servers. The SIP Registration
Servers are not the only sources of information of Location Servers [von2001-sip-location-
servers]. Besides, SIP Proxy Servers can consult DNS Servers (SRV Records) instead of SIP
Location Servers to locate inbound proxies.
It is very important to note that SIP components are often only logical entities, which
explains that some of them are very often collocated in particular in order to increase the speed
of processing.
SIP operation
The above Figure 3.10 shows also the transactions between the different SIP components
when a SIP session is established through a SIP Proxy Server. More precisely, the User Agent
Client of a Business Manager (sip:bus_man@site1.com) sends an INVITE message (1) destined
to an IT Manager (sip:it_man@site2.com); this invitation request is intercepted by the Proxy
Server that serves the Business Manager’s domain. The Proxy Server looks up in the Location
Server of its network domain (2) for the domain in which the IT Manager is currently located.
Since the IT Manager is not registered in this domain and that the IP address of the UAS does
not appear in the location database, the Location Server sends (3) the IP address of the next
proxy to the Proxy Server. The Server Proxy then forwards (4) the invitation requests to the next
proxy server. The next Proxy Server does as its predecessor and asks a Location Server (5). It
then forwards the invitation request (7) according to the response sent by the Location Server (6).
The next Proxy Server, which is collocated with a SIP Registration Server, knows the actual
current location of the IT Manager and forwards the invitation request to the IT Manager’s UAS
(8). It is supposed that the UAS accepts the invitation request for a session establishment. Then,
the response message will take exactly the same path as the request message (9, 10, 11 and 12).
Upon receipt of the response message, the Business Manager’s UAC acknowledges (this is not
represented in Figure 3.10) and sends backs an acknowledgement message (ACK) to the IT
Manager’s UAS passing through the same path. Then, a session is initiated and a media stream
can flow between the Business Manager and the IT Manager (13).
Figure 3.11 – SIP architecture & SIP session establishment through a SIP redirect server
34
In contrast, the above Figure 3.11 shows the transactions between the different SIP
components when a SIP session is established through a SIP Redirect Server. The User Agent
Client of the Business Manager sends an INVITE message (1) destined to the IT Manager
(sip:it_man@site2.com); this invitation request is intercepted by the SIP Redirect Server that
serves the Business Manager’s domain. The SIP Redirect Server contacts (2) the Location Server
to determine the path to the IT Manager’s User Agent Server, and then the SIP Redirect Server
sends (4) that information back to the Business Manager’s UAC. The Business Manager’s UAC
then acknowledges receipt of the information (this acknowledgement message has not been
illustrated). The Business Manager’s UAC then sends a request (5) to the device indicated in the
redirection information (which could be the IT Manager’s UAS or another SIP Proxy Server that
will forward the request). Once the request reaches the IT Manager’s UAS, the IT Manager’s
UAS sends back a response message (6) and the Business Manager’s UAC acknowledges the
response (not represented).The communication session between the Business Manager and the
IT Manager is established. During the process of locating a SIP UAS, a SIP Proxy Server can
simply forward an invitation request to other Proxy Servers until it arrives at one that knows the
IP address where the called user can be found or it can consult a SIP Redirect Server which will
redirect an invitation request to the SIP Proxy Server originating this request. This case is shown
in Figure 3.12.
Figure 3.12 – SIP architecture & SIP session establishment through SIP Proxy servers and Redirect servers
In Figure 3.12, the Business Manager, with the sip:bus_man@site1.com SIP URI and
residing in the site1.com domain, starts its SIP User Agent Client on his computer and sends an
INVITE message (1) destined to sip:it_man@site1.com which is the SIP URL of an IT Manager
who usually resides within the same domain as the Business Manager. This message is intercepted
by the local Proxy Server of his domain which in turn sends (2) the INVITE message to a local
SIP Redirect Server to try and identify the current address of sip:it_man@site1.com. The
Redirect Server determines that the User Agent Server does not presently reside within the local
35
domain but can be found in another domain, the site2.com domain. The Redirect Server returns
(3) this information to the Proxy Server in a 302 “Moved Temporarily” response which lists the
new address to try for the IT Manager as sip:it_man@site2.com. This message represents a final
response to the INVITE, so the Proxy Server acknowledges this response (not represented).
Then the Proxy Server has a choice: it can either return the 302 response directly to the Business
Manager for him to try or it can try the suggested location itself on the Business Manager's
behalf. In this figure, the Proxy Server attempts to locate sip:it_man@site2.com by modifying the
original INVITE message and sending it on. As this Proxy Server does not know any other Proxy
Server that controls the domain site2.com, it routes (4) the invitation request to a so-called
stateless Proxy Server. This Proxy Server sends (5) a request to a Location Server to get the
address of the next Proxy Server to which it should route the invitation request. The Location
Server replies (6) by a response message containing the address of the next Proxy Server. The
invitation request is then sent (7) to this next Proxy Server. The next Proxy Server does control
the domain site2.com (this is a stateful Proxy Server). This last Proxy Server then locates the IT
Manager and sends (8) the INVITE message to sip:it_man@site2.com. It is supposed that the IT
Manager accepts the invitation request by responding (9) with a 200 OK message. This response
message follows the same path that has been taken by the invitation request back to the Business
Manager. Upon receipt of the 200 OK message, the Business Manager acknowledges by sending
an ACK message (not represented). This ACK message could be sent directly to the IT Manager
but, in this figure, there were two stateful Proxy Servers which have routed the INVITE request
and which want to remain in the signalling path for the duration of the session. As a
consequence, the ACK message is routed through both of these proxies, as will any subsequent
SIP messages related to this session, such as those for call tear down (BYE messages).
Now that the SIP architecture has been presented, it will be interesting to see how the
VoIP architecture and the SIP architecture can work together.
3.2.3. VoIP architecture with IP-PBX using SIP

The main differences between a VoIP IP-PBX architecture and a SIP architecture is that
the first one is a centralized client-server architecture with the IP-PBX in its centre whereas the
second one is peer-to-peer. A centralized architecture is the most adapted for enterprise networks
since it is necessary to control communications, to keep call and billing records and to control
calls to be placed over the PSTN/ISDN network. Besides, the components in these two
architectures are different with different roles and the addressing system also: whereas IP-PBX
architectures work numbering plans, SIP works with URLs. How to combine both architectures?
In paper [13], it is suggested that in order to deploy the VoIP architecture with IP-PBX
using SIP, the SIP logical entities Proxy Server, Redirect Server and Registration Server should be
collocated and integrated in an IP-PBX core. That way, the IP-PBX would become the outbound
Proxy Server and the Registrar Server of all extensions in the enterprise network.
Figure 3.13 illustrates the combination of a VoIP architecture with IP-PBX and a SIP
architecture. An IP-PBX core encompasses the following SIP entities: Redirect Server, Proxy
Server and Registrar Server. The Business Manager wants to call the HR Manager
(sip:hr_man@site1.com) who is located in the same SIP domain; his SIP User Agent Client
initiates a session by sending (1) an INVITE message to the SIP Proxy which is now an
embedded function of the IP-PBX core. The IP-PBX consults a Location Server to resolve the
SIP URL sip: hr_man@site1.com and once it gets a response containing the corresponding IP
address (not represented), it forwards (2) the message to the User Agent Server of the HR
Manager. The HR Manager’s UAS sends (3) an 200 OK message to accept the call, which passes
through the IP-PBX and is forwarded (4) to the Business Manager’s UAC. The latter
acknowledges the message (not represented) and the call session is initiated.
36
Figure 3.13 – VoIP architecture with IP-PBX using SIP
However, there are some problems with this IP-PBX core structure. If the Proxy Server
is stateless, once the call will be torn down by one of the parties, the SIP BYE message will
bypass the SIP Proxy Server (see Figure 3.14) which will not be able to perform accounting and
billing functions for example. This is why a Call Stateful Proxy Server might seem a solution to
solve this problem. Call Stateful Proxy Servers insert into requests a Record-Route header; this
header is used to force all requests between UAs to be routed through the Proxy Server (Figure
3.15).The Record-Route header is then used by UAs to route subsequent requests. The Request-
URI is set to the first Route header.
Figure 3.14 – SIP call flow with a stateless Proxy Figure 3.15 – SIP call flow with a Call Stateful Proxy
Server Server
The paper [13] also suggests that instead of a Call Stateful Proxy Server, an IP-PBX can
use a Back-to-Back User Agent (B2BUA) entity.
A Back-To-Back User Agent (B2BUA) is a SIP logical entity which acts as a User Agent
to both end users of a SIP call. It receives requests as a User Agent Server (UAS) and in order to
37
respond to them, it acts as a User Agent Client (UAC) and generates requests. In contrast to SIP
proxy servers which are intermediary servers responsible for forwarding SIP requests to a
destination UAS on behalf of the UAC, i.e. that they only handle the routing of SIP signalling but
do not initiate SIP messages (INVITE or BYE messages), B2BUAs initiate SIP requests to create
or disconnect calls and can modify message or body contents. B2BUAs participate actively in all
calls from their establishment to their tear-down since all signalling messages pass through them.
They are responsible for handling all SIP signalling and maintain complete call state. Each call is
tracked from beginning to end, allowing the operators of the B2BUA to offer value-added
features to the call. B2BUAs may provide the following functionalities:
• Centralized call management (billing, automatic call disconnection, call transfer…). This
feature gives the network operator tight call management and is particularly important
when the SIP platform is administering PBX functionality, as well as general call control
such as automatic disconnect of calls (e.g., if the caller runs out of pre-paid minutes) or
call modification (e.g., changing codecs).
• Interworking with alternative networks by actively processing signalling and message
bodies throughout the duration of the call. The B2BUA may act as a bridge between SIP
and H.323 networks, where the B2BUA can convert SIP signalling to H.323 signalling,
and vice versa, an important feature for service providers with next generation networks
who are now having to support both types of IP end points and signalling, SIP and H.323
• Hiding of network internals (private addresses, network topology…)
A B2BUA is used in a wide array of SIP-based products to provide varying functionalities. For
example, a B2BUA can be used in IP-PBXs to provide features like Call Transfer or Music on
Hold. The wide range of features a B2BUA can implement can be found in [24].
Figure 3.16 – IP-PBX with B2BUA architecture
Figure 3.16 illustrates the case of an IP-PBX with a B2BUA entity and the exchange of
messages between two UAs located on the same site. The Business Manager’s UAC sends (1) an
INVITE message to the IP-PBX core. The IP-PBX core’s UAC generates and sends (2) a new
38
INVITE message to the HR Manager’s UAS; the latter sends (3) after a while an OK message
received by the IP-PBX’s UAS. The IP-PBX core’s UAC generates and sends (4) a new OK
message to the Business Manager’s UAS. Once the session is established, voice can be
transmitted (5). The call tear-down is done when the Business Manager’s UAC sends (6) to the
IP-PBX’s core a BYE message; then, the IP-PBX’s UAC generates and sends (7) a new BYE
message to the HR Manager’s UAS. The latter will then acknowledge (8) and upon receipt of the
ACK message by the IP-PBX’s UAS, the IP-PBX’s UAC sends (9) in turn the ACK message to
the Business Manager’s UAS. Asterisk is a famous Back-to-Back User Agent.
3.2.4. VoIP architecture using the Asterisk server

These last few years have been marked by a significant proliferation of open source VoIP and
PBX software. Asterisk is one of the most interesting open source telephony platforms and has
started to be adopted massively by many enterprises.
Asterisk (www.asterisk.org), also called Asterisk server, is open source software which has
multiple telecommunication functions.
First of all, it is an implementation of a legacy PBX as well as an
implementation of an IP-PBX. This means that the Asterisk server can
manage calls placed by analogues phones which are connected directly to it,
as well as calls placed by IP phones also directly connected to it. To manage these calls, Asterisk
must be configured to have a dial plan. Asterisk supports many communication protocols,
namely SIP, H.323, MGCP (Media Gateway Control Protocol) and even some limited proprietary
Cisco SCCP (Skinny Client Control Protocol) support. It has the particularity to have its own
communication protocol, the Inter-Asterisk Exchange (IAX) protocol [26]. This protocol
combines both control and media services, contrary to SIP which is only a signalling protocol.
The IAX presents all the attractive characteristics of the SIP protocol but unlike the SIP protocol
which uses one port for signalling and one port for voice transmission (out-of-band RTP stream),
IAX uses a single UDP port (usually 4569) for signalling and voice transmission (in-band RTP
stream). This explains why IAX has fewer difficulties with firewall traversal and NAT. IAX
protocol to enable VoIP connections between Asterisk servers and between servers and clients
that also use the IAX protocol. Another benefit of IAX is its trunking capability. When trunking,
data from multiple calls are merged into a single set of packets, meaning that one IP packet can
deliver information for more than one call, reducing the effective IP overhead without creating
additional latency. This is a big advantage for VoIP users, where IP headers are large percentage
of the bandwidth usage.
Asterisk can serve as VoIP signalling server, in particular as a SIP server, is a SIP
Registration server, a SIP Location server and is a SIP Back-to-Back User Agent (B2BUA) entity
[27]. It acts as a UA endpoint, that is why, it cannot be considered as a SIP Proxy Server. Asterisk
controls a call from an IP phone to another IP phone by acting as the endpoint UAS of the call
originated by the caller’s UAC, and then, as a UAC, it creates a new call to the receiving phone.
Therefore, it can maintain call state and controlling. The RTP streams may go directly from
phone to phone or may go through Asterisk's media bridge.
Asterisk can also be a VoIP media/signalling gateway, allowing VoIP systems to
interface with analogue phones, faxes, PSTN and ISDN lines…
Asterisk constitutes also platform for many services: for example, it is an Interactive
Voice Response (IVR) platform, i.e. a platform supporting a technology that automates
interaction with telephone callers and which is a way for enterprises to reduce the cost of service,
support… calls to and from their company. Asterisk includes many features like Voice Mail
services with directory, conference calling, and automatic call distribution.
Nowadays, Asterisk knows such popularity that enterprises which decide to deploy a
VoIP system in their premises increasingly install Asterisk servers. To benefit from this demand
39
explosion, Digium, the company which has developed Asterisk, has released an Asterisk for
Business package.
Let’s now see briefly, based on [28], how it is possible to deploy a VoIP system using
Asterisk servers in a small single-site enterprise and in a large multi-site enterprise.
In the scenario of Figure 3.17, a small single-site enterprise has decided to install an
Asterisk server serving as IP-PBX and as ISDN Gateway. To attach ordinary telephones to an
Asterisk server, or to connect to PSTN trunk lines, the server must be fitted with special
hardware. Digium and a number of other companies sell PCI cards to attach telephones,
telephone lines, T1 and E1 lines, and other analogue and digital phone services to a server.
The Asterisk server is connected to the ISDN network by adding a BRI ISDN card (for example,
TE 110P of Digium, allowing 30 lines).
In the scenario of Figure 3.18, this is a large multi-site enterprise which has decided to
install an Asterisk server acting as IP-PBX, ISDN and PSTN gateways in each of its branches.
The Asterisk server, installed in the headquarters (Asterisk 1), acts as server for the IP phones
and softphones of the headquarters, as well as for the Asterisk server installed in the small remote
branch. Asterisk 2 is located in the small remote office’s network and acts as a server for the
local IP phones and softphones and as a User Agent for Asterisk 1.
40
Figure 3.17 – VoIP deployment using Asterisk servers in a small single-site enterprise
Figure 3.18 – VoIP deployment using Asterisk servers in a large multi-site enterprise
41
3.3. User Access infrastructure: IP phones
User access infrastructure includes terminal devices and access paths through which users
make use of the VoIP service in enterprises. Users may access the VoIP system of the enterprise
from inside the enterprise, i.e. from the intranet or from outside the enterprise, i.e. the Internet.
Within the enterprise, generally users can access the VoIP infrastructure by having access to the
enterprise LAN either by a wired connection or a wireless connection. From outside the
enterprise premises, users can connect to the VoIP infrastructure via a remote access system like
virtual paths (IPsec tunnels, SIP Trunking…).
The main terminal devices in VoIP architecture are the so-called IP phones and
softphones. Of course, in VoIP architecture it is also possible to find analogue or ISDN phones
connected to enterprise LANs via adaptors (see Figure 3.3).
IP phones, also called hardphones, are specialized phones having the appearance of
normal phones and being able to connect directly to the IP network of the enterprise. They have
their own operating system with a TCP/IP stack, VoIP services and VoIP protocols (SIP,
RTP…). IP phones include a variety of network connectivity solutions such as Ethernet, IrDA,
Bluetooth and Wifi (802.11a/b/c) access. This is why there are several types of hardphones:
standalone Ethernet hardphones supporting only voice, cordless hardphones, WiFi phones,
hardphones supporting voice and video…
Standalone Ethernet hardphones are the most widespread IP phones and generally the
best solution for VoIP telephony in enterprises. Instead of having a conventional RJ-11 phone
connector, like classical TDM phones, they have an RJ-45 Ethernet connector allowing them to
directly communicate with a VoIP server, a VoIP gateway or another IP phone. Since they can
communicate with these devices, they can make and receive VoIP calls.
Nowadays, WiFi IP phones, also called WLAN IP phones, are already being
manufactured. WiFi IP phones are hardphones with a built-in WiFi transceiver unit instead of an
Ethernet port to connect to a WiFi base station and from there to a remote VoIP server. That
way, it is possible to make and receive VoIP phone calls from a wireless phone. For WiFi phones
to have access to the VoIP system of an enterprise, the enterprise must have several WiFi access
points.
Softphones are
software
implementing IP
phones. A softphone
is usually installed and
running on a
computer and it
requires appropriate
audio hardware to be
present on computer
they run, like a sound
card with speakers or
earphones and a
microphone.
Softphones, even
cheaper or even
available as a free
download, are not the
best solution for
Figure 3.19 – Initialization process of Cisco IP phones (source: [23]) enterprises because
employees must be
42
always reachable or at least, their phones must be able to register voice messages when they are
absent; but softphones work only when the employees have switched their computer on.
However, softphones are the best solution for mobile employees working on their laptops on the
road. Some softphones support only voice and others voice coupled with video.
Since this master thesis focuses on the SIP protocol, only SIP phones and SIP softphones
will be considered. SIP phones and softphones communicate with VoIP servers, gateways or
phones with SIP messages, i.e. they support the SIP protocol. From the SIP architecture point of
view, SIP phones and softphones have the same role: they are SIP clients or SIP User Agents.
(See definition in 2.3.2.2). SIP phones and softphones can initiate SIP requests or respond to SIP
requests, respectively acting as User Agent Clients or User Agent Servers. VoIP media/signalling
gateways act also as SIP User Agents to IP-PBXs, since they are the entities in the IP network
and VoIP system which communicate with SIP messages on behalf of analogue or digital TDM
phones.
For a later analysis of threats to IP phones, it is necessary to understand the initialization
process, i.e. how IP phones interact with other VoIP components as soon as they are connected
and how they initialize. The initialization phase of IP phones differs from manufacturer to
manufacturer; however the basic steps are the same. As illustrated in Figure 3.20, once plugged
in, the IP phone requests an IP address from the DHCP server responsible for the voice VLAN.
The DHCP server gives an IP address to the IP phone and additional information like the
address of the TFTP server and/or IP-PBX. Then, the phone connects to the TFTP server and
receives configuration information and/or updated firmware. In the next step, the IP phone
connects to the IP-PBX and registers with it. The IP-PBX configures the IP phone’s operational
features like the phone number and others. When a phone has registered with the IP-PBX, it is
then operational. The main difference between initialization processes implemented by different
manufacturers is about how an IP phone can find and contact the DHCP server of the voice VLAN. In
Figure 3.19, the initialization process of Cisco’s IP phones shows that IP phones and switch
exchange Cisco Discovery Protocol (CDP) data. CDP is a Cisco proprietary protocol which
allows interconnected Cisco devices to learn about each other. The switch uses CDP to tell the
IP phone VLAN is used for voice traffic. Once the IP phone knows which the voice VLAN is,
it tags all its data frames with the appropriate 802.1q tag and so it can request an IP address from
the DHCP server of the voice VLAN. However, other manufacturers like Avaya or Nortel have
designed that an IP phone should first request an IP address from the data DHCP server. The
DHCP server would then
give the IP phone an IP
address as well as
configuration information
including the VLAN to use.
On receipt of this data, the IP
phone would reboot and
would send its Ethernet
frames with the right VLAN
tag. Differences between
initialization processes of
these manufacturers’
products have been well
described in [23].
The knowledge of the
initialization processes of IP
phones will be useful later in
the detection of possible
Figure 3.20 – Process flow of initialization process of IP phones security vulnerabilities.
43
3.4. Management system infrastructure
A management system infrastructure provides the VoIP management functions. It
enables the system administrators of the enterprise to configure, customize and maintain every
entity of the VoIP system. All entities of a VoIP system like IP-PBXs, application servers, VoIP
gateways and others are monitored by a manager deployed in a Network Operation Centre
(NOC) which analyzes information gathered by sensors embedded in each of these devices. That
way, system administrators can manage VoIP components remotely; the easy management of
data collected and analyzed by sensors can be performed through the use of a web interface.
Remote management through web interface can give birth to additional security vulnerabilities
in a VoIP system.
An increasing number of VoIP management tools and solutions have emerged and
enterprises have a wide choice.
3.5. Introduction to security threats to VoIP

The previous paragraphs of this chapter aimed at introducing the infrastructure and the
deployment of SIP-based VoIP systems in enterprise networks. The last years, VoIP deployment
has known an impressive boom and enterprises have increasingly adopted it. However, this
development is restrained by the security barrier. Indeed, VoIP introduces new security risks in
enterprises and new vulnerabilities to be exploited by attackers. Thus, VoIP security accounts
mainly for the hesitation of enterprises to deploy VoIP.
The current paragraph aims at introducing the notion of VoIP security which is the core
subject of this master thesis and which will be deeply analyzed in Chapter 5 and gives an example
of a particular security attack on VoIP systems: ARP poisoning (also called ARP spoofing).
Let’s see how employees can listen to the VoIP conversations of other employees in the
same enterprise. Of course, conversations may be trivial as well as they may also contain critical
information for the enterprise, that is why such a simple attack can have severe consequences for
the enterprise. Employees can listen to VoIP calls between two other employees who use phones
connected to the IP network by performing an ARP Poisoning attack also called ARP Poison
Routing (APR), or even ARP Spoofing attack; this kind of attack is made possible by the use of
program tools, such as Cain & Abel, available on the Internet as free download. Thus it is an
attack easy to perform. The ARP poisoning attack exploits a vulnerability of the Address
Resolution Protocol (ARP) and it can give way to another type of attack: the Man-in-the-Middle
attack (MITM). A MITM attack occurs when a third party, the attacker, poses as the other
party in a communication which allows an attacker to monitor, record, obstruct, or modify
passing information.
On an Ethernet-based IP network, when a host A wants to communicate with a host B
whose IP address (Layer 3 address) is known but MAC address (Layer 2 address) is unknown to
it (i.e. the MAC address is not stored in the ARP cache table of A), it broadcasts an ARP request
asking for the MAC address of host B given its IP address. This ARP request is received by all
hosts on the same segment as host A. If host B is on the same segment as host A, it is the only
one which will reply by sending a unicast ARP response including its MAC address and its IP
address. On receipt of the ARP response, host A will store in its ARP cache table the association:
MAC address of B ↔ IP address of B. Then, every time that host A will wish to send IP packets
to host B, it will first consult its ARP cache table and then insert the MAC address of host B that
it has found in the cache, as destination address in the Ethernet frames. This ARP cache table
will be used for future communications, until information is outdated.
44
The vulnerability which is exploited in ARP spoofing is the legitimacy of ARP Replies
which is not checked because there is no authentication between hosts for ARP
Requests/Replies. Besides, since ARP is a stateless protocol, a host cannot know if the ARP
Reply that it receives is the reply to an ARP Request it has previously sent. The idea behind ARP
poisoning is to force the victim hosts to update their ARP cache tables with false MAC address
information. To achieve that, the attacker sends ARP Replies with a false MAC address to victim
hosts, which immediately update their ARP cache tables, without being able to check that they
had not previously send ARP Requests corresponding to these ARP Replies.
Figure 3.21 – ARP poisoning attack in a small enterprise network

Let’s see now how ARP poisoning is performed in the enterprise network illustrated in
Figure 3.21. A malicious secretary from the direction department wants to listens to VoIP calls
between the Business Manager and the Accountant, both of them located in the same user
segment demarcated by User Segment Router 1. The skilled malicious secretary can use the Cain
& Abel tool to perform an ARP poisoning and then a MITM attack; this tool contains a VoIP
call sniffer/recorder; the use of this tool is thoroughly described in [30] and [31].
With this tool, the secretary’s computer, which is connected to the same broadcast domain as her
victims, will, one the one hand, send to the Business Manager’s computer an ARP Reply
containing the IP address of the Accountant associated to her own computer’s MAC address
and, on the other hand, it will send to the Accountant’s computer an ARP Reply containing the
IP address of the Business Manager also associated to her own computer’s MAC address. Thus
the ARP cache tables in the Business Manager’s and the Accountant’s computers will contain
new fake information presented in Figure 3.22. (Of course, the attacker has to regularly poison
his victims, i.e. to send regularly the same fake ARP Replies to the victim hosts, since their
computers dynamically and regularly update their ARP cache: after a timeout period, the contents
of ARP caches are removed from the tables).
45
After the poisoning phase, the malicious secretary can proceed to a MITM attack. This means
that when the Business Manager will want to place a VoIP call to talk to the Accountant, the
information will be encapsulated in Ethernet frames with the MAC address of the attacker as
destination address. That way, the malicious secretary will receive all IP packets destined to the
Accountant and be able not only to eavesdrop but also to manipulate them (even if it is very
difficult because of the real-time nature of VoIP calls). The attacker will then forward the packets
to the Accountant so that both parties do not understand that they are eavesdropped. Reversely,
the Accountant’s data destined to the Business Manager will also be redirected through the
attacker’s computer (see Figure 3.21). By intercepting all packets and by using the Cain & Abel’s
functions, the secretary’s computer can record or play in real-time all SIP-based VoIP calls before
passing the packets to the actual intended receiver. Packets of a same SIP session are extracted
and reassembled by Cain & Abel and then are saved into WAV files. The secretary can then use
the registered conversations for malicious purposes such as blackmail or industrial espionage.
If now, always in the scenario of Figure 3.21, a malicious programmer wants to listen to a
conversation between the Business Manager and the IT administrator whose computer is
connected to a different broadcast domain as the Business Manager’s, the attack becomes more
complicated. Since routers do not route ARP messages, it is
not possible for the malicious programmer to poison the
ARP cache of the Business Manager’s computer; this is why
the malicious programmer cannot intercept all VoIP calls
between the Business Manager and the IT administrator in
particular. However, according to [32], what the malicious
programmer can do is to poison the User Segment Router 2,
the router demarcating the broadcast domain where the IT
administrator and he are located. That way, the malicious
programmer can sniff all traffic between the IT
administrator’s computer and every host outside his subnet
and all traffic between the User Segment Router 2 and the IT
Figure 3.22 – Cache tables of ARP
Poisoning victims administrator’s computer. (However, some considerations
must be taken into account [32].) By doing so, the malicious
programmer can listen to conversations between the IT administrator and any host from inside
the company on other segments.
Experiment
The ARP Poisoning attack turned out to be successful and rich in information when my
colleague P. Lawecki and I have performed it with the Cain & Abel tool in the premises of the
Alcatel student network. In this network, we have installed the required SIP softphones (X-Lite)
on a few computers, and then we have installed and configured an Asterisk server and created a
dialling plan. In the next step, we have installed the Cain & Abel software on a laptop of a
presumed attacker who would have infiltrated in the Alcatel premises and would have connected
his computer to the network. By performing this attack, it was made possible to easily eavesdrop
on all conversations on this network (as well as to steal passwords). This attack has made us
aware of the consequences of
To conclude, this chapter has presented the models of VoIP deployment that have been
designed to serve later as models for the study of VoIP security in enterprise networks. These
models of VoIP deployments represent the several steps of VoIP migration, namely the IP-
enabled, hybrid and full-IP VoIP stages, and rest on the networks of a small enterprise and a
large enterprise with different architectures and thus needing different VoIP deployments. Last,
the ARP Poisoning attack against an enterprise VoIP system is one type of attack among a
multitude of attacks and has just served in this chapter as an introductory example before the
presentation of security issues in Chapter 5.
46
4. Technological background of VoIP using the SIP
protocol
4.1. Choice of the SIP protocol

Before addressing the thorny question of security in VoIP deployments, it is high time to
have a look at the technological background of VoIP. VoIP is not a protocol in itself but is a
communication service enabled by a set of numerous protocols, called VoIP protocols, which
govern the signalling and the transport of voice (see Figure 4.1).
In this master thesis,
the Session Initiation Protocol
(SIP) has been focused in
particular because it is a simple
and ever promising protocol. It
is not a standard protocol but it
is touted as the predominant
VoIP signalling protocol of the
future, gaining, through misuse
of language, almost the title of
the standard protocol. The major
Figure 4.1 – VoIP protocols stack standard opposed to SIP is
H.323 [35], which is the
international standard for multimedia communication over packet-switched networks. H.323 is
used in VoIP, Internet telephony and IP-based videoconferencing.
H.323 belongs to a family of the International Telecommunication Union (ITU)
recommendations for multi-media interoperability called H.3x. It was first defined by the ITU in
1996 and since then it has been regularly updated. The most recent version is H.323 version 5
released in 2003 but a new version, presented in June 2006, is to be published in the next months.
H.323 was originally created to promote compatibility in videoconference transmissions over IP
networks and to provide consistency in audio, video and data packet transmissions over LANs
without guaranteed quality of service. To achieve these goals, the H.323 standard provides a
description of how multimedia communications can occur between terminals, network
equipment and services. It is based on the Real-Time Transport Protocol (RTP) and the Real-
Time Control Protocol (RTCP) defined by the IETF, with additional protocols for call signalling,
and data and audiovisual communications.
Is H.323 better than SIP? This is a controversial question that has raised many passionate
debates which have not really come to a sheer result; however, according to the Comparison of
H.323 and SIP by Schulzrinne and Rosenberg [40], in terms of complexity, scalability, extensibility
and services, SIP presents a significant advantage over H.323 because it provides far lower
complexity, rich extensibility and better scalability. However, H.323 and SIP are similar in terms
of services provided. It would also be interesting to read other comparisons, such as [38], [39]
and [41], which are comparing both protocols according to different criteria.
Even if H.323 was created the same year as SIP, i.e. 1996, it has been earlier and more
widely deployed than SIP. The reason is that SIP has lost the “battle of time”: delays in the
release of standards for SIP have brought delays in its market adoption and deployment, and this,
in favour of H.323. The latter has been published as a standard in early 1996, while SIP saw its
first draft published in 1996, its first standard, the RFC 2453, published in 1999 and after
revisions, its currently standard published in 2002 as RFC 3261 [34].
47
H.323 continues to be the most widely deployed VoIP signalling protocol in use today; in
particular, it is widely deployed by service providers for transporting international voice calls.
Besides, H.323 is still very dominant in particular in VoIP deployments of large enterprises which
have adopted it before the advent of SIP in the marketplace and which are not flexible enough or
are not willing to re-invest to replace H.323 with SIP. However, SIP is rapidly gaining acceptance
in the market: for example, SMEs which decide nowadays to deploy a VoIP system are more
prone to adopt the SIP protocol than H.323. Another example showing this acceptance is that in
March 2006, Cisco, usually promoting Skinny (proprietary signalling protocol of Cisco), MGCP
(Media Gateway Control Protocol) or even H.323, has announced that, for the first time, Cisco VoIP
products would support SIP; for example, its Call Manager (IP-PBX) would be SIP-compliant
and its IP phones and communications software will have SIP capabilities. Besides, nowadays,
many manufacturers are producing SIP-based VoIP terminals and devices such as IP-PBXs.
H.323 protocol has not been detailed in this report since it is out of scope; for more
details, it is advised to refer to [36] and [37].
This introduction has aimed at justifying the choice made in this master thesis of SIP
whose main asset is its simplicity and its long-term flexibility to integrate new multimedia
applications. Figure 4.1 represents the stack of VoIP protocols with SIP as signalling protocol;
all these basic protocols will be presented in 4.2, 4.3 and 4.4.
4.2. Media transport of voice

Voice is real-time data, very
sensitive to transport problems such
as delay, jitter (variation of the
transmission time of packets), out of
sequence delivery of packets, loss of
packets… This is why the transport
of voice must be controlled and
managed by appropriate protocols.
The Real-Time Transport
Protocol (RTP) and the Real Time
Figure 4.2 – Protocols assuring the transport of voice over IP Control Protocol (RTCP) make it
possible to transport time sensitive
traffic over the IP protocol and to
supervise the quality of service associated with this traffic. However, RTP and RTCP do not
guarantee quality of service. Figure 4.2 represents the protocols that will be presented in this
section and which are involved in the media transport of voice.
4.2.1. Real-Time Transport Protocol (RTP)

The Real-time Time Protocol (RTP), first defined by RFC 1889 in 1996 but since 2003
by the new RFC 3550 [42] is a media transport protocol for the transmission of a variety of real-
time data such as audio or video data. RFC 3550 defines RTP as a protocol “providing end-to-end
network transport functions suitable for applications transmitting real-time data, such as audio, video or
simulation data, over multicast or unicast network services.”
RTP does not ensure any quality of service (QoS) in the delivery of real-time data. That is
why other protocols or mechanisms are needed to guarantee the QoS, such as the Resource
Reservation Protocol (RSVP) or the Differentiated Services (DiffServ).
A RTP session is an association between users communicating via RTP. Each user
terminal communicates via two UDP ports: 1 RTP port (or several if several flows) and 1 RTCP
48
port (supervision port) per session RTP. The fact that RTP uses a dynamic port range makes it
difficult for it to traverse firewalls. In order to get around this problem, it is often necessary to set
up a STUN server.
How is real-time data transported? Real-time data is segmented into Packet Data Units
(PDUs) which should be as small as possible so that the packet loss does not decrease the quality
of reception (for example 20 ms of uncompressed voice has 160 bytes).These PDUs are then
encapsulated, as illustrated in Figure 4.3, in an RTP packet, that is to say, that a 12-byte long RTP
header is added to each segment which constitutes the payload of the RTP packet. Then, the
RTP packet is encapsulated in a UDP packet. UDP packets are then encapsulated in IP packets.
Figure 4.3 – Encapsulation of real-time data into RTP packets

RTP is generally carried over the underlying UDP protocol in order to make use of its
multiplexing and checksum services. TCP is not adapted to carry RTP because mechanisms such
as acknowledgements and retransmission of lost packets are incompatible with the high time
requirements of real-time data which are transported by RTP. Indeed, real-time data are less
sensitive to packet loss but are very sensitive to delays; this explains the preference of UDP
compared to TCP.
An RTP packet is constituted by an RTP header of variable size (fixed part of 12 bytes)
and the payload data. Typically, one RTP packet contains a single RTP packet but it is also
possible that it contains several RTP packets (it depends on the underlying encapsulation
protocol), in the case of parallel RTP sessions.
The RTP header contains an 8-bit Sequence Number and a 4-byte Timestamp which is
information about time and synchronization, exchanged between the sender and the receiver.
The 16-bit Sequence Number allows at the receiving side to restore the right sequence of PDUs
and get the right audio or video stream at the application layer. It increments by one for each
RTP data packet sent, and may be used by the receiver to detect packet loss. The initial value of
the sequence number is random (unpredictable) to make known-plaintext attacks on encryption
more difficult. The 32-bit Timestamp reflects the sampling instant of the first octet in the RTP
data packet. The sampling instant must be derived from a clock that increments monotonically
and linearly in time to allow synchronization and jitter calculations. The clock frequency is
dependent on the format of data carried as payload.
The RTP header also entails a 1-bit Version (V) field which corresponds to the version of
RTP (always set to 2). The 1-bit Padding (P) set to 1 indicates that there are padding bytes; the
last byte of padding in the payload indicates the number of padding bytes to ignore. The
interpretation of the 1-bit Marker is intended to allow significant events such as frame
49
boundaries to be marked in the packet stream. The 4-bit Payload Type identifies the format of
the RTP payload (see Codecs in 4.2.4)
The 32-bit
Synchronization Source
identifier (SSRC) uniquely
identifies the multimedia
source generating the
RTP packets for a
session; this identifier is
assigned randomly by the
source. It is also possible
to combine different RTP
streams with a mixer, i.e. Figure 4.4 – Mixing of several contributing sources in RTP
an intermediate system (source: http://medusa.sdsu.edu/network/CS596/Lectures/ch28_RT.pdf )
that combines RTP
streams from different sources into a single stream. The Contributing Source identifiers list
(CSRC) identifies the contributing sources for the payload contained in a packet. The number of
CSRC identifiers is given by the CSRC Count (CC) field which is comprised between 0 and 15.
CSRC identifiers are inserted by mixers by using the SSRC identifiers of contributing sources (see
Figure 4.4). The CSRC field is not used by SIP.
According to RFC 3550, the services provided by RTP include:
• Payload-type identification – Indication of what kind of content is being carried
• Sequence numbering – PDU sequence number
• Time stamping – presentation time of the content being carried in the PDU
• Delivery monitoring
Delivery monitoring of the real-time data transmitted by RTP is done by the RTCP protocol.
4.2.2. Real-Time Control Protocol (RTCP)

RTP is a protocol that provides basic transport layer for real-time applications but does
not provide any mechanism for error and flow control, congestion control, quality feedback and
synchronization. This is why RTP is supplemented by the Real-Time Control Protocol (RTCP)
which provides end-to-end monitoring and data delivery. RTCP is also defined by RFC 3550.
RTCP provides out-of-band control information for an RTP flow. It partners RTP in the
delivery and packaging of multimedia data, but does not transport any data itself. It is used
periodically to transmit control packets to participants in a streaming multimedia session. The
primary function of RTCP is to provide feedback on the quality of service being provided by
RTP: RTCP is only used for QoS reporting, not QoS.
RTP and RTCP use the same destination and recipient addresses for the transmission of
data and management functions, but they use two different ports: RTP must use an even port
and RTCP the next higher odd port. RTP is generally configured to use ports 16384-32767.
RTCP is responsible for three main functions:
• Feedback on performance of the application and the network. For example, the receiver
of a RTP flow can send information about the quality of the received transmission: jitter
observed, loss packet ratio, out of sequence packet ratio.
• Correlation and synchronization of different media streams generated by the same sender
(e.g. combined audio and video)
• Transport information about the different participants in a session (email, name, …) and
association of RTP flows with participants
50
These functions are achieved by different RTCP messages such as Sender Reports (SRs),
Receiver Reports (RRs), RTCP source description message, RTCP Bye Message and RTCP
Application-dependent packet. For more details on the structure of these messages, refer to
[44] and [45]).
Figure 4.5 – RTCP packets
RTCP packets have a header of a minimum of 12 bytes which includes information about
the message length, the version of RTCP (currently set to 2), the presence of padding or not, the
packet type (SR, RR…) and reception reports count. To reduce the overhead due to headers,
multiple RTCP packets can encapsulated in the same UDP datagram, as shown in Figure 4.5.
4.2.3. Real-Time Streaming Protocol (RTSP)

The Real-Time Streaming Protocol (RTSP) is a protocol defined by RFC 2326 [47] as “an
application-level protocol for control over the delivery of data with real-time properties. RTSP provides an extensible
framework to enable controlled, on-demand delivery of real-time data, such as audio and video”. So, RTSP is
designed to address the needs for efficient delivery of streaming media over IP networks.
Streaming media is media that is consumed (heard or viewed) whilst it is being delivered.
Streaming is more a property of the delivery system than the media itself. RTP, RTSP and RTCP
were specially designed to stream media over networks.
In a VoIP context, RSTP can be used in conjunction with the SIP protocol to achieve
Unified Messaging (UM). By Unified Messaging, it is meant the integration of different streams
of messages like e-mail, fax, voice and video into a single storage server, accessible from a variety
of different devices.
This solution of UM with RTSP has been suggested and presented in 2000 by H.
Schulzrinne and K. Singh in their paper entitled “Unified messaging using SIP and RTSP” [46]. The
multimedia voice mail system, which has been proposed, uses a voice mail server, a RTSP media
server and a SIP proxy server. An RTSP media server is a storage server which handles the
recording and playback of messages. The voice mail server is a SIP interface to the RTSP media
server to allow connections from SIP user agents. On one side, it can receive VoIP calls using
SIP, and on the other side, it behaves as an RTSP client and can perform playback, recording and
other control on the multimedia mail residing at the remote RTSP server. Separating the voice
mail server from the storage server helps in building scaleable systems.
In Figure 4.6, an example of voice mail recording has been illustrated. In this voice mail
system, SIP has been used for accepting voice calls and RTSP for the storage and access of voice
messages. In this example set in the context of an enterprise, an IT Manager wants to call a HR
Manager but the HR Manager is absent and does not pick up the phone. Since the SIP proxy
server has been configured to forward the call simultaneously to the voice mail server and to the
51
HR Manager’s phone, if the HR Manager does not respond after a few seconds, the voice mail
server acts as another phone for the HR Manager which accepts the call on his behalf. Before
accepting the call, the voice mail server sets up the media path with the RTSP server by sending
an RTSP SETUP message to the RTSP server to play back the voice prompt to the IT Manager
to leave a message and a second RTSP SETUP message to the RTSP server to record the
message. Once the IT Manager has left a message (RTP communication between her phone and
the RTSP server) and that the call has been torn down triggering up a SIP BYE message, the
voice mail server informs the RTSP server to stop recording. Then, the voice mail server sends
an email to the HR Manager informing about the arrival of a new voice message. The HR
Manager can retrieve the voice message in four ways, detailed in [46]: RTSP media streaming
tool, SIP user agent, email-attachment or web page.
Figure 4.6 – Example of recording voice mail
4.2.4. Codecs
In PSTN/ISDN networks, voice is sampled at a fixed rate of typically 8 kHz and each
sample is coded to 8 bits, resulting in a 64 kbps voice coding. However, VoIP is not limited to
using this coding and can have higher or lower data rates depending on the codecs used, the
available bandwidth between the end points, and the user’s preferences.
A codec is an algorithm which Codec Bitrate (kBit/s) Standardised Method of coding
reduces the size in bytes of G. 711 64 ITU-T PCM
large files or programs. In G. 723.1 5,3/6,3 ITU-T ACELP/MP-MLQ
VoIP, codecs reduce the size G. 726 32 ITU-T ADPCM
of voice packets. There are G. 728 16 ITU-T LD-CELP
several techniques of voice G. 729 8 ITU-T CS-ACELP
compression, the most G. 729a 8 ITU-T CA-ACELP
famous of which are PCM GSM 13 ETSI RPE-LTP
(Pulse Code Modulation), iLBC 13,3/15,2 IETF LPC
ADPCM (Adaptive Table 4.1 – Codecs
Differential PCM), LDCELP
(Low Delay Code Excited Linear Prediction), CS ACLEP ( Conjugate Structure Algebraic code
52
Excited Linear Prediction), MP MLQ (Multi Pulse Multi level quantization) and ACELP
(Algebraic Code Excited Linear Prediction). Popular voice coding standards for packetized voice
are represented in Table 4.1. An exhaustive list can be found in [55].
4.3. SIP Signalling
4.3.1. Introducing the Session Initiation protocol (SIP)

As already mentioned, the only VoIP signalling protocol which is studied in this master
thesis is the Session Initiation Protocol (SIP) as illustrated in Figure 4.7.
SIP was defined in 1999 by the
RFC 3261 [34] and has been developed
by the MMUSIC Working Group within
the IETF. In November 2000, SIP was
accepted as a 3GPP signalling protocol
and permanent element of the IP
Multimedia Subsystem (IMS).
SIP is a signalling protocol for
initiating, managing and terminating
voice and video sessions across IP
Figure 4.7 – SIP in the VoIP architecture networks. SIP is defined by RFC 3261 as
follows: “SIP is an application-layer
control protocol that can establish, modify, and terminate multimedia sessions (conferences) such as Internet
telephony calls. SIP can also invite participants to already existing sessions, such as multicast conferences. Media
can be added to (and removed from) an existing session. SIP transparently supports name mapping and redirection
services, which supports personal mobility – users can maintain a single externally visible identifier regardless of
their network location.” Thus, SIP is not destined only to VoIP but to many other applications such
as videoconference, instant messaging, video, distributed computer games. SIP is responsible for:
• Locating users – SIP enables callers to locate called parties
• Establishing sessions – SIP enables callers to determine the availability of the callee and
its willingness to participate in a call. The callee can accept, reject or redirect a call.
• Negotiating session setup – SIP enables communicating parties to negotiate the set of
parameters to be used during the session.
• Modifying sessions – SIP allows communication parties to change session parameters
during the call.
• Tearing down sessions – SIP allows to terminate a session
SIP is not a complete protocol of communication but it is more a component of the VoIP
communication architecture which can work in concert with other IETF protocol to build a
VoIP communication system. For example, it works with RTP which carries the real-time
multimedia data, but it also acts as a carrier for the Session Description Protocol (SDP) which
will be presented in paragraph 4.4.1 and which describes the media content of the session.
SIP is a text-based request/response peer-to-peer protocol borrowing many elements
from IP protocols like HTTP and SMTP and is highly extensible, i.e. that SIP can be extended to
accommodate new features and services. SIP provides the necessary protocol elements to provide
services such as call forwarding, personal mobility, multicast conferencing…
SIP has been designed in such a way to be easy implemented, scalable and very flexible.
Its flexibility lies mainly on its independence of the media type to transport during the session.
With SIP, some or all the intelligence is placed at the endpoints, i.e. IP phones. An “intelligent”
phone has the appropriate functionality so that it is able to interact with different VoIP
53
components and network components; in contrast, conventional phones from the PSTN/ISDN
are only able to interact with their telephony switch.
4.3.2. SIP architectural components

SIP architecture, which has already been presented in Chapter 3, is composed by two
major architectural components: SIP servers and SIP User Agents. These two classes of
components can be further divided into several types of entities.
There are four types of SIP servers: SIP proxy server, SIP redirect server, SIP location
server and SIP registration server. Servers listen to the default SIP ports, i.e. port 5060 for TCP
and UDP and port 5061 for TLS over TCP.
SIP User Agents are the end-user entities of the SIP architecture. A SIP IP phone is
regarded as a SIP User Agent, composed by a client entity called User Agent Client (UAC), and a
server entity called User Agent Server (UAS).
SIP components have already been described in details in 3.2.2.
SIP addressing
All SIP components are identified in the VoIP network by a SIP address, also called SIP
URI (Uniform Resource Identifier). A SIP URI consists of two parts: a username part and a
domain name part separated by @. The username part can be the name of a user as well as a
conventional E.164 phone number. The domain name part can be a domain name or a host
name or the IP address of the host...
According to the nature of the username part and the domain name part, it is possible to
distinguish three types of SIP URIs:
• Address of Record (AOR) – it identifies a user, which means that the domain name part
is a domain name. Ex: sip:it_manager@site1.com
• Fully Qualified Domain Name (FQDN) – it identifies a specific device. Ex:
sip:it_manager@phone5.site1.com, sip: it_manager@246.5.5.3, sip:+497111234567@246.5.5.3
• Globally routable UA URIs (GRUU) – it identifies an instance of a user at a given UA
for the duration of the registration of the UA to which it is bound.
SIP URIs are similar to e-mail addresses; that way, it is possible to use an e-mail address to place
a SIP call, if the e-mail address is also at the same time a SIP URI. This makes communications
easier and is in this spirit that the ENUM project was built (see later in 4.3.5).
4.3.3. SIP messages

The SIP protocol defines two types of messages which allow it to achieve its purposes:
Request messages, sent from the User Agent Client (UAC) of the caller to the User Agent Server
(UAS) of the callee and Response messages, sent by the callee’s UAC to the caller’s UAS.
Method Description
INVITE Initiates a call, changes call parameters (re-INVITE)
ACK Confirms a final response for INVITE
BYE Terminates a call
CANCEL Cancels a not fully-established session, i.e. cancels the search for a user
OPTIONS Queries the capabilities of servers
REGISTER Registers with the Location Service
Table 4.2 – SIP Request methods defined by RFC 3261
54
Request messages, also called Methods, are used to initiate (INVITE), confirm (ACK),
modify (CANCEL, OPTIONS…) and terminate calls (BYE). A special Request message,
REGISTER, allow a SIP User Agent to register with a SIP registrar. The most important
methods, which are the only defined by RFC 3261, are listed in Table 4.2. Several SIP extensions
(presented in other RFCs) define additional methods (like PRACK, INFO…) which are out of
the scope of this report but which can be found in [43].
Response messages contain numeric response codes, which are partly based on HTTP
response codes. They are used to convey either provisional information indicating call progress
but not terminating any SIP session or final information terminating SIP sessions (for example,
because the callee is busy or does not exist). There are two types of Response messages and six
classes as illustrated in Figure 4.8.
Figure 4.8 – SIP Response messages
Request as well as Response messages have the same general structure; they are
constituted by four parts:
• A start line defines the message type and the protocol version. In the case of a request,
the start line specifies a Request-URI which indicates the callee’s URI whereas in the case
of a response, the start line specifies one of the status codes listed in Figure 4.8.
• Message headers. They are four types of message headers: General header, Entity header,
Request header and Response header. The attributes included in these headers can be
found in [34].These headers convey message attributes that provide additional
information about the message. Headers have a similar syntax and semantic to this of
HTTP headers and have the format: <name> : < value>.
• A blank line
• A message body is used to describe the session to be initiated. It can appear both in
requests and responses. SIP makes a clear distinction between signalling information,
present in the start line and session information present in the message body.
Header fields are exhaustively listed in [34].
Figure 4.9 – INVITE and 200 OK messages
55
4.3.4. Typical SIP Dialogues
Let’s see now some typical SIP dialogues in the context of an enterprise network. Many other
examples of SIP dialogues can be found in [43] and in RFC 3665 [9].
Basic transactions
The most basic SIP dialogue is a direct session establishment between two users, let’s say
two employees of the same enterprise who know their respective SIP URIs and who decide to
bypass the enterprise IP-PBX. Figure 4.10 shows the
basic SIP messages exchange between
sip:it_admin@site1.com and sip:it_manager@site1.com.
The IT administrator’s User Agent Client sends at first a
SIP INVITE message to the IT manager’s User Agent
Server. The IT manager’s UAC sends a provisional
RINGING response and after a while a 200 OK
message. The IT administrator’ UAC acknowledges and
the session is established: the IT administrator and the IT
manager can then communicate. The communication is
closed when the IT administrator’s UAC sends a BYE
request to the IT manager’s UAS which accepts by
Figure 4.10 – Basic SIP session sending a 200 OK message.
The case of a communication session between
two employees through a SIP proxy server and a SIP redirect server has been addressed earlier
(Figure 3.10 and Figure 3.11).
Registration
In order that a SIP proxy finds a user, it has to learn its current location. To achieve that,
it has to ask the SIP Location Server situated in the same local domain and mapping SIP URIs
with IP addresses and ports. For example, if a SIP proxy server asks a SIP Registrar to resolve
sip:it_admin@site1.com, the local SIP Registrar might send a response of the type
sip:it_admin@123.45.64.3:5060. The Location Server’s database of users is filled in by one or
Figure 4.11 – Registration of a user agent with a SIP Registrar
56
several SIP Registrars. SIP Registrars are SIP entities which receive registrations from users,
extract information about their current location, such as IP address, port and username, and store
information in a SIP Location server.
To place a call and be able to receive calls, a user agent (IP phone) has to register with a
SIP Registrar and provide it with information about its current location. To do so, the User
Agent (UA) sends a REGISTER request message the SIP Registrar as shown in Figure 4.11. In
this example, the SIP Proxy server and the SIP Registrar are collocated, which is very often the
case, since they are tightly working together. The REGISTER message contains as Address of
Record (sip:it_admin@site1.com) and a Contact Address (sip:it_admin@123.45.64.3). The SIP
Registrar extracts the IP address and port and sends it to the Location Server. If the process was
successful, then the UA receives a SIP OK message from the Registrar.
Registrations must happen regularly since records in the Location Database have a limited
lifespan which had been specified in the expire parameter of the REGISTER message.
Call forking
Figure 4.12 – Call forking: parallel (left) and sequential (right)

After receiving an INVITE request, a SIP proxy server (generally, it must be stateful) can
choose to forward it to multiple SIP URIs, performing that way the so-called call forking. In that
case, the SIP proxy server is called SIP forking proxy. A SIP forking proxy can fork in several
ways:
• Parallel forking – as illustrated in Figure 4.12, the SIP Proxy forwards the INVITE
message simultaneously to several User Agents corresponding to the same user
• Sequential forking – as illustrated in Figure 4.12, the SIP Proxy forwards the INVITE
message to a first User Agent registered for a given user; if the latter is not responding,
the SIP Proxy forwards the INVITE message to a next UA, and so on…
• Mixed forking – the SIP proxy can perform parallel forking for some cases and
sequential forking for some others.
Even if parallel forking is more time-effective than sequential forking in the search of a user, it is
a complex task for a SIP proxy server: the SIP proxy server has to handle multiple concurrent
transactions, then it has possibly to collect multiple responses from user agents and finally to
make a decision according to them.
57
4.3.5. SIP address resolution and routing
In order to locate a user agent, SIP servers have to perform a lookup in a database which
maps SIP URIs and specific users at specific hosts/IP addresses. There are different kinds of
lookup: Location Server lookup, DNS SRV lookup and ENUM.
Location Server lookup

SIP Proxy servers can send requests to a local SIP Location Server asking for the
resolution of SIP URIs. SIP Location servers usually answer by giving back the IP address of the
next hop SIP Proxy server.
DNS SRV lookup

DNS lookups can be used many times during a call. In general, a SIP entity which wishes
to send a request may have recourse to a DNS lookup to determine the IP address, port number
and transport protocol of the next hop SIP entity.
For example, in order to locate a local SIP proxy, a SIP terminal has to make a lookup in
the DNS SRV records. In the same way, SIP proxy servers have also to make DNS lookups when
they want to resolve a SIP URI and find the next hop SIP proxy server. The RFC 3263 [50]
entitled “SIP: Locating SIP servers” details these processes of locating SIP entities. DNS SRV
records have been defined in RFC 2782 [49] entitled “A DNS RR for specifying the location of services
(DNS SRV)”.
Enum
ENUM (Telephone Number Mapping or E.164 Number Mapping) is an IETF standard defined
by RFC 3761 [48] in order to unify the traditional telephone system with the Internet. It uses the
Domain Name System (DNS) to map the E.164 phone numbers to a list of Universal Resource locators
(URLs). Although it facilitates VoIP, the protocol used for voice communication across the
Internet, ENUM is not a VoIP requirement. It provides a user with a domain name on an E.164
server in order to associate a common E.164 telephone number with a SIP URI and provide
other DNS-related services. ENUM allocates a specific zone, namely “e164.arpa” for use with
E.164 numbers. Any phone number, such as +49 123456789 can be transformed into a
hostname by reversing the numbers, separating them with dots and adding the e164.arpa suffix
thus: 9.8.7.6.5.4.3.2.1.9.4.e164.arpa. DNS can then be used to look up Internet addresses for
services such as SIP VoIP telephony. In that case, DNS NAPTR records are used to translate
E.164 addresses to SIP addresses.
ENUM is already supported by SIP proxy servers like SER or SNOM 4S, VoIP gateways
like Asterisk, Swyx Cisco and SIP phones (SNOM 200). A SIP proxy server with ENUM support
will lookup a dialled telephone number in DNS to see if there is alternate ways to set up the call
instead of calling out on the PSTN telephone line. ENUM may contain a reference to a SIP
URL, a telephone number to dial, a web page or an e-mail address. By configuring a SIP server to
use ENUM directory look-ups, it is possible to find out if a dialled E.164 phone number is listed
in the directory and obtain the DNS service record for this number. Using the service record a
peer-to-peer VoIP call can then be established, bypassing the PSTN.
4.3.6. Presence service

Presence is a service widely used in VoIP applications that allows users to see the
availability of their “buddies”. In the context of an enterprise, employees can see if their
colleagues are available, busy, absent etc. and if they can call them.
The presence service is a service allowing users to make their reachability, availability and
willingness to communicate visible to other users. It has been defined by RFC 3856 [51].
58
Users, like employees in an enterprise, can provide information about their presence to a
Presence Agent. These users are called Presentities (presence entities) and may possess several
devices connected to the enterprise VoIP system like IP phones, laptops, computers… On each
of these devices, at least one Presence User Agent (for example, a VoIP telephony software
client) sends presence information to the Presence Agent. A Presence Agent collects all presence
information sent by Presence User Agents (PUAs). At the same time, each of these devices has
at least one Watcher entity which is an entity requiring for presence information of other
Presentities. To identify Presentities, pres URIs, like pres:it_manager@site1.com, are used. The SIP
presence architecture has been represented in Figure 4.15.
4.3.7. SIP-based mobility

In current enterprise environments, the number of employees working on the move or
changing location of work from one site of the company to another is ever increasing and these
people expect to be available on the enterprise VoIP network. This renders the capability of
mobility a requirement in SIP-based VoIP architectures.
There are four types of mobility that SIP architecture can achieve [56]:
• Terminal mobility – it means that users can have access to the home subscribed services
from any location while moving with their mobile SIP terminal across heterogeneous
networks. The home VoIP network must be able to locate and identify the mobile SIP
terminal as it moves. This is achieved through different methods [56]. As the terminal
moves across different networks, it is temporarily and transparently assigned new IP
addresses without affecting any current VoIP calls. The SIP mobile terminal updates
regularly its location by regularly sending REGISTER messages to the home SIP
Registrar and messages which are destined to it can then be successfully redirected.
Terminal mobility is associated with wireless access.
• Service mobility – it means that the home VoIP network is capable of providing
personalized services to users independently of their current location. Thus, users can
have access to the same services (speed dial lists, address books, buddy lists, media
preferences…) as if they were located in their home office, despite their movement
between terminals and/or networks. The difficulties in providing Session mobility are,
first, to maintain adequate QoS for the duration of the session, and second, to ensure that
users have access to all their personalized services.
• Personal mobility – it means that it is possible to address a single user no matter where
they are located (different terminals or networks). Thus, users are able to place and
receive calls and access VoIP services on any terminal in any location. Personal mobility
is based on the use of a unique personal identity. For example, a project manager
registered with the SIP Registrar of the site1.com domain must be reachable under the
sip:project_man@site1.com URI even if he has recently moved in the site2.com domain.
SIP can realise Personal Mobility through the use of SIP Registrar, Forking Proxies….
• Session mobility – it means that users can maintain an active session while switching
between terminals. For example, an employee can wish to continue a session begun on a
mobile phone on the desktop computer of his office. SIP can support session mobility in
at least three ways: through the use of re-INVITE messages or through third-party call
control of through the use of REFER messages (see [56]).
To allow mobility in a SIP-based VoIP network, the adoption of the Mobile IP protocol is not
the right solution [57], since, with triangular routing, it introduces too long delays and an extra
tunnelling, incompatible with the high requirements of real-time communications such as VoIP
59
calls. For a mobile worker who is in different locations, pre-call mobility can be ensured by
different SIP entities: Redirect server, Forking proxy and Presence server.
Redirect server
Mobility (Service mobility and Personal mobility) can be achieved by a SIP redirect server.
As illustrated in Figure 4.13, in the context of a large enterprise network which has two sites,
headquarters (site1.com) and a small remote office (site2.com), a Business Manager usually registered
with the SIP Registrar of the site1.com domain moves into the premises of the small remote office.
The UA of the softphone installed on his laptop will send a REGISTER message to the SIP
Registrar of the visited local domain, i.e. site2.com. When a HR Manager located in the
headquarters, wants to contact the Business Manager, his UA will send an INVITE message to
the local SIP proxy server, which will respond with a SIP 302 “Moved Temporarily” message,
specifying the Business Manager’s last registration contact. The HR Manager’s UA will then
directly contact the Business Manager’s UA. That way, triangular routing is avoided and the local
SIP proxy is less loaded.
Figure 4.13 – SIP mobility using a SIP redirect server
Forking Proxy
As seen earlier, a SIP Forking Proxy can send in parallel several SIP INVITE messages
destined to the same user who may have several SIP URIs registered in the Registrar.
Mobility can be achieved by call forking. As represented in Figure 4.14, always in the
same enterprise context as in Figure 4.13, when the HR Manager, located in the headquarters,
wants to call the Business Manager, moved into the site2.com domain, his UA sends a SIP invite
message destined to the URI sip:business_man@site1.com. The SIP forking proxy will not try to
return to the HR Manager’s UA the Business Manager’s current location, but it will forward the
initial INVITE message to all domains registered in the Location server for the Business
Manager’s URI.
60
Figure 4.14 – SIP mobility using a SIP forking proxy
Figure 4.15 – SIP mobility using a Presence server
61
Presence server
In the context of an enterprise, employees may wish to see in the display of their IP
phone or SIP softphone presence and availability information about their peers before trying to
place calls. This presence service must be offered even if users registered in the headquarters
move in other networks. It means that, in the enterprise illustrated in Figure 4.15, the HR
Manager must be able to see the Business Manager’s availability even if the latter has moved into
the site2.com domain. To make this presence service available to employees located in the
site1.com domain, mobile workers have to register with the visited network’s Presence server. In
Figure 4.15, the Business Manager has moved to the remote office and his UA sends immediately
a REGISTER message to the local SIP proxy. The local SIP proxy then sends an UPDATE
STATUS message to the Presence server of the small remote office.
The most difficult type of mobility to achieve is Terminal mobility because it requires
that, during a call session, a user’s terminal can change transparently IP addresses while the user
moves across networks. This type of mobility has been described in [56] and it requires that User
Agents support the functionality of roaming (change in the IP address during a session) which is
not yet implemented by all SIP telephone manufacturers.
4.3.8. Security issues of SIP

The SIP protocol presents some very crucial issues concerning:
• Firewall traversal – Traditional data firewalls cannot handle dynamic UDP ports, which
causes problems for incoming calls originating from the public network. A solution
would be to leave open a wide range of UDP ports but in that case the network becomes
more vulnerable to external attacks.
• Network Address Translation (NAT) traversal – Private addresses are not routed in the
public Internet, so a translation of SIP URIs containing private addresses must be
performed at the edge of the private network. The main issue is that SIP message can
contain private addresses and port numbers but traditional data firewalls act only at the
layer 3 and cannot modify SIP messages (application layer).
These issues will be addressed later in details and their main solutions will be presented in
Chapter 6.
4.4. Supported protocols and languages
4.4.1. Session Description Protocol (SDP)

The Session Description Protocol (SDP), used by SIP, to provide an exact description of
an RTP session. It is defined by RFC 3266 [52].
SDP is more a data format rather than a protocol. The Message body of SIP messages
contain SDP information to describe the call to be established, as illustrated in Figure 4.9. SDP
can describe:
• Type of media (audio, video…) and media format (MPEG video, H.263 video…)
• Media destination: IP address and port number
• Session identification (username, session-id, session IP address)
• Time the session is active
• Information about the bandwidth to use in the session
• Contact information (URL, e-mail)
The exact and exhaustive SDP codes can be found in [52] and [43].
62
4.4.2. Session Announcement Protocol (SAP)
The Session Announcement Protocol (SAP), defined by RFC 2974 [53], is not used for
simple VoIP calls but mainly for conferences. It was primarily intended for initiating multicast
multimedia sessions and to provide information needed for session setup to participants.
4.4.3. Call Processing Language (CPL)

The Call Processing Language (CPL) is an IETF standard for describing callee’s
preferences. It is a simple XML-based language that allows users to determine how the local SIP
proxy server should handle calls for them but it does not allow users to affect the behaviour of
the SIP proxy server in a way that can compromise its security.
CPL scripts define rules for call processing. For example, in a company, employees may
decide to receive all their business calls on the work phone at their office between 8:00 and 18:00
and for the rest of the day, on the voicemail (see example of CPL script in Figure 4.16). CPL
scripts are used to route calls according to preferences about time, redirection, timeouts...
When a user dials a SIP URI, the SIP
proxy server has first to resolve the
given SIP URI into a network address,
either by interacting with other SIP
servers (like SIP redirect servers) or
services (DNS, SIP location server), or
provide further call routing
mechanisms like CPL scripts…
Usually users do not write CPL
scripts on their own, but use CPL
Editors either integrated in their SIP Figure 4.16 – Example of CPL
phone or in their softphone which (source: [43])
allow them to generate them easily
through pleasant user interfaces.
4.5. VoIP, quality of service and security

In VoIP, Quality of Service (QoS) is a major issue, since users expect a quality of service
at least as good as this of circuit-switched networks. QoS in VoIP systems refers to the speed and
clarity of VoIP calls and the issue is how to guarantee that voice traffic will not be delayed or
dropped due to interference with other lower priority packet traffic. QoS is measured by criteria
like latency (delay of packet delivery), jitter (variations in delay), packet loss and burstiness of loss
and jitter (loss and discards due to jitter tend to occur in bursts) [57].
The SIP protocol does not ensure on its own any QoS but it rather relies on the
underlying networks to ensure an end-to-end QoS. Mechanisms ensuring QoS are the RSVP
(Resource Reservation Setup protocol), Differentiated Services (DiffServ), MPLS (Multi-protocol
Label Switching), SBM (Subnet Bandwidth Management) etc. [57].
The mechanisms to resolve security issues will be presented in the next chapters and we
will see that some mechanisms, like encryption or firewalls, affect negatively the QoS of VoIP
since they introduce latency and jitter. A VoIP call must not have latency greater than 150
milliseconds; otherwise, it is unacceptable for users. Besides, VoIP is very sensitive to packet loss
(even 1% of loss rate is too much).
Thus, a right balance between QoS and level of security has to be found, which is a
difficult task.
63
5. VoIP threat and risk analysis in enterprise networks
5.1. Introduction
The deployment of a VoIP system in an enterprise does not go without security problems.
These problems must be efficiently addressed so that not to compromise the security not
only of the VoIP system itself but also of the enterprise.
VoIP, being an IP-based service, has inherited the security vulnerabilities of data
networks as well as those of legacy telephony. To these vulnerabilities, it is also possible to add
the SIP-specific vulnerabilities. This is why the study of SIP-based VoIP is quite complicated.
Each enterprise has its own way to deploy its VoIP system, which means that each VoIP
deployment is unique and has its own security vulnerabilities, according to the level of security of
the underlying IP network, the products used, the architecture design and the security policies.
This is why each enterprise, before rendering its VoIP network functional and public, has to
perform a serious and in-depth tailored risk assessment to identify the major threats and to be
aware of the vulnerabilities of the VoIP system. Without a serious risk analysis, which takes long
time, enterprises could be easily become a target for attackers, from within or outside the
enterprise, and could see their business and reputation jeopardized.
In this chapter, security expectations from enterprises for their VoIP systems will be
presented in 5.2. Then, in 5.3, the processes of risk assessment will be studied and any of their
steps will be highlighted. In paragraph 5.4, on the basis of this study and following these steps,
threats and risks of VoIP deployments will be analysed in the case of the small and large
enterprises. In Chapter 7, the same cases will be studied but from the point of view of the
security measures to implement after this threat and risk analysis. At last, in 5.5, a selection of
significant attacks against enterprise VoIP deployments will be described in details.
5.2. Assessing security requirements of VoIP enterprise networks
5.2.1. Definition of security requirements

When enterprises decide to develop a VoIP network on top of their IP infrastructure,
they expect that the new network will comply with some fundamental security goals, called
security requirements or security expectations. The general security requirements of IP networks
have already been presented in Chapter 2, and they are exactly the same as these of VoIP
networks. The most important security requirements for enterprise VoIP networks are called
primary requirements or primary goals and encompass the triptych Confidentiality, Integrity
and Availability (the so-called CIA requirements). On top of these fundamental security
requirements, additional requirements, called secondary, have also to be taken into consideration.
The main secondary roles are authenticity, liability or non-repudiation, privacy, authorisation…
The following Table 5.1 lists the security requirements of VoIP networks, primary as well as
secondary, and their definitions and provides associated examples.
A breach of one of these requirements can lead to the breach of other requirements; in
particular, the breach of one of these requirements at a certain OSI layer can be at the origin of
the breach of another requirement but at higher OSI layer. For example, as seen in 3.5 with the
ARP poisoning attack, an attack on authenticity and integrity (fabrication of ARP Responses
messages by a third party) at the layer 2 leads to a Man-in-the-Middle attack and eavesdropping
attack, which are attacks on confidentiality at application layer (layer 7).
According to [58], the interaction between the different security requirements of the
several OSI layers renders the threat analysis quite difficult.
64
Security Definition in a VoIP context Examples of threats
requirement
Confidentiality In VoIP, confidentiality means the protection - Eavesdropping on VoIP calls by
against unwanted and unauthorised disclosure of: wiretapping with a packet capture
- Voice calls (media traffic) software or a protocol analyser like
- Credentials of calling parties Ethereal
- Encryption elements (keys…) used in the - Voice and data interception by a
encryption of media traffic Man-in-the-Middle attack for
example…
CIA requirements
Integrity In VoIP, integrity means the protection against - Registration hijacking, i.e. the
the alteration of data like: registration of an attacker as a
- VoIP calls (media traffic) legitimate end user and redirection of
- Signalling traffic VoIP calls to this attacker
Voice mail messages (messages left can be - Modification of the location
erased or replaced) information by inserting fake SIP
and the protection against the manipulation of addresses associated with a user agent
VoIP components
Availability In VoIP, availability means the protection of: - Denial of service (DoS) attacks
- The availability of the telephony service against IP-PBXs, SIP servers…
- The quality of speech - DoS attacks against IP phones
- The availability of information such as
accounting information…
Privacy In VoIP, privacy means the protection of personal - Eavesdropping VoIP calls and
information about calling parties, i.e. protection of extracting from SIP messages meta-
meta-information, such as the identity of calling information about calling parties
parties, the duration of calls, the location of calling
parties etc.
Authenticity In VoIP, authenticity is the reliable determination - Theft of identity; for example, a
of the authentic identity of calling parties. customer calls the Financial
Authenticity depends on the integrity and department of a company but talks to
authenticity of the identities of calling parties. an attacker instead
- Impersonation
Authorisation In VoIP, authorisation is the process of granting
Secondary requirements
- Toll fraud
or denying the access of users to: - Service abuse
- VoIP-critical resources like IP-PBXs, SIP
servers, DNS/ENUM servers…
- Types of activities like the configuration of
VoIP servers, the registration from User
Agents external to the enterprise network…
- Services; certain VoIP application services are
not provided to all the employees of a
company
Liability (non- In VoIP, liability is the ability of: - The denial to have received a call; for
repudiation) - Calling parties to prove that they have really example, a stockbroker can deny to
placed successful calls have received an order coming from
- Callees to prove the identity of their callers the CEO of a large enterprise
(non-repudiation) - The assumption of having placed a
- Operators to prove that calls have been placed call, while this is untrue
by a specific VoIP user - Operators charging VoIP users more
- VoIP users have to prove that have not placed than what they have consumed
calls, in case of unjustified bills - VoIP users decide not to pay their
bills
Table 5.1 – Security requirements of VoIP systems in enterprises
65
5.2.2. Assessment of security requirements
In the context of a corporate environment, confidentiality and privacy seem to be the
most important security requirements for VoIP networks; indeed, a breach of information
security can jeopardize the reputation of enterprises which will not be able to recover easily from
the damages caused by the following lack of trust. A leak of information can ruin a whole
business; information in wrong hands can serve as a blackmailing weapon or a pressure
instrument… This does not mean that availability is not important but the costs of a breach of
confidentiality and privacy can trigger off higher losses for companies: it is much easier for
enterprises to recover from an availability attack against their VoIP system or a power outage,
whose impact is the loss of productive hours impeding the employees to work efficiently, rather
than from information disclosure resulting in loss of reputation , loss of public credibility and
loss of trust from partners, shareholders... Authentication is also a major security requirement
since the identity of the communicating parties must be guaranteed. For example, a chief of
executive must be sure that the stockbroker to whom he wants to pass an order is actually the
right stockbroker and not an attacker who has redirected the call session and who impersonates
the stockbroker. The theft of identity and impersonation are crucial issues in enterprises, in
particular in large enterprises in which employees do not necessarily know each other; this
anonymity has the consequence that employees cannot recognize if the voice they hear is this of
the interlocutor they want to talk to or that of an attacker.
After having identified the most important security requirements of enterprise VoIP
deployments, it is time now to study what are the processes of risk assessment that enterprises
have to go through before taking protection measures for the deployment of their VoIP systems.
5.3. Studying the threat and risk analysis processes for enterprise
VoIP systems
5.3.1. Processes in risk assessment

Once enterprises have identified the security expectations for their VoIP deployments,
they usually have to proceed to a so-called risk assessment. Risk assessment is the first step of
Risk Management, which can be defined, according to the NIST Risk Management Guide for IT
systems [1], as “the process of identifying risk, assessing risks and taking steps to reduce risk to an acceptable
level”. Risk management is composed by three main processes:
• Risk assessment – it refers not only to the identification and evaluation of risks and risk
impacts but also to establishment of a list of risk-reducing recommendations
• Risk mitigation – its aim is to attribute a priority to the risk-reducing recommendations
resulting from the risk assessment process and to implement them
• Evaluation and assessment – risks must be evaluated and assessed regularly (like every 3
years) to possibly detect new risks caused by the expansion of the VoIP system, by VoIP
components changes, by VoIP software applications replacement or updates…
Risk assessment plays an important role in determining which will be the security measures and
mitigations to adopt later. In this report, the nine-step risk management methodology presented
in [1], has been reorganized so that to be simplified to six steps, as illustrated in Figure 5.1.
In step 1 of the risk assessment methodology, a characterisation of the enterprise VoIP
system will be performed. This means that the boundaries of the VoIP system will be delineated
(see paragraph 5.3.2).
In step 2, a threat and vulnerability identification will be conducted in parallel to a
control analysis and impact analysis. In risk assessment, the word control refers to the risk-
reducing measures; so, a control analysis means an analysis the security measures that have
66
already been implemented. The impact analysis consists into a measure of the adverse impact of
the successful exercise of a vulnerability. To perform such an analysis, a pre-condition is to
determine the value and importance of the VoIP system to the enterprise and to understand the
criticality and sensitivity of VoIP data and VoIP components. The adverse impact of a security
threat can be described in terms of loss or degradation of one of the CIA security requirements
or of a combination of these. Impact is not always tangible and cannot always be measured (ex.
Loss of reputation, loss of public confidence…): this is why impact can be described in terms of
magnitude of impact (high, medium, low). Thus, the impact assessment has a qualitative nature
whose main advantage is to be useful to a cost-benefit analysis [1].
Figure 5.1 – Steps in risk assessment (according to NIST guide)
In step 3, the likelihood that a potential vulnerability may be exercised can be estimated
according to the motivations of attackers to compromise the enterprise VoIP system, the nature
of the vulnerability and the protective measures already implemented. The likelihood can be only
rated and each threat is given a level of likelihood (high, medium, low) [1].
In step 4, a risk determination aims at assessing the level of risk to the enterprise VoIP
system. The determination of risk can be expressed as a function of the likelihood of a threat, the
magnitude of its impact and the resistance of the security measures already deployed. The
measurement of risk requires first the constitution of a risk-level matrix. This matrix constitutes
a synthesis of steps 2 and 3 since it is a 3×3 matrix of threat likelihood (high, medium, low) and
threat impact (high, medium, low); of course it is possible to work with a finer granularity by
constituting larger matrices. The aim of this matrix is to establish a risk scale serving as a
reference and defining risk levels (high, medium, low). The result of the risk determination phase
is a list of risks and the associated risk level.
In step 5, control recommendations are listed, i.e. security recommendations and best
practices are suggested in order to mitigate risks determined in the previous step.
In step 6, risk assessment is summarized in an official report. It entails the description of
the threats and vulnerabilities, the measurements of the risk and provides recommendations for
security implementation.
Risk assessment is a long and difficult task since it requires making an exhaustive
analysis of threats and vulnerabilities which are not always immediately detectable. However, it
is a process which must be carried out effectively in order to determine which security technical
solutions, best practices and security policies to adopt in order to protect enterprise VoIP
networks.
5.3.2. VoIP system characterization

As seen previously, it is important in the process of risk assessment to define and
delineate the enterprise VoIP system to secure.
To achieve that, information must be collected about:
• Hardware (VoIP devices such as IP phones, computers hosting VoIP/SIP servers, legacy
PBXs, routers, gateways…)
• Software (IP-PBXs, logical SIP servers, VoIP applications, softphones…)
• VoIP system missions (provide an internal telephony system replacing the legacy
telephone network, provide advanced telephony services to employees…)
67
• Data and information (storage and archiving of VoIP calls, storage of accounting
information…)
• System and data criticality and sensitivity (which subnets must have the highest security
because of a high need of confidentiality? Which subnets can have a medium-level
protection? …)
• VoIP system interfaces (is there any direct connection to the PSTN/ISDN networks?
Are they VPNs linking the VoIP systems of two company sites? Is the enterprise VoIP
system connected to partner enterprises of the extranet? ...)
• Other system-related information
One of the most crucial questions is to determine whether the enterprise VoIP system will be
a closed or an open VoIP system. To put it in other words:
Is the enterprise VoIP system a closed telephone system allowing only internal VoIP calls,
rejecting all VoIP calls coming from the outside public Internet and accepting only TDM calls
from the rest of the world? Or is it an open system in which internal VoIP calls are made
possible but also VoIP calls placed by user agents from the public Internet? Or is it a semi-open
system allowing internal VoIP calls and calls coming from branches, trusted partners and trusted
mobile workers? It is important to fix whether the system will be open or closed because the
threat and risk analysis in the risk assessment depend strongly on that. For example, if you
consider an enterprise which has opted for the deployment of a closed VoIP system, the Spam
over IP telephony (SPIT) attack or DoS attacks from the public Internet do not have to be taken
into account in the threat identification step of risk assessment since VoIP calls coming from the
public world cannot penetrate the enterprise network.
After the characterization of the VoIP system of an enterprise, the next step is to
identify and classify threats. Let’s have a look first at the different ways of classifying threats.
5.3.3. Types of threat classifications/taxonomies

There are many ways to classify threats: the aim of this paragraph is to present and assess
these different types of classification of threats (called taxonomies) to VoIP systems.
First, it is important to specify the different criteria which distinguish threats, such as:
• Source of the threat (see Table 2.2) : human, natural or environmental
• Motivation: Intentional or unintentional threat
• Internal attackers (employees, visitors, interns… ) or external attackers
• OSI layers of vulnerability and/or threat
• VoIP-specific or underlying-layer-specific attack
Many reports, white papers or articles suggesting a VoIP threats classification have opted
for different approaches. To create Table 5.2, first a selection of the most interesting and
important documents suggesting a threat classification has been made and then these
classifications have been presented according the different approaches that have been
adopted. The analysis of the selected documents has resulted in the identification of six
different classification approaches (of course, there are many others, but these are the
most significant):
Threat-to-security-requirement approach: threats are classified according to the security
requirement they put at risk
Layered classification: threats are usually classified according to VoIP architecture layers
they put at risk.
Infrastructure-oriented classification: threats are classified in function of the parts of the
VoIP infrastructure they are susceptible to damage
68
Damage-oriented classification: threats are classified according to the types of damage
they can potentially provoke
Major threats classification: major threats have been just selected without real ordered
classification
Multi-layered classification: threats are classified according to the different VoIP
architecture layers and threats at these layers are then further classified according to OSI
layers, VoIP components, VoIP functionalities…
Classification Examples Threat classification

approach
Threat-to- Materna’s “A proactive Threats to service availability
security- approach to VoIP security” [67] Threats to service integrity
requirement Eavesdropping (threat on confidentiality)
classification Spam over Internet Telephony (SPIT)
AT&T’s “VoIP security: what Denial of service (threat to availability)
are the risks and solution?” [70] Fraud and abuse (threat to integrity)
Threats to Data confidentiality and privacy
Layered Mihai’s “VoIP – A layered Signalling protocol layer
classification approach” [68] Transmission protocol layer
Application layer
Infrastructure- NSA’s “Security guidance for Threats to network infrastructure
oriented deploying IP Telephony Threats to network perimeter infrastructure”
classification systems” [62] Threats to VoIP servers
Threats to IP phones
Damage- Roberts’ “Voice over IP Service disruption
oriented security” [69] Service interception
classification Service fraud and abuse
VOIPSA’s “Threat Social threats
Taxonomy”[61] Eavesdropping
Interception and modification
Service abuse
Intentional interruption of service
Major threats NEC’s “VoIP security best Unauthorized access Manipulation
classification practice I” [71] Interception/ Protocol attack
This is a simple listing of eavesdropping SPIT
major threats. Fraud VoIP components
Denial of service vulnerabilities
Conroy’s article “VoIP Operating systems Unauthorized access
corporate vulnerabilities: attacks Spoofing/Eavesdropping
facts without fears” [64] Protocol attack IP phones vulnerabilities
Application Firewall issues
Manipulation Heat and power
Remote access
Multi-layered BSI’s approach [58] 1. Threats to the VoIP network layer
classification Threats to VoIP network infrastructure
Threats to the VoIP middleware (IP-PBXs, and servers)
Threats to the VoIP end devices
Threats to energy supply
2. Threats to the VoIP application layer
Malware
Threats to applications of VoIP end devices
Threats to applications of VoIP middleware
3. Threats to VoIP WLANs
Table 5.2 – Different approaches in the classification of VoIP threats and vulnerabilities
69
All approaches of taxonomy of threats to VoIP systems are valid. However, it seems that
layered and multi-layered classifications do not really give a good overview of what the attack
motivations could be and divide attacks in an artificial manner; as said previously, a same attack
could affect several layers, so a same attack would be classified twice. An infrastructure-oriented
classification could be of great help for security measures implementation and risk mitigation.
The VOIPSA “VoIP Security and Privacy Threat Taxonomy” [61] has the ambition
and intention to become the standard threat classification for VoIP security. It has been drawn
and published in October 2005 by the Voice over IP Security Alliance (VOIPSA), a consortium
of VoIP equipment vendors, security product vendors, VoIP providers... Let’s see this taxonomy.
5.3.4. VOIPSA Threat taxonomy

The aim of the current report is not to make and suggest a new exhaustive taxonomy, but
to understand the process of securing the VoIP networks in enterprise environments.
The VOIPSA’s “VoIP Security and Privacy Threat Taxonomy” [61] is the first document of
this kind and this is the fruit of a great effort to find an appropriate classification for VoIP
security. Since it is probably the most important document that has been published the last years
in VoIP security, it seems judicious to adopt it in this report, instead of suggesting or adopting
another one.
The VOIPSA taxonomy distinguishes six main types of threats according to the damage
they cause. These main threats, as well as their sub-categories, are illustrated in Figure 5.2, and are
namely:
Social threats – they encompass threats to privacy, theft of services which, for
example, cause enterprises to incur additional costs, unwanted contact…
Eavesdropping – it is the unauthorized monitoring but not manipulation of signalling
and/or media between two calling parties
Interception and modification – they are attacks in which attackers have
unauthorized access to the signalling and media of calls and have the ability to modify
them
Service abuse – they are attacks that involve improper use of services for example by
means of identity theft
Intentional interruption of service – it refers to a class of attacks that fully or partially
disrupt the operation of a VoIP system
Other interruptions of service – they are unintentional threats due to natural factors
or circumstances and encompass loss of power, resource exhaustion and performance
latency
These main types of threats are general types of threats and can be then divided into finer types
of threats which can then be further divided, rendering that way the granularity of this
taxonomy very fine. A useful summary of this taxonomy has been established by M. Collier [74].
The VOIPSA Taxonomy serves as a frame to classify threats and to set a common and
standard terminology in VoIP security.
In this report, I have used this frame to create exhaustive tables gathering the major
VoIP security threats present in enterprise networks. These tables appear in Annex 1 and have
been established by taking the VOIPSA taxonomy as frame and by making a synthesis of the
threats to VoIP systems mentioned in a selection of reference documents. SIP-specific security
issues have also been classified in this annex.
The VOIPSA Taxonomy does not focus on the most “trendy” threats to VoIP systems
but makes an effort to present many other threats without forgetting those inherited from the
data networks like malware which are much likely to occur.
One of the trendiest attacks, which gets much attention the last years and which is not yet
a real considerable problem in enterprise networks is the famous SPIT or Spam over IP
70
Telephony. It was one of the main security problems discussed in the 3rd Annual VoIP Security
Workshop organized by Fraunhofer FOKUS in Berlin in 2006. SPIT was the main threat to be
discussed since, even if it is currently much of hype, it will certainly become in the future one of
the most annoying threats. The VOIPSA has classified it as an “unwanted contact” threat
belonging to social threats.
Figure 5.2 – VOIPSA Threat taxonomy
Certainly, new threats to VoIP systems are to come in the near future and will have to be
classified according to the VOIPSA Taxonomy. Currently, attacks against enterprise VoIP
systems are quite rare and do not constitute yet a real problem for enterprises. The attack carried
out by Pena and Moore in June 2006 [75] is not a good omen for VoIP security, since it shows
that “unbreakable” security can always be broken, in particular in ways that VoIP network
designers would have never imagined. To be more exact, it shows that enterprises are very often
not cautious enough and underestimate the importance of the adoption of VoIP security
measures to protect against external attacks targeting their VoIP systems.
Once the types of threats have been studied, using for example the VOIPSA Threat
Taxonomy, an enterprise can then proceed in conducting a threat analysis of its VoIP system.
5.3.5. Choice of a threat analysis model

In order to identify the security threats peculiar to a particular enterprise VoIP system, it
is necessary to perform a tailored threat analysis. The threat analysis process is part of Step 2 of
the NIST risk assessment methodology presented in 5.3.1. The threat analysis can be done in
parallel with a vulnerability analysis, because, while looking for threats, the associated
vulnerabilities can also be identified creating that way vulnerability/threat pairs.
There are many ways to perform a threat analysis; the one which has been chosen and will
be used in paragraph 5.4 in order to determine the threats to the VoIP systems of a small and a
large enterprise is the so-called Schneier’s attack trees threat analysis model.
Attacks trees model [73] is a formal methodology which allows to describe and analyze
the security of computer systems. Attack trees represent attacks against a system in a tree
structure. Each tree has a root node which represent a goal for attackers and leaf nodes which
71
represent the possible attacks against the system to achieve this goal. Each leaf node can also be
the root node for other leaf nodes, thus detailing the ways of attacking the system.
The attack trees model will be applied in 5.4 with the VoIP network as system, and the
main threats of the VOIPSA Taxonomy as shown in Figure 5.2 as the root nodes.
5.3.6. Risk assessment results

Once the threat and
vulnerability analyses, along with
control, impact and likelihood analyses,
have been completed, the next
important step is to summarize their
results by performing a risk
determination for each of the
vulnerability/threat pairs with the help
of their impact and likelihood rates. Table 5.3 – Risk-level matrix (source: NIST)
Each vulnerability/threat pair is
assigned a risk rate based on a risk-level matrix previously established (see example in Table 5.3)
and is recommended a particular solution for reducing the risk (control recommendations step).
During the Control Recommendations step, technical solutions or best practices are
suggested to mitigate or eliminate the identified risks. The aim of this step is to bring the level of
risk to enterprise VoIP systems to an acceptable level. The suggestions made during this step are
of the greatest importance to the next process: the Risk Mitigation. All recommended solutions
will not be later implemented because, very often, the costs of implementing them cannot always
be justified by the importance in the reduction of risk they introduce. To complete the risk
assessment step, the risk assessment must be presented in an official well-documented report to
be used for the implementation of VoIP security.
After having seen in this sub-chapter the process of risk assessment of a VoIP system, it
will be interesting to apply it to models of VoIP deployments in enterprise networks. The
models which will be studied are these which have been presented in Chapter 3 and which
correspond to a simple VoIP deployment in a small enterprise and to a more complex VoIP
deployment in a large enterprise. Let’s see which the results of risk assessment are.
5.4. Performing a risk analysis in two models of VoIP networks
5.4.1. Characterization of the enterprise VoIP systems to analyse

The models of VoIP deployments which will be analyzed in this sub-chapter are the
models presented earlier in Chapter 3. The ones selected to be analysed are:
• VoIP system in a small enterprise in a full-IP deployment stage (Figure 3.2)
• VoIP system in a large enterprise in a hybrid deployment stage (Figure 3.5)
Both of the systems are open VoIP systems, that is to say that these enterprise VoIP
networks are not VoIP islands within intranets without connection to the outside world but
relying on PSTN lines for that. It means that external VoIP calls originating from the public
Internet can reach user agents internal to the enterprise network and vice versa. This has,
however, the disadvantage to introduce new threats to the enterprise VoIP networks, since
external attackers can launch attacks such as SPIT or DoS… Besides, it will be possible for a SIP
user agent located in one corporate branch to call another SIP user agent located in another
branch.
72
5.4.1.1. Model of a small single-site enterprise network
Scenario
A small enterprise of 20 people has decided to develop a VoIP network in its premises
and to use it for internal purposes as well as for external communication to reach and to be
reached by SIP User Agents from the public Internet and by users from the PSTN/ISDN
networks. Besides, the company does not have any PSTN and ISDN media/signalling gateway in
its local network but has access to the PSTN/ISDN networks through SIP Trunking between
the enterprise premises and its ISP. The enterprise does not support mobility for its workers,
who have to work using Ethernet or WLAN access in the premises of the company.
Indeed, thanks to Voice over WLAN (VoWLAN) technology [76], it is possible to place a
VoIP call from any wireless device which connects to the corporate LAN through wireless
access. VoWLAN is a technology, based on the IEEE 802.11 standards, for sending digitized
voice over wireless links, i.e. for delivering VoIP through wireless access.
In this small enterprise, the deployed WLAN allows mobility within and even outside
from the enterprise premises. Employees can:
• Move within the enterprise premises by using PDAs, wireless handsets or softphones
installed on their laptops
• Use fixed mobile convergence (FMC) phones in order to use a wireless access at the
office and a cellular access outside of the office
The benefits of VoWLAN for enterprises are described in [77].
Areas to secure
To perform a security analysis, the VoIP system should be divided into several areas (see Figure
5.3) and studied in two steps:
• Step 1: Study the VoIP system isolated from the rest of the world. The threats and attacks
can only come from the inside of the enterprise and can affect the Perimeter area, the
VoIP servers, the IP phones and the wireless VoIP infrastructure.
• Step 2: Study the VoIP system connected to the rest of the world, vulnerable to attacks
coming from the Internet and exploiting vulnerability of the perimeter area and/or the
SIP Trunking.
Security requirements
The CIA requirements for the VoIP system of this small enterprise are:
Confidentiality
- Internal private and professional calls should be protected from interception
- Private and professional calls to/from user agents located in the outside world
- Internal and external calls should be highly protected in the Direction office and
have a medium security in the other departments (see Figure 5.3)
Integrity/Authenticity
- Protection from manipulation of the totality of calls and high protection in the
Direction office.
- Protection from toll fraud (even if it does not seem to be the biggest problem)
Availability
- Maximum power outage of IP-PBX and other VoIP servers: ½ h per year (3
occurrences)
- Maximum power outage of IP phones: 1 h per year (10 occurrences)
- Possibility to place emergency calls
- ISDN-like quality of voice
73
Figure 5.3 – Areas to secure in the VoIP system of a small single-site enterprise
Motivations for attackers

Internal attackers
• Employees whose goal is to compromise the confidentiality of calls, whether they
are highly confidential or not, by eavesdropping with a view to blackmailing or
bribing other employees or member of the staff, or just selling personal
information to competitors… These employees act surreptitiously and are people
with high IT skills
• Employees who act just out of curiosity – they are generally low-skilled
• Employees whose goal is to take revenge from colleagues or hierarchical
superiors… They can for example prevent them from receiving or placing calls,
they can interrupt their ongoing calls or simply by making the VoIP system
unavailable (attack on servers…). These employees are generally quite skilled.
• Employees who act with negligence and do not comply with security policies; they
can be at the origin of the spread of malware infecting the VoIP components
External attackers:
• Self-employed hackers or crackers breaking into the VoIP system of the small
company to gain a certain status in their community by showing their
competence, which can be to put down the VoIP network, to manipulate
remotely IP-PBXs, VoIP servers or IP phones…
• Hackers or crackers working for competitors of the small enterprise and acting
with a view to damaging the company by putting down the VoIP systems, stealing
confidential information…
74
5.4.1.2. Model of a large multi-site enterprise network
Scenario
A large multi-site enterprise of 1000 people has opted for the adoption of a VoIP system
and has decided to deploy it in the headquarters as well as in its branch office. Each site has a
VoIP system for internal purposes but it is also possible to place VoIP calls to and receive VoIP
calls from external user agents connected to the public network. Besides, this deployment allows
mobile workers to access the internal VoIP system of the enterprise. That way, authorized
employees can work from home or on the road.
Each corporate site is also connected to the PSTN/ISDN networks via a
media/signalling gateway, which allows employees to call from their IP phone external users
connected to the PSTN/ISDN networks, and inversely, to receive calls from users connected to
the traditional telephony networks.
Areas to secure
After the deployment design, the security of the VoIP system is the next step. To perform
a security analysis, it is necessary to divide the VoIP system into several areas (see Figure 5.4) and
to proceed to a thee-step study:
• Step 1: Study the VoIP system of each corporate site as a system isolated from the rest of
the world. The threats can only come from the inside of the site and can affect the VoIP
servers, the IP phones and the WLAN.
• Step 2: Study the set of VoIP systems as a single enterprise VoIP still isolated from the
rest of the world. Attacks can be launched from one site and affect another, threats such
as malware can propagate in the whole VoIP network…Threats and attacks could exploit
vulnerability of the perimeter area as well as vulnerability due to mobility support outside
of the company’s premises
• Step 3: Study the single corporate VoIP system connected to the rest of the world and
thus, subject to attacks coming from the public Internet, attacks coming from or
launched by workers from the enterprise to the PSTN/ISDN networks or attacks and
threats coming from the extranet (WAN)
Security requirements
The CIA requirements for the VoIP system of this small enterprise are:
Confidentiality
- Private and professional calls…
…internal to a corporate site…
…between two corporate sites…
…between a mobile worker and the corporate sites…
…to/from user agents located in the outside world…
…should be protected from interception
- Internal and external calls should be highly protected in the Board of Directors
subnet and have a medium security in the other departments (see Figure 5.4)
Integrity/Authenticity
- Protection from manipulation of the totality of calls and high protection for calls
from/to the Board of Directors subnet.
- Protection from toll fraud
Availability
- Same requirements as small enterprises
- High protection against externally-launched DoS ( high availability of public VoIP
servers)
75
Figure 5.4 – Areas to secure in the VoIP system of a large multi-site enterprise
Motivations for attackers
Motivations of attackers are more or less the same as for small enterprises. The only
important difference is the value of gain, no matter if it is reputation and status, or money at
stake.
On the one hand, attackers can be self-employed and perform attacks against a company’s
VoIP system for themselves. For example, attackers can act to satisfy their egos or to prove their
competence and they are much more stimulated to attack a renowned large company than a small
one. Another motivation for self-employed attackers is to gain money. The larger is the company,
the more valuable is information to their eyes. They can, for example, intercept and record
private confidential conversations and sell them at a very high price to the company’s
competitors or to blackmail company’s members who fear of being compromised.
On the other hand, attackers can act on behalf of a third party who pays them to
perform attacks against the VoIP system of a large company. These third parties can have
different faces; they can be:
• Competitors who want to gain private information from the large company and would
like to listen to valuable confidential conversations, for example about strategies, new
products to be launched, new ideas… – this is a form of high industrial espionage.
• Competitors who want to ruin the reputation of the large company by divulging to the
press for instance corporate secrets, secret alliances between enterprises… extracted from
confidential conversations
• The large company itself (it is not a third party but acting as if…) which wants to test its
VoIP system and which hire professional hackers to make a sort of audit for it
76
• Companies, whose business is to secure VoIP systems, which, in order to sell their
services to the large company, prove that there are security breaches in its VoIP system
and convince for the urgent need of security solutions that they can provide
Certainly, there are many other motivations but it is difficult to figure them out outside of a
specific context; however, the above mentioned are the most important.
5.4.2. Threat analysis of the VoIP systems in two models

Classification of threats in the threat analyses
In this master thesis, I have adopted the Schneier’s attack trees model and adapted it to the
VoIP case in order to conduct a threat and risk analysis in the two models of enterprises
characterized in the previous paragraph. The results of this threat and risk analysis can be found
in Annex 1 in tables which certainly do not list the totality of threats but strive to be as exhaustive
as possible. Since the major part of threats was common to the two models, it has been decided
to create common tables for both models.
Threats have been classified according to the goals they aim at and according to the layers
they threaten.
Since the VOIPSA Taxonomy has been used as a frame for this threat analysis, the main
goals are those mentioned by this taxonomy and are namely:
• Constituting a social threat
• Eavesdropping
• Intercepting and Manipulating
• Abusing of VoIP services
• Conducting Intentional Denial-of-Service attacks
Layers are only two: network level and application level. This is why two separate tables
have been created: Annex Table 1 and Annex Table 2 include attack trees analysing the threats to
an enterprise VoIP system, respectively at the network level and at the application layer.
Network-level threats exploit vulnerabilities of the underlying IP network (up to the OSI layer 4)
in order to conduct attacks against VoIP components and against the VoIP system more
generally. Application-level threats include all the threats exploiting SIP and RTP protocols and
encompass all VoIP SIP-specific attacks.
Differences between threat analyses

The threat analyses of the VoIP system in the small enterprise of Figure 5.3 and in the
large enterprise of Figure 5.4 have large common parts which have been represented in two
tables, Annex Table 1 and Annex Table 2.
However, in the case of the small enterprise, and in contrast to the large enterprise, threats
to the VoIP media/signalling gateway should not be taken into account since the enterprise has
decided to have recourse to SIP Trunking instead of having its own gateway.
Another difference in the two threat analyses is that the threat analysis of the large
enterprise should take into account the threats due to the support of mobility; these threats have
been summarized in Annex Table 4.
Besides, since both enterprises have adopted the VoWLAN technology in their premises,
new threats, represented in Annex Table 3, are added to the threat analyses of both enterprises.
Interest of the VoIP-specific threat analysis of Annex Table 2

The part of the threat analysis which is the most interesting is that done at the application
layer and represented in Annex Table 2 because it lists the VoIP-specific threats exploiting
vulnerabilities of the VoIP protocols, namely SIP, RTP and RTCP.
77
Confidentiality, integrity and authenticity of voice messages being among the major
assets to protect in any enterprise, an analysis of how it is possible to compromise them is of
great value for network designers willing to secure an enterprise VoIP network.
The attack tree with “Interception and manipulation of VoIP” as root, i.e. as the goal
attackers want to achieve, provides precious elements for such an analysis, as illustrated in Table
5.4. This attack tree shows that the major part of attacks to intercept and manipulate SIP
signalling messages is based on identity spoofing in order to achieve Man-in-the-Middle attacks.
Other methods to intercept and manipulate signalling are based on falsification of registration
records like the Registration Hijacking attack (see detailed description of attack in 5.5.1.1) or on
caller identification spoofing like the easy-to-perform caller-ID spoofing. Definitions of these
attacks can be found in Annex 1.
Goal: Interception and manipulation
1. Conversation impersonation and hijacking

1.1. Manipulating Registration Records to hijack conversations
1.1.1.Updating the SIP Location server entries with fake information to hijack the call and forwarding SIP
messages to the right destined user agent
1.1.1.1. Sending a fake SIP REGISTER message with the victim user agent’s URI associated with a
fake Contact information containing the IP address of the attacker (Registration hijacking attack)
1.2. Spoofing SIP Registrar and performing man-in-the middle attack
1.2.1. Using SIP 301 “Moved Permanently” response (for all successive attacks) including the attacker’s IP
address as redirection address to intercept all SIP messages and then forward them to the right destination
1.2.2. Using SIP 302 “Moved temporarily” response (for single-time attacks) including the attacker’s IP
address as redirection address to intercept all SIP messages and then forward them to the right destination
1.3. Impersonating SIP Proxy server and performing man-in-the middle attack
1.3. 1. Spoofing SIP Proxy server and performing man-in-the middle attack
1.3.1.1. Same as 1.2.1.
1.3.1.2. Same as 1.2.2.
1.3.1.3. Using SIP 305 “Use Proxy” response including the
attacker’s IP address as redirection address to intercept all SIP messages and then forward them to
the right destination
1.3.1.4. Performing DNS spoofing
1.3.2. Changing the SIP proxy address in the configuration of SIP phones
1.4. Spoofing SIP Redirect server and performing man-in-the middle attack
1.4.1. Same as 1.2.1.
1.4.2. Same as 1.2.2.
1.5. Spoofing SIP User Agent and performing man-in-the middle attack
1.5.1. Same as 1.2.1.
1.5.2. Same as 1.2.2.
1.6. Posing as voicemail server and tricking the caller into leaving a message
1.6.1. Impersonating the voicemail
1.6.1.1. Performing Registration hijacking attack
1.6.1.2. Same as 1.2.1.
1.6.1.3. Same as 1.2.2.
2. Conversation alteration
2.1. Inserting words, phrases, sound effects into a conversation
2.1.1. Getting access to the RTP stream of a conversation (not necessarily MITM attack)
2.1.1.1. Using software tools like rtpinsertsound/rtpmixsound
Table 5.4 – Attack tree for “Interception and manipulation” in an enterprise VoIP network (extract from the
threat analysis I have performed in Annex 1)
The difficulty of performing this threat analysis presented in Annex 1 was first to collect
from multiple sources all the types of threats that could put enterprises in peril, analyse their
relevance, veracity and accuracy and then think about the goals they could achieve. Indeed, a
single attack can fulfil several goals, and thus it must be listed under several roots in the attack
trees.
Once the threat analysis has been carried out, details about the threats listed appeared
to be really necessary. These details such as whether the threat is internal or external, the
likelihood and the impact contributed to the risk analysis performed in the next step.
78
5.4.3. Risk analysis and results
Subjectivity of the risk assessment
In Annex 1, the likelihood and the impact have been assessed for each of the threats listed
during the threat analysis.
Obviously, the assessment is qualitative instead of quantitative otherwise it should have
been performed in real cases of enterprises and not on models. Since each VoIP system differs
from an enterprise to another, the evaluation of damage caused by threats and the evaluation of
the probability of penetration or of occurrence would not have presented any interest in this
report since they would have been specific to a particular VoIP system.
Therefore, this assessment is highly subjective, which does not mean that it has been
performed without foundation and baseless. On the contrary, the evaluation of threats has been
based on the one hand, on personal experience acquired during a few experiments of attacks
done with P. Lawecki in Alcatel and on the other hand, on literature and recent articles. Despite
its qualitative aspect, the risk analysis in Annex 1 shows which attacks are the most likely to occur
against enterprise VoIP systems.
Evaluation of the likelihood and impact of threats

The evaluation of the likelihood that threats occur against a VoIP system has been a
difficult task in particular due to the determination of the criteria to use. Which criteria have been
chosen to assess likelihood?
For network-based threats to VoIP deployments presented in Annex Table 1, likelihood
has been evaluated according to three criteria:
• The motivation of attackers; the more the attackers are motivated to perform an
attack, the more likely a threat is to occur
• The difficulty of performing attacks; the more difficult the attack is to perform
technically, the less likely it is.
• The basic security measures already implemented in the enterprise IP
infrastructure for data security. Indeed, it is important to note that since a VoIP
system is build on an underlying IP infrastructure, it also benefits from some
security mechanisms used for data security. Thus, some network-based threats to
VoIP systems cannot occur in a normally secured enterprise IP network and
therefore, they have a low likelihood of occurrence. For example, in a today’s
enterprise network, relationships of trust between computers cannot be built and
thus the IP spoofing attack has a very low likelihood to be performed.
Concerning the application-layer threats, likelihood has been evaluated in a different
manner. It has been assumed
that the underlying layers have
been sufficiently secured and
that no security mechanisms to
protect VoIP have already been
applied. Therefore, the
likelihood of application-layer
threats has been evaluated, as
shown in Table 5.5 according to Table 5.5 – Threat likelihood matrix
the following two criteria:
• The motivation of attackers
• The difficulty to perform attacks (degree of feasibility, skills and means
required…)
The evaluation of impact of threats has been performed according to the most important
assets to protect in the enterprise defined in the evaluation of security requirements in 5.2.2. It
79
has been considered that threats that imperil enterprises due to the disclosure or theft of
information as well as the impersonation of legitimate users could have a high impact. However,
threats against availability have been regarded as threats having a medium impact. This does not
mean that availability is not important for a business; on the contrary, it is crucial; however, it is
easier to recover than disclosure of information. The consequences of the disclosure of
information and of impersonation are irrevocable whereas availability is doubtlessly damaging
but recoverable. Besides, it has been considered that the impact of the VoIP service denied to a
user is lower than the impact of a VoIP service with vulnerabilities giving way to attacks like
interception or impersonation. For example, if a manager wants to place a call to a stockbroker to
pass an order, it is better not to be able to place the call at all and use other telecommunications
means than to be able to place a call that is intercepted by a third party… Of course, denial of
VoIP service may seem unacceptable to enterprises for running their business, but this is not
unrecoverable or hazardous.
Distinction between internal and external threats – Results

All threats listed in Annex 1 have also been distinguished according to their origin; they
can be internal, i.e. originating from the inside of the company, or also external, i.e. originating
from the public network to which enterprises are connected.
As shown in Table 5.4, interception and manipulation are goals that can generally be
achieved from the inside of the company and only in extreme cases from the outside, at least for
the following reasons:
- It is almost impossible for external attackers to successfully register as internal users to an
enterprise SIP Registration server.
- These attacks are “opportunity-based attacks” because they target particular situations whose
knowledge is difficult to gain outside of the company’s network
- The latency of traffic does not allow external attackers to be as responsive as internal ones
and to inject forged SIP messages exactly at the point of time they want to act
One of the other major types of attack is Denial of Service and its effects can be disastrous
for an enterprise since it can deprive it from telephony service, which is unacceptable for a
business. This attack can take a great variety of forms (see Annex 1) and can affect the VoIP
system of an enterprise either by targeting directly at VoIP servers, which can cause total denial
of telephony service, at IP phones, which causes limited DoS, or at underlying network
components… This attack can be conducted externally as well as internally.
Even if all the attack goals listed in the Annex Tables have to be taken seriously into
consideration by VoIP network designers, special attention should be paid, in enterprise
environments, to eavesdropping which can be easily performed within the enterprise premises
and which can have serious consequences for the business like information leakage to
competitors, to the press, to shareholders…
Whether a threat is external or internal plays a role on the risk it constitutes because it
seems easier to internally launch an attack against a VoIP system than externally because of the
impeding perimeter security measures protecting the network, as well as because of the lack of
information about “opportunities”, like calls that could turn out to be interesting, information
about people, roles…
Results of the analyses

Once likelihood and impact have been determined for each point of the threat analysis, the
risk can be also assessed according to the risk-level matrix presented earlier.
From the risk analysis of application-based threats to VoIP, which is the most interesting
analysis in comparison with the others, the following results have been deduced:
80
• Eavesdropping by conversation reconstruction is a high-level risk threat for
enterprises, in particular because it has become very easy to perform with possible
disastrous consequences
• Toll fraud, which is a social threat, is a high-level risk threat when it is achieved
through the impersonation and identity theft of a legitimate user in the enterprise.
For this type of threat, the motivation of attackers is high, the difficulty is relatively
low with appropriate software tools and the impact can be considered high in so far
that the consequences of this threat are financial losses which are contrary to the
main business mission which is to make money.
• Misrepresentation of legitimate employees of an enterprise can also present a high
level of risk because attackers can introduce themselves as employees and entice
their interlocutors to give information
• In spite of the hype of the SPIT attack, SPIT is currently a low-level threat to
enterprise VoIP systems because, although its impact could be very high on a
business, its likelihood is presently very low. However, in a couple of years, SPIT
attacks may be frequent and enterprises may appear to be interesting targets,
increasing by that their likelihood.
• Identity theft by intercepting and stealing a legitimate user’ credentials is also a
serious security issue in an enterprise with a high risk
• There is a wide range of methods to perform interception and manipulation, in
particular conversation impersonating and hijacking, and their risk is high.
Besides, conversation alteration will certainly become very soon a high-risk threat
since new software tools have appeared like the recent rtpinsertsound or rtpmixsound
tools published end of October 2006 in the new www.hackingvoip.com website
(created in June 2006). Conversation alteration can be achieved by the insertion in a
conversation of sounds, words, and sound effects…; for instance, all the “sell”
words could be overwritten by “buy” during the call between a financial manager
and a stockbroker, which could have obvious disastrous consequences for the
enterprise.
• Denial of service has been assessed as a medium-to-high-level risk threat, in so far
that, as explained previously, its consequences are serious but recoverable. There is
actually a wide variety of VoIP-specific attacks achieving that goal and victims can
be individual targeted users, groups of users or all users. Denial of service
imperilling the whole VoIP system and affecting all users with a medium-to-high
risk can be accomplished by setting the enterprise SIP Proxy server out of service
or by preventing users from placing external calls. The major DoS attacks listed in
Annex Table 2 target individual users like interrupting ongoing call session
initiations, degrading the QoS of a particular call, making a specific IP phone reboot
or crash… DoS attacks against a group of users may include preventing User
Agents to register with the local SIP Registrar or provoking loss of calls destined to
voicemail servers or VoIP gateways…
This is a brief summary of the results from the risk analysis detailed in Annex 1 of the
VoIP-specific threats. Network-based threats have also been analysed but the evaluation of their
risks depends strongly on the security of the underlying IP network. It is also possible to find the
risk analysis of the threats introduced by the support of Voice over WLAN and by the support of
mobility.
Let’s see now in details a few major VoIP-specific attacks.
81
5.5. Analysing a selection of major attacks against enterprise
VoIP networks
In this sub-chapter, four major SIP-based attacks will be presented in details; the rest of the
attacks are succinctly described in Annex 1. The presented attacks, namely SIP CANCEL/BYE
DoS attack, Call hijacking, SIP INVITE Bombing and Registration hijacking are the most
representative VoIP-specific attacks and even if opportunity-based, attacks relatively easy to
perform.
5.5.1. Some VoIP-

specific Denial of
service attacks
As shown in Annex 1 and as
represented in Figure 5.5., there are
several ways to perform Denial-of-
Service attacks against VoIP
components and at different layers.
DoS attacks can be regarded as
the most challenging attacks against
VoIP systems because of their variety,
unpredictability, origin and effect; they
can affect seriously VoIP systems, in Figure 5.5 – Layered overview of DoS attacks against VoIP
particular because they can provoke the systems (source: [80])
total shut-down of the VoIP service in
a company, by targeting at the VoIP-critical components like IP-PBXs. Since VoIP is a real-time
service, it is vulnerable to DoS attacks caused by the deterioration of the quality of service.
As illustrated in Figure 5.5, the three main classes of DoS attacks at the application layer are
Registration hijacking, Session tear-down attacks like SIP CANCEL DoS or SIP BYE DoS and
SIP INVITE flooding of IP-PBXs. Let’s see these two first attacks.
5.5.1.1. Registration hijacking

The Registration hijacking attack is one of the most serious SIP-based attacks because it
allows attackers to impersonate legitimate workers of an enterprise and gain their access rights.
Registration hijacking means that an attacker registers with the local SIP Registration server as a
legitimate user, replacing that way the legitimate Contact address with their own IP address. That
way, all the incoming calls that are destined to the legitimate user are sent to the attacker who can
record the calls or play a spoofed voice mail prompt inviting the calling party to leave a message.
Registration hijacking can be performed to achieve two main goals: a DoS attack or a Man-
in-the Middle attack. A DoS attack is performed if the attacker does not forward the packets to
the legitimate user after having received them, whereas a Man-in-the-Middle attack is performed
when the attacker does only intercept the packets and forwards them immediately to the
legitimate user who is not aware of the interception.
The impact of DoS achieved by registration hijacking depends on the VoIP component
which is impersonated: if the attacker registers as and therefore impersonates a SIP User Agent,
the impact is limited in so far that only a single User Agent is deprived of the VoIP service;
however, if the attacker impersonates a VoIP media/signalling gateway, all outbound calls could
be blocked depriving much more people from the access to the PSTN/ISDN networks.
82
Figure 5.6 – Registration hijacking attack in an enterprise environment
Figure 5.6 illustrates the Registration Hijacking attack performed either by an internal
attacker or by an external one. Although the way of proceeding is the same in both cases, the
external attack is highly unlikely to happen or only in extremely rare cases. Indeed, SIP Registrars
can be configured (and this is highly recommended!) so that not to accept registrations coming
from outside of the company, and firewalls usually do not let SIP REGISTER messages enter the
enterprise network. However, often, the principle of rejecting external REGISTER requests is
not applied and external attackers can exploit this vulnerability by scanning methods. How to
handle the problem with mobile workers who need to register from the public network? It is
possible to configure the firewall and the SIP Registrar so that to allow a limited number of
external Uas to register and detect failed attempts to register [85].
In the above figure, an internal attacker wants to receive and block all calls destined to a
HR manager. To achieve that, the attacker must first disable the HR Manager’s registrations by
performing a DoS attack which deregisters him. In one of the methods to do so, the attacker
sends (1) a first SIP REGISTER request to the SIP Registrar (installed on the IP-PBX core) in
order to unbind all previous registrations of the HR Manager. This request has particular headers:
Contact header containing a wildcard * and Expires header to 0 [85]. After that all legitimate
contacts for the sip:hr_man@site1.com SIP URI have been deleted, the attacker must send (2) a
second SIP REGISTER with a Contact header containing his own IP address and setting a new
value for Expires. His request is accepted by the SIP Registrar which updates the local SIP
Location server collocated in this example with the SIP Registrar. Let’s assume that a Business
Manager wants to call the HR Manager; his IP phone initiates a call by sending (3) an INVITE
request to the IP-PBX core; the latter makes a lookup in the SIP Location server which contains
an association between the sip:hr_man@site1.com URI and the attacker’s address. The IP-PBX
forwards then (4) the INVITE request to the attacker’s user agent instead of the HR Manager’s.
The hijacking process has been described in details and illustrated in an interesting process flow
by M. Collier in [85]
83
To forge a SIP REGISTER message, the attacker needs to have some software tool
which generates SIP-conform messages. One of these tools is called SiVus which has been
presented by P. Thermos in his famous article “Two well-known attacks on VoIP” [65]. This article
explains also how this attack works and suggests that before sending the forged REGISTER
request, the attacker should disable the legitimate user’s registration by performing a DoS attack
either by deregistering the user (as explained just previously) or by generating a registration race-
condition in which the attacker sends repeatedly and regularly SIP REGISTER requests to
override the legitimate user’s requests.
If the Registration attack is possible, it is due to the connectionless nature of the UDP
protocol used for registration requests, which makes it easy to spoof them, and to the lack or
weakness of authentication mechanisms used by SIP Registrars in order to authenticate Uas
requesting for registration. When authentication is used, it is usually not strong and consists of an
MD5 digest of the username, password, and timestamp-based nonce sent in the authentication
challenge. Besides, passwords can also be broken in several ways and do not provide any real
protection [85]. Thus, to protect an enterprise network against Registration hijacking, SIP
Registrars should use a strong authentication and VoIP-aware firewalls to detect and block
attacks. The Internet Engineering Task Force (IETF) recommends using TLS, MD5 digest and
strong passwords.
5.5.1.2. SIP CANCEL/BYE DoS

Another way to generate a VoIP-specific DoS attack is to tear down sessions before they are
properly set up or while they are still ongoing.
An attacker who detects that a calling user agent is establishing a SIP session with another
user agent can perform a SIP CANCEL DoS attack against the calling party in order to make
them believe that the called party is not available or is not at their office. This means that the
attacker sends to the called party a SIP CANCEL message before that he picks up the phone, i.e.
before that the callee’s user agent sends a 200 OK message.
Figure 5.7 – SIP CANCEL DoS (left) and SIP BYE DoS (right)
Besides, an attacker who detects an ongoing call can interrupt the conversation by injecting a
SIP BYE message. The calling party who will receive this message will think that his interlocutor
has hanged up. In Figure 5.7, both types of DoS have been represented.
The difficulty of the SIP CANCEL attack is the limited lapse of time left to the attacker to
act: the SIP CANCEL message must be sent in the time window between the sending of the
INVITE request from the calling party and the receiving from the calling party of the SIP ACK
84
message. This time windows being very small, it is difficult to perform it and can only be an
internal attack. From the point of vie of the time window, the SIP BYE attack is much easier to
perform because the SIP BYE request can be sent by the attacker at any point of the ongoing
call. Another difficulty for both attacks is to generate fake messages with valid headers, i.e.
headers in conformity with the session initiation parameters. The headers that attackers must
adapt are branch, From, tag, To, Call-ID and Cseq.
These attacks are also opportunity-based attacks because they depend on particular
circumstances.
5.5.2. Call hijacking with 3xx code responses

Call hijacking is a form of Man-in-the-Middle attack in which usually an attacker intercepts
covertly calls between two calling parties and “hijacks” a conversation, for example to record it
and sell it to competitors. Call hijacking can be achieved in different ways as shown in Table 5.4
but one of the most common ways is to insert 3xx code response in the session setup. 3xx code
responses are messages which inform about redirections in case that User Agents have moved.
In call hijacking attacks with 3xx code responses, an attacker detects the initiation of a call
and sends immediately a 301 Moved Permanently or 302 Moved Temporarily code response to the
calling party with a new address to contact which is his own address. The calling party then
retransmits an INVITE message which is first sent to the attacker’s address; the attacker then
forwards the message to the legitimate destination, so that the call can be setup and intercepted
covertly.
Figure 5.8 – Call hijacking by using a 301 Moved permanently Figure 5.9 – Theft of credentials by
message performing call hijacking
In Figure 5.8, an attacker located in the premises of a company redirects the call at his
address by sending a 301 Moved Permanently message. The same result could have been achieved
by sending a 302 Moved temporarily message; the only difference would be that with a 302 response
the attacker would intercept only the current call whereas with a 301 response, he would be able
to receive able to receive all calls that the calling party would receive. In Figure 5.8, the attacker
usurps the identity of a User Agent; however, he could also impersonate other VoIP components
in particular VoIP proxies (in that case the insertion of a 305 Use proxy code response would be
85
more adapted) or SIP Registrars. In Figure 5.9, the attacker usurps the identity of a SIP Registrar
which has the fatal consequence that the victim calling party gives, after challenging, its
credentials. The attacker can then forward the messages sent by the victim so that his action
remains secret.
Due to the opportunity-based nature of the attack and the small time window for action, it is
difficult for attackers to generate valid messages to perform call hijacking. Call hijacking is a
pernicious attack which can be used for interception purposes as well as for identity theft
purposes which can then lead to toll fraud…
5.6. Discussion of results

To summarize, this chapter has focused on the process of risk assessment, the first step of
Risk Management, which is an unavoidable step to be taken before the deployment of enterprise
VoIP systems.
In this chapter, security requirements for VoIP systems in enterprises have been listed,
defined, illustrated by examples and finally assessed. To my mind, the most important
requirements in a company are the confidentiality and privacy of VoIP calls as well as integrity
of VoIP conversations. Availability is without contest regarded as a major security requirement
but behind the three requirements just mentioned and this, because availability issues can be
recoverable, whereas disclosure of information or impersonation are irrevocable and can be
disastrous for businesses.
Then, an analysis of VoIP threat taxonomies has been performed resulting in the
constitution of Table 5.2 where different approaches of classification of VoIP threats and
vulnerabilities have been identified, named and presented. This analysis of VoIP threats
classification has served to make a choice about which threat classification to adopt for a threat
analysis in two models of VoIP deployments. The choice of the VOIPSA (VoIP Security
Alliance) threat classification has been motivated by its exhaustiveness, its granularity and its
aspiration to becoming a standard in the VoIP world.
In a next step, a risk analysis has been performed in two models of VoIP networks: on the
one hand, in a small enterprise with a full-IP VoIP deployment and on the other hand, in a large
enterprise with a hybrid VoIP deployment. For each model, the security requirements have been
defined as well as areas to be secured in the network and motivations of attackers. It seems that
attackers have approximately the same motivations in both models but with a difference in
ambition maybe, since financial gain and standard gain are greater in the case of attacks against
large enterprises. The exhaustive threat and risk analyses have been presented in Annex 1 and
have made a distinction between VoIP threats which are network-based, application-based, due
to the support of Voice over WLAN and due to the support of mobility. To conduct the threat
analysis, the Schneier’s attack trees model has been adopted and adapted to the analysis of VoIP
security. The roots of the trees appearing in that annex represent the goals followed by attackers
and correspond to the main threats defined by the VOIPSA Taxonomy which has served as a
frame for this study of threats. From the conducted qualitative risk analysis, certainly highly
subjective but not without foundation, the major risks for enterprises have been highlighted such
as interception and manipulation of VoIP calls with an exhaustive list of which methods to use
(ex: manipulation of Registration Records, spoofing of SIP Registrar…)
At last, some of the major threats exploiting SIP vulnerabilities, namely SIP CANCEL/BYE
DoS attacks, Registration Hijacking and Call Hijacking using 3xx code responses have been
presented.
Let’s now see briefly which the technical solutions to the threats listed in this chapter are.
86
6. General technical solutions for VoIP security
Whereas the threats to enterprise VoIP systems have been reviewed in the previous chapter,
the current chapter aims at presenting which are the technical solutions and security mechanisms,
very often inspired by HTTP and SMTP security mechanisms due to the similarity of SIP and
these protocols, which can be applied to VoIP systems. The next chapter will show how these
solutions can be practically implemented in enterprises to protect their VoIP systems.
6.1. Encryption in VoIP
6.1.1. Introduction
Since VoIP is an IP-based service and that the SIP specification does not recommend any
specific security mechanism, most of security mechanisms used for IP-based applications is also
applied to VoIP security. One of the most popular security mechanisms is encryption, which is a
mechanism based on a special algorithm to obscure communication data to third parties so that
they cannot read it. Encryption allows to check the integrity and authenticity of data.
As seen in the previous threat analysis, eavesdropping is one of the most serious threats to
VoIP in enterprise environments due to its easiness to be performed with an ever growing variety
of free software tools. To protect VoIP calls from eavesdropping, it is possible to encrypt the SIP
signalling and/or RTP packets, either in a hop-by-hop way, or in an end-to-end way.
In hop-by-hop encryption, the calling User Agent Client encrypts SIP messages send to a trusted
SIP proxy server which decrypts message with the key it shares with the User Agent Client
(UAC). Then, it re-encrypts the messages but with a key it shares with the next trusted SIP Proxy
server and sends the encrypted messages to it which decrypts them with the shared key and so
on… until the messages arrive at the called User Agent. Hop-by-hop encryption is a succession
of encryptions/decryptions performed by a chain of trusted SIP Proxy servers with user agents at
the extremities of this chain. In end-to end encryption, there is only a trust relationship between
calling parties and SIP messages are not decrypted by intermediary entities but only by user
agents which have authenticated one another. Mutual authentication needs long-term keys such
as pre-shared keys or certificates and this requires from end users to store them, which is not
well-adapted for weak terminals, such as mobile phones, with limited resources [87]. In both
cases, packets are protected from integrity attacks.
However, encryption introduces additional processing time and in consequence, delays
which can strongly affect the quality of VoIP calls. This is why encryption is only used in
particular cases where the impact of eavesdropping could be very high.
In VoIP, both signalling streams and media streams can be encrypted. Let’s see how.
6.1.2. Encryption of media stream

There are several security mechanisms which can be used to encrypt the media stream:
the most famous of them are the Secure Real-Time Transport Protocol (SRTP) and the IPsec
protocol. However, in contrast to SRTP, IPsec is an encryption protocol which secures at the
network-layer and adds a very large overhead in the packets to encrypt, compromising by that the
QoS of voice. This is why this report focuses on the study of the SRTP protocol rather than on
IPsec for the encryption of media streams.
SRTP protocol
In VoIP, voice is usually carried over the RTP protocol whose specification does not
define any encryption mechanism for secure transport of media streams, over public networks
87
for example. To make up for that, RFC 3711 [86] has been published in 2004 to define the Secure
Real-Time Transport Protocol (SRTP) and the Secure Real-Time Control Protocol (SRTCP)
which are respectively the secure versions of RTP and RTCP. SRTP ensures, by encryption
method, the confidentiality of RTP and RTCP payloads as well as the integrity of the entire RTP
and RTCP packets which are also protected against replay attacks.
Figure 6.1 illustrates the format of SRTP and SRTCP packets.
The SRTP packet is constituted by the same header as the corresponding non-secure RTP
packet and its payload contains the:
• RTP payload (including RTP padding) in an encrypted form; the RTP and SRTP
payload sizes are the same. SRTP does not encrypt the headers of the equivalent RTP
packets and does not use any additional encryption headers. SRTP can be used in
conjunction with header compression without any effect on QoS.
• SRTP Master Key identifier (MKI) – it is an optional field which does not exist in
the corresponding RTP packet and which identifies the master key from which the
session keys were derived. The MKI can be used for re-keying.
• Authentication tag – this field is not mandatory but only recommended. It provides
authentication of the RTP header and payload since it is a cryptographic checksum
obtained from both of them. This tag protects the packets from unauthorized
modification
Figure 6.1 – Secure RTP packet (left) and secure RTCP packet (right)
The SRTCP packet, as shown in Figure 6.1, has also its payload encrypted like for an SRTP
packet but the major difference between these packets is that the Authentication Tag is
mandatory for SRTCP packets. This prevents Denial-of Service attacks based on session tear-
down by insertion of a BYE message. Some fields are added to the equivalent RTCP packet:
• SRTCP index – it is used as a sequence counter preventing replay attacks
88
• SRTCP master key identifier (MKI) – same as for SRTP
• Authentication Tag – mandatory field
• Encryption flag – It is set to 1 if the RTCP body is encrypted
In a communication which has to be encrypted with SRTP, voice is segmented in Packet

Data Units (PDUs), then encapsulated in RTP packets and then these packets are transformed in
SRTP packets after encryption of the RTP packets and after possible addition of an MKI header
and of an Authentication Tag which are not compulsory. Then, the SRTP packets are
encapsulated in UDP segments and then in IP packets, as illustrated in Figure 6.1. At the other
communication side, IP packets are decapsulated in the same way they have been encapsulated.
Similarly, RTCP packets are also transformed in SRTCP packets and then encapsulated by a
transport protocol.
For encryption and decryption of media packets, the Advanced Encryption Standard
(AES) in the Counter Mode (AES-CTR) is the cipher, i.e. the algorithm to perform encryption,
used by both SRTP and SRTCP. SRTP supports a second cipher, called “NULL cipher”, which
allows to disable encryption.
How are payloads of RTP or RTCP packets encrypted? A logical XOR function is applied
between the RTP/RTCP payload and a so-called keystream, which is a stream produced by an
AES keystream generator encrypting a special function called Initialisation Vector with an
encryption key (for details, refer to [88] and [86] ). The result is the encrypted payload. The
advantage of using AES in counter mode is that it minimizes the delay introduced by encryption,
by computing the keystream in advance before that the payload to encrypt is available.
Figure 6.2 – Encryption using AES (left) and authentication using HMAC-SHA1 (right) (from [88])
AES is used to encrypt RTP messages, that is to say, to ensure confidentiality; however,
authentication and integrity are ensured by another algorithm, the HMAC-SHA1 (RFC 2104).
The Hash Message Authentication Code (HMAC) is calculated using the SHA1 hash function
applied to the RTP/RTCP payload in combination with a secret key. The size of the resulting
HMAC is the same has that of the underlying hash function (160 bits for SHA1) but, in SRTP
the HMAC is then truncated to an 80-bit (or 32- bit) string called Authentication Tag and
appended to the SRTP/SRTCP packet. This truncation is done to reduce the overhead of
SRTP/SRTCP packets.
In the AES encryption algorithm and in the HMAC-SHA1 authentication algorithm, an
encryption key, an authentication key and a so-called salt key are needed for both SRTP and
SRTCP (6 keys in total for a cryptographic context). How to generate these keys? These keys are
derived from a single master key, once again by using AES in a session key derivation process.
However, the major concern is not the generation of keys but the distribution of the master key
to user agents. RFC3711 does not provide any solution for this issue; this is why SRTP has to rely
on an external key management protocol to securely distribute master keys, like the Multimedia
Internet Keying (MIKEY), ZRTP or SDP Descriptions (SDES).
89
MIKEY (Multimedia Internet Keying)
The Multimedia Internet Keying (MIKEY) is a key management protocol defined in RFC
3830 [91] in 2004. It is intended for real-time applications and therefore, it can be used to supply
SRTP with key management. The exchange of keys is realised in a single round-trip between the
two User Agents which are the extremities of the encryption channel. Authentication between
User Agents can be achieved either by the use of pre-shared keys or by the use of digital
signatures. There are three types of key agreement: key transport with pre-shared key, with
public key encryption and with authenticated Diffie-Hellmann encryption. MIKEY has the
disadvantage to require either pre-shared keys or a global end-to-end Public Key Infrastructure
(PKI).
Even if MIKEY provides strong end-to-end security, when compared to SDES for
example, its complexity of implementation is the main handicap to its wide adoption by
manufacturers. Being standardized since long time, MIKEY had some advantage over SDES
and had already been implemented in VoIP products earlier. However, given the fact that the
latter has been recently standardized, MIKEY will be certainly overridden by SDES, which is
preferred by manufacturers. Due to the complexity of MIKEY, this master thesis will not focus
on further details about this key management protocol but will describe two simpler protocols.
ZRTP
ZRTP is a key agreement protocol which has not been standardized yet but is still a draft
[92] proposed by P. Zimmermann. This is a protocol which uses the Diffie-Hellman key
exchange and which relies neither on any PKI, trust model or certification, nor on SIP signalling
for the key exchange. Instead, the key management and agreement is done on a peer-to-peer way
over the RTP stream.
Indeed, ZRTP defines a new
RTP header extension. This is made
possible by the specification for RTP
(RFC 3550) which allows for the
addition of new headers to RTP
messages; this addition is signalled by
the Extension (X) field of the RTP
messages set to 1 and the header
extension is appended to the RTP
header, following the CSRC list if
present. This header extension
defined by ZRTP is used for the in-
band key exchange inside the RTP
stream; this means that ZRTP
messages are transported within the
RTP packets forming the media
stream to encrypt. The ZRTP
exchange starts at the same time that
the first RTP packets are sent. With
the Diffie-Hellman exchange, two
ZRTP endpoints can either exchange
a new shared secret to generate later
a master key and salt, or discover if
Figure 6.3 – Establishment of an SRTP session using ZRTP they have any shared secrets in
common. If they have shared secrets
in common, this exchange allows them to discover how many and agree on an ordering for them.
In Figure 6.3, the Business Manager’s User Agent initiates a SIP call with the IT Manager’s User
Agent. As soon as the call has been initiated and that RTP messages are exchanged, the Business
90
Manager’s User Agent initiates the Diffie-Hellman key exchange by sending a ZRTP HELLO
message to the IT Manager’s User Agent, which acknowledges and sends also a ZRTP HELLO
message. The key exchange starts with the sending of a ZRTP COMMIT message. Once keys
have been exchanged, an SRTP session can be established and the media stream is then end-to-
end encrypted. More details about ZRTP key exchange can be found in [92] and [93].
ZRTP is a key management protocol with multiple advantages: first, it does not rely on a
PKI or a key certification, which are generally very difficult to configure outside of a privately
managed network; then, it achieves end-to-end security without the intervention of VoIP
proxy/redirect servers and finally, it uses RTP in-band key management which does not
encounter issues with NAT/Firewalls. However, due to its late release, ZRTP has not been yet
widely implemented (only Zfone) and will certainly penetrate the market with difficulties.
SDP Descriptions (SDES) key management protocol

SDP Descriptions or SDES is the favourite external key management protocol of
manufacturers of VoIP components, such as Cisco, Broadcom, Ingate, Snom... For SRTP
communication using SDES, SIP phones and IP-PBXs have to support SRTP with SDES; this
support is nowadays present in many products like pbxnsip, snom phones …
SDES stands for
Session Description Protocol
Security Descriptions for
Media Streams and has been
standardized very recently in
July 2006 as RFC 4568 [89]
and is strongly believed to
become the predominant
master key distribution
protocol in the future, and
this, first due to its simplicity
in comparison with MIKEY
which is too complicated to
implement, and second, due
to its earlier penetration of
market compared to ZRTP
which has come too late [90].
SDES is very easy to
implement and consists in
transporting the master key
in the SDP of a SIP INVITE
message asking for the
initiation of an encrypted call.
However, the SIP INVITE
containing the key must be
protected itself from
Figure 6.4 – INVITE message with master key information (source: [90])
confidentiality and integrity
attacks by methods of signalling encryption described in the next paragraph. SDES has the
advantage to introduce low latency.
As illustrated in Figure 6.4, the SDP part of the encrypted INVITE message sent by an IP
phone to the local SIP Proxy server contains a new cryptographic attribute (crypto) informing
about the cryptography method to use and the master key, under the following format:
a=crypto :< tag> <crypto-suite> <key-params> [<session-params>]
With: key-params = <key-method> “:” <key-info>
And with: key-method= inline:” <key||salt> [“|” lifetime] [“|” MKI “:” length]
91
The most interesting and used key parameter is the inline key method which contains the keying
material, i.e. the master key and the salt, and all policy related to that that master key like its
lifetime, whether any master key identifier (MKI) is used or not to associate an incoming SRTP
packet with a particular master key.
RFC 4568 is only limited to unicast media streams where each source has a unique
cryptographic key. So, the SIP proxy server answers by giving in an encrypted 200 OK response
its master key. Master keys have been then exchanged in a secure way by using SDES. However,
SDES allows only hop-by-hop key distribution and not end-to-end.
Let’s see now how SIP messages are encrypted, since this is a sine qua non condition for
the use of SDES key exchange protocol.
6.1.3. Encryption of signalling stream

As for media streams, it is possible to encrypt signalling streams to protect them from
eavesdropping and ensure privacy of calls, or to prevent attackers from modifying them ensuring
by that their integrity.
However, in contrast to media streams which can be encrypted end-to-end, signalling can
be only encrypted in a hop-by-hop way. This requirement is due to the fact that the signalling
path goes through several SIP proxy servers and redirect servers which have to read and modify
the headers of SIP messages. However, hop-by-hop encryption may encounter some problems
such as the lack of common trust models between SIP proxy/redirect servers located in different
administrative domains or the impossibility for firewalls to do protocol fix-ups, i.e. to modify
elements with encrypted SIP messages, such as private addresses into public addresses [90].
Several security mechanisms for SIP have been defined in its specification RFC 3261. One
of them is an encryption mechanism which sends SIP messages over a Transport Layer Security
(TLS)-encrypted channel.
SIP over TLS

To protect SIP messages from malicious modifications by attackers and to ensure the
confidentiality and privacy of SIP messages which transit over public networks or not, RFC 3261
has defined a new addressing scheme, sips URIs, which ensure that the Transport Layer Security
(TLS) protocol is used to transport SIP messages in a secure way. TLS provides transport-layer
security over TCP (Transmission Control Protocol). TLS encryption is done hop-by-hop instead of
end-to-end, because intermediate SIP Proxy servers have to read and modify the headers of
incoming SIP messages: SIP Proxy servers must be able to read the Request-URI, Route, and Via
fields to properly perform routing and must be able to add additional Via values in SIP headers.
Therefore, SIP Proxy servers must be trusted to some extent by SIP User Agents by
authenticating mutually with them, and SIP Proxies have to establish between them a relationship
of trust by authenticating against one another, forming that way a transitive chain of trust.
According to RFC 3261, SIP Proxy, Redirect and Registration servers must implement TLS and
must support both one-way and mutual authentication, in contrast with SIP User Agents which
are only strongly recommended to implement TLS. However, most of the SIP phones on the
market support TLS. All SIP entities which support TLS must implement the SIPS scheme.
The Secure SIP (SIPS) addressing scheme is similar to SIP addressing scheme:
sips: user@host:port; uri-parameters?headers
Examples of sips URIs are sips:it_manager@site1.com, sips:it_manager@192.168.3.4:5060…
When callers want to make secure calls, they have to dial sips URIs. By doing so, their SIP
User Agent Client (UAC) contacts the local SIP Proxy server by initiating a TLS connection over
which SIP messages will be sent. The SIP Proxy server responds with a public certificate which is
validated by the SIP User Agent on receipt. In the next step, the SIP User Agent and the SIP
Proxy server exchange session keys to encrypt and decrypt data for a particular session. Then, the
SIP Proxy server contacts the next hop and negotiates in the same way a TLS session.
92
TLS does not allow SIP User Agents to authenticate against SIP Proxy/redirect servers if
they do not form a direct TCP connection with them.
The disadvantage of the use of TLS is that it requires a PKI and that it is based on the
underlying TCP transport protocol which is connection-oriented.
SIP over IPsec
TLS is not the only encryption method recommended by RFC 3261 to be used for the
protection of SIP signalling streams: IPsec is another alternative and can be used to provide
encryption for SIP messages at the network layer (OSI Layer 3), in contrast with TLS which
provides security at the transport layer (OSI Layer 4). With IPsec encryption, SIP messages can
be transported by UDP or TCP protocols.
Since the encryption of signalling requires hop-by-hop encryption, IPsec ESP
(Encapsulating Security Payload) and AH (Authentication Header) must be applied hop-by-hop. IPsec
protocol relies on the Internet Key Exchange (IKE) key management protocol for key exchange
and negotiation of Security Associations (Sas). IKE is based on an asymmetric encryption
algorithm; this is not always really appropriate since SIP phones are limited in resources and
processing capacity.
SIP and S/MIME

S/MIME stands for Secure/Multipurpose Internet Mail Extensions and has been
defined by RFC 2633. It provides a way to send and receive secure MIME data and ensures
integrity, authentication and non-repudiation by using digital signatures and privacy and
confidentiality by using encryption.
Since SIP messages can carry MIME bodies, SIP can make use of the already existing
security mechanisms for protecting the integrity and confidentiality of MIME contents. S/MIME
can be used to secure MIME bodies in a hop-by-hop way or in an end-to-end way. Secured
MIME contents can be of multipart/signed or application/pkcs7-mime types. End users are identified
by X. 509 certificates on the basis of their email addresses which are part of SIP URIs.
This solution has the major disadvantage that it requires the deployment of a global
S/MIME PKI but there is no centralized authority which delivers certificates for end-to-end
applications at a global scale. Another disadvantage is that S/MIME generates considerable
overhead in SIP messages and does not protect the integrity and confidentiality of the entire SIP
messages.
6.1.4. Summary
To summarize, encryption is a security mechanism which, when applied to VoIP, can
considerably affect its quality of service, depending on the complexity of its implementation
(need for a PKI or not, certificates or not…) and on the overhead it adds to encrypted packets,
whether they are signalling or media packets.
For the encryption of signalling streams, S/MIME and IPsec seem to be too complicated
to implement, too expensive in overhead introduced which can result in additional latency in the
signalling path. Even if TLS has the disadvantage to be based on the connection-oriented TCP, it
seems to be much better from the implementation point of view, in particular in enterprises. As
for the encryption of media streams, SRTP seems to be the prevalent secure protocol to be used.
The controversy is about which key management protocol to use: MIKEY, ZRTP, SDES… In
spite of its flaws, but favoured by its great simplicity, SDES seems to be preferred by
manufacturers [94], while other solutions easy to implement are researched; for example, W.
Shao, in his paper [90], supports the idea to develop an identity-based key exchange protocol –
maybe it is already too late… Further research has been carried out to look for hybrid solutions
combining several protocols (at least [87] and [95]).
93
6.2. SIP authentication mechanisms
SIP-based VoIP uses several authentication mechanisms for verifying the identity of User
Agents placing or receiving calls. In SIP architectures, authentication can take place between User
Agents and SIP Proxy Servers, SIP Redirect Servers, SIP Registration Servers or other User
Agents. SIP Proxy/Redirect servers require SIP User Agents to authenticate themselves before
the processing SIP INVITE messages while SIP Registration Servers may require the same
before accepting SIP REGISTER messages. User Agents can also ask SIP servers to authenticate
themselves: in that case, there is Mutual authentication. Authentication can be required for
registration, call initiation, call modification or call tear-down and prevents registration
hijacking attacks, unauthorized access to services…
For authentication purposes, RFC 3261 defines specific headers in SIP messages like the
Authorization header which contains authentication credentials of a User agent, the Proxy-
Authenticate header which contains an authentication challenge and the Proxy-Authorization
header which allows a client to identify itself to a proxy server which requires authentication.
The main authentication scheme set by RFC 3261 [34] is the HTTP Digest Authentication
mechanism (RFC 2617). This mechanism provides only message authentication and replay
protection (no message integrity or confidentiality). An exhaustive study of HTTP Digest
authentication for SIP can be found in [111], as well as a suggestion for an extension. The HTTP
Digest Authentication is a challenge-based scheme using a nonce value. A valid response to the
challenge contains the Message Digest 5 (MD5) checksum of the username, the password, the
given nonce value, the HTTP method and the requested URI. As shown in Figure 6.5, in the case
of a SIP Registration, a
User Agent sends a SIP
REGISTER request with
no credentials in the
Authorization header field.
The SIP Registrar which
requires authentication
sends then a 401
Unauthorized response
including the WWW-
Authenticate header whose
field includes a challenge
giving a nonce and the
authentication scheme
(Digest). The Business
Figure 6.5 – Registration with Digest Authentication Manager is challenged to
give his username and
password and his User Agent Client computes an MD5 checksum which is sent as response in the
Authorization header field of the second INVITE message. Upon receipt, the SIP Registrar
performs the same digest operation performed by the User Agent Client and compares the result
with the given digest value in response. The process is almost the same between a SIP User Agent
Client and a SIP Proxy server, with the difference that the SIP Proxy Server sends back a 407
Proxy Authentication Required response code instead of 401 and WWW-authenticate challenge is
replaced by Proxy-Authenticate challenge.
HTTP Digest authentication is a weak mechanism which is vulnerable to Man-in-the-Middle
attacks due to the fact that the challenge and the response are in clear text and can be intercepted
by attackers. Besides, attackers can masquerade since there is no strong correlation between
usernames and SIP URIs. Additionally, it is considerably vulnerable to dictionary-style attacks.
94
For that, other stronger authentication mechanisms should be used. Many suggestions have
been made and here is a brief overview of these authentication mechanisms:
• S/MIME [34]
• EAP (Extensible Authentication Protocol) authentication for SIP [114] & [115] (see Figure
6.6)
• SIP authentication using CHAP (Challenge Handshake Authentication Protocol) password –
mechanism to allow authentication of users by using backend RADIUS servers [113]
• DIAMETER SIP application [117]
• Authentication, Authorization and Accounting (AAA) for SIP [116] (see Figure 6.6)
• SIP authentication with Elliptic Curve Diffie-Hellman (ECDH) key exchange [112]
Figure 6.6 – EAP authentication and AAA
In order to authenticate and authorize users, SIP servers can communicate with AAA servers
rather than attempt to store user credentials and profiles locally and perform authorization and
accounting. SIP servers can communicate via a SIP AAA interface to access the AAA server (see
Figure 6.6). RFC 3702 [116] provides recommendations for this interface between SIP servers
and AAA servers. However, this interface is a new potential source of vulnerabilities and
messages between the SIP server and the AAA server can be subject to eavesdropping or
interception and modification.
6.3. Solutions to SIP NAT and firewall traversal issues
6.3.1. NAT and Firewall traversal issues

The NAT traversal issues
NAT (Network Address Translation) is a process enabling the translation of IP
addresses and port numbers within private address ranges into public IP addresses and port
numbers, as IP packets pass through a firewall from a private to a public network. The PAT
(Port Address Translation) is closely associated with the NAT and performs the translation of
private port numbers into public ones and vice versa (for detailed NAT description, see [104]).
In an enterprise environment, NAT is implemented on the perimeter firewall or in the
internal firewall if there is a Demilitarized Zone (DMZ), and allows enterprises not only to hide
95
the internal topology of their networks but also to use a limited set of public addresses shared by
all the devices within the enterprise. There are four various types of NAT defined by the STUN
protocol (RFC 3489) according to the way UDP is treated by the NAT; the most widespread of
them in enterprise networks is the symmetric NAT which means that for connections allowed
through the firewall, internal IP address/port pairs are converted into distinct external IP
address/port pairs.
However, the NAT traversal is quite problematic for signalling and media streams for, at
least, the following reasons:
4. The main problem introduced by NAT in VoIP/SIP communications is due to the inability
of NAT to detect and translate private IP addresses embedded in application-level messages
like SIP messages because it operates at the network layer. Indeed, NAT cannot translate
private addresses included in communication parameters used by signalling and media into
public addresses. These private addresses can be:
• Private IP addresses mentioned in the headers of SIP messages (Via, From, Contact fields)
• Private IP address values in the SDP part of SIP messages (ex: c=IN IP4 192.168.0.2)
This means that
certain communication
parameters included in
SIP messages will not be
able to be taken into
account. When an
internal User Agent sends
a SIP INVITE message
to an external User Agent
(see Figure 6.7), the NAT
installed on the firewall
fails to translate the
source IP address which
is private and embedded
in this message; when the
external User Agent will
send a SIP message
response destined to a Figure 6.7 – Private addresses embedded in INVITE message
SIP URI containing a
private IP address, the message will never arrive at destination and the communication will not be
established
2) Break in the pair relationship of RTP and RTCP port numbers. After the translation
of private port numbers for RTP and RTCP streams, it is very probable that the respective public
port numbers will not be consecutive anymore as it should be according to the specification of
the RTP protocol (RTCP port n°=RTP port n° +1). This is due to the fact that NAT assigns at
random port numbers.
3) Encryption. In the case of encrypted signalling, new problems emerge, like, for
example, the classical problems between IPsec and the NAT.
4) Limited duration of NAT associations. NAT binds public IP addresses and ports to
private ones only for a limited period of time, after which bindings are deleted if there is no
traffic. Because of that, if there is, in a VoIP call, a silence period longer than this period, the call
is interrupted because the NAT binding has been deleted.
The Firewall traversal issues

Firewalls can be defined as perimeter devices or software components (usually installed
on perimeter devices such as routers) used to monitor and control traffic between private
96
networks and public untrusted networks. The control of traffic is made according to rule-based
policies determining which IP packets are allowed to enter the private networks and which one
should be blocked. Firewalls are fundamental components in any network and in particular in
enterprises where they serve to protect the enterprise connection to the Internet from attacks
coming from this untrusted network.
However, traditional data firewalls pose several problems with SIP communications. The
most important ones are:
1) Separate signalling and media streams. In VoIP using SIP, signalling and media streams are
separate. The media is carried over multiple dynamically assigned UDP ports while signalling is
exchanged on the 5060 port. For firewalls there is no relationship between a signalling stream and
the media flow it generates.
2) Incoming calls: problems arise from the fact that traditional data firewalls cannot handle
dynamic UDP ports and in order to accept incoming calls originating from the Internet, they
have to leave a wide range of UDP ports open, which renders private networks more vulnerable
to attacks.
3) QoS: firewalls can significantly degrade the quality of VoIP by introducing latency and jitter in
the signalling traffic as well as in the media traffic. The introduction of latency and jitter are
mainly due to high congestion at the firewall, which processes and filters multiple packets
without making any distinction of priority. Besides, firewalls may also remove QoS markings on
packets indicating high priorities [106]. In these cases, VoIP packets are treated equally to data
packets, neglecting that way the real-time nature of VoIP.
4) Lack of application awareness. Traditional data firewalls cannot control SIP messages at the
application layer and therefore, they cannot detect application-level attacks like DoS, SPIT,
registration hijacking from the Internet…
6.3.2. Solutions to the Firewall/NAT traversal issues

The analytical study of solutions to the SIP NAT and Firewall traversal issues is out of the
scope of this master thesis, and since they have already been exhaustively addressed in the
literature (refer to [103], [58], [105] and [59] for more details about the following), this paragraph
will present a simple overview of solutions.
• Application Level Gateways (ALGs): An ALG is software embedded on a firewall

performing NAT or not. It allows dynamic configuration based on application
information. When a firewall has ALG software installed on it, it can act at an application
level, not only at a network level, by reading SIP messages, by understanding the
relationship between SIP signalling and the resulting media streams and by dynamically
opening and closing UDP ports. If the firewall performs NAT, ALG can read SIP
messages and in conjunction with NAT replace private IP addresses and port numbers
with public ones for outgoing calls and vice versa for incoming calls. ALG can also
replace in outgoing SIP messages, the private IP addresses by its own IP address in order
to “represent” externally the internal telephone terminal. The incoming traffic would then
be destined to the ALG which would then change public IP addresses and port numbers
into private ones. The main issue with ALG is that it can provoke an increase in the
latency of VoIP packets since it introduces a new processing time and can seriously affect
the performance of the firewall because it is embedded in the firewall itself. However,
ALGs remain the simplest and safest solution to allow secure external VoIP calls.
• IETF MIDCOM solution. This is a solution to the NAT traversal issue which separates
the ALG from the firewall (also called Middlebox) in order to avoid the decrease in the
performance of the firewall. The MIDCOM solution is called also decomposed
middleboxes solution [108]; the decomposed middleboxes architecture separates the
ALG from the firewall/NAT and introduces a new protocol of communication between
97
them, the MIDCOM protocol (still underway). The ALG can be installed on a SIP Proxy
server to be on the signalling path of VoIP calls. That way, it can parse SIP messages and,
accordingly, inform the firewall by using the MIDCOM protocol to open or close ports.
The MIDCOM solution is a project undertaken and conducted by the Middlebox
Communications Working Group of the IETF which has already published several RFCs,
among which the most important are RFC 3303, RFC 3304 and the recent RFC 3989.
The middlebox solution, although better than the ALG solution, has not been adopted
and implemented yet in the market.
• STUN. The Simple Traversal of UDP through Network Address Translators (STUN) is
a protocol defined in RFC 3489 which enables a SIP User Agent to discover whether it is
behind a NAT, to determine the type of NAT and to find out its public address. It
requires that IP phones or other terminals have STUN clients on them and it introduces a
new type of device a STUN server, generally attached to the public Internet and receiving
and responding to STUN requests from STUN clients. In their responses, STUN servers
can inform STUN-enabled SIP User Agents located in private networks about the public
IP addresses and port numbers from which packets come. STUN servers do not sit in the
signalling or media flows. The main drawbacks of the STUN protocol are that it does not
provide any solution for SIP signalling based on TCP and it does not work with
symmetric NATs which are most commonly used by enterprises. Besides, STUN requires
STUN clients be installed on IP phones or other terminals, constraining enterprises to
change their pool of terminals. For these reasons, STUN is definitely not popular in
enterprises. Additionally, STUN introduces new security issues, since attackers could
intercept and manipulate STUN responses.
• TURN. This is a mechanism which has been designed to solve the media traversal issue
for symmetric NATs. TURN (Traversal Using Relay NAT) is a protocol allowing SIP
User Agents behind a NAT to receive SIP messages over TCP or UDP connections. It
introduces a TURN server located either in a corporate DMZ or in the Service Provider
network. Like STUN has the disadvantage to require the replacement or the upgrade of
the pool of terminals in a company. Generally, TURN and STUN alike, are not widely
supported by vendors.
• ICE. ICE (Interactive Connectivity Establishment) is an IETF draft [109] which provides
a framework to combine several solutions like STUN, TURN and Real Specific IP (RSIP)
to solve the NAT traversal issues. ICE is a complex solution which relies on
client/server-based protocols and which removes control from enterprises. However,
ICE is considered a viable and acceptable solution and Microsoft and Cisco have
announced in 2005 their intention to adopt it in their products.
• UpnP (Universal Plug and Play). UpnP is a set of network protocols and is not
exclusively a VoIP solution. UpnP becomes increasingly popular in home office or
residential environments and is limited to small installations, therefore not adapted for
medium to large enterprises. If VoIP terminals are UpnP-enabled and want to initiate a
call, they can request the NAT supporting UpnP to give them back their public IP
address and port number to insert them in the VoIP signalling and media packets they
want to send. This ensures that VoIP packets contain routable public IP addresses and
port numbers. The disadvantage of UpnP is that VoIP terminals and NATs must support
UpnP and that, today, only a few VoIP terminals, NATs and firewalls do support it.
Besides, this solution does not satisfactorily solve the firewall problem: with UpnP, UpnP
clients control dynamically the NAT opening of pinholes to the public network and this
capability is likely to be contrary to most security policies.
• Session Border Controller solution. There is no exact definition for Session Border
Controllers; this is much more a commercial term which encompasses a wide variety of
technical proprietary solutions. However, these solutions have in common that they
98
usually act as B2BUAs located at the edge of networks, whether they are corporate
networks or provider networks. SBCs can be distinguished between SBCs at the Service
Provider Edge (out of the scope of this report) and Customer premise SBCs which are
adopted by enterprises. Customer premise SBCs control the SIP traffic to and from a
private enterprise network without involving the existing firewall. SBCs sit in the VoIP
signalling and media paths and act as B2BUAs; this means that all VoIP traffic passes
through them and that they act as if they were the called IP phones and re-initiate calls
destined to the called parties (see [103]). However, SBCs are quite controversial in so far
that they break any end-to-end confidentiality and integrity in particular in the media
streams and can introduce new security issues. Besides, end-to-end encryption for media
cannot be used unless SBCs have the appropriate keys; however, even with encrypted
VoIP streams SBCs can control certain parts of message headers.
To conclude, several solutions have been suggested to palliate the NAT/Firewall traversal
issues; however, protocols like STUN cannot be used alone in an enterprise network because it is
incompatible with the widespread symmetric NATs. SIP ALG-based firewalls, using the ALG
mechanism, constitute nowadays the majority of VoIP-specific firewalls for SIP but they do not
provide full protection and flexible functionality for enterprises. Today, the most viable solutions
for enterprises are the adoption of SBCs (ex: Newport Networks 1460 SBC…) or the adoption
of SIP proxy-based firewalls like the Ingate Firewall 1180 with SIParator 18 which is the only
solution of this type in the market. SIP Proxy-based firewalls are firewalls which have a SIP
Proxy server and a SIP Registrar that dynamically control the firewalls.
Enterprises willing to protect their VoIP network and find a solution the NAT/firewall
traversal should generally consider to install a new type of firewall replacing their traditional data
firewalls and which are VoIP specific. The requirements of VoIP-specific firewalls can be found
in an article of Mark Collier [106].
6.4. SPIT prevention

As seen in the previous chapter, SPIT or Spam over Internet Telephony is usually an attack
coming from outside a corporate network and even if real SPIT attacks have not been yet
reported, by analogy with email SPAM, it is thought to become one of the major concerns of
enterprises in the future. Indeed, call spam can be defined as unsolicited and unwanted messages
for establishing voice sessions. Call spamming can take two forms: single unsolicited and
unwanted calls, for marketing purposes for example, or massive number of calls, certainly
automatically generated by several coordinated attacking computers. Single voice spams can be
calls placed by telemarketers, automated calls with advertisement messages or automatically
generated calls with nobody talking. The threat for enterprises is that with continuous spamming
attacks, the phones will keep on ringing, thus disturbing employees, exhausting their patience and
nerves and reducing their productivity, or another threat is that voicemail service is unavailable if
spammers attack massively at night for example by overfilling voicemail boxes. As represented in
Figure 6.8, spam calls generated by an external attacker not only disturb employees of the
enterprise but also enter in competition with good calls which can be lost or delayed while
employees are busy with call spams.
Due to the difficulty in detecting call spams in real-time, no real solution has been found yet
for protecting against SPIT attacks but only prevention solutions. Some first basic prevention
solutions have been suggested without having been developed in [101] by J. Rosenberg. They are
generally solutions inspired from solutions to email spamming, and some of them are succinctly
listed below:
• Content Filtering – It is one of the most successful solutions for email spam prevention;
however, it is useless for call spamming, and this because, called parties must first pick up
99
the phone before that the content is sent and filtered; so, disturbance is not avoided and
spamming not prevented.
• Black Lists – Spam filters maintain lists of the identities to be rejected; however, this
approach applied to call spam is almost as useless as it is when applied to email spam, due
to the fact that it is easy to forge a SIP address.
• White Lists – Spam filters maintains lists of identities allowed to call; this is a solution
preventing spam but it has the major disadvantage that callers who are not explicitly
mention in the white list cannot successfully place their calls. For an enterprise, this
solution, used alone, is unacceptable.
• Consent-Based Communications – Consent-based solution is used in conjunction with
white and black lists. When callers call, for example, employees of an enterprise, for the
first time and that they are not on the white or black lists of the employees, then their call
is first rejected and the call is notified to the called parties. The called parties have then to
Figure 6.8 – VoIP spamming against an enterprise’s personnel
put the callers in the black or white lists. However, this solution is not well-adapted for
VoIP, since it can provoke the impatience of callers and the impatience of callees who are
bothered with consent requests instead of calls.
• Reputation Systems – A reputation-based solution is a mechanism which builds trust
between users within a SIP community and prevents attackers from carrying out attacks.
In a corporate environment, this solution could be adopted by enterprises to create a
social network based on business relationships between companies.
• Address Obfuscation – In order to protect SIP URIs which appear publicly on web sites
from being collected and added to spam lists by spammers, address obfuscation is used to
make SIP URIs undetectable by address gatherers (spam bots) but readable by humans, like
sip: it_man at site1 dot com. This is not a very effective solution to SPIT.
100
• Turing Tests – these tests are sophisticated tests used to detect if calls have been
generated by machines or by humans. When receiving incoming calls, IP-PBXs perform
Turing tests, i.e. they challenge the caller to respond to a question that can be answered
only by a human and not a machine. When combined with behaviour learning, Turing
tests can be used selectively rather than for every call.
• Circles of Trust– Creation of relationships of trust between enterprises, with penalties in
case of detection of spamming coming from one of the member of this relationship.
The solutions above mentioned are technical solutions; they do not efficiently prevent SPIT
unless they are combined together. They usually involve the feedback from called parties after
the call. However, it is thought that it would be possible to prevent SPIT by legal/regulatory
measures (prohibition of spamming and legal penalties); nevertheless, the problem remains
unresolved if legislation prohibiting SPIT is not adopted at a global scale but only at a national
level; the reason is that SPIT attacks could originate from countries where SPIT is not prohibited
and target countries where it is. Such a global legal frame is currently lacking.
Another way to prevent SPIT is the economic factor: by rendering massive spamming
prohibitively expensive, SPIT could be contained. A method to achieve that is the payment at
risk, which means that callers put money in escrow that the callee refunds if the call is not a
spam. For example, when an unknown caller calls for the first time an employee of an enterprise,
he must transfer a small amount of money to the employee’s account before that the employee
picks up the phone. If the employee decides that the call is not a call spam, he refunds the money
to the caller, otherwise he does not. After this first call, the payment at risk is not necessary
anymore, since the caller’s identity has been classified in a black or white list. Besides, SIP
providers can also prevent the propagation of SPIT on their domains, by creating inter-domain
relationships with other SIP providers and by establishing agreements which stipulate the
charging of SIP exchanges.
Nowadays, aware of the emergency to find a viable solution and implementation for SPIT
prevention, manufacturers and researchers are increasingly investigating new ways of resolving
the SPIT problem before that it becomes a reality. Some of the most recent and interesting
solutions are the following:
• SPIT prevention system at the NEC Laboratories [100]: Prevention system software has
been developed to perform a multi-stage filtering of calls. The prototype created has
been integrated in a SIP Express Router (SER); when incoming calls are received by
the SER, they are treated by the SPIT prevention application to detect if they are
spams or not.
• Creation of a network-level anti-spit entity [97]: An anti-spit entity is used to mitigate call
spams but instead of being installed on a SIP Proxy server, it is preferably installed
on a network traffic analyzer connected to the network to protect. The traffic
analyzer detects VoIP traffic and then forwards it to the anti-spit entity. As for the
previously presented solution, the anti-spit entity performs a multi-stage filtering
according to particular detection algorithms.
• SPIT prevention system with incorporation of active fingerprinting [98]: This solution consists
of the implementation of a firewall detecting SPIT. For SPIT detection,
fingerprinting techniques are used.
• SPIT-AL project [99]: It has developed a open source SPIT filter which is effective,
legally-compliant and which ensures privacy. This adopted approach combines
several filtering algorithms, black and white lists and uses several aspects of
reputation systems to implement a reachability management system.
• Reputation-based solution by Y. Rebahi and D. Sisalem [96]: this solution suggests a
mechanism which uses the reputation concept for building trust between members
101
of a SIP community and which is based on reputation ratings. A metric for
computing reputation is defined and an algorithm of evaluation of users is presented.
In this solution a new entity called Reputation Network Manager (RNM) has been
introduced to build the SIP social network and to compute reputation rate. When a
SIP server receives invitation messages, it first forwards them to the RNM in order
to determine if they are unsolicited or not.
SPIT prevention is an ever-growing field of research, given the perspective of the future
detrimental consequences of SPIT. Currently, there is no perfect solution, only flexible ways to
mitigate the problem. Even if SPIT is not yet a major concern for enterprises, it will certainly
become one; at that moment in the (near) future, enterprises will have no other choice but to
protect their VoIP systems by implementing some SPIT prevention system which will be very
probably in the continuation of one of the above mentioned solutions.
6.5. VoIP VPNs: secure interconnection of distant VoIP systems

Definition
In the case of a multi-site enterprise, it is necessary to interconnect in a secure way the
several VoIP systems of each distant branch as well as to ensure an adequate level of quality of
service (QoS) for VoIP communication. The major solution to resolve this problem is to create
VoIP VPNs interconnecting remote VoIP systems.
A VoIP VPN can be defined as an enhanced managed data VPN service offered to
enterprises. VoIP VPNs can be IP-based VPNs which have been enhanced to be able to carry
voice traffic with a high quality of service along with data traffic at the same time. In that case,
there are some mechanisms to prioritize voice traffic over data traffic. But, VoIP VPNs can also
be IP-based VPNs which are dedicated to voice and transport only voice traffic. Enterprises
which have large volumes of VoIP calls between their sites and want to have an optimal VoIP
QoS can opt for this second solution: to have one IP-based VPN dedicated to data and one for
voice only. VoIP VPNs have to be distinguished from voice VPNs which exist since long time and
which transport voice over TDM circuits [120].
Very few enterprises can create and manage on their own end-to-end VoIP VPNs between
their sites because this is a difficult task which requires a high-level in-house expertise that is the
privilege of a few very large enterprises. This is why generally enterprises have recourse to VoIP
service providers or carriers which offer to create VoIP VPNs over managed and unmanaged IP
networks and to manage these VPNs for their enterprise customers. Due to the intervention of
third-parties in the management VoIP VPNs, VoIP VPNs are considered as managed services.
Models of VoIP VPNs

According to [125], it is possible to distinguish two models of managed VoIP VPNs offered
by carriers:
CPE (Customer Premises Equipment) Model, also called CPE-VPN
Hosted Model or Provider-provisioned Model, also called PP-VPN
In the CPE model of VoIP VPN, the service provider hosts a VPN server in its own
network and enhances the edge router of the enterprise customer’s network (or replaces it by a
new one more performing or installs additional software) so that to render it VPN-enabled. The
role of the service provider in the enterprise customer’s premises is then to manage it, while the
personnel of the enterprise can still perform some administration activities through a web-based
management interface. The advantage of this model is for the service provider to spare costs of
ownership and for the enterprise to avoid complex installation, configuration and management
tasks for the creation of VoIP VPNs. In a managed VoIP CPE-VPN, the enterprise customer
has to convey to its service provider its dialling plan (at least the addresses of its different sites) so
102
that the dial routes are configured on the service provider’s VPN Management Centre. As
illustrated in Figure 6.9, when an employee of the enterprise calls (1) an employee located in the
remote branch office, the IP-PBX forwards (2) the SIP INVITE message to the VPN-enabled
edge router of the enterprise which in turn sends (3) a request to the centralized VPN
Management Centre located in the service provider’s site asking for the IP address of the
destination site based on the dialled extension number or on the “dialled” SIP URL. The VPN
Management Centre looks into its database of configured routes and dialling plans and answers
by returning (4) the requested address. On receipt, the VPN-enabled edge router creates (5) a
secure channel on which the SIP INVITE message is transmitted. The VoIP CPE-VPN model
allows to create end-to-end VoIP connectivity.
Figure 6.9 – CPE model of VoIP VPN
The most used encapsulating protocol used for VoIP CPE-VPNs is the IPsec protocol,
operating in tunnel mode to connect sites. IPsec has the advantage to encapsulate UDP or TCP
segments; that way, there is no problem of compatibility with SIP since SIP messages are usually
transported over UDP. The voice IP packets routed through the VPN are encapsulated into
other IP packets according to the Authentication header (AH) or Encapsulating Security Payload (ESP).
In the Hosted Model or Provider-provisioned Model, the creation and management of VoIP
VPNs depend strongly on the service provider’s equipment and on its managed network.
Generally, the service provider lets its managed MPLS (Multiprotocol Label Switching) network
available for the establishment of MPLS VPNs. MPLS, defined in RFC 3031 [102], is used for
forwarding packets over the backbone and can be used at layer 2 or layer 3 of the OSI model
[54]. MPLS VPNs ensure a high quality of service for the transport of voice traffic and ensure
Service Level Agreements (SLAs) by providing scalable robust QoS mechanism, guaranteed
bandwidth and traffic-engineering capabilities. MPLS VPNs protect enterprises from externally-
launched Denial of Service attacks against their VoIP networks since it is impossible to insert and
modify packets in these VPNs. However, MPLS VPNs do not provide any encryption
mechanisms and therefore, confidentiality is not guaranteed.
103
Figure 6.10 – Hosted Model of VoIP VPN with MPLS
Comparison of VoIP VPNs

The different types of VPN for VoIP traffic that can be used do not always fulfil the same
security requirements and present some drawbacks.
IPsec ensures encryption at the layer 3 and can be used for site-to-site VPNs as well
as for remote access VPNs (for mobile workers).However, it has the big
disadvantage to add a large overhead to IP packets which has repercussions to
quality of service.
MPLS is a very powerful solution which allows to guarantee bandwidth and thus
quality of service for voice traffic. However, it has the drawback to leave voice traffic
in clear text, giving way to confidentiality attacks. This is why, sometimes,
enterprises opting for MPLS VPNs because of their scalability and their QoS
guarantee augment them with IPsec when they need additional security functions
such as data encryption. Layer 2 MPLS VPNs are very appreciated and mainly used
for the protection of static routes, for example, between two enterprise sites.
By choosing to encrypt signalling with the Transport Layer Protocol (TLS) and
media streams with the Secure Real-Time Transport protocol (SRTP), a sort of TLS
VPN is created. This sort of VPN could be created for the call between an employee
of the enterprise and a customer on the Internet. The real problem with TLS
encryption is that it requires the deployment of a Public-Key Infrastructure (PKI)
which is very complex.
The choice of the VoIP VPN therefore depends on the specific requirements of enterprises.
SSL VPNs?
Although it has long been thought that it was impossible to use the Secure Sockets Layer
cryptographic protocol (SSL), the predecessor of TLS, for VoIP security, it has been recently
asserted that SSL VPNs could be used for VoIP traffic [60]. This assertion relies upon
experimental results conducted on 10 SSL VPN solutions. If SSL VPNs can indeed be
compatible with voice applications and ensure the required QoS, as it has been claimed in this
104
article, then SSL VPNs could prove in the future very useful for external mobile workers which
could easily connect from whichever visited network.
6.6. VoWLAN security

As the introduction of the Voice over WLAN (VoWLAN) technology in enterprise
networks becomes increasingly widespread, particular attention must be paid to the security of
mobile wireless terminals like smartphones, laptops or PDAs, placing or receiving VoIP calls in
the enterprise premises.
VoWLAN requires the same security measures used to secure wired VoIP but also extra
security mechanisms. These mechanisms do not aim at providing additional security at the
application layer but at securing the wireless access. Indeed, the application-based, and in
particular the SIP-based, security issues are the same for all IP phones, irrespective of their
connectivity, wired or wireless. It is important to secure the wireless links from eavesdropping,
interception or DoS by attackers who could have penetrated legally or not the premises of the
enterprise, as well as to protect Access Points(Aps) from physical attacks or from spoofing by
rogue Aps (see Threat analysis in Table 10.3). By using certain application tools like Kismet, it is
very easy for malicious employees or visitors in an enterprise to detect wireless networks by
passively collecting data packets and detect standard named networks and detecting the presence
of hidden networks. Kismet is an 802.11 layer 2 wireless network detector and sniffer. Besides,
malicious employees or attackers can install rogue Aps. To prevent this threat, enterprises can
install wireless intrusion detection systems to monitor the radio spectrum to detect unauthorized
access points or tools like NetStumbler and then block them by several methods [119].
As for the protection of confidentiality and integrity, standards provide them for wireless
networks.
Wired Equivalent Privacy (WEP)
To protect the 802.11 MAC and PHY layers, the 802.11 standard provided the WEP (Wired
Equivalent Privacy) mechanism but this standard has been proved to be too vulnerable and
easily broken by attackers [118].
The WEP aimed at providing security at the MAC layer (or link level) by securing the traffic
between a terminal and an Access Point (AP).WEP is a shared key authentication process and
was intended to ensure confidentiality, access control and data integrity. In WEP, a wireless
device sends an authentication request to the AP requesting shared key authentication. Upon
receipt, the AP generates a random number (nonce) by using the WEP algorithm and sends back
a challenge with this number to the wireless device. The latter then uses its locally configured
WEP key to encrypt the challenge sent in a new authentication request; when the AP receives it,
it tries to decrypt the encrypted challenge. If it succeeds, then the wireless device has been
authenticated. WEP does not provide a strong authentication mechanism and is vulnerable to
attacks. Attackers can easily recover the WEP key of any wireless device and then impersonate it.
This is due to the fact that WEP encryption consists in the application of the XOR function on
the clear text challenge and the private WEP key; if the XOR function is applied on the clear text
challenge and the encrypted challenge, the result is the private WEP key. That way, attackers can
easily recover the WEP key by eavesdropping on challenges and their respective encrypted
challenges. Even performed with large keys, WEP is not recommended for enterprises.
802.1x and EAP (Extensible Authentication Protocol)

IEEE 802.1x is a standard performing authentication and key management for wired and
wireless networks and is highly recommended for wired IP terminals and wireless devices to
authenticate them at a network level. In wireless networks, 802.1x prevents unauthenticated and
unauthorized devices from gaining access to the wireless network. As shown in Figure 6.11,
802.1x standard makes use of the Extensible Authentication Protocol (EAP) between the wireless
105
terminal and the Access point and the latter communicates with a RADIUS server for AAA.
802.1x prevents rogue terminals from connecting to the wireless network but also rogue Access
Points from performing Man-in-the-Middle attacks.
802.1x is the minimum
required for ensuring
authenticity in VoWLAN and
laptops and other wireless
devices should authenticate
themselves against Access
Points with this standard.
WPA2 (802.11i standard)

WiFi Protected Access 2
(WPA2), implementing the
IEEE 802.11i standard replaces
the WEP of the 802.11
Figure 6.11 – 802.1x authentication for VoWLAN
standard. It provides strong
encryption by using the AES
encryption. For authentication, the 802.11i standard uses the 802.1x standard, for keeping track
of associations, it uses RSN (Robust Security Network) and for integrity, integrity and origin
authentication, it uses AES encryption. Nowadays, the support of WPA2 by most wireless Aps
and wireless devices is increasing.
In order to protect a sensitive application like wireless VoIP, WPA 2 seems by far the
best security solution for link-level security that enterprises are highly recommended to adopt.
6.7. Limits on the technical and commercial efforts

In this chapter, the major security mechanisms involved in VoIP security have been
reviewed. SIP-based VoIP security is a major area of research and the last years many RFCs have
seen the light to compensate for the lack of security mechanisms in RFC 3261, and to reinforce
VoIP security by using strong security mechanisms like encryption.
However, although many RFCs making security recommendations have been published,
many manufacturers of VoIP devices like IP phones, IP-PBXs, SIP servers, or VoIP gateways do
not respect recommendations and do not implement them in their products. For example, some
vendors support key exchange management mechanisms like SDES but do not encrypt signalling
which transports the key! This key passes through the network in clear text and can be
intercepted easily. Another example is that TLS or S/MIME which are explicitly recommended
by RFC 3261 are not supported in all IP phones mainly because they demand high processing
performance from devices with limited processors. A third example is that the standard 802.1X
which is an authentication standard not only for wireless devices but also for wired IP devices is
not supported by the major part of IP phones, preventing the use of this technology for network-
level authentication.
Besides, another problem in the implementation of security is the interoperability of VoIP
equipment. For example, TLS is unlikely to run smoothly across multi-vendor equipment; on top
of that, TLS is not even always supported by all the products of a same vendor.
Flaws in VoIP components like flaws in their operating systems, in message parsing…
constitute also a limit in commercial VoIP products. However, the last years, the use of security
tools to probe VoIP flaws like PROTOS has helped vendors to discover and fix flaws in their
products.
Additionally, the encrypted remote management configuration of VoIP components is very
often neglected by manufacturers which do not offer this option. That is why enterprise VoIP
106
deployment designers should pay attention to install VoIP devices enabling encryption-based
remote management. Thus, the major problem is not that security standards or technologies lack
but that there are not rightly implemented.
Besides, another source of insecurity in VoIP is that enterprises do not always make use of
technologies aiming at protecting their VoIP systems. For example, it is fundamental that VoIP-
specific firewalls are adopted instead of relying on traditional data firewalls for the protection of
SIP networks.
107
7. Securing VoIP systems in enterprise networks
In the light of the identification of threats specific to VoIP systems in enterprise networks
seen in Chapter 5 and of the technical solutions presented in Chapter 6, the way how enterprise
VoIP systems can be secured in practice will be studied in the current chapter.
First, a brief overview of the risk mitigation process will be presented in paragraph 7.1.
Then, in paragraph 7.2, a comparison of four sets of recommendation for VoIP security published
by four major institutions has been performed, described and analysed and the main common
points as well as the points of divergence have been highlighted and commented. At last, VoIP
security measures have been applied to two models of enterprises studied throughout this master
thesis report, namely the model of a small single-site enterprise and that of a large enterprise
network. Let’s see which the results from these analyses are.
7.1. Studying the risk mitigation process

In the prolongation of the risk assessment studied in 5.3.1, the so-called risk mitigation
process will be now briefly studied. As seen earlier, the risk assessment process which includes
the threat and risk analysis processes constitutes the first step of risk management, the second
step of risk management being the risk mitigation process.
To mitigate means to moderate, to lessen unwanted or harmful effects; thus, risk mitigation
can be defined as the moderation of the likelihood that threats occur as well as the moderation of
impacts that threats could have on a system, like a VoIP network. The risk mitigation process
includes the efforts taken to reduce likelihood or impact, no matter the nature of these efforts
(physical measures, practices, policies, financial measures…). According to the NIST [1], risk
mitigation “involves prioritizing, evaluating and implementing the appropriate risk-reducing controls
recommended from the risk assessment process”.
The main options to achieve risk mitigation in VoIP systems are:
• Risk avoidance – risks are mitigated by eliminating the risk causes or consequences.
For example, in order to avoid security issues generated by the use of softphones in
an enterprise VoIP network, a mitigation option could be to ban all softphones from
the enterprise. Another example of risk avoidance eliminating risk consequences is to
use Network Intrusion Detection Systems to detect the intrusion of attackers in the
VoIP network before they act harmfully.
• Risk assumption – risks are either not mitigated at all, if the risk is low, or risks are
brought to an acceptable level by taking some measures. For example, it is thought
currently that VoIP networks will not be able to totally protect themselves from SPIT
attacks; however, they could implement in the future SPIT prevention systems which
would bring SPIT attacks at an acceptable level of frequency. However, since the risk
of SPIT attacks is currently very low, enterprises do not necessarily need to care
about any SPIT prevention system (disregarding the fact that no such system exists
for the moment in the market!)
• Risk limitation – risks are mitigated by the application and implementation of
security measures to limit the impact of threats. For example, the risk of
eavesdropping can be mitigated by encrypting VoIP media and/or signalling streams.
Enterprises must make a choice of which risk mitigation options to use to address the
major threats to their VoIP systems as identified at the outcome of a previous threat analysis.
However, the choice of a mitigation option is not adequate: appropriate technologies and policies
should also be implemented.
108
Mitigation is an important process, which according to B. Materna [67], should be part of
a “proactive security architecture for VoIP” which would build a systems-level approach to VoIP
security. The mitigation process should follow a prevention process and a protection process.
During the prevention step, enterprises perform a vulnerability assessment before the VoIP
deployment is begun, i.e. before that any VoIP device or application is installed in the company.
The vulnerability assessment allows enterprises to verify vendor claims regarding the security
features of their products and to identify flaws in VoIP products or network design. This
assessment should be kept on even after the deployment of the VoIP system. The next step is
protection: a multi-layer security infrastructure is advised to be deployed ensuring security at the
network perimeter as well as within the enterprise VoIP network. Mitigation is the last step which
aims at remaining aware of the potential new threats that could occur and that have not been
taken into consideration during the two previous steps, either by default or due to the novelty of
the threat. Systems providing real-time automated VoIP security mitigation are expected in the
market and are thought to become an essential part of the VoIP security infrastructure in the
near future. Already, vulnerability assessment systems have recently seen the light.
Which risk mitigation measures should be taken and for which level of security are often
described in recommendations or best practices serving as guidelines and published by
institutions, security experts, manufacturers... However, due to their plurality, it is imperative to
sort through them and make a selection. This has been done in the next paragraph, leading to an
interesting and fruitful comparison.
7.2. Comparing major VoIP security recommendation reports
7.2.1. Interest of a comparison

To mitigate security vulnerabilities in VoIP deployments of private networks like
networks of organizations, universities or enterprises, many recommendations, best practices or
security guidelines have seen the light to help managers, designers and administrators of VoIP
networks in their security mission. Throughout this master thesis report, the term of security
recommendations will be used to define guidance regarding the adoption of security measures in
order to mitigate, i.e. to reduce, the risk of vulnerability exploitation in VoIP systems. These
recommendations include guidelines for implementation of security mechanisms, for appropriate
architecture construction, appropriate VoIP components to install, appropriate software…
Due to the current topicality of VoIP security and to its criticality in enterprise networks,
many institutions or VoIP experts have published their recommendations to help companies in
their deployment of secure VoIP systems. Four reports on VoIP security measures have drawn
my attention by their exhaustiveness or importance:
• “Internet Protocol Telephony &Voice over Internet Protocol – Security Technical Implementation
Guide – version 2” [63] published in April 2006 by the American Defense Information
Systems Agency (DISA)
• “VoIPsec – Studie zur Sicherheit von Voice over Internet Protocol” [58] published in October
2005 by the German Federal Office for Security in Information Technology (BSI)
• “Security considerations for Voice over IP systems” [59] published in January 2005 by the
American National Institute of Standards and Technology (NIST)
• “Security Guidance for Deploying IP Telephony Systems” [62] published in February 2006 by
the American National Security Agency (NSA)
Three of these reports have been published by American federal institutions and the
fourth one by a German federal office. All of them have been recently published and adopt
several different approaches to present their security recommendations for VoIP deployments.
The lack of a common European VoIP security report for enterprises is to deplore. The
109
European Commission has started to make some timid steps to regulate VoIP and made in 2004
a public call for input asking several institutions, regulatory authorities, operators, ministries … to
provide comments on Voice over IP; the major part of them has underlined the importance of
VoIP security. Therefore, the publication of a European set of recommendations for VoIP
security would be welcome.
The VoIP security recommendations suggested by these four reports present some
common points as well as contradictions and due to the various approaches adopted,
recommendations made in some report may lack in the others and vice versa.
Therefore, enterprises which are considering securing their VoIP network might be at
first disoriented by the confusing plurality of approaches, the difference in the depth and
granularity of security measures and the divergence and contradictions between
recommendations. A detailed comparison of the four sets of recommendations seems to be
inevitable and highly profitable for enterprises to guide them during the deployment of their
VoIP system.
In Annex 4, I have performed a thorough comparison of the four sets of
recommendations. A long comparison table summarizes this comparison by listing exhaustively
all the recommendations made in these reports and indicating for each of them which institutions
have suggested them. To each recommendation has been assigned a level of security (minimum,
medium or highest) which specifies the degree of security it introduces.
Let’s first see to what extent the approaches in these four reports differ and which are the
most adapted and helpful for enterprises.
7.2.2. Several approaches of recommendations

The DISA security implementation guide, the BSI security study, the NIST
recommendations guide and the NSA security guidance suggest all of them a series of security
recommendations for private VoIP networks; however, they do not adopt the same approach to
present them. By “approach” it is meant, at least, the classification of recommendations, the
degree of detail in the description of recommendations, the categorization of the security levels
associated with recommendations and the focus on specific recommendations.
Classification of recommendations
The DISA report has chosen to classify its recommendations in 13 parts. It distinguishes:
• The physical protection of VoIP servers and network devices critical for VoIP
• The protection of VoIP components (VoIP servers, IP phones and their
configuration, voice mail servers,
• The protection of processes: authentication and authorisation of IP phones during
registration
• Security policies for the use of softphones
• The segregation of data and voice virtual LANs (VLANs)
• The voice network protection: perimeter security, traffic control, wireless VoIP,
VoIP connection to the DSN (+ security of MGCP protocol, which is irrelevant in
this report)
• Call privacy and confidentiality
• The management of VoIP components
Even if slightly disordered, this structure is interesting and helpful for enterprise because it
focuses on the protection of components, physical or software, and areas of protection (VLANs,
perimeter…). This report is well-structured regarding the classification of its recommendations.
The BSI report has classified its recommendations in the same way it had done for the
threats to VoIP systems. A similar structure of classification for threats and for recommendations
110
helps to make it clear which security measures solve which security issues. This structure is built
in the following way:
• Security measures mitigating network-based vulnerabilities: physical protection,
power supply, separation of voice and data networks, authentication of VoIP
terminals, measures against MAC spoofing, ARP spoofing and other Layer 2 and 3
attacks, redundancy of critical VoIP components, protection against eavesdropping
and interception, protection against manipulation of signalling and toll fraud,
firewalls and NIDS…
• Protection of VoIP middleware, i.e. VoIP servers and gateways
• Protection of IP phones
• Security protocols (security of signalling and media streams)
The BSI report is, without any doubt, one of the most interesting VoIP security
recommendations reports due to its rigour and exhaustive list of recommendations on the
underlying network. In comparison with the other reports, it is the only, along with the NSA
report, which details the measures to be taken for the underlying network security. This structure
for the classification of recommendations is also interesting and its study could be beneficial for
enterprises. However, these parts are too disproportionate, the first one being the longest.
The NIST report makes a suggestion of recommendations in 9 points and develops them
very shortly. These points include the development of an appropriate network architecture, the
physical controls, the power back-ups, the VoIP-aware firewalls, softphones and wireless VoIP.
The NIST report has thus no particular structure. Although this report is considered as a VoIP
security reference, the lack of structure and exhaustiveness in its recommendations renders it
almost useless for guiding enterprises in their deployment.
In contrast, the last report, the NSA report, is of great value for secure enterprise VoIP
deployments. Its guidelines have been first distributed in four infrastructure areas: network,
perimeter, VoIP servers and IP phones. For each of these areas, threats have been identified and
the associated mitigations have been detailed. The NSA report has also the advantage to
summarize clearly its mitigations in tables for each area.
Granularity and depth of recommendations

The four reports which have been compared differ significantly in the degree of detail
and in the technical depth of their recommendations. While the NIST report remains at a
superficial level without really detailing its recommendations and makes a mixture of technical,
financial (point 5) and legal (point 9) recommendations, the three other reports give exclusively
technical recommendations or recommendations on best practices and policies. The lack of
rigour and depth of NIST recommendations reduces their value and importance.
The BSI report is the only one which details its recommendations relatively to the
protection of the network at layer 2 and 3 and goes into depth similarly as it had done previously
for the network-based attacks against VoIP systems. The NSA report tries also to do the same
but not as detailed. As regards the DISA report, it simply makes a reference to a network
infrastructure security implementation guide. An example is that of the security of VLANs: while
the BSI gives some specific protective measures for VLANs (against VLANs hopping for
example), the DISA report refers only to the VLAN section of a network infrastructure security
implementation guide and the NSA and NIST do not mention them or refer to other documents.
Concerning recommendations about the VoIP servers and IP phones, the DISA and
NSA reports go really deep into details, while the BSI report is not as exhaustive as they are,
though it enters into a few specific technical points. The concept of backup is mentioned by all
the reports except for the DISA one, but the NIST report makes only an allusion to it for legal
purposes, and the BSI report makes only a recommendation on the encryption of the backup
data. In contrast, the BSI report suggests several security measures concerning backup.
111
The most astonishing and striking feature in these reports is that the DISA report is the
only which analyses in depth the protective measures related to softphones while softphones
constitute a critical source of vulnerability in VoIP systems. It analyses different study cases like
the use of softphones on laptops outside of the company, in the company premises… and gives
for each of them recommendations. The need for a softphone security policy is strongly stressed.
The NSA report, instead, makes only a few recommendations relating to softphones, with a
technical character however, contrary to the NIST, which simply advises against the use of
softphones but does not say anything in case that softphones should be used in a company, for
mobile workers for example. Strangely, the BSI, which in the main lines has a fine granularity in
its recommendations, does not make any special recommendation for softphones except a
possible creation of VLAN for softphones.
Concerning the protection of management and configuration of VoIP components, the
NSA confers a great importance to it and suggests several recommendations, in particular for the
management and configuration of VoIP servers, entering into details. In contrast, the three other
reports do not relatively develop this aspect. However, the BSI report makes a few
recommendations on the management and configuration of IP phones.
Other topics like the security of voicemail servers, VoIP gateways, E911 services or
wireless have not been covered by all reports. For example, the security of voice mail services
has been addressed only by the DISA report and relatively in-depth.
At last, while all reports recommend the use of encryption for the confidentiality and
privacy of VoIP calls, the ones which mention SRTP as the encryption protocol for media do not
make any recommendation on the key management protocol!
Categorization of security levels: two approaches

Except for the NIST report, all reports define, describe and associate security levels with
their recommendations.
The NSA report defines three security robustness levels which describe the strength of
mitigation achieved when recommendations are implemented. Highest robustness mitigations
encompass mitigations which protect the VoIP system in the best way within the limits of current
technology standards and without sacrificing the basic services of VoIP. Medium robustness
mitigations represent the best practices for designing and administering a VoIP network which
will provide the same security level as legacy telephony systems. The medium robustness level
should be implemented in all enterprises, because minimum robustness level is not enough.
Minimum robustness mitigations represent mitigations introducing the strict minimum security in
VoIP systems.
The BSI report defines three protection classes which have been defined according to the
vulnerability severity. Protection classes and vulnerability severity evolve in the same direction:
the higher the vulnerability severity, the higher the security level which is needed to mitigate
vulnerability. In the BSI report, Class 1 corresponds to the lowest level of security (needed to
mitigate low-severity vulnerabilities) whereas Class 3 corresponds to the highest level (needed to
mitigate severe vulnerabilities). The detailed description of these classes can be found in [58].
Similarly to the BSI report, the DISA report does not define security levels but
vulnerability severity codes. Four vulnerability severity categories have been defined with a
Category I corresponding to the highest severity (attackers can gain immediate access to a device
or have administrator rights or bypass firewalls) and Category IV corresponding to the lowest
(vulnerabilities which can be easily mitigated). Category II corresponds to vulnerabilities which
allow intrusions in the VoIP system and Category III to vulnerabilities which can compromise
VoIP components.
From this comparison, two different approaches have been made clear: security levels
defined by the strength of mitigation or security levels define by the vulnerability severity they
have to mitigate.
112
Focus on particular topics
All four reports do not put the accent on the same areas of security and therefore make
different recommendations. While the DISA report has suggested an exhaustive set of
recommendations on softphones or voice mail servers, BSI has preferred to focus on security
measures like redundancy, i.e. the introduction of redundant VoIP critical servers and gateways in
the VoIP network, or the development of a network intrusion detection system (NIDS) in the
VoIP network. In contrast, the NSA report has deemed necessary to make exhaustive
recommendations on network availability (power cut mitigations or backups) or on the
management of VoIP components; for instance, it makes an inventory of security measures
related to the protection of web-based management interfaces of IP phones as well as VoIP
servers, to the security of remote management of VoIP servers by vendors and to the
management protocols used for IP phones.
The diversity of approaches in VoIP security reports and their confrontation in a

comparison table as this established in Annex 4 can be of great help to enterprises wishing to
deploy secure VoIP systems. Given that the reports do not always focus on the same aspects of
security and make different recommendations to mitigate various security issues, they
complement themselves and their confrontation and comparison can only bring enterprises to
build, as far as possible, a complete picture of security measures.
7.2.3. Common points of recommendations

When comparing the four sets of recommendations by having a look at Table 10.6, one can
see that, fortunately, they present many common points presented in the table below.
Protection of VoIP servers

• VoIP servers should be dedicated to only applications required for VoIP (do not use servers for general
Internet access, such as email and web browsing)
• Critical VoIP servers should be secured according to the security guidelines of the operating systems on
which they run
• Software on VoIP servers and OS should be up-to-date and that new available software patches can be
downloaded and installed
Physical security and protection against data loss of critical VoIP servers
• All critical VoIP network and server components should be located in physically secured areas (in special
controlled rooms like server rooms or network-wiring closets)
• Only trusted authorized personnel should have access to VoIP network and server components (by using
smartcards, one-time passwords or biometrics). Log of people entering server rooms should be maintained
• Critical VoIP servers should be protected from power cuts by short-term backup like Uninterruptible
Power Supply (UPS)
Data and voice segregation

• The underlying network supporting the VoIP system should be configured using VLANs and at least
one voice VLAN and one data VLAN should be configured to segregate voice traffic from data traffic
• The voice network should be subdivided into multiple VLANs to segregate VoIP components by type and
function
• The voice VLAN should be subdivided into 2 VLANs: “producing VLAN” with VoIP servers and
VoIP gateways and a “consuming VLAN” with IP phones
• When using DHCP for address assignment, different DHCP servers should be used for voice components
and data components and these servers should reside in their respective voice or data address space
113
• The local network’s VLANs should be implemented in accordance with the VLAN security rules
• Port level security should be enabled (to allow dynamic or static mapping of MAC addresses to VLAN
ports) on all switches (for all security levels)
• 802.1x authentication should be used to authenticate devices in the voice VLAN
• IP phones (that do not contain a multi-port switch) and servers providing voice services should be connected
to switchports with membership only to the voice VLANs
• Data workstations (without approved softphones) should be connected to switchports with membership only
to the data VLANs
IP softphones
• For minimum security level, the installation of softphones can be accepted on workstations (fixed or
portable) intended for day-to-day use in the users normal workspace if a special VLAN dedicated to
softphones have been previously created
VoIP network protection and internal traffic control

• A Network Intrusion Detection System (NIDS) should be implemented and a sensor should be connected
to a switch port of every critical switch to filter and control traffic. These connections of sensors to switches is
made possible if switches have the Switched Port Analyzer (SPAN) functionality
• Voice or data traffic between the data and voice VLANs should be filtered and controlled by an
appropriate firewall
• Data or voice traffic between the data and voice VLANs should be controlled by a layer 3&4 stateful
firewall; filtering traffic is not required at the application layer (for minimum level)
• Traffic between voice VLANs should be filtered and controlled by a layer 3 switch/router ACL or a
layer 3&4 stateful firewall
• All mobile employees or employees from another site should connect to the enterprise voice VLANs
through a VPN and voice packets coming from outside the enterprise should pass through a firewall before
they reach the voice VLANs
• Interoffice VPNs should respect and maintain the separation of voice and data traffic
• For the control of external calls, ALG-based firewalls are not the only solution to adopt: Session Border
Controllers or other standards-based solutions can be used instead
Call privacy and confidentiality

• All VoIP traffic that is sent over WAN connections via an IP WAN network like the Internet…
should be encrypted via VPNs and that the VPNs respect and maintain the separation of voice and data
traffic
Configuration/management of VoIP components
• All remote administrative connections (in-band or out-of-band) to critical VoIP servers should be encrypted
(use SSH instead of Telnet or FTP or use IPsec VPNs)and Telnet should be disabled
Table 7.1 – Recommendations common to the majority of selected recommendation sets
These common points can be regarded by enterprises as the minimum set of

recommendations which MUST be implemented in order to secure their VoIP deployment.
114
7.2.4. Divergence in recommendations
From Table 10.6 which compares the recommendations of the four above mentioned
reports, the below contradictory choices can be derived.
About IP phones with multi-port Ethernet switch

Many IP phones have an integrated Ethernet multi-port switch; this allows to plug in an IP phone into a
computer and get connected to the network. This enables the IP phone and the computer to use the same
network port. This solution has been designed for an easier deployment of IP phones without the need for
additional cabling for phones. However, it presents many security problems: first, both the IP phone and
computer will be aware of the network traffic meant for either device and second, an attacker who succeeds in
compromising the computer can have access to the IP phone and eavesdrop on calls or launch a DoS attack,
and vice versa.
Contradiction of choices: The NSA report recommends disabling data ports only to achieve the
highest security level; however, for a minimum or medium security level, data ports can be enabled only if the
phone’s multi-port switch supports VLANs. In contrast, DISA advises to disable data ports on IP phones
only if a computer is not attached to it: it does not make any distinction between security levels and does not
recommend the disabling of data ports even for highest security. It only says that data ports should be disabled
only if IP phones do not support VLANs (802.1Q Trunking). Opposed to both DISA and NSA
reports, the BSI one suggest disabling all data ports whatever the security level.
About the subdivision of voice VLAN

The four reports agree on separating logically the voice and data traffics by creating two VLANs, one
data VLAN and one voice VLAN. However, apart from the NIST report which does not mention it, all
reports also agree on the need for a further subdivision of the voice VLAN.
Contradiction of choices: While the BSI restricts this subdivision to only 2 VLANs, the NSA
and DISA report mention at least 5 VLANs. BSI defines a “consuming” VLAN on which there are
only IP phones and “productive” VLAN which include all VoIP servers, gateways and other servers.
Although NSA and DISA acknowledge that subdivision, they go further by subdividing the “productive”
VLAN. In spite of the fact that NSA and DISA mention at least 5 VLANs for voice, they are not the
same and present a significant difference. The common VLANs are:
• VLAN of VoIP servers (IP-PBX, AAA servers, DHCP servers…)
• VLAN of IP phones
• VLAN of VoIP gateways
• VLAN of computers with softphones
As for the fifth VLAN, the DISA report mentions a VLAN of message servers like email, voice mail
and unified messaging, whereas the NSA one mentions an administrative VLAN (which is, by the way,
mentioned by DISA as an extra VLAN but not included in the 5 most important). This is an important
contradiction because, NSA and DISA accord to create a Demilitarized zone (DMZ) in which all servers
which must be reached by both data and voice VLANs, i.e. messaging servers and convergent services, are
located. However, the DISA recommends creating a specific VLAN for them, while NSA does not.
About softphones
The installation of softphones on computers located in the premises of the enterprise or on computers
connected to the enterprise VoIP from the outside world via a VPN may pose severe security problems. This
is due to the fact that softphones reside in the data network but require access to the voice network in order to
access the IP-PBX, place calls… The problem is that softphones are highly vulnerable to attacks due to the
greater number of possible entries in the system (OS, resident applications, enabled services…) and also to
attacks targeting the data network itself (viruses, Trojan horses..)
Contradiction of choices: Curiously, BSI does not give any specific recommendation as for the use of
softphones in private networks, except for the creation of a specific VLAN.
115
The NIST prohibits softphones for any security levels, “if not practical”; in the same way, the DISA highly
discourages the use of softphones but accepts it in “special situations”. In contrast, the NSA makes a
distinction of security level for the use of softphones: for medium and highest security level, softphones are
prohibited while for minimum level, it is accepted under the condition to create a VLAN of softphones.
The contradiction between these recommendations is subtle: for NIST and DISA, exceptions for the use
of softphones are made if there is real necessity or practicality, while for NSA, if there is no need for
high network protection. They are two different points of view.
About the configuration of IP phones at the terminal

The configuration of IP phones can be made in three main ways: configuration at the terminal,
configuration on a web interface through connection to the embedded web server of the IP phone or automatic
configuration through a central https server from which configuration files are downloaded
Contradiction of choices: Apart from the NIST report which does not mention anything about this
topic, the three other reports disagree on which the best mode of configuration is.
The BSI report recommends against the configuration at the terminal while the DISA and NSA reports
recommend it, under the condition that it is password-protected. Besides, while BSI highly recommends the
configuration of IP phones through web interface but with a protected connection like https or even certificates,
NSA is radical on this subject: deactivate the web interface! (DISA does not mention anything about it).
NSA recommends the deactivation of the Web interface if users can access necessary phone features through
the phone’s display and administrators can configure phone settings using downloaded configuration files. This
choice of NSA can be explained by the fact that web servers on phones raise several problems: first, web
servers are a new source of vulnerabilities and second, cryptographic methods should be used to authenticate
authorized users but IP phones do not always support them.
About automatic registration of IP phones with VoIP servers

IP phones may have the capability of registering themselves automatically with the IP-PBX. However,
this option is not always wished.
Contradiction of choices: The NSA report recommends disabling automatic registration capability
after initial deployment of IP phones. The reason it gives is that with automatic registration, the IP-PBX (or
VoIP server) cannot verify whether the IP phones which registers is an actual phone or an attacker
masquerading as a phone. This recommendation has been made for all security levels.
In contrast, the DISA report agrees with the fact that automatic registration of IP phones should be
normally disabled; however, it makes an exception for the deployment of large VoIP systems: in that case,
automatic registration is allowed but should be disabled within 5 days following initial system setup and/or
following any subsequent large redeployments or additions. Margin of time differ in the two reports.
Table 7.2 – Points of divergence in the selected recommendation sets
7.2.5. Suggestions and comments

Concerning the contradictory points, my suggestions would be:
• Disable multi-port switch on IP phones for all users because this is an unnecessary
option, source of additional vulnerabilities. Only administrators of the enterprise
VoIP could enable them to plug into IP phones a computer serving as
configuration monitor.
• Subdivide the voice VLAN in at least 6 VLANs: VoIP servers, IP phones, VoIP
gateways, softphones, management and convergent services. Servers providing
convergent services should be placed in a DMZ and have their own VLAN; this
would allow a geographic distribution of servers in the enterprise premises.
116
• Softphones should not be used in areas requiring a high security level, even if they
are needed. Softphones should not be used inside the enterprise but only by mobile
workers or home workers outside of the enterprise
• If IP phones support strong authentication, the use of the web interface for their
configuration can be allowed; if not, only, configuration at the terminal is allowed.
At any case, it is suggested installing IP phones which support encryption
mechanisms.
• The automatic registration of IP phones during a VoIP deployment in small
enterprise should be deactivated. In large deployments, a few days could be granted.
This comparison of VoIP security recommendations by four major institutions is quite
interesting and may be useful for guiding enterprises. It has shown that there are several
approaches to mitigate security threats in VoIP networks and that different measures may be
taken to mitigate the same threats. However, enterprises should not be confused by the
divergence in recommendations; in the contrary, they should analyse the ideas behind the
recommendations, understand the divergence of point of views and make the right choice of
recommendation by always keeping in mind their own particular VoIP network architecture.
Recommendations are only hints which should be adapted and tailored to the measures of
specific VoIP architectures. Besides, enterprise should be really precautious as for the reliability
of certain security recommendations. Each recommendation should be closely analysed in order
to avoid introducing new (security) problems: for example, the BSI report recommends the use
of native VLAN as a measure against VLAN hopping but Cisco underlines in [121] that this
method should be avoided because it leads to loss of means of identification and loss of
classification of packets.
The comparison has also helped to build a full picture of areas to secure; some reports
have chosen not to focus on certain aspects whereas others have done it so they are
complementary. The most interesting recommendations which present some originality are the
recommendations on softphones by DISA and the ones on redundancy by BSI.
At the end of this comparison of recommendations, let’s see which major recommendations
should be applied in the two models of enterprises that have been studied throughout this report
and how it is possible to implement them.
7.3. Applying security concepts in two models of enterprises
7.3.1. Recommended secure VoIP architecture in a small enterprise

In the light of the security guidelines provided by the above mentioned four major
institutions for the mitigation of security threats in a VoIP network, it is high time to see how to
apply some of the recommendations to the particular case of a model of a VoIP system in single-
site enterprise which has been studied throughout this master thesis report and whose areas to
secure have been illustrated in Figure 5.3.
To secure the VoIP system of the small enterprise, as for any enterprise, many solutions
can be adopted and many different kinds of architecture can be thought about, depending on the
requirements in security level. Two architectures for secure VoIP systems will be presented below
differing mainly in their layer-2 VLAN segregation. It is obvious that to reduce the probability
that a malicious attacker penetrates the network and affects the VoIP network or even to reduce
the possibilities of eavesdropping by rendering broadcast domains smaller, it is necessary to
create several VLANs in the enterprise network to separate logically the VoIP and data
networks. VLANs are not in themselves a security mechanism but can be used to achieve some
security, such as controlling the flow of different types of traffic penetrating and leaving VLANs.
The number of VLANs to create depends on the level of security to achieve.
Let’s see a few suggestions of secure VoIP architecture.
117
Suggestion 1
Figure 7.1 illustrates a first suggestion of a secure VoIP deployment. The illustrated
architecture implements obviously the recommendations for:
• The logical separation of the network into two virtual LANs (VLANs), namely a
voice VLAN and a data VLAN
• The filtering between the voice VLAN and the data VLAN through an application-
layer firewall
• A DHCP server and other servers (AAA server…) dedicated to the voice network
• The use of a VoIP-aware firewall at the perimeter of the enterprise network.
• The support of wireless VLANs to “place” wireless devices in the right VLAN
according to their nature
• The installation of a failure VoIP media/signalling gateway, in case the primary
connection to the Service Provider fails
In this architecture, it has been chosen to install a VoIP-aware firewall to augment the
traditional data firewall by adopting a so-called “DMZ configuration”. As explained in [106], it is
possible to choose three different configurations for the installation of a VoIP-aware firewall: in a
configuration “in series”, the VoIP-aware firewall is placed behind the data firewall; in a
configuration “in parallel”, the VoIP-aware firewall receives all the voice traffic already filtered by
the data firewall and in the “DMZ configuration”, an edge router routes VoIP traffic to the
VoIP-aware firewall and data traffic to the data firewall. The tunnelled traffic is routed to a VPN
termination device which can route incoming decapsulated traffic to the data or VoIP-aware
firewalls according to the nature of the traffic.
In a highly secured architecture, traffic flow between the voice and data VLANs should
not be possible. However, in order from enterprises to reap the benefits of services like
messaging services or other new convergent services which have nowadays become indispensable
in corporate environments and which contribute to the productivity of employees, traffic flow
between data and voice VLANs must be enabled. Examples of these traffic flows have been
represented in Table 7.3.
Traffic flow Data VLAN Voice VLAN
Data VLAN - calls from softphones residing on

computers to call IP phones
√
- connections between the email server
and the voicemail server
Voice VLAN - connections between the
voicemail server and the email √
server
Table 7.3 – Examples of traffic flows between data VLAN and voice VLAN
However, the traffic flow between both VLANs should be filtered and controlled at the
application-layer to provide a medium level of security; for a minimum level, a stateful layer 3&4
firewall could be configured to block all protocols except those required for VoIP. However, the
use of an application-layer firewall is highly recommended for enterprises which should not only
aim at the minimum security but better at a medium level of security. In the architecture
suggested in Figure 7.1, on the one hand, the data VLAN includes all computers, with softphones
installed or not, as well as all data servers, and on the other hand, the voice VLAN includes all IP
phones as well as VoIP servers; both VLANs are separated by an application-layer firewall.
118
Figure 7.1 – Suggestion 1: Creation of a data and voice VLAN for a secure VoIP deployment in a small
enterprise
This simplistic way to secure the enterprise VoIP network provides some security but
does not resolve all problems. The following remain:
• Levelling of security level requirements: This architecture does not take into account the
different requirements for security level of IP phones; all IP phones are placed on the
same broadcast domain and it is easy for internal attackers to covertly eavesdrop on calls
of colleagues. For example, an internal malicious employee of the HR department can
eavesdrop on all the calls from the Direction department. As mentioned in the Security
Requirements in 5.4.1.1, calls to and from the Direction department should have a high
security level whereas other departments a medium one. By the suggested architecture,
this is not achieved.
• Lack of security for VoIP servers: VoIP servers have been placed on the same broadcast
domain as IP phones. This leaves the door wide open for attacks against VoIP servers at
the layer-2 level such as man-in-the-middle attacks, eavesdropping on traffic between IP
phones and VoIP servers or even DoS attacks, all of them originating from an attacker
supplied with a laptop connected to the voice VLAN is there is no strong authentication
mechanisms for access control.
• Messaging and convergent services: Servers providing messaging and convergent services
fulfil roles on both data and voice VLANs and therefore should have access to both. This
is one of the reasons that traffic flow between both VLANs should be possible, under the
condition to be filtered at the application-level. To solve these issues with this kind of
servers, a sort of “Demilitarized Zone” should be created to place them, behind the
application-layer firewall separating voice and data VLANs.
• Softphones: Computers with softphones sit on the data VLAN; if a virus or other
malware attacks the voice application, the whole data network can also get affected.
For these reasons, let’s see now a second improved suggestion for secure VoIP architecture.
119
Suggestion 2
To solve some of the above mentioned problems, some modifications to this architecture
could be made, as illustrated in Figure 7.2:
• Place all public servers in the Demilitarized Zone (DMZ); separate public VoIP
servers (IP-PBX core) and public data servers (Mail server, Web server…) into two
different VLANs. Public VoIP servers are thus accessible by incoming calls from the
outside of the enterprise as well as by incoming calls placed from the premises of the
company.
• Place servers fulfilling internal purposes (DHCP server, AAA server…) within the
Intranet part and separate data servers and VoIP servers into two separate VLANs.
• As for convergent services, place servers providing them in a dedicated “DMZ”,
accessible from data and voice VLANs (but not from the outside). DMZ in this sense
is used to designate a zone which is accessible by traffic coming from at least two
different VLANs.
• Keep the concept of voice and data VLANs, but to avoid levelling of security level
requirements, a special voice VLAN could be created for IP phones located in the
Direction department and another one for IP phones located in the IT and HR
departments. Similarly, two data VLANs could also be created for the same reasons.
• Since the Direction department should have a high security level, softphones should
not be installed and run on computers connected to the data VLAN of this
department (data VLAN 401). They could be accepted in the data VLAN of the IT
and HR departments, which makes it, however, vulnerable to VoIP attacks against
softphones and brings security at a minimum acceptable level.
Figure 7.2 – Suggestion 2: Installation of VoIP servers in the DMZ in a separate VLAN as this of data servers for a
secure VoIP deployment in a small enterprise
120
In this solution, softphones still constitute a high security problem even placed in a data
VLAN. Let’s see a third and last suggestion to secure this VoIP system in a small enterprise.
Suggestion 3
A last suggestion, which is considered the best in comparison with the others, has been
represented in Figure 7.3.
Figure 7.3 – Suggestion 3: Creation of multiple DMZs for a secure VoIP deployment in a small enterprise
This architecture implements:

• The separation of data and voice networks
• The creation of a VLAN for VoIP servers separate from VLANs for IP phones
• The separation with VLANs of IP phones with different requirements for security
• The creation of two virtual DMZs: one for voice and one for data. The IP-PBX is placed
just after the VoIP-aware firewall (for incoming calls).
• The separation with VLANs of computers with softphones and computers without
softphones
• The control of traffic flows between VLANs through an application-layer firewall. The
traffic control between voice VLANs could be also achieved only with packet filtering
routers but the traffic control between voice and data VLANs can only be achieved
through application-layer firewalls.
• The creation of a “DMZ” hosting all messaging servers and other convergent services
which can be accessed by the VoIP servers, data servers, computers with or without
softphones and IP phones.
• DHCP, AAA, DNS … servers dedicated to VoIP
121
• The support of wireless VLANs to “place” wireless devices in the right VLAN according
to their nature. The use of wireless VoIP should be banned from areas of high security
level; however, due to the popularity of wireless VoIP, members of the Direction
department may require the use of VoWLAN in their office. In that case, the use of
WPA2 encryption is required and the creation of wireless VLANs is compulsory: Access
points should support wireless VLANs, i.e. they should be configured to map several
LANs. When multiple VLANs are enabled on a switch, several Service Set Identifiers (SSID),
i.e. several “network names” differentiating WLANs from other WLANs, should be
created.
Traffic flow Data Data
Data
IP phones IP phones
VoIP
Softphones
Messaging/
servers servers unified servers
VLAN 1 VLAN 2 VLAN 1 VLAN 2 VLAN
VLAN VLAN DMZ
Data
Allowed Allowed Allowed Not allowed Not allowed Not allowed Allowed Allowed (1)
VLAN 1
Data
VLAN 2
Data servers
VLAN
IP phones
Not allowed Not allowed Not allowed Allowed Allowed Allowed Allowed Allowed (1)
VLAN 1
IP phones
VLAN 2
VoIP servers
VLAN
Softphones
Allowed Allowed Allowed Allowed Allowed Allowed Allowed Allowed
VLAN
Messaging/
unified servers Allowed (1) Allowed (1) Allowed Allowed (2) Allowed (1) Allowed Allowed Allowed
DMZ
(1) ex: Synchronisation of address books (2) ex: Connection between email and voicemail servers
Table 7.4 – Regulation of traffic flow between VLANs and DMZs
To make this architecture in Figure 7.3 more secure, redundant VoIP servers should be
installed, but they should not be connected to the same switch as the others to avoid creating a
single-point of failure. These redundant VoIP servers have not been represented here. Besides, a
management VLAN has not been represented but should be created.
The above traffic flow control table, Table 7.4, represents which are the traffic types which
are allowed to enter a VLAN or a DMZ. This table shows that the creation of a DMZ for
messaging and unified services ensures that no traffic
between any IP phones VLAN and any data VLANs
(for example, for the synchronization of PDAs and IP
phones with the same address book), or between VoIP
servers and data servers (for example, for connections
between voicemail servers and email servers) is needed,
and thus, that no traffic flowing between these VLANs
should be allowed. For example, if it is wished that an
IP phone and a PDA synchronize with a personal
address book, the two devices should not synchronize
directly but should access the address book from a
directory server located in the Convergent services
DMZ (see Figure 7.4). This directory server acts as a
gateway between these devices and provides Figure 7.4 – Synchronization with personal
address book (source: [122])
authentication and authorization.
In the architecture of Figure 7.3, it has been
chosen to allow softphones on computers but only in areas where there are no high requirements
122
for security level, i.e. in the IT and HR departments and to ban them from the Direction
department. However, a new VLAN has been created to place computers with softphones.
Computers with softphones could have been placed in data VLAN 2 as well; however, in order
to contain the traffic susceptible to present risks, a separate VLAN is needed. If all computers
would have been placed in data VLAN 2, malware infecting the softphone on a computer could
gain control of the infected computer but probably affect other computers without softphones.
Area Recommendation
Perimeter • Use a separate VoIP-aware firewall to augment the traditional data firewall; place the firewall in a
“DMZ configuration”
VoIP • Make sure that the physical protection of VoIP servers is adequate (see more details in Table 10.6)
servers • A DHCP server would be dedicated for VoIP to allocate IP addresses to IP phones
• There should be a redundant IP-PBX and DHCP server; pay attention not to create a single point of
failure
• Public VoIP servers like IP-PBX, SIP servers… placed in a DMZ just after the external VoIP-
aware firewall
• Non-public VoIP servers like DHCP, DNS and AAA servers… placed in their own VLAN
• Create a “demilitarized zone” (“DMZ”) for convergent services like voicemail server and unified
messaging server
• Non-public VoIP servers VLAN should be protected from other VLANs by application-level
firewalls. Make sure that firewalls are also VLAN-aware firewalls (so that policies can be created
relying on the 802.1Q tags that specify the membership of packets to a particular VLAN)
• Deactivate in the IP-PBX the automatic registration of IP phones
• Limit number of failed registrations from IP phones and keep track of them (to detect registration
hijacking for example)
• Perform regular backup of VoIP servers
• Replace any default passwords on VoIP servers or voice mail servers
IP phones • Ensure that all IP phones support VLAN technology
• Ensure that IP phones support 802.1x authentication
• Replace any default passwords on all IP phones
• Place IP phones in their own VLAN and create different IP phone VLANs if some IP phones
require higher level of security.
• IP phones VLANs should be protected from other VLANs by application-level firewalls. Make
sure that firewalls are also VLAN-aware firewalls
• Ensure that IP phones are supplied with Power over Ethernet (PoE)
• Multi-port Ethernet switches should be disabled from IP phones, without exception
• Make sure, at least for specific users whose role is important in the enterprise, that their IP phones
support encryption: TLS with sips addressing scheme for the signalling and SRTP with SDES key
management protocol. Make sure that is SDES is used, encryption of signalling is enabled.
• Ensure that IP phones use at least the HTTP Digest authentication or another stronger authentication
at the application-layer.
Soft- • If high security is required, prohibit softphones on any computer.
phones • Allow softphones only where there is no requirement for high security level; in that case, place computers
with softphones in their own separate VLAN.
• Softphones VLANs should be protected from other VLANs by application-level firewalls. Make
sure that firewalls are also VLAN-aware firewalls
Wireless • The use of wireless VoIP should be banned from areas of high security level; however, needed, it could
WLAN be accepted only if WPA2 encryption is used and wireless VLANs are created.
• Configure access points to require 802.1x authentication from the wireless devices which connect to the
voice network
Table 7.5 – Suggestion of a few important recommendations for the VoIP system of a small enterprise
123
To conclude with the secure VoIP deployment in the small enterprise, above is briefly
presented a list of recommendations in Table 7.5, classified according to the areas to be secured
shown in Figure 5.3, that should be applied in this particular case, without pretending to be
exhaustive. About the network-level security measures, it would be better to refer to other
documents like the BSI or NSA reports.
7.3.2. Recommended secure VoIP architecture for a large enterprise

After having seen some suggestions for securing the VoIP deployment of a small single-site
enterprise, it would be interesting to focus on how to achieve the same aim for the VoIP
deployment in the model of large multi-site enterprise presented in the previous chapter and
whose areas to be secured have been illustrated in Figure 5.4 and whose requirements have seen
defined in Chapter 5. As already mentioned in the previous subchapter, many solutions can be
adopted and many different kinds of architecture can be thought about, depending on the
requirements in security level but not only. The implementation of solutions should be tailored
to the needs of the enterprise concerning security level but also should take into account the
design and the vulnerabilities of the underlying system, the services that should be supported, the
cost of investment, etc.
In Figure 7.5, a logical architecture of a secure VoIP deployment in the studied model has
been suggested. This architecture implements some measures similar to those applied to the small
enterprise’s VoIP system:
• The installation of a VoIP-aware firewall in “DMZ configuration” (see 7.3.1) at the
perimeter of the enterprise network
• The separation of data and voice networks in the headquarters as well as in the
interconnected branch office
• The creation of a DMZ hosting public VoIP servers like IP-PBX (SIP Proxy Server…)
separating these entities from IP phones and non-public VoIP servers
• The creation of a VLAN for non-public VoIP servers within the intranet
• The separation with VLANs of IP phones with different requirements for security
• The creation of a DMZ hosting public data servers (Web servers, Email servers…)
separating them from computers and non-public data servers
• The creation of a VLAN for non-public data servers within the intranet
• The separation with VLANs of computers with softphones and computers without
softphones
• The control of traffic flows between VLANs through an application-layer firewall. The
traffic control between voice VLANs could be also achieved only with packet filtering
routers but the traffic control between voice and data VLANs can only be achieved
through application-layer firewalls.
• The creation of a “DMZ” hosting all messaging servers and other convergent services
which can be accessed by the VoIP servers, data servers, computers with or without
softphones and IP phones.
• The support of wireless VLANs to “place” wireless devices in the right VLAN according
to their nature. The use of wireless VoIP should be banned from areas of high security
level; however, due to the popularity of wireless VoIP, it has been chosen to install
wireless access points even in these areas (ex:: Board of Directors offices). However, the
use of WPA2 encryption and the creation of wireless VLANs are required
124
Some additional measures have been implemented:
• The creation of a management VLAN including all computers used by administrators to
manage remotely all VoIP devices of the network
• The development of a Network Intrusion Detection System (NIDS) by placing sensors
on all switches of the network to filter the network traffic and detect suspect connections.
• The creation of a DMZ dedicated to VoIP media/signalling gateways
• The protection of voice traffic to and from WAN links (in this model, they are dedicated
leased lines) connecting the headquarters to private networks of customers by filtering
voice traffic with application-level firewalls collocated with the WAN access routers.
The separation of voice and data traffic should be conserved
• The creation of Virtual Private Networks (VPNs) over the Internet, connecting the
headquarters to the branch office or to mobile workers.
In Figure 7.5, redundancy which is fundamental in large enterprises has not been represented
for reasons of clarity of representation. However, this measure should not be overlooked because
it contributes to the stability of availability in the VoIP network.
Table 7.6 represents the additional recommendations to be implement in the large enterprise
network, in addition to these already presented for a small enterprise in the previous chapter
which still apply in this model.
Area Recommendation
Perimeter • Ensure that all VoIP traffic circulating between the different enterprise sites over public
networks like the Internet has been encrypted. For that, ensure that voice traffic is tunnelled
through the VPNs interconnecting sites.
• Ensure that the separation of voice and data traffic is maintained over VPNs
• Place VoIP-aware firewall at the enterprise network perimeter to supplement the
traditional data firewall. Use preferably the “DMZ configuration”.
Internal • Develop a Network Intrusion Detection System in the enterprise network
enterprise
network
Interface with • Creation of a VLAN where to place VoIP media/signalling gateway
circuit- • VoIP media/signalling gateways should require authentication before completing calls
switched • The only traffic flow which should be allowed to and from the VoIP media/signalling
networks gateway is between the VoIP gateway VLAN and the VoIP servers VLAN
• VoIP media/signalling gateways must validate and terminate all PSTN/ISDN
signalling at the gateway
WAN access • Install VoIP-aware firewalls on the WAN access routers
• Ensure that the separation of data and voice traffic is maintained on the dedicated leased
lines
Table 7.6 – Suggestion of a few important recommendations for the VoIP system of a large enterprise
125
Figure 7.5 – Suggestion for a secure VoIP deployment in a large multi-site enterprise
126
To introduce which security measures could be taken for the protection of an enterprise
VoIP system, the current chapter has first presented a selection of sets of recommendations
published by four major international institutions, namely the American National Security Agency
(NSA), National Institute of Standards and Technology (NIST) and Defense Information Systems Agency
(DISA) and the German Federal Office for Security in Information Technology (BSI). Such a selective
choice has aimed at helping enterprises to discern the best recommendations sets among a
multitude. A comparison of recommendations, targeting at being exhaustive, has been performed
and commented. One of the difficulties to establish a comparison table (see Annex 4) was to deal
with the different security approaches as explained earlier but also to cope with the different
technical jargons used by the several institutions. Besides, it has not always been an easy task to
determine the security level corresponding to every recommendation or to determine if security
recommendations from different sets really matched and meant the same purpose. From this
comparison table, several common points and points of divergence have been derived, listed,
described and commented. Common recommendations should be considered by enterprises as
highly required for the security of their VoIP system. The main points of contradiction between
recommendations were concerning IP phones with multi-port Ethernet switches, the subdivision
of the voice VLAN, the use of softphones, the configuration of IP phones at the terminal, and
the automatic registration of IP phones with VoIP servers. Therefore, these should be regarded
as crucial points that enterprises should not overlook but take seriously into consideration before
deploying VoIP. The comparison table has shown that there is a great number of measures to
take for VoIP security but this does not imply that all of them should be applied. The choice of
the security measures to apply should depend on the results obtained from the threat and risk
analyses performed specifically for a given enterprise, the enterprise’s unique network
configuration and architecture, the enterprise’s objectives…Thus, the security choices for VoIP
deployment should be tailored to and fit the needs of the enterprise.
In a second part in this chapter, a few suggestions of secure architecture have been
presented for the VoIP deployment in the model of small enterprise and in that of a large one.
These are not supposed to be THE true secure solutions for these models but to show that there
is a plurality of ways to apply security recommendations but that all security implementations
should always be made with security requirements in mind. Besides, the creation of a table as
Table 7.4, summarizing the traffic flows which are allowed to and from the VoIP network, or
even to and from sub-networks of the VoIP network, like VoIP servers VLAN, IP phones
VLAN, VoIP gateways VLAN, softphones VLAN and others, is a good method contributing to
the constitution of a secure VoIP architecture.
127
8. Security concepts for SIP mobility in hosted VoIP
deployments
8.1. Setting the problem of hosted VoIP and mobility support

In order to reap the benefits of a VoIP deployment while remaining focused on the
business mission, small enterprises increasingly have resort to outsourced solutions like hosted
IP-PBX which, as detailed in Chapter 3, refers to the installation and management of an IP-
PBX in the premises of a VoIP Service Provider. The success of this VoIP solution among
small enterprises is due to the fact that these companies have limited personnel and resources
and are reluctant to develop an in-house costly expertise specific to VoIP which is not their
main mission. This explains the battle between IP-PBX vendors and hosted IP-PBX in the
market of enterprise VoIP, in particular destined to small enterprises. However, more and
more vendors associate with VoIP Service Providers in order to provide better interoperability
and management between sites.
In a hosted IP-PBX, all SIP signalling has to pass through the IP-PBX located in the site
of the VoIP Service Provider. So, when an employee of a small enterprise wants to call a
colleague in the same enterprise, the SIP call signalling has to be routed out of the enterprise
network and then into this network to reach the destination SIP User Agent; in contrast, the
media stream will be routed within the enterprise network. Thus, all signalling streams travelling
between the enterprise and its VoIP Service Provider must be somehow secured, otherwise it
would be possible for an external attacker to capture traffic and be aware of the calling parties
of all calls, even internal calls, which could seriously jeopardize any company.
The issue that will be studied in this last chapter is this of the adoption of a hosted IP-
PBX solution in combination with the support of external mobile workers, as illustrated in
Figure 8.1. Indeed, a few VoIP Service Providers have started to offer to enterprises services
Figure 8.1 – Small enterprise with hosted IP-PBX and supporting external mobile workers
128
like hosting an IP-PBX for them in parallel with supporting external mobile workers. This
offering is the response to the increasing need of some small enterprises having mobile workers
moving outside of their premises regularly on professional purposes and not only occasionally.
However, this service is quite new and little documented in particular about its security aspects.
Generally, VoIP Service Providers do not publish their methods of securing hosted IP-PBX
architectures with support of mobility and it is incumbent on enterprises seeking this service to
ask appropriate questions to VoIP Service Providers. Some questions to Service Providers
about the security of their hosted services have been listed by M. Collier in [82] but without
taking into account the security issues arising with the support of mobility. This is why M.
Collier’s questions have been presented in Table 8.1 and then, in Table 8.2, my suggestions for
the crucial points to clear up about security related to mobile workers have been presented.
What steps are taken to protect IP PBXs, application servers, and VoIP media/signalling
gateways? What can you expect if these servers are attacked?
Since IP phones/softphones must be visible to the network, what steps are taken to
prevent them from being attacked?
How is NAT performed? Is it performed by a device on the enterprise site or at the Service
Provider, via far-end NAT traversal?
Assuming that SIP is used, what security is provided against attacks such as Denial of
Service (DoS), registration hijacking, and other SIP-specific attacks?
What process is used to patch IP phones and other components when vulnerabilities are
identified?
Are strong authentication and encryption technologies, such the Transport Layer Security
(TLS) and Secure Real-time Protocol (SRTP), available? If not, are they planned and
supported on the IP phones being deployed now?
How secure are all components against an attack that originates from within the enterprise
network? What configuration is recommended for enterprise switches, routers, firewalls,
and so forth?
Are more secure configurations or components available?
Table 8.1 – Questions destined to VoIP Service Providers about the security of their hosted IP-PBX services
(according to [82])
Which are the security mechanisms used so that to ensure that mobile workers have a
secure access to the enterprise VoIP service?
What processes are used to protect the enterprise network in case where the laptop of a
mobile worker has been stolen?
How is it possible to control where mobile workers are? Within the enterprise premises or
on the road?
Which are the authentication and authorisation mechanisms used to identify properly
mobile workers?
Which is the survivability plan in case that the Internet connectivity goes down and that the
VoIP service is no longer available? What about mobile workers?
Which types of Virtual Private Networks (VPNs) are used? Who has to manage them?
Which quality of service (QoS) of voice will be ensured to the mobile workers?
Table 8.2 – Suggestion of questions destined to VoIP Service Providers about the security of their hosted
IP-PBX services combined with support of mobile workers
129
These questions are of high value for enterprises wishing to adopt a hosted IP-PBX
solution during the process of choice of a VoIP Service Provider because they allow them to get
the right information about the security aspects of the offered hosted solutions. Enterprises must
make sure that the level of security that VoIP Service Providers offer is adapted to their business;
for example, enterprises in financial services, healthcare, banking… have high security
expectations. In hosted VoIP solutions, this is the responsibility of their VoIP Service Provider
to install, configure, manage and secure the VoIP architecture and infrastructure. Therefore, the
only resistance that customer enterprises have is to be aware of the potential security risks of
hosted IP-PBX deployments and to be able to understand the security approach of several VoIP
Service Providers before deciding which one to choose to cooperate with. Enterprises are
certainly not able to exert control over the VoIP Service Provider selection of components,
architecture, and configuration and change them… but they are at least free to knowingly choose
their VoIP Service Provider. Making the right choice is crucial because it is very difficult for an
enterprise to change VoIP Service Provider and hosted solution once VoIP has already been
deployed [29].
In the next sub-chapter, two scenarios of hosted IP-PBX solution with mobility support
will be modelled and designed and will strive to give an answer to the questions presented in
Table 8.2. Then, these two scenarios will be compared in 8.3 .
8.2. Designing and modelling two scenarios of secure hosted IP-

PBX solutions with mobility support
8.2.1. Introduction
A small enterprise decides to opt for a hosted IP-PBX solution and wishes to support
mobile workers outside of its premises. Figure 8.2 illustrates a suggestion for a hosted IP-PBX
solution that could be adopted. This figure does not represent how to support mobile workers,
since two scenarios for mobility support will be later presented, in 8.2.2 and 8.2.3.
Figure 8.2 – Suggestion for a hosted VoIP architecture adopted by a small enterprise with mobile workers
130
The security issues in hosted IP-PBX solutions are different from that in in-house IP-PBX
deployments. The internal attacks can be similar in both solutions but there are additional
external attacks in hosted IP-PBX solutions. This is due to the fact that all SIP signalling has to
be routed out of the enterprise network even the signalling of internal calls; this could give the
opportunity to external attackers to eavesdrop on internal communications signalling but not
necessarily media streams.
In Figure 8.2, some security measures for hosted IP-PBX solutions have been represented.
These are the following:
Dedicated path between the enterprise network and the VoIP Service Provider offering
the hosted service. This dedicated path could be preferably a layer-2 MPLS VPN
guaranteeing a high bandwidth and thus a high QoS between the two networks. He
enterprise customer could benefit from a Service Level Agreement (SLA) ensured by the
VoIP Service Provider for its voice traffic. The choice of layer 2 MPLS VPN could also
be justified by the fact that the route between both networks is static and thus this type
of VPN is easier to manage for enterprise customers as well as for VoIP Service
Providers. However, layer 2 MPLS VPNs do not ensure confidentiality since there is no
encryption. This is why, for high security, the traffic travelling in these VPNs could be
encrypted at layer 3 with IPsec.
Besides MPLS VPNs, it is also possible to create IPsec VPNs, providing encryption,
between the VoIP Service Provider and the enterprise customer. Nevertheless, this has
the disadvantage to require the VPN-enhancement of the Customer Premise Equipment
(CPE) in order to support IPsec, but also to introduce additional large overhead to
packets.
In the following scenarios, MPLS VPNs will be assumed for the connection between
the enterprise customer and the VoIP Service Provider.
Two external public IP addresses are required for the enterprise: one public address
obtained from the VoIP Service Provider for a secure link dedicated to VoIP traffic
travelling between the VoIP Service Provider and the enterprise, and one public address
obtained from an Internet Service Provider for the rest of communications from the
Internet. The Enterprise edge router should then have at least two public Network
Interface Controller (NIC) cards.
The installation of a survivability IP-PBX and a failure VoIP media/signalling
gateway connecting directly to the PSTN/ISDN networks. These elements are the
response brought to one of the questions of Table 8.2 about the survivability plan in case
that the dedicated VoIP link to the VoIP Service Provider falls down. In that case, all
calls would be processed by the survivable IP-PBX and external calls would be routed
out of the enterprise through the VoIP media/signalling gateway. One of the
vulnerabilities of hosted IP-PBX solutions is the strong dependence of the enterprise
VoIP service on the Internet connectivity since call management and processing are
performed outside of the enterprise network.
One or more VPN gateways installed at the VoIP Service Provider site allow authorized
and authenticated people form the Internet to connect to the managed network of the
VoIP Service Provider. That way, this could be possible for remote workers to have
access to the enterprise VoIP service.
A VoIP-aware firewall/NAT router is located at the edge of the VoIP Service Provider’s
network. This is used to filter SIP calls destined to the enterprise customer. It is assumed
that the filtering rules are the same for all enterprises which benefit from the hosted IP-
PBX service of the VoIP Provider. However, certainly for legal reasons, the filtering rules
must not be too restrictive because of legal principles of telecommunications.
Unified Messaging and voice mail servers are located in the premises of the VoIP
Service Provider; that way, the customer enterprise does not have to handle the
131
maintenance and updates of these servers since it is in charge of the VoIP Service
Provider.
Even it has not been represented in Figure 8.2, the VoIP Service Provider could play the
role of a VPN Provider as well, managing the interconnection via VoIP VPNs of the enterprise
with its remote sites besides managing the VoIP service.
After these general considerations about the security of hosted IP-PBX solutions, let’s see
how the enterprise can be able to support mobile workers and how security could be ensured.
Two suggestions have been made under the form of two scenarios.
8.2.2. Scenario 1: Mobile workers connected via VPNs to the VoIP

Service Provider network
In the first scenario, illustrated in Figure 8.3, the enterprise can profit from the VPN
services of the VoIP Service Provider in order to support mobility for its external workers.
External mobile workers can connect, in this scenario, via a VPN to the VPN gateway of
the VoIP Service Provider’s network and that way, they can have access to the VoIP services of
their enterprise. This VPN will be assumed to be an IPsec VPN, since IPsec VPNs are well
adapted to remote access to the enterprise by teleworkers or mobile workers [33].
When a mobile worker wants to call an employee located in the premises of the enterprise,
it sends a SIP INVITE message which travels through a VoIP VPN over the untrusted Internet.
As illustrated in Figure 8.3, the SIP INVITE request arrives at the VPN Gateway located at the
edge of the VoIP Provider’s network and is forwarded to the hosted IP-PBX. The latter
processes this message and sends it to the enterprise network over the L2 MPLS VPN existing
between the VoIP Provider and the enterprise. The SIP INVITE message is therefore sent to the
public interface of the enterprise’s edge router dedicated to VoIP. There is no necessity for a
VoIP-aware firewall filtering SIP messages on this interface, since there is a relationship of trust
between the enterprise and the VoIP Service Provider and the VoIP traffic on this link has
already been filtered by the VoIP Provider. It could be difficult and dangerous to doubt of the
honesty and professionalism of large VoIP Service Providers and suspect them for launching
attacks against the VoIP systems of their customers; however, in the case of unknown or new
VoIP Service Providers, it is important to be vigilant and get more information about their
background, security mechanisms… The main attacks that VoIP Service Providers could
perform against their enterprise clients would be to intercept and divulge information,
unbeknownst to enterprises, to third-parties for example (whether or not these third parties have
any governmental authority – lawful interceptions are allowed in some countries).
On Figure 8.4 which represents some call flows in the case of scenario 1, one can see
how a user of the PSTN network can reach a mobile worker without knowing whether this
worker is in his office or on the road. The signalling of the TDM call arrives at the VoIP Service
Provider’s PSTN media/signalling gateway and is translated into a SIP INVITE message
forwarded to the hosted IP-PBX. The hosted IP-PBX processes this message and modifies in the
SIP message the destination SIP URI by specifying in the domain name part the current IP
address or domain name of the mobile worker. Then, the hosted IP-PBX forwards this SIP
message to the VPN gateway and this message travels through the VoIP VPN to reach the
mobile worker.
How the IP-PBX knows where to forward the SIP INVITE messages destined to mobile
workers? This question has been previously seen in paragraph 4.3.7 about SIP-based mobility and
there are several ways for the IP-PBX to proceed depending on how it has been configured.
132
Figure 8.3 – Scenario 1: Mobile workers connected to the VoIP Service Provider via a VPN
For example, the IP-PBX might not know where the mobile worker is currently, so it “forks” any
SIP INVITE message destined to the mobile worker to all the SIP URIs which are registered
with the SIP Registrar. The mobile worker could either be in the enterprise (in that case the
registered SIP URI has the same domain address as the enterprise) or outside of the enterprise
connected to the VoIP Provider network via a VPN (in that case, the registered SIP URI has a
domain address which is the same as of the VoIP Service Provider). In Figure 8.4, an Internet
user placed a call from its softphone destined to the mobile worker. The SIP INVITE message
traverses the Internet and after been filtered by the VoIP-aware firewall installed on the edge
router of the VoIP Service Provider’s network, it reaches, if accepted, the hosted IP-PBX. The
latter can be configured so that, not knowing exactly where the mobile worker currently is, on the
road or at his office, it forwards the SIP INVITE message to two locations, according to the SIP
URIs registered in the collocated (in this scenario) SIP Registrar. This forking has been
represented in this figure with yellow arrows.
At any case, for the mobile worker to be always reachable while on the road, it has to
regularly register with the hosted SIP Registrar dedicated the enterprise, at least each time it
changes domain (see types of mobility in 4.3.7).
Let’s now see a second scenario for mobility support and hosted IP-PBX solution.
133
Figure 8.4 – Establishment of calls in scenario 1
134
8.2.3. Scenario 2: Mobile workers connected via VPNs to the
enterprise network
In this second scenario, illustrated in Figure 8.5, the enterprise does not profit from the
VPN services of its VoIP Service Provider but from these of its Internet Service Provider.
Mobile workers connect directly to the enterprise via a VPN which is used to transport data as
well as voice. The VPN terminates to the VPN Gateway of the enterprise.
Figure 8.5 – Scenario 2: Mobile workers connected to the enterprise network via VPNs
As shown in this figure, when a mobile worker wants to call an employee located in the
premises of the enterprise, he sends a SIP INVITE message which travels through a VPN over
the public Internet. The SIP INVITE message reaches then the enterprise VPN Gateway. The
latter, seeing that this is a SIP message, forwards it to the enterprise Edge Router, which routes it
out over its VoIP-dedicated interface to the VoIP Service Provider’s network. The SIP INVITE
message is forwarded to the hosted IP-PBX which processes it and then forwards it to the right
destination.
What happens when calls are destined to the external mobile worker? Since the mobile
worker connects directly to the enterprise network via a VPN, he is considered by the enterprise
as an internal user located in the premises of the enterprise. Besides, from the outside of the
enterprise network, it is impossible to define whether the mobile worker is currently in his office
or on the road out of the enterprise. The connection of the mobile worker via a VPN allows him
to be reached from the enterprise at the private address he usually has when he is connected to
the enterprise network.
The major thing is that it is fundamental for the enterprise network to know whether a
mobile worker is directly connected to the enterprise network or connected remotely via a VPN.
Therefore, there should be an entity in the enterprise network which should keep track of which
are the mobile workers currently outside of the enterprise. In this scenario, this entity will be
considered to be the VPN Gateway. The VPN Gateway keeps track in a table of the associations
between the internal private IP addresses the external mobile workers and the current public IP
address. The internal IP addresses would serve to route traffic to the mobile worker as if he were
in the enterprise premises, while the public IP addresses are useful for the creation of the VPN.
135
Figure 8.6 – Establishment of calls between external users and a mobile worker in scenario 2
136
Once this has been said, that would be interesting to see how calls can be established
between external users and an external mobile worker.
Since the mobile worker is still in the same domain as the enterprise even remotely, he
can keep the same SIP URI as if he were at his office. This is why, when the mobile worker’s SIP
Client registers with the hosted IP-PBX, the registration SIP URI remains the same with the
same domain part independently of the mobile worker’s location. This explains that the hosted
IP-PBX cannot be aware of the current location of the mobile worker; for the hosted IP-PBX,
the mobile worker is directly connected to the enterprise network.
As illustrated in Figure 8.6, a user connected to the PSTN network or to the public
Internet can call an external mobile worker without knowing where he is at that moment. When a
user on the Internet wants to call a mobile worker, its SIP Client sends a SIP INVITE message to
the VoIP Service Provider network; if this message passes successfully through the VoIP-aware
firewall, it is forwarded to the hosted IP-PBX. The hosted IP-PBX processes the message and
sends the SIP message to the registered SIP URI of the mobile worker. The SIP URI contains
the domain IP address or name of the enterprise, so the message is forwarded to the VoIP-
dedicated interface of the enterprise edge router. This message is received by the VPN Gateway
which recognizes in its table an association between the private IP address to which this message
is destined and a public IP address. This means that the mobile worker is outside of the
enterprise. The VPN Gateway then forwards the SIP INVITE message through the VPN to the
mobile worker. That way, the call can be established.
What happens now if a call has to be established between two external mobile workers?
As shown in Figure 8.7, two VPNs have to be created to each of the mobile workers. External
Mobile Worker 1 wants to call the External Mobile Worker 2. The SIP Client on his laptop sends
a SIP INVITE request to the VPN Gateway; on receipt, the VPN Gateway cannot decide to
send directly the SIP INVITE message to the External Mobile Worker 2, in particular for call
management and accounting/billing reasons. The message must be first forwarded to the hosted
IP-PBX and then sent back to the VPN Gateway. The latter will then forward the SIP message
through a VPN to the mobile worker.
Figure 8.7 – Establishment of a call between two mobile workers in scenario 2
137
Once these two scenarios of hosted IP-PBX VoIP deployment with support of mobile
workers have been presented, a comparison between them should be interesting, in order to
determine which solution is the most adapted for small enterprises from the security point of
view.
8.3. Comparing scenarios from the security point of view

Even if achieving the same goal of providing secure VoIP to external mobile workers in
hosted IP-PBX solutions, the two above presented scenarios present some dissimilarity in
particular about the security level or about the influence of security choices on other factors.
Let’s focus the comparison in four points:
The perception of mobile workers by the hosted IP-PBX: In scenario 1, external mobile
workers are seen by the hosted IP-PBX as external, whereas in scenario 2, they are
perceived as internal workers. Since in scenario 2, the IP-PBX cannot distinguish
between enterprise users depending on their location, the VoIP services offered to
external mobile workers are the same as those to users located within the enterprise.
This presents a risk in so far that if the laptop of the mobile worker is stolen, the thief
can easily have access to the complete range of VoIP services offered by the hosted IP-
PBX to enterprise users at their office. This renders the attacker all powerful. In
contrast, in scenario 1, since the hosted IP-PBX is aware of the location of the mobile
worker, it can restrict his access to VoIP services if he is outside of the enterprise.
From that point of view, scenario 1 seems much better because it reduces the impact of
remote attacks against the enterprise VoIP system in case of the theft of laptop.
Obviously, a concept could be found in scenario 2 in order to palliate this problem: for
example, the enterprise it self could set some privilege classes for its workers according
to its location. This could probably be achieved by having the VPN Gateway, which
keeps records of the external mobile workers, make the IP-PBX aware of these
privileges attributed by the enterprise.
The resolution of the laptop theft security issue: The biggest security with the support
of mobility outside of the enterprise premises is that of the theft of a mobile worker’s
device, such as a laptop. This issue can be disastrous to the enterprise in so far that the
enterprise is vulnerable to any attack launched from this laptop which is considered as
an internal user. When theft occurs, one solution is to block the access to the VoIP
service from the stolen device. In scenario 2, the enterprise personnel itself has to
configure the VPN Gateway to block the stolen device from remote access via VPN to
the enterprise. For example, the privilege class for this mobile device could be
“banned”. However, in scenario 1, this is the responsibility of the enterprise, as soon as
it is aware of the theft, to inform the VoIP Service Provider to block any remote access
from the stolen device. From the enterprise’s perspective, scenario 1 should be easier.
Quality of Service: It is difficult to estimate the impact of the security measures taken in
both scenarios on quality of services. However, it seems that for calls between external
mobile workers, scenario 1 seems slightly better to the extent that in this scenario,
signalling has to travel from the caller to the VoIP service provider’s network to reach
the hosted IP-PBX and then directly from this network to the called party. In contrast,
in scenario 2, signalling between two external mobile workers has to travel through the
enterprise’s network, creating too much signalling traffic. In that case, the quality of
VoIP calls must be (slightly) worse than scenario 1 due to the latency.
DoS against external mobile workers: In both scenarios, it has been assumed that the
VPNs connecting external mobile workers to the VoIP Provider’s network (scenario 1)
or to the enterprise (scenario 2), were IPsec VPNs. However, this presents the
following issue: if mobile workers are unable to build a VPN channel, then the
138
enterprise VoIP service is denied to them. Unfortunately, some visited networks have
such a security policy in place that it is impossible for visiting hosts to build use IPsec
for the creation of VPNs. This can be explained by the fact that some Internet Service
Providers for DSL and cable services implement policing of traffic for residential class
service; so, protocols such as IPsec might be blocked unless the customer subscribes to
business class service. In that case, mobile workers cannot benefit from the VoIP
services of their enterprise. Other types of VPNs could be thought about, in particular
SSL VPNs if they really prove to improve voice quality as claimed in [60].

This last chapter has focused on the security aspects of hosted IP-PBX solutions
supporting mobile workers. The interest of this study is due to the increasing adoption of hosted
VoIP solutions by small and medium enterprises in order to concentrate on their business
mission without spending too much effort on the VoIP deployment in their premises.
Since in hosted IP-PBX solutions, this is the responsibility of the VoIP Service Provider to
ensure security for their solutions, the only resistance of enterprise customers before making their
choice of provider, is to be aware of the potential security issues that could occur and to ask the
right questions about security mechanisms employed by the potential VoIP Service Providers.
When mobility is also supported, additional questions have to be asked; my suggestions for such
questions can be found in Table 8.2.
Generally, hosted VoIP solutions are very shortly described by VoIP Providers without
entering the details of implementation. Mobility support by such solutions is scarcely mentioned,
let alone its architecture. For that reason, in this chapter, two suggestions for mobility support
along with hosted IP-PBX solutions have been modelled and designed. These suggestions can be
found in Figure 8.3 and Figure 8.5. The first figure represents a suggestion for the connection of
mobile workers to the VoIP Service Provider network via a VPN, while the second figure
presents how mobile workers could connect directly to the enterprise network also via VPNs.
Both solutions present differences in their security approaches such as the handling of the
theft issue or the segregation of VoIP services depending on the current location of mobile
workers. If mobile workers are outside of the enterprise premises, they may not have access to all
the enterprise VoIP services as if they were in their home network, because of configuration
reasons. There is not any “best” solution but only solutions more or less adapted to real cases
and depending on the already existing infrastructure in customer enterprises and VoIP Service
Providers.
139
9. Conclusion
To conclude, this master thesis has focused on how enterprises can proceed to the secure
deployment of Voice over IP, overcoming their reluctance and hesitations. Fundamental steps
such as the threat analysis, the risk analysis and the mitigation process have been presented and
discussed and their application has constantly been illustrated by using models of VoIP
deployments. Additionally, VoIP deployment models have been used to conduct thorough threat
and risk analyses (see Annex 1) listing, under the form of attack trees, the goals attackers may
chase and the respective ways to achieve them. Besides, four major sets of security
recommendations for Voice over IP, published by international institutions, have been selected,
analysed, compared and evaluated. Last, since enterprises, in particular small ones, are thought to
adopt in the future preferably hosted solutions instead of deploying their own VoIP networks,
new security concepts for mobility combined with hosted VoIP have been exposed.
Conclusions
Before deploying their own in-house VoIP network, enterprises must keep in mind that
VoIP is not a common IP-based service which can be secured by using the same mechanisms as
for data networks: VoIP has inherited the security vulnerabilities of data networks as well as
those of legacy telephony. Besides, vulnerabilities inherent to the Session Initiation Protocol (SIP)
used as signalling protocol should be closely paid attention. Each enterprise VoIP network is
unique in its architecture and infrastructure and therefore, to be secured, it requires a serious and
in-depth tailored risk assessment. Even if this phase of risk assessment might take long time, it is
crucial not to skip it or dash it off in order to prevent future security attacks or issues.
The risk assessment is first based on the identification of fundamental security
requirements for VoIP networks. According to this report, the most important requirements in a
corporate environment are confidentiality and privacy (confidentiality of personal data) as well as
integrity and authenticity (integrity of identity). Availability has been regarded as an important
requirement but secondary and this can be justified by the fact that the consequences of disclosure
of information and of impersonation are irrevocable whereas consequences of availability issues,
although damaging, are recoverable. Authenticity is fundamental because it guarantees the
identity of communicating parties and prevents theft of identity and impersonation. This is
important in particular in large enterprises where employees do not necessarily know each other
and where the risk of impersonation is high. Attacks on integrity like Registration Hijacking or
modification of user location information can be disastrous because they enable attackers to
eavesdrop, modify and interrupt calls, or initiate calls as legitimate users.
To classify threats during the threat analysis process, the VOIPSA (VoIP Security
Alliance) Threat Taxonomy is considered in this report as the best one. This taxonomy
constitutes a frame for classifying threats in social threats, service abuse threats, eavesdropping
threats and others (see Figure 5.2). The VOIPSA Taxonomy has the advantage to have a fine
granularity and to be flexible enough to the extent that, since it is a damage-oriented
classification, it can host the future emergent threats. Other classifications can be infrastructure-
oriented, threat-to-security-requirement, layered, multilayered or major-threats (see Table 5.2) but
are not as appropriate as the damage-oriented VOIPSA classification. For instance, layered or
multi-layered classifications do not really give a good overview of the motivations of attack and
140
divide attacks in an artificial manner. Enterprises should preferably use the VOIPSA Threat
Taxonomy for conducting a threat analysis for their VoIP deployments.
From the threat analysis conducted in this report based on models, areas of vulnerability
in VoIP networks and motivations for attackers have been identified. Major areas to secure are
the Perimeter, VoIP servers, IP phones, softphones, wireless links supporting VoIP (VoWLAN),
external mobile workers, interconnection of VoIP systems of distant sites, WAN access routers
and VoIP media/signalling gateways. Motivations for attackers are more or less the same in
small and large enterprises, with the only difference that the value of gain, whether this is
reputation or status, or money at stake is higher in the case of attacks against large enterprises.
VoIP network designers should keep in mind that most attacks on VoIP systems originate from
within the enterprise and are often performed by employees. Indeed, employees may want to
compromise the confidentiality of calls by eavesdropping with a view to blackmailing or bribing
other employees, or selling personal information to competitors. In contrast, external attackers
can be self-employed and act out of vanity or to prove their competence or can act on behalf of a
third party who pays them to perform attacks; these third parties can be competitors aiming at
gaining private information or at ruining the reputation of a company, as well as companies
whose business is to secure VoIP systems and which want to sell their services by proving to
potential customer enterprises the existence of security breaches in their network.
From the analyses, the majority of attacks are successful only if they are internally
launched, like interception and manipulation attacks or eavesdropping which is very simple to
perform by using a variety of methods and tools. DoS attacks are so far the only external attacks
which can be successful; SPIT attacks are not widespread for the moment but are believed to
become one of the major annoyances enterprises will encounter in the future.
High-risk threats at the application-layer are eavesdropping by conversation
reconstruction due to its simplicity and its disastrous consequences, toll fraud through
impersonation and identity theft of legitimate users, misrepresentation of legitimate employees by
attackers introducing them as employees and enticing their interlocutors at giving information,
identity theft, conversation impersonation and hijacking, and conversation alteration. Denial of
Service has been assessed as a medium-to-high-level risk threat since its consequences are
recoverable. Targets of DoS attacks can be VoIP servers, particular IP phones or groups of IP
phones affecting respectively all employees, particular users or groups of users. Major
application-based VoIP attacks are Registration hijacking or Session Tear-downs by SIP
CANCEL or BYE messages.
Enterprises should be aware that technical solutions for securing VoIP deployments exist
and that they should implement them appropriately according to security recommendations.
These technical solutions encompass encryption of signalling and media traffic, authentication
mechanisms, SIP-aware firewalls resolving the NAT and firewall traversal issues, VoIP Virtual
Private Networks (VPNs), security protocols for voice over WLAN; in the future, anti-SPIT
entities will certainly be also included. However, although, technologies and protocols exist, they
are not always implemented in VoIP products by manufacturers and vendors; for these reasons,
VoIP devices should be carefully selected by VoIP network designers and make sure that they
implement sufficient security protocols.
Concerning the recommendations for VoIP security to adopt, enterprises might be at first
overwhelmed by the confusing plurality of approaches or their difference of granularity. For that,
in this master thesis four major sets of recommendations have been selected and compared in
141
order to identify which documents are the most valuable and helpful to enterprises and which
common points and points of divergence exist. After comparison of recommendations, the
“Internet Protocol Telephony & Voice over Internet Protocol – Security Technical
Implementation Guide – version 2” of the DISA [63] and the “Security Guidance for Deploying
IP Telephony Systems” published by the NSA [62] turn out to be of much help and interest for
companies.
Recommendations which have been made by the majority of the four documents (see
Table 7.1) should be regarded by VoIP network designers as highly required for the security of
their VoIP system and should be compulsorily applied. The interest of these common points in
recommendations is that they do not let any doubt regarding their implementation. In contrast,
points of contradiction in the comparison (see Table 7.2) should be considered as controversial
points not to overlook and to analyse.
The comparison of recommendations represented in Annex 2 shows that there is a great
number of measures to take for VoIP security but enterprises should not necessarily apply all of
them. The choice of the security measures to apply should depend on the results obtained from
the threat and risk analyses performed specifically for a given enterprise, the enterprise’s unique
network configuration and architecture and the enterprise’s objectives. Therefore, security choices
should be tailored to and fit the needs of enterprises.
For small enterprises wanting to reap the benefits of VoIP while remaining focused on
their business mission without having to develop an in-house costly expertise, hosted IP-PBX
solutions seem to be the most adapted. However, at the same time, small enterprises increasingly
support external mobile workers. The combination of SIP mobility and hosted IP-PBX solutions
may be prone to new security issues. Therefore, small enterprises decided to opt for such a
combination must have resort to a VoIP Service Provider to implement it. The role of enterprises
is to choose the best VoIP Service Provider and to achieve that, to get the right information
about the security aspects of the offered hosted solutions by asking appropriate and judicious
questions (see Table 8.2). Asking appropriate questions is fundamental because the only
resistance that customer enterprises have is to be aware of the potential security risks of hosted
VoIP supporting mobility and to understand the security approach of several VoIP Service
Providers before deciding which one to choose to cooperate with. Since it is difficult for
enterprises to change VoIP Service Provider and hosed solution once one has been deployed, the
decision making is crucial and enterprises should knowingly choose.
Further work
First, security aspects of Voice over IP could be analysed on other models of VoIP
deployments such as campus networks as it has been done with the models of VoIP deployments
in small single-site and large multi-site enterprises. Then, hosted VoIP solutions could be studied
in detail from the security point of view and suggestions for the best way to create hosted VoIP
deployments could be made; for example, determine which could be the best distribution of
services between VoIP Service Providers and customer enterprises, how to secure these services,
which share of responsibility about security have each of them…
Finally, in the continuation of what has been done in Chapter 8 on security concepts for
SIP mobility in hosted VoIP deployments, new in-depth investigations could be carried out in
order to determine whether there exists an optimized solution and whether new security issues
emerge or not.
142
The task we must set for ourselves is not to feel secure, but to be able to tolerate insecurity
Erich Fromm
Philosopher
1900-1980
143
Annexes
Annex 1 – VoIP security threats and vulnerabilities
1. Threat analysis in enterprise networks (no matter the size)

The threat analysis has been performed according to the Schneier’s attack trees model as
explained in details in [73]. Main references that have been used to collect information about
threats are:
• The report of the German Federal office of IT security (BSI) “VoIPsec” [58]
• The NSA’s report “Security guidance for deploying IP Telephony systems” [62]
• The master thesis report “VIP/SIP & Sécurité” [66]
• Mark Collier’s article “Basic vulnerability issues in SIP” [78]
• XMCO’s white paper “VoIP security – A layered approach” [68]
• Chris Roberts’ paper “VoIP security” [69]
• SNOCER’s deliverable “Towards a secure and reliable VoIP infrastructure” [79]
The threat analysis has been carried out by distinguishing threats at the network level and at the
VoIP application layer. Below two tables are making this distinction evident.
144
Annex Table 1 – Attack trees analyzing the threats to an enterprise VoIP system at the network level
Likelihood
Threats to the VoIP service at the network layer
Int./Ext.
Impact
Risk
A] Goal: Gain control over VoIP components I H H H
1. Connecting physically and directly to the VoIP network as an IP phone I H H H
1.1. Connecting the intrusive device (laptop, IP phone…) to the LAN and pretending to be I H H H
an authorized user
1.1.1. Using the MAC address of an authorized user (MAC spoofing) I H H H
1.1.2. Unplugging an authorized device like an IP phone and use its MAC address I H H H
1.2. Registering as a legitimate user using an already connected end device I H H H
1.2.1. Intercepting registration information (username, password) between IP I H H H
phones and VoIP server
1.2.1.1. Performing a MAC flooding I H H H
2. Connecting physically to the enterprise data network and indirectly to the VoIP network I L H L
2.1 Gaining access to the VLAN including VoIP servers and/or VoIP IP phones I L H L
2.1.1. Performing a VLAN Hopping attack I L H L
3. Gaining unauthorized control over the remote web management of servers I M H M+
3.1. Gaining administrator’s passwords I M H M+
3.1.1. Capturing network traffic to get the administrator’s password in case that the I M H M+
web session is not encrypted
3.1.2. Gaining unauthorized access to administrator’s computer to get I M H M+
administrator’s passwords stored in the cache of web browsers
3.2. Exploiting vulnerabilities in the applications and operating systems of servers I M H M+
4. Gaining direct physical access to the console interfaces of servers I L H L
4.1. Changing/cancelling configuration of servers I L H L
5. Modifying the configuration file of IP phones I L M L
5.1. Sending fake TFTP responses I L M L
5.1.1. Compromising the TFTP server storing the firmware and configuration files I L M L
5.2. Intercepting configuration files while IP phones download them from the TFTP server I L M L
and then replacing them with falsified files
5.2.1. Performing a Man-in-the-Middle attack I L M L
5.3. Gaining access to the web-based management interface of IP phones I L M L
5.3.1. Breaking passwords exploiting the fact that many IP phones are not capable I L M L
of providing strong cryptographic methods
5.3.2. Exploiting vulnerabilities of Web servers installed on IP phones I L M L
6. Disabling remote management functionality from IP phones I L M L
6.1. Exploiting vulnerabilities of the OS installed on the phones I L M L
6.2. Exploiting vulnerabilities of applications like web and SNMP servers for phone I L M L
management, address books, web browsers…
7. Launching malware attacks on any VoIP component (no IP phone-specific malware has been I L H L
identified yet)
7.1. Infect downloadable third-party software I L H L
7.2. Synchronizing IP phones with mobile phones which are infected I L H L
B] Goal: Eavesdrop I/E H H H
1. Capturing traffic I H H H
1.1. Using a sniffer, like Ethereal, running in promiscuous mode (for switched networks) I H H H
1.1.1. Forcing the switch to broadcast all incoming signalling and voice packets I H H H
1.1.1.1. MAC flooding attack I H H H
1.2. Using simply a sniffer in a non-switched network I L H L
1.3. Intercepting traffic (signalling and/or media traffic) between two users on the same I H H H
segment or between a user and a router and forwarding it to the right destination.
1.3.1. Pretending to be the intended called party or the router I H H H
1.3.1.1. Sending fake (or “spoofed”) ARP responses to the victim end users I H H H
containing a fake MAC address to make them update their ARP tables (ARP
spoofing attack)
145
Likelihood
Int./Ext.
Impact
Risk
1.4. Intercepting traffic (signalling and/or media traffic) between an IP phone and a VoIP I M H M+
gateway/VoIP server and forwarding it to the right destination.
1.4.1. Making the IP phone download a modified configuration file with fake VoIP I M H M+
gateway/VoIP server address from a rogue TFTP server
1.4.1.1. Making an IP phone send a TFTP GET configuration file message to I M H M+
a rogue TFTP server
1.4.1.1.1 Faking TFTP server information and address I M H M+
1.4.1.1.1.1. Performing a DHCP Rogue server attack I M H M+
1.5. Infecting IP phones with Trojan horses which can transfer to the attacker private I M H M+
information about calls
2. Call-pattern tracking I L L L
2.1. Performing traffic analysis of calls and call patterns I L L L
2.1.1. Penetrating the VoIP network and using a sniffer, like Ethereal I L L L
2.1.2. Capturing traffic (see Eavesdrop, 1.) I L L L
3. Conversation reconstruction I H H H
segment or between a user and a router and forwarding it to the right destination (Man-in-
the-Middle attack) + using appropriate software tool for recording and storing media
traffic, like Cain & Abel tool.
3.1.1. ARP spoofing attack I H H H
3.1.2. ICMP Redirect attack I L H L
3.2. Intercepting encrypted traffic (signalling and/or media traffic) between two users on I L H L
the same segment or between a user and a router and forwarding it to the right destination
(Man-in-the-Middle attack) + using appropriate software tool for recording and storing
media traffic, like Cain & Abel tool + getting encryption keys
3.1.1. ARP spoofing attack + compromising servers storing key material for I L H L
encryption and authentication
3.1.2. ICMP Redirect attack + compromising servers storing key material for I L H L
encryption and authentication
3.3. Infecting IP phones with Trojan horses which can transfer to the attacker recorded I L H L
voice data
4. Voicemail reconstruction I H H H
4.1. Intercepting traffic (signalling and/or media traffic) between a user and a voicemail I H H H
server and forwarding it to the right destination (Man-in-the-Middle attack) + using
appropriate software tool for recording and storing voicemail messages, like Cain & Abel
tool.
4.1.1. ARP spoofing attack I H H H
4.2. Compromising voice mail servers I H H H
5. Number harvesting (collecting or determining phone numbers, URLs, or other I M M M
identifiers)
5.1. Scanning database servers I L M L
5.1.2. Gaining control over database servers I L M L
5.2. Scanning directories on IP phones I M M M
5.2.1. Gaining control over IP phones I M M M
6. Remotely activating integrated microphones in IP phones I/E L H L
6.1. Infecting victim IP phones with malware which can activate their microphones and I/E L H L
transfer all conversations in the office to the attacker
C] Goal: Intentional Denial of service I H H H
1. Preventing traffic from reaching the destined IP phone or from being sent by an IP phone I H H H
1.1. Intercepting traffic between two users on the same segment or between a user and a I H H H
router and not forwarding it
1.1.1. Pretending to be the intended called party or the router I H H H
1.1.1.1. ARP spoofing attack I H H H
1.2. Impersonating VoIP server and not forwarding traffic I H H H
1.2.1. Spoofing the VoIP server I H H H
1.2.1.1. MAC spoofing attack I H H H
1.3. Impersonating an IP phone and not forwarding traffic I H H H
1.3.1. Spoofing the VoIP server I H H H
1.3.1.1. MAC spoofing attack I H H H
146
Likelihood
Int./Ext.
Impact
Risk
1.4. Redirecting traffic to non-existing or non-reachable destinations I L H L
1.4.1. Performing an ICMP Redirect attack, by sending an ICMP Redirect message I L H L
with no real reachable destination
1.4.2. Performing an IRDP (ICMP Router Discovery Protocol) spoofing attack, by I L H L
sending a fake IRDP packet in order to make the user agent overwrite its Default
Gateway entry
1.4.3. Performing a route injection attack (when dynamic routing protocols are I L H L
used, as often in enterprises or campuses)
1.5. Impersonating a VoIP gateway and not forwarding traffic in both directions (from IP I H H H
network to PSTN/ISDN and inversely)
1.5.1. Spoofing the VoIP gateway I H H H
1.5.1.1. Performing a MAC spoofing attack I H H H
2. Interrupt call sessions I L H L
2.1. Excluding a switch from an active communication I L H L
2.1.1. Provoking the recalculation of the spanning tree and exclude the switch I L H L
2.1.1.1. STP attack I L H L
3. Disconnecting VoIP servers/VoIP gateways I L H L
3.1. Excluding the switch to which a VoIP server/VoIP gateway is connected from the I L H L
network
3.1.1. Provoking the recalculation of the spanning tree by sending a fake BPDU to I L H L
exclude the switch
3.1.1.1. STP attack I L H L
3.2. Shutting down VoIP servers I L H L
3.2.1. Physical intrusion (System and equipment access) I L H L
3.2.2. Damage of cables / short-circuit I L H L
4. Provoking the out of service or dysfunction of VoIP servers I/E M H M+
4.1. Infecting network with viruses, worms to intrude in the VoIP network… I L H L
4.1.1. Having access to the VLAN in which they are located I L H L
4.1.1.1. Performing a VLAN Hopping attack I L H L
4.2. Manipulating VoIP servers remotely I/E M H M+
4.2.1. Having access to the VLAN in which they are located I L H L
4.2.1.1. Performing a VLAN Hopping attack I L H L
4.2.2. Impersonating a trusted user agent I/E L H L
4.2.2.1. Using an IP address that is within the range of trusted IP addresses I/E L H L
or a trusted external IP address
4.2.2.1.1. Performing a IP spoofing attack I/E L H L
4.2.3. Gaining unauthorized remote management of servers I/E M H M+
4.2.3.1. Gaining administrator password I M H M+
4.2.3.1.1. Capturing network traffic to get the administrator’s I M H M+
password in case that the web session is not encrypted
4.2.3.1.2. Gaining unauthorized access to administrator’s computer to I M H M+
get the administrator’s passwords stored in the cache of web browsers
4.2.3.2. Exploiting vulnerabilities in the applications and operating I/E M H M+
systems of servers
4.2.3.3. Launching malware attacks I/E M H M+
4.2.3.3.1. Remotely controlling target servers by Trojan horses I/E M H M+
4.3. Flooding servers with packets I M H M+
4.3.1. Performing a Ping flood attack I L H L
4.3.2. Performing a SYN flood attack I L H L
4.3.3. Performing a LAND flood attack I L H L
4.3.4. Request flooding of DHCP servers I M H M+
4.3.4.1. Make IP phones (DHCP clients) continuously request IP addresses I M H M+
until none is left for legitimate devices
4.4. Impeding VoIP servers to process external calls I/E M H M+
4.4.1. Performing DoS attacks on other public servers consuming all available I/E M H M+
Internet bandwidth and provoking the loss of external VoIP services
4.5. Damaging physically VoIP servers I L H L
4.5.1. Physical intrusion (System and equipment access) and physical destruction I L H L
4.5.2. Intentional fire I L H L
147
Likelihood
Int./Ext.
Impact
Risk
4.6. Changing/cancelling configuration of servers while having physical access to VoIP I L H L
servers
4.6.1. Gain administrative access to the servers I L H L
4.6.1.1. Use jumpers on the motherboard to reset BIOS passwords and I L H L
change passwords
4.6.1.2. Use boot disks to load alternate operating systems I L H L
4.7. Provoking loss of power I L H L
4.7.1. Shutting down the power supply I L H L
4.8. Attacking underlying operating systems I/E M H M+
4.8.1. Exploiting vulnerability due to default passwords and accounts I/E M H M+
4.8.2. Exploiting vulnerability due to weak passwords of operating systems of I/E M H M+
servers
4.8.3. Exploiting insecure default configuration settings I/E M H M+
5. Provoking the out of service or dysfunction of IP phones I M M M
5.1. Infecting network with viruses, worms to intrude in the VoIP network… I L M L
5.1.1. Having access to the VLAN in which they are located I L M L
5.1.1.1. VLAN Hopping attack I L M L
5.2. Manipulate VoIP servers remotely I L M L
5.2.1. Having access to the VLAN in which they are located I L M L
5.2.1.1. VLAN Hopping attack I L M L
5.3. Flooding IP phones with packets I L M L
5.3.1. Performing a Ping flood attack I L M L
5.3.2. Performing a SYN flood attack I L M L
5.3.3. Performing a LAND flood attack I L M L
5.4. Disable IP phones I M M M+
5.4.1. Gaining control of IP phones I M M M+
6. Preventing IP phones from having access to the VoIP network I M M M
6.1. Exhausting all available IP addresses for IP phones I M M M
6.1.1. Performing a DHCP Starvation attack I M M M
6.2. Providing fake IP addresses for IP phones I M M M
6.2.1. Performing a DHCP Rogue attack I M M M
6.3. Providing fake information about the TFTP server to contact I M M M
6.3.1. Send a “DHCP Offer” message with fake TFTP and router information I M M M
6.3.1.1. Performing a DHCP Rogue attack I M M M
7. Provoking loss of power in the whole enterprise network or part of it I L H L
7.1. Shutting down the power supply I L H L
D] Goal: Unintentional Denial of Service I L H L
1. Unintentional damaging or destruction of VoIP servers I L H L
1.1. Fire I L H L
1.2. Floods from broken water pipes I L H L
1.3. Natural disasters I L H L
2. Loss of power I L H L
E] Goal: Service abuse I/E H H H

1. Identity theft I/E M H M+
1.1. Intercepting registration information (username, password) between IP phones and I M H M+
VoIP server
1.1.1. MAC flooding I M H M+
1.1.2. ICMP Redirect attack I L H L
1.1.3. Route injection attack I L H L
1.1.4. MAC spoofing I H H H
1.2. Performing IP spoofing I/E L H L
148
Likelihood
Int./Ext.
Impact
Risk
F] Goal: Social threat I/E H H H
1. Theft of services I/E M H M+
1.1. Performing toll fraud I/E M H M+
1.1.1. Impersonating a legitimate user I/E M H M+
1.1.1.1. Intercepting registration information (username, password)between I M H M+
IP phones and VoIP server
1.1.1.1.1. MAC flooding I M H M+
1.1.1.2. Performing IP spoofing I/E L H L
1.1.2. Impersonating a legitimate VoIP server I/E M H M+
1.1.2.1. Intercepting registration information (username, password) I M H M+
between VoIP server and VoIP gateway
1.1.2.1.1. MAC flooding I M H M+
1.1.2.2. Performing IP spoofing I/E L H L
1.2. Adding, after the end of the call, additional costs of communication if the session is I/E M H M+
not rightly torn down
1.2.1. Infecting IP phones with Trojan horses which can maintain a session open I/E M H M+
2. Misrepresentation (of identity, authority & rights, and content) I/E H H H
2.1. Registering as a legitimate user at a VoIP server I/E M H M+
2.1.1. Intercepting registration information (username, password) between IP I M H M+
phones and VoIP server
2.1.1.1. MAC flooding I M H M+
2.1.2. Performing an IP spoofing attack I/E L H L
2.2. Initiating sessions and tearing them down as legitimate users I H H H
2.2.1. Stealing identity (see Service abuse, 1.) I H H H
2.2.2. Infecting IP phones with malware able to initiate calls on their own I M H M+
2.3. Misrepresenting the source or destination of the media and/or signalling streams I L H L
2.3.1. Performing IP spoofing I/E L H L
G] Goal: Interception and modification I H H H
1. Conversation impersonation and hijacking I H H H
segment or between a user and a router and forwarding it to the right destination.
1.1.1. MITM attack by ARP spoofing attack I H H H
1.1.2. Performing an ICMP Redirect attack and forwarding VoIP traffic to the right I L H H
destination
2. Call rerouting (sink holing) I M H M+
2.1. Modifying the default VoIP gateway/VoIP server address in the configuration of an IP I M H M+
phone and redirecting calls to attacker’s end device
2.1.1. IRDP spoofing attack I L H L
2.1.2. Route injection attack (when dynamic routing protocols are used, as often in I L H L
enterprises or campuses)
2.1.3. Making the IP phone download a modified configuration file with fake VoIP I M H M+
gateway/VoIP server address from a rogue TFTP server
2.1.3.1. Making an IP phone send a TFTP GET configuration file message I M H M+
to a rogue TFTP server
2.1.3.1.1. Faking TFTP server information and address I M H M+
2.1.3.1.1.1. Performing a DHCP Rogue server attack I M H M+
149
Likelihood
Threats to the VoIP system at the application layer
Int./Ext.
Impact
(threats to SIP and RTP applications)
Risk
A] Goal: Eavesdropping I H H H
1. Call-pattern tracking
1.1. Performing traffic analysis of calls and look for SIP INVITE and possibly for SIP I L L L
BYE messages
2. Conversation reconstruction
2.1. Intercepting traffic between calling parties or between a calling party and SIP Proxy I H H H
servers…
2.1.1. Conversation impersonation and call hijacking (see D], Conversation I H H H
impersonation and hijacking) + using an appropriate software tool to record and
decode RTP streams
B] Goal: Social threat I/E H H H
1. Theft of services
1.1. Performing toll fraud I H H H
1.1.1. Connecting directly to the VoIP media/signalling gateway and place I L H L
unauthorized calls
1.1.1.1.Bypass the access control provided by the IP-PBX core I L H L
1.1.1.1.1. Send an INVITE message directly to the VoIP I L H L
media/signalling gateway
1.1.2. Impersonating legitimate users and placing calls as if they were doing it I H H H
1.1.2.1. Stealing identity (see C], identity theft) I H H H
1.2. Causing the enterprise incur additional costs I/E L H L
1.2.1. Performing a SPIT attack on office IP phones which probably, during the absence I/E L H L
of workers from their offices, forward calls on mobile phones
2. Misrepresentation
2.1. Spoofing identity/impersonating I H H H
2.1.1. Stealing identity(see C], identity theft) and pretending being somebody I H H H
2.1.2 Using a false caller identification which appears on the phone display of the I/E M H M+
receiver
2.1.2.1. Falsifying the calling number (caller-ID) I/E M H M+
2.1.2.1.1. Using service offered by local enterprise IP-PBX core like I L H L
Asterisk server (by malicious employees)
2.1.2.1.2. Using websites offering caller-ID spoofing services E M H M+
3. Unwanted contact
3.1. Annoying or wasting time of victims I/E L M L
3.1.1. Initiating unwanted call sessions for telemarketing purposes for example or I/E L M L
other.
3.1.1.1. Performing SPIT E L M L
3.1.1.1.1. Using software tool like spitter in conjunction with Asterisk E L M L
to setup a SPIT generation platform (new software so not so used
yet…)
3.1.1.1.2. Just calling for telemarketing or other purposes E L M L
3.1.1.2. Performing Call masquerading and enticing victims to pick up the I/E L M L
phone by hiding their identity and do telemarketing or other
3.1.1.2.1. Performing Caller-ID spoofing with the intent to mislead I/E L M L
3.1.2. Making prank calls I/E L L L
3.1.2.1. Performing Call masquerading and enticing victims to pick up the I/E L L L
phone by making them see a familiar calling number on their phone display
or a calling number of employees, boss, colleagues…
3.1.2.1.1. Performing Caller-ID spoofing with the intent to mislead I/E L L L
3.1.3. Harassing called parties with constant ringing of phones with no real calls I/E L L L
3.1.3.1. Setting up VoIP calls and tearing down as soon as the victim picks I/E L L L
up the phone
3.1.3.2. Setting up VoIP calls by replaying old SIP messages I L L L
3.2. Leaving messages (prank, harassment, unwanted…) in the voice mail boxes E L L L
3.2.1. Performing call masquerading and presenting to the voicemail E L L L
box as trusted caller [84]
150
Likelihood
Int./Ext.
Impact
Risk
C] Goal: Service abuse I H H H
1. Identity theft and use for access
1.1. Intercepting and stealing victim’s credentials I H H H
1.1.1. Impersonating the SIP Registrar by spoofing it and performing Man-in-the- I H H H
middle attack + challenging the victim user agent for its credentials
1.1.1.1. Sending SIP 301 “Moved Permanently” response including the I H H H
attacker’s IP address as redirection address (by using the redirectpoison
tool, for example) to intercept all SIP messages and forward them to the
right destination, and then send to victim’s user agent a “401
Unauthorized” SIP response to ask for credentials
1.1.1.2. Sending SIP 302 “Moved temporarily” response including the I M H M+
attacker’s IP address as redirection address to intercept all SIP messages
and forward them to the right destination and then send to victim’s user
agent a “401 Unauthorized” SIP response to ask for credentials
1.1.2. Performing a Man-in-the-middle attack between the victim SIP User Agent I M H M+
and the SIP Registrar + collecting credentials
1.1.2.1. Performing Registration hijacking attack and intercepting I M H M+
credentials
1.1.3. Scanning network for SIP header lines like REGISTER, INVITE, OPTIONS, I M H M+
From, Authorization, Proxy-Authorization and breaking MD5 digest
1.1.3.1. Using software tool like authtool(new tool) I M H M+
1.2. Replaying valid user ID and password information to have access to services I M H M+
1.2.1. Stealing victim’s credentials, even encrypted and resending them I M H M+
1.2.1.1. Capturing valid packets which can be replayed, even if they are I M H M+
encrypted
D] Goal: Interception and manipulation I H H H
1.1. Manipulating Registration Records to hijack conversations I H H H
1.1.1.Updating the SIP Location server entries with fake information to hijack the I H H H
call and forwarding SIP messages to the right destined user agent
1.1.1.1. Sending a fake SIP REGISTER message with the victim user I H H H
agent’s URI associated with a fake Contact information containing the IP
address of the attacker (Registration hijacking attack)
1.1.1.1.1. Using software tool like SiVus or RegistrationHijacking I M H H
1.2. Spoofing SIP Registrar and performing man-in-the middle attack I H H H
1.2.1. Using SIP 301 “Moved Permanently” response (for all successive attacks) I H H H
including the attacker’s IP address as redirection address to intercept all SIP
messages and then forward them to the right destination
1.2.2. Using SIP 302 “Moved temporarily” response (for single-time attacks) I M H M+
including the attacker’s IP address as redirection address to intercept all SIP
messages and then forward them to the right destination
1.3. Impersonating SIP Proxy server and performing man-in-the middle attack I H H H
1.3. 1. Spoofing SIP Proxy server and performing man-in-the middle attack I H H H
1.3.1.1. Same as 1.2.1. I H H H
1.3.1.2. Same as 1.2.2. I M H M+
1.3.1.3. Using SIP 305 “Use Proxy” response including the I M H M+
attacker’s IP address as redirection address to intercept all SIP messages
and then forward them to the right destination
1.3.1.4. Performing DNS spoofing I L H L
1.3.2. Changing the SIP proxy address in the configuration of SIP phones I L H L
1.4. Spoofing SIP Redirect server and performing man-in-the middle attack I H H H
1.4.1. Same as 1.2.1. I H H H
1.4.2. Same as 1.2.2. I M H M+
1.5. Spoofing SIP User Agent and performing man-in-the middle attack I H H H
1.5.1. Same as 1.2.1. I H H H
1.5.2. Same as 1.2.2. I M H M+
1.6. Posing as voicemail server and tricking the caller into leaving a message I M H M+
1.6.1. Impersonating the voicemail I M H M+
1.6.1.1. Performing Registration hijacking attack I L H L
1.6.1.2. Same as 1.2.1. I M H M+
1.6.1.3. Same as 1.2.2. I M H M+
151
Likelihood
Int.Ext.
Impact
Risk
2.1. Inserting words, phrases, sound effects into a conversation I H H H
2.1.1. Getting access to the RTP stream of a conversation (not necessarily MITM I H H H
attack)
2.1.1.1. Using software tools like rtpinsertsound/rtpmixsound I H H H
E] Goal: Intentional Denial of Service I/E M H M+
1. Interrupting call session initiation I M M M
1.1. Sending spoofed messages to interrupt call setups before the called party hangs up I M M M
1.1.1. Sending faked call tear down messages I M M M
1.1.1.1. Injection of SIP CANCEL message (successful only internally) I M M M
1.1.1.2. Injection of SIP BYE message (successful only internally) I M M M
1.1.1.2.1. Using a software tool like BYE Teardown I M M M
1.1.2. Sending calling IP phones fake error messages supposedly coming from I M M M
legitimate VoIP servers
1.1.2.1. Sending a 4xx message I M M M
2. Preventing user agents from registering with the SIP Registration server I M H M+
2.1. Provoking a complete fill-in of registration table of SIP Registrar I M H M+
2.1.1. Flooding SIP Registration server with a great number of SIP REGISTER I M H M+
requests (Directory service flooding) coming from different
2.1.1.1. Performing a SIP REGISTER flooding attack against SIP Registrar I M H M+
3. Setting the local SIP proxy server out of service I/E M H M+
3.1. Exhausting processing resources of the SIP Proxy server I/E M H M+
3.1.1. Flooding local SIP Proxy server with a great number of SIP INVITE messages I/E M H M+
3.1.1.1.Performing a SIP INVITE Flooding attack (also called SIP Bombing) I/E M H M+
3.1.2. Sending a large number of malformed packets provoking the non-stop E L H L
processing of bad packets
3.1.2.1. Performing a distributed DoS attack with malformed packets from the E L H L
outside of the company
3.1.2.2. Performing a simple DoS attack with malformed packets from the E L H L
outside of the company
4. Setting IP phones out of service making them crash or reboot I M M M
4.1. Exhaust processing resources of IP phones I M M M
4.1.1. Flooding IP phones with a great number of SIP INVITE messages I M M M
4.1.1.1.Performing a SIP INVITE Flooding attack against IP phones I M M M
4.2. Completely degrading the QoS of calls I M M M
4.2.1. Performing actions described in 5. I M M M
4.3. Making the IP phone rebooting I M M M
4.3.1. Transmitting a NOTIFY request to a SIP phone which causes it to reboot I M M M
4.3.1.1. Using a software tool like check-syn-reboot I M M M
5. Degrading the QoS of calls (QoS abuse) I M H M+
5.1. Downgrading the codec quality in order to degrade the streams’ audio quality and I M H M+
render the conversation impossible
5.1.1. Performing a mid-session codec change by sending SIP UPDATE request [83] I M H M+
5.1.2. Performing a mid-session codec change by sending SIP re-INVITE request I M H M+
with change in the SDP part
5.2.Upgrading the codec quality in order to provoke a higher bandwidth usage causing a I M H M+
increase in packet loss and crippling the conversation so that to render it impossible
5.2.1. Performing a mid-session codec change by sending SIP UPDATE request [83] I M H M+
5.2.2. Performing a mid-session codec change by sending SIP re-INVITE message I M H M+
with change in the SDP part
6. Provoking loss of calls destined to IP Phones by hijacking calls I M H M+
6.1. Performing registration hijacking to register as the victim legitimate user agent I M H M+
6.1.1. Updating the SIP Location server with fake information about a SIP User I M H M+
Agent
6.1.1.1. Sending a fake SIP REGISTER message with the victim user agent’s I M H M+
URI associated with a fake Contact information containing the IP address of the
attacker and deregistering all older SIP URIs
152
Likelihood
Int./Ext.
Impact
Risk
6.1.1.2. Modifying a SIP REGISTER request that contains more than one I M H M+
Contact address which uses “q values” for the priority of these addresses
6.1.1.2.1. Setting to zero the “q value” of original SIP URIs so that they I M H M+
cannot receive any INVITE messages and setting a higher priority for a
forged SIP URI
6.1.1.3. Modifying a SIP REGISTER request by setting the Expires header to I M H M+
zero
6.2. Spoofing SIP Registrar to redirect traffic to another destination which does not I M H M+
forward traffic to the right destination
6.2.1. Using SIP 301 “Moved Permanently” response including the attacker’s IP I M H M+
address as redirection address (by using redirectpoison tool for example)
6.2.2. Using SIP 302 “Moved temporarily” responses including the attacker’s IP I M H M+
address as redirection address
6.2.3. Using SIP 305 “Use Proxy” response including the attacker’s IP address as I M H M+
redirection address to intercept all SIP messages
6.3. Impersonating SIP Proxy server and not forwarding traffic to the right destination I M H M+
6.3.1. Spoofing SIP Proxy server and blocking traffic I M H M+
6.3.1.1. Same as 6.2.1. I M H M+
6.3.1.2. Same as 6.2.2. I M H M+
6.3.1.3. Same as 6.2.3. I M H M+
6.3.1.4. Performing DNS spoofing I M H M+
6.3.2. Changing the SIP proxy address in the configuration of SIP phones I L M L
6.4. Spoofing SIP Redirect server and not forwarding traffic to the right destination I M H M+
6.4.1. Same as 6.2.1. I M H M+
6.4.2. Same as 6.2.2. I M H M+
6.4.3. Same as 6.2.3 I M H M+
6.5. Spoofing SIP User Agent and not forwarding traffic to the right destination I M H M+
6.5.1. Same as 6.2.1. I M H M+
6.5.2. Same as 6.2.2. I M H M+
6.6. Playing mid-session tricks/”Re-INVITE or Session Replay” I M H M+
7. Provoking loss of calls destined to VoIP media gateways, voice mail server, Interactive I M H M+
Voice Response and Automated Attendant by hijacking calls
7.1. Performing registration hijacking by faking information for these components I M H M+
7.1.1. Updating the SIP Location server with fake information about a VoIP I M H M+
components
7.1.1.1. Sending a fake SIP REGISTER message with the victim user agent’s I M H M+
URI associated with a fake Contact information containing the IP address of the
attacker
8. Interrupting ongoing calls I/E M H M+
8.1. Observing the signalling for a call and sending spoofed SIP BYE message to the I M H M+
calling parties of the call (Session Tear-down attack)
8.1.1. Injection of SIP BYE message (successful only internally) I M H M+
8.2. Tearing down UDP ports opened for legitimate external calls (Collier) I M H M+
8.2.1. Flooding the firewall with SIP BYE messages I M H M+
8.3. Preventing the firewall from properly managing ports for legitimate external calls I/E M H M+
8.3.1. Flooding the firewall with calls I/E M H M+
8.4. Inserting forged valid RTCP BYE packets in an RTP conversation I M H M+
8.5. Modifying the state of the dialog-session I M H M+
8.5.1. Sending a forged SIP Re-INVITE message redirecting call to an inexistent I M H M+
destination for example
9. Preventing users from placing external calls I M H M+
9.1. Preventing the firewall from properly managing ports for legitimate external calls I M H M+
9.1.1. Flooding the firewall with calls (SIP INVITE messages) I M H M+
9.2. Tearing down UDP ports opened for legitimate external calls (Collier) I M H M+
9.2.1. Flooding the firewall with SIP BYE messages I M H M+
10. Hijacking RTP streams by injecting rogue RTP packets in the RTP streams I M H M+
10.1. Making fake RTP packets be read by the called user agent before genuine RTP I M H M+
packets from legitimate caller are read
153
Likelihood
Int./Ext.
Impact
Risk
10.1.1. Learning the SSRCS of one of the calling parties and injecting in an ongoing I M H M+
RTP stream forged RTP packets with the same SSRC but with higher sequence
number and higher value of timestamp than the legitimate packets. The receiver’s
RTP application will process the attacker’s packets first and discard the legitimate
packets since they have invalid older timestamps.
11. Ejecting a calling party from an ongoing call I L H L
11.1. Provoking a SSRC collision I L H L
11.1.1. Stealing the SSRC of one of the calling parties and sending own RTP I L H L
messages, thus leading the receiver to choose to accept packets originating from a
single source I L H L
11.1.2. Sending to a victim RTP messages labelled with the victim’s SSRC, thus
forcing the victim to abandon the current RTP streams in order to choose a new
collision-free SSRC
12. Setting voicemail servers out of services I H M M+
12.1. Performing voicemail bombing attack I H M M+
Annex Table 2 – Attack trees analyzing the threats to an enterprise VoIP system at the application level
2. Threats coming with wireless VoIP
References:
• Mark collier ‘s article “Wireless VoIP security fundamentals”
• http://bigwill.mit.edu/80211hacking2005.pdf
• BSI report [58]
Additional threats to the VoIP system due to the adoption of wireless

Likelihood
Int./Ext.
Impact
VoIP within the premises of the enterprise
Risk
A] Goal: Eavesdropping I H H H
1.1. Exploiting vulnerabilities of the WEP (Wired Equivalent Privacy) I H H H
1.1.1. Decrypting all traffic over the air I H H H
1.1.1.1. Recovering the secret key shared by the wireless terminal and the access I H H H
point
1.1.1.1.1. Collecting a large sample of packets protected by WEP and use I H H H
appropriate tools to recover the secret key (tools like airsnort, WEPcrack,
Wepattack)
B] Goal: Interception and modification I H H H
1. Impersonation of wireless terminals
1.1. Exploiting vulnerabilities of the WEP (Wired Equivalent Privacy) I H H H
1.1.1. Using a recovered WEP key of a legitimate terminal to impersonate it and I H H H
initiate calls from a SIP User Agent installed on this device
1.2. Performing a MAC spoofing attack and impersonate a legitimate wireless device I H H H
1.2.1. Stealing a legitimate MAC address I H H H
2. Impersonation of legitimate Access Points (Aps) and Man-in-the-Middle attack
2.1. Installing Access Points with the same ESSID as legitimate Aps I M H M+
2.1.1. Using appropriate tools and obtain a legitimate ESSID I M H M+
C] Goal: Denial of Service I H M M+
1.Introducing rogue wireless Access Points (Aps) in the enterprise premises
1.1. Generating covertly a local Access point and blocking all messages I H M M+
1.1.1. Using tools like Fake AP I H M M+
154
Likelihood
Int./Ext.
Impact
Risk
2. Spoofing legitimate Access Points (Aps) in the enterprise premises and blocking all
messages sent by wireless terminals
2.1. Installing Access Points with the same ESSID as legitimate Aps and blocking I M M M
traffic
2.1.1. Using appropriate tools and obtain a legitimate ESSID I M M M
3.Access Point deactivation
3.1. Physical intrusion and physical deactivation I M M M
3.2. Saturating the Access Point I M M M
3.2.1. Forging and sending multiple associations I M M M
4. Denying the access of a wireless terminal to the wireless network
4.1. Disassociating the user by forging disassociations I M M M
4.2. Overflowing an Access Point which only accepts a limited number of users I M M M
D] Goal: Gaining access to the wireless network I H H H
1. Detecting passively wireless networks, hidden or not
1.1. Sniffing packets on the air in the local premises of an enterprise (“leeching access”) I H H H
1.1.1. Using tools like Kismet or NetStumbler for sniffing L2 data on the air and I H H H
detecting wireless networks
Annex Table 3 – Attack trees presenting extra threats to VoIP systems due to wireless VoIP
3. Threats coming with support of external mobile workers
Additional threats to the VoIP system due to the support of mobile
Likelihood
Int./Ext.
Impact
workers with IP devices out of the premises of the enterprise
Risk
A] Goal: Gaining unauthorised access to the private enterprise VoIP network E M H M+
1. Infiltrating in the VoIP network as a legitimate mobile worker
1.1. Stealing the mobile device of an external mobile worker and impersonating them E M H M+
1.1.1. Logging in as the legitimate SIP user E M H M+
1.2. Using the mobile device of an external mobile worker without their knowing (for ex., E M H M+
laptop left unprotected for a while in a conference room…) and impersonating them
2.1.1. Logging in as the legitimate SIP user E M H M+
1.3. Using the mobile device of an external mobile worker with their knowing (for ex., E L H L
laptop borrowed to colleagues or friends…) and impersonating them
2.1.1. Logging in as the legitimate SIP user E L H L
B] Goal: Eavesdropping E H H H
1.1. Exploiting the lack of use of encryption of call conversations by mobile workers E H H H
1.1.1. Sniffing conversations on the air with the appropriate software tools like E H H H
Kismet or Netstumbler
2. Traffic capture
2.1. Same as 1.1. E M H M+
2.1.1. Same as 1.1.1. E M H M+
C] Goal: Interception and modification E M H M+
1.1. Gaining unauthorized access and performing a Man-in-the-Middle attack as if the E M H M+
attacker were in the premises of the enterprise (see Annex Table 2)
2.1. Gaining unauthorized access, performing a Man-in-the-Middle attack as if the attacker E L H L
were in the premises of the enterprise (see Annex Table 2) and inserting words, phrases,
sound effects into legitimate ongoing conversations
155
Likelihood
Int./Ext.
Impact
Risk
2.1.1. Getting access to the RTP stream of a conversation (not necessarily MITM E L H L
attack)
2.1.1.1. Using software tools like rtpinsertsound/rtpmixsound E L H L
D] Goal: Intentional interruption of Service E M H M+
1. Launching DoS attacks against the enterprise VoIP network from the mobile device
1.1. Gaining unauthorised access to the VoIP network (see A]) E M H M+
E] Goal: Social threat E M H M+
1. Misrepresentation of a legitimate user
1.1. Gaining unauthorized access and impersonating the victim mobile worker in E M H M+
telephone conversations (see A] for unauthorized access)
2. Theft of services
2.1. Gaining unauthorized access and impersonating the victim mobile worker in order to E M H M+
get access to all the services the legitimate worker is allowed (see A] for unauthorized
access)
F] Goal: Service abuse E M H M+
1. All types of service abuse (Call conference abuse, Premium Rate Service Fraud, Improper
bypass…)
2.1. Gaining unauthorized access by stealing the mobile device of an external mobile E M H M+
worker
G] Goal: Unintentional Denial of Service E M H M+
1. Rendering impossible to mobile workers to connect to the enterprise VoIP network
1.1. Visited networks denying the creation of IPsec VPNs E M H M+
1.2. Home network denying the access of external mobile workers (problem with the VPN E L H L
gateway)
Annex Table 4– Attack trees presenting threats to VoIP systems due to mobility support
4. Some definitions of attacks against VoIP systems
9 MAC spoofing
MAC spoofing is the theft of identity of a computer by altering the MAC address on its Network Interface
Controller (NIC) card. For example, intruders can change the MAC address of their computer to have access to an
IP network as authorized users, no matter if this authorized user is already connected to the network or not.
9 MAC flooding
MAC flooding is a technique used to compromise the security of network switches. In such an attack, a switch
is flooded with packets, each containing different source MAC addresses. The intention is to consume the limited
memory set aside in the switch to store the MAC address-to-physical port translation table. The result of this attack
causes the switch to enter a state called failopen mode, in which all incoming packets are broadcast out on all ports.
A malicious user could then use a packet sniffer running in promiscuous mode to capture sensitive data from other
computers, which would normally not be accessible were the switch operating normally.
9 ARP Spoofing/Poisoning
ARP spoofing (or ARP poisoning) is a technique used to attack an Ethernet network which may allow an
attacker to sniff data frames on a switched local area network (LAN) or stop the traffic altogether (Denial of Service
attack). The principle of ARP spoofing is to send fake, or ‘spoofed’, ARP messages to an Ethernet LAN. These
frames contain false MAC addresses, confusing network devices, such as network switches. As a result frames
intended for one machine can be mistakenly sent to another or an unreachable host (DoS attack). ARP spoofing can
also be used in a man-in-the-middle attack in which all traffic is forwarded through a host with the use of ARP
spoofing and analyzed for information. See more details in 3.5.
9 STP attacks
The Spanning Tree Protocol (STP) prevents the creation of loops in a redundant switched network
environment. STP is a hierarchical tree-like topology with a root switch at the top. A switch is elected as root based
on the lowest configured priority of any switch. When a switch boots up, it begins a process of identifying other
switches and determining the root bridge. After a root bridge is elected, the topology is established from its
perspective of the connectivity. The switches determine the path to the root bridge, and all redundant paths are
156
blocked. STP sends configuration and topology change notifications and acknowledgments using bridge protocol
data units (BPDU). An STP attack involves an attacker spoofing the root bridge in the topology. The attacker
broadcasts out an STP configuration change BPDU in an attempt to force an STP recalculation. The BPDU sent out
announces that the attacker’s system has a lower bridge priority. The attacker can then see a variety of frames
forwarded from other switches to it. STP recalculation may also cause a denial-of-service (DoS) condition on the
network by causing an interruption of 30 to 45 seconds each time the root bridge changes.
9 VLAN hopping attack
VLAN hopping (Virtual Local Area Network hopping) is a method of attacking a network by sending packets
to a port that is not normally accessible from a given end system. A VLAN hopping attack can occur in either of two
ways: switch spoofing and double tagging.
Switch spoofing can occur if a network switch is set for autotrunking; in that case, an attacker turns it into a
switch that appears as if it has a constant need to trunk (that is, to access all the VLANs allowed on the trunk port).
In double tagging, the hacker transmits data through one switch to another by sending frames with two 802.1Q
tags, one for the attacking switch and the other for the victim switch. This fools the victim switch into thinking that
the frame is intended for it. The target switch then sends the frame along to the victim port. VLAN hopping can be
used to steal passwords and other sensitive information from specific network subscribers. VLAN hopping can also
be used to modify, corrupt, or delete data, install olumbi or other malware programs, and propagate viruses,
worms, and Trojans throughout a network.
9 IP spoofing
The Internet Protocol (IP) address spoofing is the creation of IP packets with a forged (spoofed) source IP
address.
9 ICMP Redirect attack
ICMP (Internet Control Message Protocol) Redirect messages (RFC 792) have been introduced to make routing
more efficient. They are messages which inform a host to redirect its routing information, i.e. to send packets on an
alternate route. For example, let’s assume that a host A tries to send data through a router R1 and then through
another router R2 to reach host B; if a direct path from host A to R2 is available, an ICMP Redirect message sent by
router R1 will inform the host of such a route.
The ICMP Redirect attack consists in sending to host A a spoofed ICMP Redirect message which contains the
IP address of the attacker presented as the IP address of the best gateway through which to send the packets. That
way, host A adjusts its routing and sends IP packets to host B through the attacker. The attacker can then receive
and block these packets, causing a DoS attack against host B, or he can receive and forward these packets to host B,
performing a MITM attack. The spoofed ICMP Redirect message can also contain an unreachable IP address instead
of the attacker’s IP address. In that case, IP packets sent by host A are lost and host B is victim of DoS.
9 Route injection
When dynamic routing protocols are used in campuses or large enterprises, configuration errors can occur
which make it possible to inject fake routes and thus to deviate IP packets from their destination address.
9 DHCP starvation
A DHCP starvation attack works by broadcasting DHCP requests with spoofed MAC addresses to the DHCP
server. This is easily achieved with attack tools such as gobbler. If enough requests are sent, the network attacker can
exhaust the address space available to the DHCP servers for a period of time.
9 DHCP Rogue server
A DHCP Rogue server is an illegitimate unauthorized DHCP server installed in a network by an attacker and
which acts as a legitimate one. This DHCP Rogue Server can be the basis of attacks in a network. For example, a
rogue DHCP server could provide legitimate clients with fake IP addresses that prevent the users from
communicating on the network. A DoS condition then results, and users are unable to connect to network resources
to perform their work.
9 Ping Flood
A broadcast storm of pings overwhelms the target system so it cannot respond to legitimate traffic.
9 SYN Flood
A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a
target’s system. This is a well known type of attack and is generally not effective against modern networks. It works if
a server allocates resources after receiving a SYN packet, but before it has received the ACK packet. There are two
methods, but both involve the server not receiving the ACK packet. A first method is that a malicious client can skip
sending this last ACK packet. A second method is to spoof the source IP address in the SYN packet leading by that
the server to send the SYN-ACK to the falsified IP address; that way, the server will never receive the ACK packet.
In both cases the server will wait for the acknowledgement for some time since simple network congestion could
also be the cause of the missing ACK packet. If these half-open connections bind resources on the server, it may be
possible to take up all these resources by flooding the server with SYN packet. Once all resources set aside for half-
open connections are reserved, no new connections, legitimate or not, can be made, resulting in DoS. Some systems
may malfunction badly or even crash if other operating system functions are starved of resources this way.
157
9 LAND attack
A LAND attack is a DoS attack that consists of sending a special poison spoofed packet to a computer, causing
it to lock up. The attack involves sending a spoofed TCP SYN packet (connection initiation) with the target host’s IP
address and an open port as both source and destination. The reason a land attack works is because it causes the
machine to reply to itself continuously.
9 Denial of Service
A denial-of-service (DoS) attack is an attack on a computer system or network that causes a loss of service to
users, typically the loss of network connectivity and services by consuming the bandwidth of the victim network or
overloading the computational resources of the victim system. A DoS attack can be perpetrated in a number of ways.
There are three basic types of attack:
1. Consumption of computational resources, such as bandwidth, disk space, or CPU time
2. Disruption of configuration information, such as routing information
3. Disruption of physical network components
9 Call hijacking
Call hijacking attack refers to a situation where one of the intended end points of the conversation is exchanged
with the attacker. Call hijacking can have common consequences with eavesdropping attacks (access to confidential
information). Generally if an attacker is able to conduct a Call Hijacking attack, he will evolve it into a MITM attack
to avoid raising suspicions. See more details in 5.5.2.
9 Registration hijacking
In a Registration hijacking, an attacker registers with the local SIP Registrar as a legitimate user. See more details
in 5.5.1.1.
9 DoS CANCEL/BYE attack
See definition in 5.5.1.2
9 SIP INVITE/REGISTER flooding attack
In a SIP INVITE flooding attack, attackers generate large volumes of SIP INVITE requests and send them to a
SIP Proxy Server in order to overwhelm it and to make it crash for example. A SIP REGISTER attack is based on
the same principle but is launched against a SIP Registrar with SIP REGISTER messages.
9 Session tear-down attack
Session tear-down occurs when an attacker observes the signalling for a call, and then sends spoofed SIP BYE
messages to the participating Uas. Most SIP Uas do not require strong authentication, which allows an attacker to
send properly crafted SIP BYE messages to the two Uas, tearing down the call.
9 SSRC Collision attack
Upon receiving packets from two different sources with the same SSRC, RTP is put in a collision situation.
RTP’s collision management mechanism is simple: if a source discovers at any time that another source is using the
same SSRC identifier as its own, it must send an RTCP BYE packet (detailed later in this section) for the old
identifier and choose another random one; if a receiver discovers that two other sources are colliding, it may keep
the packets from one and discard the packets from the other; the two sources are expected to resolve the collision so
that the situation does not last.
Analyzing how RTP handles collisions, two DoS attacks are foreseeable:
• The attacker “steals” the SSRC of one of the peers and sends its own RTP messages to the other peer. Upon
receiving packets with the same SSRC, the receiver chooses to accept packets originating from a single source. The
attacker can thus effectively eject a VoIP user from a session.
• The attacker sends to a victim RTP messages labelled with the victim’s SSRC. The victim is forced of abandoning
current RTP streams in order to choose a new, collision free SSRC. This results in an interruption of any
conversation the victim user is involved in.
9 SPIT
See definition in 6.4.
9 Voicemail bombing attack
This attack is also called Vbombing. It refers to the delivery of thousands of voice mail messages to a VoIP
device. This could cause a DoS or service disruption.
9 Caller ID spoofing
Caller ID (CID) allows the person being called to see the number or the name of the person calling them, when
they have the correct equipment to receive the Caller ID information. Caller ID is often an added value service from
VoIP provider and is sent with the call setup. Caller ID spoofing is an attack in which a fake CID information is sent
in order to mislead the called party.
158
Annex 2 – Comparison of VoIP security recommendations
A comparison of four major reports on VoIP security measures published by different

institutions and authorities has been thoroughly presented below. These reports are namely:
• “Internet Protocol Telephony &voice over Internet Protocol – Security Technical Implementation
Guide – version 2” [63] published in April 2006 by the American Defense Information
Systems Agency (DISA)
• “VoIPsec – Studie zur Sicherheit von Voice over Internet Protocol” [58] published in October
2005 by the German Federal Office for Security in Information Technology (BSI)
• “Security considerations for Voice over IP systems” [59] published in January 2005 by the
American National Institute of Standards and Technology (NIST)
• “Security Guidance for Deploying IP Telephony Systems” [62] published in February 2006 by
the American National Security Agency (NSA)
In the comparison table below, a recommendation by a particular institution has been
marked by a √ symbol if it has been expressly mentioned and by NM (Not Mentioned) if they
have not been mentioned.
The security level achieved by the implementation of each of the following
recommendations has been specified by the following acronyms:
Security level Description

m Minimum These recommendations represent the minimum that VoIP
networks should implement; if not implemented, the risk level
is unacceptable.
M Medium These recommendations represent the best practices for the
deployment of a VoIP system in an enterprise. They provide a
level of security as good as that of legacy telephony.
Enterprises should try to implement this level of security.
H Highest These recommendations represent the best way to protect a
VoIP system. However, they might be difficult to implement
because they do not scale or because all products in the market
do not support them, or they might be too expensive.
This comparison table of recommendations has been commented and evaluated in

Chapter 7. The contradictory recommendations have been coloured in the table in pink and the
common points in the majority of reports have been coloured in turquoise.
159
Annex Table 5 – Comparison between the recommendations of the DISA, BSI, NIST and NSA reports
Sec. Level
DISA
NIST
NSA
BSI
Recommendations
Protection of the underlying infrastructure

Ensure that the underlying IP network supporting the VoIP systems complies with the basic 9 9 NM NM all
security rules for data networks
Ensure that the underlying IP network supporting the VoIP systems possesses the adequate 9 NM NM NM all
bandwidth, reliability, survivability, and prioritization capabilities
Ensure that the underlying IP network is a switched network NM NM NM 9 all
Protection against MAC spoofing and MAC flooding

Use security measures like: static mapping between MAC addresses of terminals or critical NM 9 NM 9 M+H
VoIP components and switchports, 802.1x authentication …
Ethernet switchports are configured to only allow known MAC addresses OR Ethernet NM NM NM 9 M+H
switches only allow traffic that matches the IP address and MAC address assigned to a port
during a DHCP lease
Configure switches to alert administrators when MAC address tables overflow NM NM NM 9 M+H
Protection against ARP spoofing

Deactivate Gratuitous ARP in all VoIP end devices and VoIP servers NM 9 NM NM all
Deactivate Proxy ARP mechanism from router interfaces NM 9 NM NM all
Protection against DHCP attacks

To prevent DHCP starvation, configure the DHCP server with explicit users or ensure a NM 9 NM NM m+M
limit of DHCP requests on switchports which , if exceeded, provokes the shutdown of the
switch port
Protection against STP attacks
Use switches which allow the control of the origin of data packets or the control of the NM 9 NM NM all
forwarding of data packets
Anti-spoofing filter
Use ACL access lists to prevent that external attackers use IP addresses internal to the NM 9 NM 9 all
network, i.e. they send packets with an internal IP source address
Protection against Layer 3 attacks
Against IP spoofing, use of anti-spoofing filters in the routers and switch-based ACLs NM 9 NM 9 all
Against ICMP redirect, deactivate the processing of Redirect messages in VoIP components NM 9 NM NM all
like gateways, VoIP servers and IP phones
Against ICMP redirect, enable the ICMP Redirect filters in routers NM 9 NM NM all
Against IRDP spoofing, deactivate IRDP protocol on all VoIP components and activate NM 9 NM NM all
ACLs (ICMP-type 9) on router interface
Against route injection, ensure that IP terminals as well as unauthorized routers cannot NM 9 NM NM all
send protocol information like RIP2 or OSPF messages. Enable ACLs that control and
filter the source and destination address of routing protocol messages. Enable passwords
between routers.
Against Ping, SYN and LAND floods, use packet-filtering systems like firewalls, even if NM 9 NM NM all
these systems can also be vulnerable to these attacks
Protection of VoIP servers
Ensure that VoIP servers are dedicated to only applications required for VoIP (do not use 9 9 NM 9 all
servers for general Internet access, such as email and web browsing)
Ensure that critical VoIP servers have been secured according to the security guidelines of 9 9 NM 9 all
the operating systems on which they run
Choose to install VoIP servers on operating system versions which have strong security like NM 9 NM NM M+H
SINA-Linux or SE-Linux
Deactivate on VoIP servers all unnecessary network services NM 9 NM 9 all
Ensure that unnecessary functionality on VoIP servers is deactivated (e.g. tear-down of NM 9 NM NM all
calls…)
Ensure that the security of critical VoIP servers is evaluated and controlled according to NM 9 NM NM M+H
Common Criteria for Information Technology Security Evaluation (ISO/IEC 15408)
160
DISA
NIST
Level
NSA
Sec.
BSI
Ensure that software on VoIP servers and OS are up-to-date and that new available 9 9 NM 9 all
software patches can be downloaded and installed
Ensure that software patches for critical VoIP servers come from the system manufacturer 9 NM NM 9 all
and are applied in accordance with the manufacturer’s instructions
Require that software updates be cryptographically signed by the manufacturer NM NM NM 9 all
Ensure that all security alerts applicable to the general-purpose systems and applications 9 NM NM NM all
used by VoIP systems are referred to the system manufacturer for approval and patch
distribution
Install anti-virus software on servers and update virus definitions regularly NM NM NM 9 all
Limit user accounts on VoIP servers to only the administrators of the server NM NM NM 9 all
On VoIP servers, assign to user accounts only the privileges necessary for the user to NM NM NM 9 M+H
complete required tasks
Remove or disable unused default accounts on VoIP servers NM NM NM 9 all
Ensure that complex and hard to guess passwords are required for server user accounts NM NM NM 9 all
and that the number of failed login attempts is limited
Change all default passwords before connecting the VoIP servers to the network NM NM NM 9 all
Audit all default configuration settings before connecting the VoIP servers to the network NM NM NM 9 all
On VoIP servers, enable system logging and logging of call detail records (CDRs) NM NM NM 9 all
Regularly review logs for discrepancies NM NM NM 9 M+H
Send all logs to a hardened log server which should only accept log messages from NM NM NM 9 M+H
authorized servers
Ensure the use of a security channel to connect the VoIP and database servers, and limit NM NM NM 9 m
access to database server to IP addresses of the VoIP servers
Ensure the use of a dedicated communication channel such as separate physical network, to NM NM NM 9 M+H
connect VoIP and database servers
Physical security and protection against data loss of critical VoIP servers
Ensure all critical VoIP network and server components are located in physically secured 9 9 9 9 all
areas (in special controlled rooms like server rooms or network-wiring closets). This does
not apply to IP terminals
For high security, all network closets should be under video surveillance NM 9 NM 9 H
Ensure that the secured areas protecting VoIP network and server components are NM 9 NM 9 all
protected from power cut, floods or fire
Ensure that only trusted authorized personnel have access to VoIP network and server 9 9 NM 9 M
components (by using smartcards, one-time passwords or biometrics). Maintain log of
people entering room
Ensure that critical cabling in the enterprise premises should be protected from physical NM 9 NM NM all
access by concealing cables or by covering them with protection against fire
Ensure that fire suppression systems are installed in the server rooms NM NM NM 9 m+M
Ensure that fire suppression systems safe to electronic equipment are installed in the server NM NM NM 9 H
rooms
Install alarms on all entry points to the server room NM NM NM 9 H
Disable booting from removable media NM NM NM 9 M+H
Enable password protection of BIOS settings NM NM NM 9 M+H
Ensure that no water or sewage mains run through server rooms NM NM NM 9 all
Ensure that critical VoIP servers are protected from power cuts by short-term backup like NM 9 9 9 M+H
Uninterruptible Power Supply (UPS)
Ensure that critical VoIP servers are protected from power cuts by long-term backup like NM NM 9 9 H
electrical generators for long power outages
Ensure that all network equipment has configurations and software backed up regularly NM 9 NM 9 all
(for recovering from availability attacks)
Ensure that backup and recovery policy is in place NM NM NM 9 all
Ensure that backup and recovery processes have been tested NM NM NM 9 all
Ensure that full backups are performed weekly NM NM NM 9 all
Ensure that incremental backups are performed daily NM NM NM 9 M+H
Ensure that backups are periodically archived NM NM NM 9 M+H
161
NIST
Level
D DISA
BSA
Sec.
BSI
Ensure that backups are stored in an environmentally controlled and locked room NM NM NM 9 all
Ensure that backups are stored in an encrypted form NM 9 NM NM all
Ensure that backups are encrypted when not under physical control of the enterprise NM NM NM 9 all
For highest security, keep configuration and software backups off site NM NM NM 9 H
Ensure that VoIP servers use RAID disk mirroring and are configured with hot standbys NM NM NM 9 m
Ensure that servers use RAID 5 disk mirroring and striping, have redundant power NM NM NM 9 M+H
supplies, have ECC memory, are configured with hot standbys and that a spare server is
always available to replace a failed server
Redundancy of critical VoIP components
Create a redundant sub-network containing all redundant critical network and VoIP NM 9 NM NM M+H
components like DHCP server, TFTP server, FTP server, IP-PBXs, accounting servers,
Firewalls and VoIP media/signalling gateways.
Ensure that the VLAN with VoIP servers, DHCP server, gateway, IP-PBX, accounting NM 9 NM NM M+H
server and its redundant VLAN are protected by their own firewall and are connected to the
VoIP VLAN with IP terminals via two distinct switches which are not connected by a third
switch representing a single point of failure
Physical security of IP phones
Ensure that IP phones support Power-over-Ethernet (PoE) supply (PoE is standardized by NM 9 NM 9 all
802.3af)
Enable security mechanisms in all infrared (IrDA) applications NM NM NM 9 m
Disable IrDA port in IP phones’ configurations NM NM NM 9 M+H
Cover IrDA port on phones using metallic tape NM NM NM 9 H
Protection of IP phone software

Ensure that all unnecessary features and applications are disabled on IP phones NM 9 NM 9 all
Ensure that IP phones have been secured according to the security guidelines of the NM 9 NM 9 all
operating systems on which they run
Ensure that security updates are regularly applied NM NM NM 9 all
For IP phones with automatic firmware and configuration files download, ensure that IP NM 9 NM 9 all
phones download firmware updates after authentication. CRC check is not sufficient.
Authentication based on digital signature from hardened servers is recommended
Use redundant and physically dispersed servers for distributing firmware and configuration NM NM NM 9 M+H
files
Do not allow users to install software on their IP phones NM NM NM 9 H
Block direct Internet access from IP phones NM NM NM 9 all
Allow on IP phones only features that have been tested in laboratory environments NM NM NM 9 m
Limit applications on IP phones to those that have been tested in laboratories, NM NM NM 9 M

cryptographically signed by the enterprise and made available through an internal server
Ensure that embedded microphones have been physically disabled NM NM NM 9 M+H
Use dedicated phones which physically disconnect the microphone when the handset is in NM NM NM 9 M
the cradle
Use push-to-talk handsets or headsets NM NM NM 9 H
For high security areas, use IP phones which have evaluated with the Common Criteria for NM 9 NM NM H
Information Technology Security Evaluation (ISO/IEC 15408)
VoIP phones registration, access control and authentication
For large VoIP deployments, automatic registration of VoIP phones with the IP-PBX (VoIP 9 NM NM no m+M
servers) is allowed. However…
Ensure that automatic registration of VoIP terminals is disabled within 5 days following
initial system setup and/or following any subsequent large redeployments or additions
Disable automatic registration capabilities of IP phones with the IP-PBX (VoIP servers) no NM NM 9 all
Ensure that an inventory of authorized VoIP terminals is documented and maintained 9 NM NM NM m
Ensure that the VoIP server only registers authorized VoIP phones. This can be through an 9 NM NM NM m
automated authorization process during automatic registration or by comparing the
registration logs to the documented authorized inventory allowed to auto-register.
Ensure that DHCP is enabled for all phones with link layer authentication NM NM NM 9 M
Assign static IP addresses assigned to all phones NM NM NM 9 H
Assign static IP addresses to critical servers and phones NM NM NM 9 all
162
DISA
NIST
Level
NSA
BSI
Sec.
For high security, disable DHCP after initial deployment of IP phones NM NM NM 9 H
Enable DHCP but only with anti-spoof protections NM 9 NM 9 m+M
Ensure that a two-way authentication between IP phones and VoIP servers is performed NM NM NM 9 M+H
Require users to authenticate themselves to the phone before making calls NM NM NM 9 H
Ensure that 802.1X authentication is used by switchports to detect and prevent the NM 9 NM 9 H
intrusions of attackers in the voice VLAN. Use phones and switches which support 802.1X
authentication
Ensure that IP phones authenticate to server using a challenge-response protocol (e.g. NM NM NM 9 m
HTTP-digest) with long and complex passwords
Ensure mutual authentication between phones and VoIP servers is achieved by using strong NM NM NM 9 M+H
cryptography (e.g. public keys)
Ensure mutual authentication between phones NM NM NM 9 M+H
Data and voice network segregation

Ensure that the underlying data network supporting the VoIP system is configured using 9 9 9 9 all
VLANs and configure at least one voice VLAN and one data VLAN to segregate voice
traffic from data traffic
Ensure that the voice network is subdivided into multiple 9 9 NM 9 M+H
VLANs to segregate VoIP components by type and function.
Subdivide the voice VLAN into 5 VLANs at minimum: IP-PBXs, message servers (voice- 9 no NM no M+H
mail, email, unified)/converged services, VoIP gateways, IP phones, and computers with
softphones. In the extra VLANs suggested by DISA, there is a VoIP components
management VLAN
Subdivide the voice VLAN into 2 VLANs: “producing VLAN” with VoIP servers and VoIP 9 9 NM 9 M+H
gateways and a “consuming VLAN” with IP phones
Subdivide the voice VLAN into 5 VLANs: VoIP servers (IP-PBX, AAA servers, DHCP no no NM 9 M+H
servers…), VoIP gateways, IP phones, softphones, and administrative VLANs. Message
servers/convergence services like voicemail, email and unified servers are not placed in
their own VLAN but in a DMZ placed between the voice and the data VLANs. A new VLAN
is not necessary(if servers are placed in the same sub-network and not geographically
distributed)
Ensure that servers or devices that are to be accessed from both the voice and data 9 NM NM no m+M
networks like unified message servers or workstations with softphones reside in their own
protected VLANs.
Place mutually accessible servers (message servers, converged services…) in the DMZ of a 9 NM NM 9 m+M
dedicated stateful firewall placed between the voice and data networks per voice/data
network protection requirements
Install a stateful layer 3 &4 firewall between the data, voice and converged services 9 NM NM 9 m
networks
Install a firewall between the data , voice and converged services network which performs 9 NM NM 9 M
stateful layer 3&4 and application layer filtering
Use strong authentication with all converged services NM NM NM 9 m+M
For high security, do not allow IP phone interaction with non-telephony devices NM NM NM 9 H
Scan for malicious code in the DMZ between voice and data networks and on the firewall NM NM NM 9 M
between the networks
Ensure that all VoIP components are deployed on their own dedicated IP networks or sub- 9 NM 9 NM all
but but
networks that utilize separate IP address blocks from the normal data address blocks Y Y
Ensure that all local VoIP components are deployed using private address space(RFC1918) 9 NM 9 NM all
but but
Y Y
Ensure that when using DHCP for address assignment, different DHCP servers are used 9 NM 9 9 all
for voice components and data components and ensure that these servers will reside in their
respective voice or data address space.
Ensure that the voice network and the data network have separate DNS and NTP servers NM NM NM 9 all
For higher security, divide physically and not only logically the data and voice networks NM 9 NM 9 H
Ensure that the local network’s VLANs are implemented in accordance with the VLAN 9 9 NM 9 all
security rules
163
DISA
NIST
Level
NSA
Sec.
BSI
To eliminate he VLAN Hopping attack, use native VLANs or use of switches which support NM 9 NM NM all
VACLs (VLAN access Listen) to analyse Ethernet frames or configure ACLs on the
promiscuous port of routers
Enable port level security (to allow dynamic or static mapping of MAC addresses to VLAN 9 9 NM 9 all
ports) on all switches (for all security levels)
802.1x authentication should be used to authenticate devices in the voice VLAN 9 9 NM 9 H
VMPS (VLAN Management Policy server) should not be used to authenticate devices at the 9 NM NM NM all
access layer in the voice VLAN
Ensure that IP phones (that do not contain a multi-port switch), and servers providing voice 9 9 NM 9 all
services are connected to switchports with membership only to the voice VLANs
Ensure that data workstations (without approved softphones) are connected to switchports 9 9 NM 9 all
with membership only to the data VLANs
For highest security, ensure that all IP phones having a multi-port switch for connecting no NM NM 9 H
external devices such as a workstation have the data ports disabled
For any security level, ensure that all IP phones having a multi-port switch for connecting 9 no NM no all
to external devices which do not utilize 802.1Q Trunking to separate voice and data traffic
have their data ports disabled.
Ensure that, when allowed, all IP phones having a multi-port switch for connecting to 9 NM NM 9 m+M
external devices, support 802.1Q Trunking to separate voice and data traffic
For any security level, ensure that all IP phones with a multi-port switch have the data port no 9 NM no all
disabled
For any security level, ensure that all IP phones with a multi-port switch supporting VLANs 9 no NM no all
have the data port disabled if a PC is not normally attached
Ensure that all unused ports are disabled and are placed in an unused VLAN 9 NM NM NM all
Ensure that the maximum number of MAC addresses that can be dynamically configured 9 NM NM NM m
on a given switch port is limited to that which is required
IP softphones
Ensure that approval from the IT department is obtained prior to the use of any IP 9 NM NM NM all
softphone. The IT department will maintain documentation pertaining to such approval for
inspection by auditors
Ensure that devices on which IP softphones are installed comply with all security rules 9 NM NM NM all
related to OS, Applications…
Ensure a local IP softphone policy exists and is being enforced 9 NM NM NM all
IP softphone policy
For medium and high security, prohibit the installation and use of IP softphones on no NM no 9 M+H
workstations (fixed or portable) intended for day-to-day use in the users normal workspace
For any security level, allow the installation and use of IP softphones on workstations 9 NM 9 no all
(fixed or portable) intended for day-to-day use in the users normal workspace only
exceptionally
For minimum security level, you can accept the installation of softphones on workstations 9 NM 9 9 all
(fixed or portable) intended for day-to-day use in the users normal workspace if a special
VLAN dedicated to softphones have been previously created
Prohibit the use of IP softphones in the users normal workspace even if they have been 9 NM NM 9 M+H
approved and installed on a laptop for the purpose of VoIP communications while
travelling
Require prior justification and approval for the use of any IP softphone 9 NM NM NM m
Require that the justification and approval of IP softphones use is reviewed annually and 9 NM NM NM m
that approval renewed if justified
Prohibit the installation and use of IP softphones that are independently configured by end 9 NM NM NM M+H
users for personal use or that is provided by commercial VoIP service providers.
When the use of softphones in the enterprise LAN is approved, the host computer must 9 NM NM 9 m
contain a Network Interface Card (NIC) that is 802.1Q (VLAN tagging) and 802.1P
(Priority tagging) capable
When the use of softphones in the enterprise LAN is approved, the host computer, the NIC 9 NM NM 9 m
and the softphone must be configured to use separate 802.1Q VLAN tags for voice and
data.
164
DISA
NIST
NSA
level
BSI
Sec.
When the use of softphones in the enterprise LAN is approved, dual NICs may be used 9 NM NM NM m
where voice traffic is routed to one NIC and data traffic is routed to the other. Each NIC is
connected to an access switch port residing in the appropriate VLAN
When the use of softphones in the enterprise LAN is approved, the host computer will be 9 NM NM 9 m
connected to a separate data VLAN dedicated to hosts with IP softphones installed
When the use of softphones in remote places is approved, the host computer connects to the 9 NM NM NM m
enterprise “home LAN” through a Virtual Private Network (VPN) connection.
When the use of softphones in remote places is approved, the VPN is terminated at the 9 NM NM NM m
network boundary in accordance with the security rules for VPNs
When the use of softphones in remote places is approved, the voice and data traffic is 9 NM NM NM m
routed appropriately to separate voice and data
VLANs in the enterprise “home LAN”
When the use of softphones in remote places is approved, the IP softphone connects to the 9 NM NM NM m
IP-PBX on the enterprise “home LAN” through the VPN using private enterprise “home
LAN” IP addressing
VoIP network protection and internal traffic control
Implement a Network Intrusion Detection System (NIDS) and connect a sensor to a switch NM 9 9 9 M+H
port of every critical switch to filter and control traffic. These connections of sensors to
switches is made possible if switches have the Switched Port Analyzer (SPAN) functionality
Ensure that NIDS monitors both VoIP and date networks NM NM NM 9 M+H
Install an NIDS sensor in front of and behind the firewall protecting the enterprise network NM 9 NM NM M+H
from the Internet
For highest security, ensure that no network traffic between voice VLANs and data VLANs 9 NM NM 9 H
is allowed
Ensure that voice or data traffic between the data and voice VLANs is filtered and 9 9 NM 9 m+M
controlled by an appropriate firewall, so that only traffic which is planned and approved
between authorized devices using approved ports, protocols, and services
Data or voice traffic between the data and voice VLANs should be controlled by a layer 9 9 NM 9 m
3&4 stateful firewall; filtering traffic is not required at the application layer
Data or voice traffic between the data and voice VLANs should be controlled by an NM 9 NM 9 M
application firewall
Ensure that traffic between voice VLANs is filtered and controlled by a layer 3 9 9 NM 9 all
switch/router ACL or a layer 3&4 stateful firewall
Ensure that traffic between the VLAN containing mutually accessible servers or 9 NM NM 9 m
devices(hosts with softphones) to and from the voice VLANs or the data VLAN is filtered
and controlled by a layer 3&4 stateful firewall so that VoIP traffic is allowed from and to
the VoIP servers VLAN and the IP phones VLAN, and that data traffic is allowed to and
from the data VLAN. This firewall will block traffic between the voice and data VLANs
Protection from external calls made by employees of the enterprise (from home, other enterprise
branch or on the road)
Ensure that all calls to and out of the enterprise VoIP network are routed via a media 9 NM NM 9 all
gateway to the traditional TDM networks like DSN or PSTN. An exception is made for
approved remote VoIP terminals and softphones that connect to the enterprise VoIP
network via a VPN and are therefore part of the VoIP network
Ensure that the VPN protects the WAN link and supports the separation of voice and data NM NM NM 9 all
traffic either by supporting VLANs or creating individual VPNs for each network
Ensure that all mobile employees or employees from another site connect to the enterprise 9 9 NM 9 all
voice VLANs through a VPN and that the voice packets coming from outside the enterprise
pass through a firewall before they reach the voice VLANs
Ensure that interoffice VPNs respect and maintain the separation of voice and data traffic 9 9 NM 9 all
Keep logs (syslog) of all external connections to the enterprise voice VLANs NM 9 NM 9 all
Ensure that approval is obtained prior to the implementation of IP Trunking connections 9 NM NM NM m

from the enterprise VoIP network to the WAN. Documentation must be maintained about
approvals for inspection by auditors
Ensure that VoIP-aware firewalls are deployed in the enterprise VoIP network at the 9 9 NM NM all
boundary with a WAN so that WAN connections can provide VoIP call connectivity. Such
firewalls must employ stateful packet inspection and dynamic port mapping
165
DISA
NIST
Level
NSA
Sec.
BSI
Ensure that NAT is implemented at the VoIP network WAN connection point (to maintain 9 9 NM NM all
the private addresses scheme on the VoIP LAN)
Ensure all VoIP security perimeter firewalls are dedicated to VoIP traffic to reduce 9 NM NM NM all
transmission latency caused by firewall operations
Ensure MS-SQL (port 1433) is blocked at the VoIP network perimeter 9 NM NM NM all
Ensure the Network Time Protocol (NTP) (port 123) is blocked at the VoIP network 9 NM NM NM all
perimeter
Ensure Terminal Services or remote desktop protocol (port 9 NM NM NM all
3389) is blocked at the network perimeter or that these connections are encrypted.
Ensure that all remote HTTP access to the VoIP network perimeter firewalls is proxied. 9 NM NM NM all
HTTP access from the VoIP network, if required, should route through the data network.
Additionally HTTPS should be used in place of this if possible
Protection from external calls made by any external caller
For the control of external calls, ensure that traffic between VLANs is filtered and NM 9 9 NM all
controlled by a firewall in which an Application Level Gateway (ALG) is embedded. Use
preferably (in comparison with stateless and stateful firewalls) firewalls with ALG
For the control of external calls, ALG-based firewalls are not the only solution to adopt: NM 9 9 9 all
Session Border Controllers or other standards-based solutions can be used instead
Determine the number of incoming external calls the VoIP network can handle and still NM NM NM 9 M+H
adequately support internal calls. Use network perimeter devices to allocate bandwidth
only sufficient for that number of external calls (protection against DoS)
Limit the number of external calls accepted by the IP-PBX NM NM NM 9 M+H
Use routers to prioritize VoIP VLAN traffic over data VLAN traffic NM NM NM 9 M+H
Use anti-virus software and promptly apply software security policies NM NM NM 9 all
Call privacy and confidentiality: protection of media streams

Ensure that all VoIP traffic that is sent over WAN connections via an IP WAN network like 9 9 NM 9 all
the Internet… is encrypted via VPNs and that the VPNs respect and maintain the separation
of voice and data traffic
Ensure that, if VoIP terminals are not computationally powerful enough to perform NM NM 9 NM all
encryption, place this burden at a central point to ensure that all VoIP traffic emanating
from the enterprise network has been encrypted
For high confidentiality of internal voice traffic, ensure that VoIP terminals support the NM 9 NM 9 H
secure RTP (SRTP) protocol for the end-to-end encryption of voice traffic(RTP traffic)
Protection of signalling streams and protection from toll fraud
Ensure that all VoIP components, SIP terminals as well as SIP servers, support the TCP NM 9 NM 9 M+H
and TLS protocols as recommended by RFC 3261and the sips addressing scheme
Configuration/Management of VoIP components
Ensure that all VoIP systems are managed in accordance with the security rules for system 9 NM NM 9 all
management
For VoIP servers (IP-PBXs)
Ensure all remote administrative connections (in-band or out-of-band) to critical VoIP 9 9 9 9 all
servers are encrypted (use SSH instead of Telnet or FTP or use IPsec VPNs)and that Telnet
is disabled
For high security, avoid using remotely Web-based configuration interface at all. Only NM NM 9 9 H
allow access to the Web-based configuration interface from the IP-PBX (VoIP servers)
itself. Disable remote access.
If practical, avoid using remotely Web-based configuration interface at all NM NM 9 NM all
Limit access to Web-based configuration interfaces to the IP addresses of administrator NM NM NM 9 m

workstations
Isolate web interfaces on a separate administrative network NM NM NM 9 M
Ensure that the remote Web-based configurations of network and VoIP systems are made NM 9 NM 9 m+M
by using the TLS protocol and the https addressing scheme
Disable web browser password caching features unless the passwords are strongly NM NM NM 9 all
encrypted
Regularly apply security updates to the web server and web applications NM NM NM 9 all
166
DISA
NIST
Level
NSA
Sec.
BSI
Ensure VoIP firewall administrative/management traffic is blocked at the perimeter, or is 9 NM NM 9 M
tunnelled and encrypted using VPN technology at the network perimeter(e.g. for vendor’s
remote configuration), or is out-of-band
Do not allow remote management of VoIP servers by the vendor NM NM NM 9 H
Use a dedicated connection , such as a PSTN or ISDN line, for vendor access to VoIP NM NM NM 9 m+M
servers and encrypt traffic over the dedicated connection
Physically disable the vendor’s connection when not in use NM NM NM 9 m+M
Require that the administrator initiate the connection to the vendor NM NM NM 9 M
Monitor and log all actions performed by the vendor on the server NM NM NM 9 m+M
For IP phones
The configuration of IP phone at the terminal should be password-protected 9 9 NM NM m
The configuration of IP phone at the terminal should be rarely used even if password- no 9 NM no m
protected; if not necessary, configuration at the terminal should be deactivated
The web interface for IP phone configuration should be deactivated if users can access NM no NM 9 H
necessary phone features through the phone’s display and administrators can configure
phone settings using downloaded configuration files
If the configuration through the web interface is necessary (more features), access to the NM NM NM 9 H
web interface should be limited to physical ports on the IP phone. That is access to the web
interface should only be available through a direct connection to the user’s computer which
cans directly connect to the IP phone using the phone’s PC port or USB
For the Web-based management of IP phones through access to the Web server integrated NM NM NM 9 m
in the IP phones, use router or switch ACLs to limit access to the phone’s web interface to
authorized IP addresses only
For Web-based management, ensure that HTTP-Digest password authentication NM 9 NM 9 m
(authentication with the integrated Web server) is required
Ensure that a policy is in place to ensure that the IP phone configuration and display 9 NM NM NM all
password respects the password policies (e.g. password complexity, expiration, reuse,
protection and storage)
For Web-based management, if the IP-PBX has the option perform all phone configuration NM NM NM 9 M+H
on a central server and have the phone automatically download new signed configuration
files, disable remote access to the phone’s web interface
For Web-based management, ensure that the use of strong passwords is required and that NM 9 NM 9 M
the remote connection to the IP phone is encrypted with TLS using the https addressing
scheme
For Web-based management and for high security requirements, the use of client NM 9 NM NM H
certificates could be used for the authentication of authorized users
Automatic IP phone configuration through the “pull” of configuration file from TFTP NM 9 NM NM m
servers should not be adopted and should be deactivated
Automatic IP phone configuration through the “pull” of configuration file from https NM 9 NM NM M
servers should require the authentication of the https server with a certificate which could
be validated by the IP phone before the download of the configuration file
Ensure that SNMP v3 is used to manage the phone settings NM NM NM 9 m+M
Disable SNMP or set read-only if SNMP is necessary NM NM NM 9 H
Use router or switch ACLs to limit access to the phone’s SNMP port to authorized IP NM NM NM 9 all
addresses only
Voicemail services
Ensure text-to-speech is disabled if the voice mail platform is configured to interact with a 9 NM NM NM all
legacy corporate email system and both systems are not collocated in the same or adjoining
VLANs as required under the VLAN section above.
Ensure that operating system on which the Voice Mail Server is running is properly 9 NM NM NM all
secured
Ensure the application services (SQL, Apache, Oracle, etc.) supporting the voice mail 9 NM NM NM all
service are properly secured
Ensure that users can only change their voice mail settings via the phone interface or 9 NM NM NM all
through a SSL connection. HTTP and Telnet services will be disabled on the voice mail
platform
167
DISA
NIST
Level
NSA
BSI
Sec.
Wireless VoIP
Ensure that if wireless VoIP is used, the security requirements specified in WLAN security 9 NM NM 9 all
policy for wireless connections must be applied to the wireless VoIP environment
Ensure that wireless VoIP terminals implement WiFi Protected Access (WPA) rather than NM 9 9 NM all
802.11 Wired Equivalent Privacy (WEP)
Disable WLAN access points built into phones NM NM NM 9 M+H
Disable wireless PAN (Personal Area Network) connectivity to IP phones NM NM NM 9 M+H
Ensure that approval is obtained prior to the implementation of VoIP over WLAN. 9 NM NM NM m
Documentation about approvals should be maintained for inspection by auditors
Voice gateways interfacing with PSTN
Use strong authentication and/or access control on the voice gateway system. VoIP NM NM 9 9 all
gateways require client authentication before completing calls
Ensure that VoIP gateways are placed on a separate VLAN and that signalling traffic is 9 NM NM 9 M+H
only accepted into the VLAN from authorized servers
Ensure that all PSTN signalling messages are validated and terminated at the gateway NM NM NM 9 all
E-911 services
Give special consideration to E-911 emergency services communications and ensure that NM NM 9 9 M+H
VoIP servers support the E-911 automatic location service. VoIP servers should maintain
phone location information and location should be updated manually (thus phone location
must be static)
Ensure that IP phones, once they have been activated with a password, allow E-911 service NM 9 9 NM M+H
Provide each office area with an emergency phone connected to the PSTN NM NM NM 9 m
VoIP archiving
Carefully review statutory requirements regarding privacy and record retention with NM NM 9 NM all
competent legal advisors. Archive and store call records for legal purposes
168
References
[1] Stoneburner G., Goguen A., Feringa A. – Risk Management Guide for Information Technology Systems (Special
Publication 800-30) – National Institute of Standards and Technology (NIST) – Jun. 2002
http://csrc.nist.gov/publications/nistpubs/800-30/sp800-30.pdf
[2] Bellovin S. – Security Problems in the TCP/IP Protocol Suite – ACM Computer Communications Review –
Volume 19, Number 2, pp. 32-38 – Apr. 1989
http://www.cs.columbia.edu/~smb/papers/ipext.pdf
[3] Wikipedia – article Private Branch Exchange
http://en.wikipedia.org/wiki/Private_branch_exchange
[4] Microsoft TechNet – Enterprise Design for Remote Access – Mar. 2005
http://www.microsoft.com/technet/itsolutions/wssra/raguide/RemoteAccessServices/igrabp_2.mspx?mfr=true
[5] Avaya – Enterprise SIP Trunking – white paper – Jun. 2005
http://www.avaya.com/master-usa/en-us/resource/assets/whitepapers/lb2749.pdf
[6] Cesmo Consulting & France Telecom – Téléphonie sur IP, Livre Blanc – Jun. 2004
http://www.itu.int/ITU-D/afr/events/Dakar-
2006_Regulatory_Challenges_of_VoIP_Africa/Additional_Reading/livre_blanc_toip.pdf
[7] Faltstrom P., Mealling M. – The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation Discovery System
(DDDS) Application (ENUM) – IETF RFC 3761 – Apr. 2004 – http://www.ietf.org/rfc/rfc3761.txt
[8] Peterson J. – enumservice registration for Session Initiation Protocol (SIP) Addresses-of-Record – IETF RFC 3764 –
Apr. 2004 – http://www.ietf.org/rfc/rfc3764.txt
[9] Johnston A., Donovan S., Cunningham C., Summers K. – Session Initiation Protocol (SIP) Basic Call Flow
Examples – IETF RFC 3665 – Dec. 2003 – http://www.ietf.org/rfc/rfc3665.txt
[10] SIP Center – SIP Client Software – http://www.sipcenter.com/sip.nsf/html/SIP+Client+Software
[11] voip-info.org – Analog Telephone Adapters –
http://www.voip-info.org/wiki/view/Analog+Telephone+Adapters
[12] Cisco Systems – SIP Call-Flow Process for the Cisco VoIP Infrastructure Solution for SIP –
http://www.cisco.com/univercd/cc/td/doc/product/voice/sipsols/biggulp/bgsipcf.htm
[13] Chang S. – IP-PBX and enterprise VoIP solution using SIP – CCL/ITRI –
http://phoenix.labri.fr/documentation/sip/Documentation/Papers/SIP/Presentation/IP-PBX.pdf
[14] Williams W., Dolsky K. – Hosted versus Premises IP-Telephony – Business Communications Review – Nov. 2005
http://www.broadsoft.com/pdf/BCR_Article_November_pdf1.pdf
[15] Brandstadter J. – Hosted services: It’s not your father’s Centrex - ACUTA Journal – Winter 2003
http://www.broadsoft.com/pdf/Not_Your_Fathers_Centrex_BroadWorks_Hosted_Services.pdf
[16] Basart E. – Don’t go with hosted VoIP rather than an IP-PBX – NetworkWorld – Jul. 2006
http://www.networkworld.com/columnists/2006/071706-hosted-voip-no.html
[17] Cisco Systems – Distributed managed IP Telephony services for small and medium sized businesses – white paper – 2004
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns458/c654/cdccont_0900aecd801ba5cc.pdf
[18] Ludwick T. – Why customers prefer hosted VoIP – VoIPNews – May 2006
http://www.voip-news.com/news/why-customers-may-prefer-hosted-voip-050306/
[19] Stredicke C. – IP PBX vs. hosted VoIP? – Converge! Network Digest – Jul. 2005
http://www.convergedigest.com/blueprint/ttp04/bp1.asp?ID=216&ctgy=4%3FID=216&ctgy=4?topic=299072
[20] Long T. – Eavesdropping an IP-Telephony call - SANS Institute – Feb. 2003
http://www.sans.org/reading_room/whitepapers/telephone/318.php?portal=1d2ce5a16dfab285908c81f815b9945f
[21] VoIP News – IP-PBX Buyer’s guide – 2006
http://www.voip-news.com/whitepaper/voip-ip-pbx-buyers-guide/
[22] SANS Institute – SANS Top 20 Internet Security Attack Targets (2006 Annual Update) – Nov. 2006
http://www.sans.org/top20/
[23] Trunk R. – IP Telephony Troubleshooting – Netcraftsmen – 2005
http://www.netcraftsmen.net/rtrunk/Troubleshooting%20IP%20telephony%20Problems.pdf
[24] Zmora A. – Bringing Telephony Features into SIP Networks with Back To Back User Agent – SIP Center –
http://www.sipcenter.com/sip.nsf/html/Bringing+Telephony+Features+into+SIP+Networks+with+Back+To+Back
+User+Agent
[25] Ohl K. – Ready, Set, Assess: Preparing for a VoIP Migration – Lucent Technologies – Nov. 2005
http://www.lucent.com/knowledge/documentdetail/0,1983,inContentId+090094038009c70f-inLocaleId+1,00.html
[26] Spencer M. , Capouch B., Guy E., Miller F., Shumard K. – IAX2: Inter-Asterisk eXchange Version 2 draft-guy-
iax-02 – Internet Draft – October 2006 – http://www.ietf.org/internet-drafts/draft-guy-iax-02.txt
169
[27] voip-info.org – Asterisk SIP not-proxy – article
http://www.voip-info.org/wiki/index.php?page=Asterisk+SIP+not-proxy
[28] Serrano S. – Asterisk en español – Astricon Europe 2005 – 2005
http://www.asterisk-es.org/modules/sections/index.php?op=viewarticle&artid=11
[29] O’Boyle S., Caron J. - IP Centrex and IP PBX: is it either/Or - or both? – Advisory Report of CurrentAnalysis –
Aug. 2005 - http://www.currentanalysis.com/r/2005/reports/files/CIR_14379.pdf
[30] Koster R. – Own your LAN with ARP Poison Routing – InfosecWriters – Apr. 2006
http://www.infosecwriters.com/text_resources/pdf/Arp_Rkoster.pdf#search=%22voip%20arp%20spoofing%20%22
[31] Wallingford T. – VoIP Hacks – O’Reilly Media – Dec. 2005
(excerpt) http://www.macvoip.com/resources/voip_record_calls.php
[32] Montoro M. – Introduction to Arp Poison Routing – Jun. 2001 – http://www.oxid.it/downloads/apr-intro.swf
[33] Cisco Systems – Managed VPN: comparison of MPLS, IPsec, and SSL architectures – white paper
http://www.cisco.com/en/US/netsol/ns341/ns121/ns193/networking_solutions_white_paper0900aecd801b1b0f.shtml
[34] Rosenberg J., Schulzrinne H., Camarillo G., Johnston A., Peterson J., Sparks R., Handley M., Schooler E. –
SIP: Session Initiation Protocol – IETF RFC 3261 - Jun. 2002 - http://www.ietf.org/rfc/rfc3261.txt?number=3261
[35] International Telecommunication Union (ITU) – ITU-T Recommendation H.323, version 5 – Jun. 2006
[36] Terena – IP Telephony Cookbook – Terena Report – Mar. 2004
http://www.terena.nl/activities/iptel/contents1.html
[37] International Engineering Consortium (IEC) – H.323 Tutorial – http://www.iec.org/online/tutorials/h323/
[38] Dalgic I., Fang H. – Comparison of H.323 and SIP for IP Telephony Signaling – 1999
http://www.cs.columbia.edu/~hgs/papers/others/1999/Dalg9909_Comparison.pdf
[39] Packetizer – H.323 versus SIP: a comparison - http://www.packetizer.com/voip/h323_vs_sip/
[40] Schulzrinne H., Rosenberg J. – A comparison of SIP and H.323 for Internet Telephony – Jul. 1998
http://www.cs.columbia.edu/IRT/papers/Schu9807_Comparison.pdf
[41] SIP Center – SIP and H.323 - Article
http://www.sipcenter.com/sip.nsf/html/SIP+and+H.323
[42] Schulzrinne H., Casner S., Frederick R. - RTP: A Transport Protocol for Real-Time Applications - IETF RFC 3550
– Jul. 2003 - http://www.ietf.org/rfc/rfc3550.txt
[43] Rupp S., Siegmund G., Lautenschlager W. – SIP Multimedia Dienste im Internet - 2002- dpunkt. Verlag
[44] Multimedia over IP (RTP, RTCP, SI, RSTP) – Department of Computer Science of the San Diego State University
http://medusa.sdsu.edu/network/CS596/Lectures/ch28_RT.pdf
[45] Mattila J. – Real-Time Transport Protocol – University of Helsinki – Oct. 2003
http://www.cs.helsinki.fi/u/jmanner/Courses/seminar-papers/rtp.pdf
[46] Singh K., Schulzrinne H. - Unified messaging using SIP and RTSP - IP Telecom Services Workshop - Sept. 2000
http://www1.cs.columbia.edu/~kns10/publication/vmail.pdf
[47] Schulzrinne H., Rao A., Lanphier R.- Real Time Streaming Protocol (RTSP) - IETF RFC 2326 - Apr. 1998 -
http://tools.ietf.org/html/rfc2326
[48] Faltstrom P., Mealling M. - The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation Discovery System
(DDDS) Application (ENUM) – IETF RFC 3761 – Apr. 2004
http://www.ietf.org/rfc/rfc3761.txt
[49] Gulbrandsen A., Vixie P., Esibov L. - A DNS RR for specifying the location of services (DNS SRV) – IETF RFC
2782 – Feb. 2000
[50] Rosenberg J., Schulzrinne H. - Session Initiation Protocol (SIP): Locating SIP Servers – IETF RFC 3263 – Jun.
2002 - http://www.ietf.org/rfc/rfc3263.txt
[51] Rosenberg J. - A Presence Event Package for the Session Initiation Protocol (SIP) - IETF RFC 3856 - Aug. 2004
[52] Olson S., Camarillo G., Roach A. B. - Support for IPv6 in Session Description Protocol (SDP) - IETF RFC 3266 –
Jun. 2002 - http://www.ietf.org/rfc/rfc3266.txt
[53] Handley A., Perkins C., Whelan E. - Session Announcement Protocol – IETF RFC 2974 – Oct. 2000
[54] Wu T. – MPLS VPNs: Layer 2 or layer 3? Understanding the choice – Riverstone Networks white paper -
http://www.riverstonenet.com/solutions/mpls_vpns_layer2_or_layer3.shtml
[55] Wikipedia – article List of Codecs - http://en.wikipedia.org/wiki/List_of_codecs
[56] Schulzrinne H., Wedlund E. – Application-layer mobility using SIP - Mobile Computing and Communications Review
– Jul. 2000 - http://www.cs.columbia.edu/IRT/papers/Schu0007_Application.pdf
[57] http://www.hep.ucl.ac.uk/~ytl/qos/index.html
170
[58] Bundesamt für Sicherheit in der Informationstechnik (BSI) – VoIPsec, Studie zur Sicherheit von Voice over Internet
Protocol – Oct.2005 - http://www.bsi.de/literat/studien/VoIP/index.htm
[59] Kuhn, D.R., Walsh, T.J., Fries S. – Security Considerations for Voice Over IP Systems - National Institute of
Standards – Jan. 2005 - http://csrc.nist.gov/publications/nistpubs/800-58/SP800-58-final.pdf
[60] Snyder J. – Test shows VoIP call quality can improve with SSL VPN links – Article in NetworkWorld - Feb. 2006
http://www.networkworld.com/reviews/2006/022006-ssl-voip-test.html
[61] Voice Over IP Security Alliance (VOIPSA) - VoIP Security and Privacy Threat Taxonomy – Oct. 2005
http://www.voipsa.org/Activities/VOIPSA_Threat_Taxonomy_0.1.pdf
[62] National Security Agency (NSA) – Security Guidance for Deploying IP Telephony Systems – Feb. 2006
http://www.nsa.gov/snac/voip/I332-016R-2005.PDF
[63] Defense Information Systems Agency (DISA) – Internet Protocol Telephony & Voice over Internet Protocol, security
technical implementation guide (version 2) – Apr. 2006 - iase.disa.mil/stigs/stig/voip_stig_v1r1.pdf
[64] Conroy J., Zaatari M. – Voice over Internet Protocol Corporate Vulnerabilities: facts without fears – The
Telecommunications Review – 2006 - www.mitretek.org/Paper_09_TR2006.pdf
[65] Thermos P. – Examining two well-known attacks on VoIP – VoIPonder – Apr. 2006
http://www.voiponder.com/posts/examining_two_well_known_attacks_on_voip/
[66] Maret L. – Etude de cas: VoIP/SIP & Sécurité – Master thesis (EIVD) – Apr. 2005
www.iict.ch/Tcom/Projets/VoIP/Maret.pdf
[67] Materna B. – A proactive approach to VoIP security – VoIPshield white paper – Apr. 2006
http://www.voipshield.com/index.php?option=com_content&task=view&id=27&Itemid=47
[68] Mihai A.-S. – Voice over IP security, A layered approach – XMCO partners white paper - Mar. 2006
www.xmcopartners.com/whitepapers/voip-security-layered-approach.pdf
[69] Roberts C. – Voice over IP Security – Centre for Critical Infrastructure Protection (New-Zealand) – May 2005
www.ccip.govt.nz/ccip-publications/ccip-reports/voice_over_ip_security.pdf
[70] AT&T - VoIP security: what are the risks and solution? – point of view – Oct. 2005
whitepapers.techrepublic.com.com/abstract.aspx?docid=170713&promo=300111
[71] Miyachi T. – Univerge VoIP security Best practice (vol. I), principles of VoIP security – NEC - 2006
www.necunifiedsolutions.com/Downloads/WhitePapers/NEC_VoIP_SecurityBestPractice_Vol_1_WhPpr.pdf
[72] Miyachi T., Serada T. – Univerge VoIP security Best practice (vol. II), principles of VoIP security – NEC - 2006
www.necunifiedsolutions.com/Downloads/WhitePapers/NEC_VoIP_SecurityBestPractice_Vol_2_WhPpr.pdf
[73] Schneier B. – Modeling security threats - Dr. Dobb's Journal – Dec. 1999
http://www.schneier.com/paper-attacktrees-ddj-ft.html
[74] Collier M. – VOIPSA Threat Taxonomy Summary –VoIP Magazine Online – Jan. 2006
http://www.voip-magazine.com/content/view/1373/
[75] Endler D. – Hackers crack Net phone provider for gain – Voice of VOIPSA - Jun. 2006
http://voipsa.org/blog/2006/06/07/hacker-cracks-net-phone-providers-for-gain/
[76] International Engineering Consortium (IEC) - Including VoIP over WLAN in a Seamless Next-Generation Wireless
Environment - http://www.iec.org/online/tutorials/ti_voip_wlan/
[77] Ridley J. – VoWIFi: calling for mobility – Converge! Network Digest – Oct. 2005
http://www.convergedigest.com/bp-c2p/bp1.asp?ID=262&ctgy=
[78] Collier M. - Basic vulnerability issues in SIP – VoIP Magazine Online – Mar. 2005
[79] Sisalem D. and others - Towards a Secure and Reliable VoIP Infrastructure – SNOCER Project – May 2005
http://www.snocer.org/Paper/COOP-005892-SNOCER-D2-1.pdf
[80] Collier M. - VoIP vulnerabilities: Denial of Service – VoIP Magazine Online – Aug. 2005
[81] Collier M. – VoIP vulnerabilities: Registration hijacking – VoIP Magazine Online – Jun. 2005
[82] Collier M. - IP security in a hosted environment – VoIP Magazine Online – Dec. 2005
[83] Rosenberg J. - The Session Initiation Protocol (SIP) UPDATE Method – IETF RFC 3311 – Sep. 2002
[84] Reed H. – Hacking mobile voicemail with Asterisk and caller ID spoofing – article – Sep. 2006
http://www.nata2.org/2006/09/24/hacking-voicemail-with-asterisk-and-caller-id-spoofing/
[85] Collier M. – SIP Vulnerability : Registration hijacking – white paper – Jan. 2005
download.securelogix.com/librarydownload.htm?downloadfilename=Registration_hijacking_060105.pdf
[86] Baugher M., McGrew D., Naslund M., Carrara E., Norrman K. - The Secure Real-time Transport Protocol (SRTP)
- IETF RFC 3711 – Mar. 2004 - http://www.ietf.org/rfc/rfc3711.txt
171
[87] Jiang W. - Lightweight secure SIP model for end-to-end communication - Proc. of the 10th International Symposium
on Broadcasting Technology (ISBT 2005) – 2005 - security.riit.tsinghua.edu.cn/share/ISBT_SIP.pdf
[88] Steffen A., Kaufmann D., Stricker A. - SIP Security – Zürcher Hochschule Winterthur – 2004
[89] Andreasen F., Baugher M., Wing D. - Session Description Protocol (SDP) Security Descriptions for Media Streams –
IETF RFC 4568 – Jul. 2006 - http://www.ietf.org/rfc/rfc4568.txt
[90] Shao W. - Key Management for VoIP Media Encryption - 3rd Annual VoIP security workshop Berlin 2006
[91] Arrko J., Carrara E., Lindholm F., Naslund M, Norrman K. – MIKEY: Multimedia Internet Keying – IETF RFC
3830 – Aug. 2004 - http://www.ietf.org/rfc/rfc3830.txt
[92] Zimmermann P. – ZRTP: Extensions to RTP for Diffie-Hellmann Key Agreement for SRTP – Internet draft expired
in Sept. 2006 - http://zfoneproject.com/docs/draft-zimmermann-avt-zrtp-01.html
[93] Sotillo S. – Zfone: a new approach for securing VoIP communication – Spring 2006
http://www.infosecwriters.com/text_resources/pdf/Zfone_SSotillo.pdf
[94] Stredicke C. – Securing VoIP media communication – 3rd Annual VoIP security workshop Berlin 2006 -
http://old.iptel.org/voipsecurity/doc/17%20-%20Stredicke%20%20Securing%20VoIP%20Media%20Communication.pdf
[95] Orrblad J. – Alternatives to MIKEY/SRTP to secure VoIP – Master Thesis – March 2005
www.minisip.org/publications/Thesis_Orrblad_050330.pdf
[96] Rebahi Y., Sisalem D. – SIP service providers and the SPAM problem - Voice over IP
Security Workshop Proceedings, Washington, USA - June 2005
www.snocer.org/Paper/SIP%20Service%20Providers%20and%20The%20Spam%20Problem_rebahi.pdf
[97] Mathieu B., Loudier Q., Gourhant Y., Bougant F., Osty M. – SPIT Mitigation by a network level anti-spit entity -
3rd Annual VoIP security workshop Berlin 2006
[98] Yan H. and others – Incorporating active fingerprinting into SPIT prevention systems - 3rd Annual VoIP security
workshop Berlin 2006
[99] Hansen M., Hansen M., Möller J.– Abwehr von Spam over Internet Telephony (SPIT-AL) – white paper TNG The
Net Generation – Jan. 2006 - http://www.spit-filter.com/Whitepaper_SPITAL_20060310.pdf
[100] Niccolini S. - SPIT prevention: state of the art and research challenges – VoIP security seminar NEC - 2006
http://www.iptel.org/voipsecurity/doc/07%20-%20Niccolini%20-
%20SPIT%20prevention%20state%20of%20the%20art%20and%20research%20challenges.pdf
[101] Rosenberg J., Jennings C., Peterson J. - The Session Initiation Protocol (SIP) and Spam (draft-rosenberg-sipping-spam-
01) – Internet Draft expired: Apr. 2005 - http://www1.tools.ietf.org/html/draft-rosenberg-sipping-spam-01
[102] Rosen E., Viswanathan A., CallonR. – Multiprotocol Label Switching architecture – IETF RFC 3031 – Jan. 2001 -
[103] Newport Networks – NAT Traversal for Multimedia over IP – 2006
http://www.newport-networks.com/cust-docs/33-NAT-Traversal.pdf
[104] Huston G. – Anatomy: a look inside Network Address Translators - Cisco Systems – The Internet Protocol
Journal, Volume 7, number 3 – Sep. 2004
http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_7-3/anatomy.html
[105] Mächler P. – SIP architecture with NAT v1.0 – Siemens white paper – 2004 - http://www.mysip.ch/
[106] Collier M. – Voice over IP and Firewalls – VoIP Magazine Online – May 2005
http://www.voip-magazine.com/index.php?option=com_content&task=view&id=42&Itemid=52
[107] InteropLabs – SIP and Firewalls – May 2006
http://www.interop.com/lasvegas/exhibition/interoplabs/voip/sipsecurity.pdf
[108] NEC Europe Network Laboratories – MIDCOM, Middlebox Traversal – Jul. 2005
http://www.ccrle.nec.de/Projects/midcom.htm
[109] Rosenberg J. - Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT)
Traversal for the Session Initiation Protocol (SIP) (draft-rosenberg-sipping-ice-01) – Internet Draft expired: Dec. 2003 -
http://www.jdrosen.net/papers/draft-rosenberg-sipping-ice-01.html
[110] Greenway J. – The evolution of Session Border Controllers – Converge! Network Digest - Aug. 2004 -
http://www.convergedigest.com/blueprint/ttp04/z2kagoor3.asp?ID=96&ctgy=2
[111] Qiu Q. – Study of Digest Authentication for Session Initiation Protocol (SIP) – Master Thesis – Dec. 2003
http://www.site.uottawa.ca/~bob/gradstudents/DigestAuthenticationReport.pdf
[112] Durlanik A., Sogukpinar I. – SIP Authentication Scheme using ECDH – Enformatika – 2005
http://www.enformatika.org/data/v8/v8-68.pdf
[113] Byerly B., Williams D. – SIP Authentication using CHAP- password – Internet Draft – expired: Mar. 2001
http://old.iptel.org/info/players/ietf/aaa/draft-byerly-sip-radius-00.txt
[114] Niemi A., Torvinen V., Arrko J. - EAP Authentication for SIP (draft-torvinen-http-eap-01.txt) – presentation – Jan.
2001 - http://www3.ietf.org/proceedings/01dec/slides/sip-5/index.htm
172
[115] Niemi A., Torvinen V., Arrko J. – EAP Authentication for SIP & HTTP, AAA considerations – 2001
http://www3.ietf.org/proceedings/01aug/slides/aaa-2/sld007.htm
[116] Loughney J., Camarillo G. - Authentication, Authorization, and Accounting Requirements for the Session Initiation
Protocol (SIP) – IETF RFC 3702 – Feb. 2004 - http://www.ietf.org/rfc/rfc3702.txt
[117] Garcia Martin M., Belinchon M., Pallares Lopez M, Canales C., Tammi K. – Diameter Session Initiation Protocol
(SIP) Application (draft-ietf-aaa-diameter-sip-app-11) – Internet Draft expired: Aug. 2006
http://bgp.potaroo.net/ietf/all-ids/draft-ietf-aaa-diameter-sip-app-11.txt
[118] Wireless Research - http://www.cs.umd.edu/~waa/wireless.html
[119] ManageEngine - What are Rogue Access Points? – article
http://manageengine.adventnet.com/products/wifi-manager/rogue-access-point.html
[120] Nortel Networks - VoIP VPN: the lost cost, low risk migration path to convergence – white paper -
http://www.nortel.com/products/01/succession/cs/services/vvpn/collateral/nn_voipvpn_020604.pdf
[121] Cisco Systems – Virtual LAN Security Best Practices- white paper - 2002
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09186a008013159f.shtml
[122] Cisco Systems – Understanding Personal Directory - 2002
http://www.cisco.com/univercd/cc/td/doc/product/voice/serv_fea/config/intro.pdf
[123] Collier M. – Session Initiation Protocol (SIP) Vulnerabilities- IPCOMM 2006 – Sep. 2006
http://www.hackingvoip.com/presentations/IPCOMM_SIP.pdf
[124] Btnaccess – Managing security issues in the hosted IP Telephony environment – white paper – 2005
http://www.btnaccess.com/us/about/papers/pdfs/btnaccess_IP_Telephony_Security_White_Paper_3-14-2005.pdf
[125] Van Sickle R. – VoIP Virtual Private Networks: Bringing the benefits of convergence to the enterprise –
Telecommunications Magazine – Sep. 2002
http://www.tspt.net.et/documentation/Final_07_02_Voice_VPN_Bob.pdf
173
Abbreviations
AAA Authentication, Authorisation, Accounting PBX Private Exchange Branch
ACL Access Control List PDA Personal Digital Assistant
AES Advanced Encryption Standard PDU Protocol Data Unit
AH Authentication header PKI Public-Key Infrastructure
ALG Application Level Gateway PoE Power over Ethernet
AOR Address Of Record PRI Primary Rate Interface
AP Access Point PSTN Public-switched Telephone Network
APR ARP Poison Routing PUA Presence user Agent
ARP Address Resolution Protocol QoS Quality of Service
ATA Analogue Telephone Adaptor RADIUS Remote Authentication Dial In User Service
B2BUA Back-to-Back User Agent RFC Request for Comment
BRI Basic Rate Interface ROI Return On Invest
CDP Cisco Discovery Protocol RSVP Resource Reservation Protocol
CHAP Challenge Handshake Authentication RTP Real-Time Transport Protocol
Protocol RTCP Real-Time Control Protocol
CIA Confidentiality, Integrity, Availability RTSP Real-Time streaming Protocol
CPE Customer Premises Equipment SA Security Association
CPL Call Processing Language SAP Session Announcement Protocol
CVS Concurrent Versions System SBC Session border Controller
DHCP Dynamic Host Configuration Protocol SCCP Skinny Client Control Protocol
DMZ Demilitarized Zone SDES SDP Descriptions
DNS Domain Name System SDP Session Description Protocol
DoS Denial of Service SER SIP Express Router
DSL Digital Subscriber Line SHA1 Secure Hash Algorithm 1
EAP Extensible Authentication Protocol SIP Session Initiation Protocol
ENUM Telephone Number Mapping SIPS SIP security
ESP Encapsulating Security Payload SLA Service Level Agreement
FMC Fixed mobile Convergence SME Small and Medium size enterprises
FQDN Fully Qualified Domain Name SPAN Switched Port analyzer
FXO Foreign Exchange Office SPIT Spam over IP Telephony
FXS Foreign Exchange Subscriber SRTCP Secure RTCP
GRUU Globally Routable UA URIs SRTP Secure RTP
HMAC Hash Message Authentication Code SSH Secure Shell
IAX Inter-Asterisk Exchange Protocol SSID Service Set Identifier
ICE Interactive Connectivity Establishment SSL Secure Sockets Layer
ICMP Internet Control Message Protocol STUN Simple Traversal of UDP Through network
IETF Internet Engineering Task Force address translators
IKE Internet Key Exchange SS7 Signalling System 7
IMS IP Multimedia Subsystem SWOT Strengths, Weaknesses, Opportunities, and Threats
IP Internet Protocol TCP Transmission Control Protocol
IP-PBX Internet Protocol Private Exchange Branch TDM Time Division Multiplexing
IPsec Internet Protocol Security TFTP Trivial File Transfer Protocol
IrDA Infrared Data Association TLS Transport Layer Protocol
ISDN Integrated Services Digital Network TOC Total Cost of Ownership
ISP Internet Service provider TURN Traversal Using Relay NAT
ISUP ISDN User Part UA User Agent
ITU International Telecommunication Union UAC User Agent Client
IVR Interactive Voice Response UAS User agent Server
LAN Local Area Network UDP User Datagram Protocol
LL Leased Line UM Unified Messaging
MAC Media Access Control UPnP Universal Plug and Play
MAC Moves, Adds, Changes UPS Uninterruptible Power Supply
MAN Metropolitan Area Network URI Uniform Resource Identifier
MD5 Message Digest 5 URL Uniform Resource Locator
MGCP Media Gateway Control Protocol VLAN Virtual Local Area Network
MIKEY Multimedia Internet Keying VoIP Voice over IP
MITM Man-in-the-Middle attack VOIPSA Voice over IP Security Alliance
MKI Master Key Identifier VoWLAN Voice over WLAN
MPEG Moving Picture Experts Group VPN Virtual Private Network
MPLS Multiprotocol Label Switching WAN Wide Area Network
NAT Network Address Translation WAV Waveform Audio Format
NIC Network Interface Controller WEP Wired Equivalent Privacy
OS Operating System WLAN Wireless Local Area Network
OSI Open Systems Interconnection WPA2 Wifi Protected Access 2
PAT Port Address Translation
174
175

Security of SIP-based Voice Over IP in Enterprise Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Security of SIP-based Voice Over IP in Enterprise Networks

Uploaded by

Copyright:

Available Formats

INFOTECH

Security of SIP-based Voice over IP

Master Thesis report submitted to the University of Stuttgart in

Under the supervision of:

Prof. Dr. Paul J. Kühn

Dr. Stephan Rupp

I would like to express my gratitude to Wolfgang Lautenschlager and Matthias Duspiva

Special acknowledgements to Professor Paul J. Kuhn for giving me the opportunity to do

Decir gracias no sería suficiente para aquel a quién quiero, Angel.

1. INTRODUCTION AND MOTIVATION ...................................................................................... 1

Figure 2.1 – Enterprise networks and interconnection of sites _______________________________________________ 4

Table 2.1 – General network security requirements ............................................................................................................. 6

2.1. Enterprise networks

Figure 2.1 – Enterprise networks and interconnection of sites

2.1.2. Interconnecting various sites

2.1.2.1. Dedicated leased lines

2.1.2.2. Virtual Private Networks (VPNs)

2.1.2.3. Dial-up remote access

2.1.3. Securing enterprise networks

2.1.3.1. Network security requirements

Table 2.1 – General network security requirements

2.1.3.2. Threats and vulnerabilities

Threat-source Motivation Threat actions

Types of network threats and attacks

2.1.3.3. Security solutions

2.1.3.4. Security policies

2.1.4.1. Single-site small private enterprise network

2.1.4.2. Large multi-site private enterprise network

Figure 2.3 – Large multi-site private enterprise network

2.2.1. Definition of Enterprise Voice over IP

2.2.1.1. Voice over IP vs. IP Telephony

2.2.1.2. Consumer VoIP vs. Enterprise VoIP

Figure 2.4 – Four types of VoIP services

2.2.4. Challenges of a VoIP migration in enterprises

3.1. VoIP deployment scenarios

3.1.1. Deployment in a small single-site enterprise

3.1.1.1. IP-enabled VoIP architecture

Figure 3.1 – IP-enabled VoIP deployment in a small single-site enterprise network

3.1.1.2. Full-IP VoIP architecture

Figure 3.2 – Full-IP VoIP deployment in a small single-site enterprise network

3.1.2. Deployment in a large multi-site enterprise network

3.1.2.1. IP-enabled VoIP architecture

3.1.2.2. Hybrid VoIP architecture

3.1.2.3. Full-IP VoIP architecture

Figure 3.7 – Full-IP VoIP deployment in a large multi-site enterprise network

3.1.3. Hosted IP-PBX solution

3.1.3.1. Hosted IP-PBX vs. IP-Centrex

3.1.3.2. Deployment of a hosted IP-PBX solution

Nevertheless, the main advantage of hosted IP-PBX solutions is to allow enterprises to

3.2. VoIP infrastructure using the SIP signalling protocol

3.2.1. Components of the VoIP infrastructure

Figure 3.9 – IP-PBX switching calls

Table 3.2 – Basic and advanced features of IP-PBXs

3.2.1.2. VoIP media/signalling gateway

3.2.1.3. Conference Bridge

Figure 3.10 – SIP architecture – SIP session through a Proxy Server

3.2.3. VoIP architecture with IP-PBX using SIP

Figure 3.16 – IP-PBX with B2BUA architecture

3.2.4. VoIP architecture using the Asterisk server

3.5. Introduction to security threats to VoIP

Figure 3.21 – ARP poisoning attack in a small enterprise network

4.1. Choice of the SIP protocol

4.2. Media transport of voice

4.2.1. Real-Time Transport Protocol (RTP)