Professional Documents
Culture Documents
TR1546
TR1546
TR1546
Simplified Infrastructure
In-Service Performance Analysis
Priyanki Vashi
Mlardalen University
Vsters, Sweden
i
Abstract
Ericsson has always strived for the technology leadership in its offering
by designing products based on the latest technology. Going ahead with a
similar thought it started exploring an idea of running a mobile core application
using a Simplified Infrastructure (SI) to eventually enable the Cloud based
solutions. But in order to run these type of applications in the Cloud, the
in-service performance provided by such a SI should be the same as the native
infrastructure in order to maintain the mobile core applications QoS. "High
availability" of the infrastructure is one of the measure of the ISP and from the
ISP point of view, such a migration would be considered feasible only if the SI
is able to maintain the same level of availability as provided by the native
infrastructure solution without bringing in any major architecture changes
within the SI. Hence this master thesis project investigates the feasibility of
achieving the same availability as before if the mobile core application is to be
migrated from the native infrastructure to the SI. Such a feasibility exploration
was the very first attempt with respect to the SI within Ericsson, which was
executed through this master thesis project. In order to achieve the goal of
this thesis project a detailed system study was carried out, which focused
on the native infrastructure architecture, how it was maintaining the "high
availability" and how it differed from the SI.
In the end, it was possible to confirm that the level of availability of
infrastructure services as provided through the SI will be higher than the native
infrastructure after the migration if the proposed suggestions of this master
thesis project are implemented successfully. These implementations also do
not change the architecture of the SI in any major way. The end results of this
thesis project were also highly appreciated by Ericsson and are now part of the
development plan for next mobile core infrastructure solution at Ericsson.
ii
Acknowledgements
The memories associated with this master thesis work will always have
a special place in my heart and to have such an amazing feeling about my
involvement in the work, I would like to start with thanking my Ericsson
mentors and technical supervisors Leif Johansson, Nikhil Tikekar and Niklas
Waldemar. Without their belief and trust in my capabilities it would not have
been possible to reach an expected outcome. In addition, I would also like
to thank the designers, system managers and previous master thesis students
(Isaac and Manuel) at Ericsson, who provided me a valuable information,
which was not so evident in an available documentation in order to reach
an expected outcome of this thesis project. Some of the learnings, which I
really want to highlight here is, first why to bring a simplification and then
secondly how to bring the simplification in a more systematic way for a complex
products such as the one studied as a part of this thesis project. Well in
this case, the simplification is mainly driven to enable the compatibility with
the latest technology involving the Multicores, Virtualization and hence Cloud
Computing and then leverage the benefits of the Cloud technology. Not only
technically it was rewarding for me to work in this area but also motivating
and an inspiring experience to interact with such a simple minded and humble
but yet very talented people of Ericsson.
I would also like to equally thank professor Thomas Nolte for all his support
and clear guidelines on my queries during this thesis work. I would honestly
admit that I felt very happy and honoured when Thomas had agreed to be my
thesis supervisor just based on an initial phone talk without even meeting me in
person. Interacting with him was a great experience. I am also very grateful to
professor Damir Isovic for encouraging me throughout my masters education as
my study advisor. Both of them have always answered my questions precisely
and provided me with a very valuable feedback and suggestions.
Last but not the least, I would also like to convey my deepest regards and
sincere thanks to my family and more specifically to my mother, Kantaben
Vashi and best friend, Ravikumar Darepalli, who is also my life partner. Their
words were constant source of encouragement throughout my Life and sharing
the Masters Education experience with them is none different than that !
Contents
Contents
III
List of Figures
List of Tables
VI
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
VII
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 General Background
2.1. Ericsson MSC Server Blade Cluster (MSC-S BC)
2.1.1. Overview . . . . . . . . . . . . . . . . . .
2.1.2. MSC-S BC Hardware Architecture . . . .
2.1.3. MSC-S BC Software Architecture . . . . .
2.1.4. MSC-S BC blade states for MSC-S BC . .
2.1.5. MSC-S BC Hardware Management . . . .
2.1.6. Link and Plane Handling for MSC-S BC
2.1.7. MSC-S BC Functional View . . . . . . . .
2.2. In-Service Performance (ISP) . . . . . . . . . . .
2.2.1. ISP Overview . . . . . . . . . . . . . . . .
2.2.2. Availability Measurements . . . . . . . . .
2.3. SI Prototype Summary . . . . . . . . . . . . . . .
2.3.1. Overview . . . . . . . . . . . . . . . . . .
2.3.2. Verification Environment in Prototype . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
5
9
10
11
14
15
15
16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
19
22
25
26
28
30
32
32
33
34
34
37
CONTENTS
iv
3 Evaluation
3.1. Approach for Theoretical Study . . . . . . . . . . . . . . .
3.1.1. Analysis from ISP Perspective . . . . . . . . . . .
3.1.2. Current System Design Perspective . . . . . . . . .
3.2. Theoretical Study Findings . . . . . . . . . . . . . . . . .
3.2.1. Interfaces Identified . . . . . . . . . . . . . . . . .
3.2.2. List of Functions using NON-IP Interfaces . . . . .
3.3. Analysis of Unavailability of Identified Functions . . . . .
3.3.1. Function-1: Automatic Boot . . . . . . . . . . . .
3.3.2. Function-2: Supervision of Suspected Faulty Blade
3.3.3. Function-3: Link Fault Detection and Recovery . .
3.3.4. Function-4: Plane Fault Detection and Recovery .
3.3.5. Remaining functions: Function-5 to Function-10 .
3.3.6. Summary on Proposals for Different Functions . .
3.4. Verification of Proposals using Prototype . . . . . . . . .
3.4.1. Verification Strategy . . . . . . . . . . . . . . . . .
3.4.2. Test Case Description . . . . . . . . . . . . . . . .
3.4.3. Test Execution . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
41
43
45
46
46
46
46
48
50
51
53
53
54
54
55
56
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
63
64
64
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
67
71
71
71
74
List of Figures
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1.7.
1.8.
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
SIS blade. .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
BSOM signal flow diagram between MSC blades and a SIS blade. . . . .
Connectivity between CP Blades and Infrastructure Blades. . . . . . . .
Analysis of unavailability of automatic boot function. . . . . . . . . . .
Analysis of an unavailability of the MSC-S BC blade supervision function.
Analysis of an unavailability of link management function. . . . . . . . .
Analysis of unavailability of plane handling function. . . . . . . . . . . .
Analysis of unavailability for rest of the functions. . . . . . . . . . . . .
Summary of the proposed alternatives. . . . . . . . . . . . . . . . . . . .
2
4
6
7
8
9
13
14
18
19
20
23
27
28
33
35
36
39
44
45
47
49
51
52
53
54
72
75
List of Tables
1.1. Global Mobile Data Traffic Growth . . . . . . . . . . . . . . . . . . . . .
37
38
38
3.1. Compilation of tests results where one of the prototype MSC-S BC blade
was added to an existing MSC-S BC cluster. . . . . . . . . . . . . . . .
3.2. Compilation of tests results where one of the prototype MSC-S BC blade
was removed from an existing MSC-S BC cluster. . . . . . . . . . . . . .
vi
57
58
API
APG
AUC
Authentication Center
BSC
BSS
BTS
BW
Bandwidth
CAPEX
Capital Expenditure
CPU
CS
Circuit-Switched
CSCF
EIR
eNB
Evolved Node B
EPC
ETSI
GB
Gigabyte
Gbps
GHz
Gigahertz
GPRS
GSM
viii
GUI
HLR
HSS
IaaS
Infrastructure as a Service
IEEE
IMS
IP Multimedia Subsystem
IMSI
I/O
Input/Output
IP
Internet Protocol
ISO
ISP
In-service Performance
IT
Information Technology
KVM
LAN
LTE
MAC
MB
Megabyte
Mbps
MGW
Media Gateway
MIPS
ms
milisecond
MSC
MSC-S
MSC-S BC
MSISDN
NGN
NIST
ix
NMC
NMS
NSS
OMC
OPEX
Operational Expenditure
OS
Operating System
OSI
OSS
PaaS
Platform as a Service
PC
Personal Computer
PS
Packet-Switched
PSTN
QoE
Quality of Experience
QoS
Quality of Service
RAM
RAN
RNC
SI
Simplified Infrastructure
SIM
SIS
SMS
SPX
Signaling Proxy
SSH
Secure Shell
UDP
UPS
UMTS
UTRAN
VLAN
VLR
VM
Virtual Machine
VPN
Chapter 1
Introduction
The aim of this chapter is to introduce the wider group of readers with the work
carried out in this master thesis project. As a first step, an overview of the subject
and its related work is described so that the readers can connect and follow the
rest of the parts easily and logically. After that the problems, which had triggered
this kind of work/study is described followed by a statement of the goals. Next, the
methodology that is used to solve the identified problems is decsribed. Thereafter
scope, limitations and the target audience of the project are clearly stated. Finally,
an outline of the thesis is presented to highlight the structure of the thesis.
1.1.
Overview
Today Global System for Mobile Communications (GSM) and Universal Mobile
Telecommunications System (UMTS)) are two of the most widely used mobile core
network architectures. GSM represents a second generation (2G) digital mobile
network architecture [1] and UMTS is a third generation (3G) mobile cellular
technology standard [2]. On high level both the architecture topology is composed
of three subsystems. The mobile core application (MSC-S application) and its
infrastructure (Ericsson MSC-S Blade Cluster), which is focus of this master thesis
project are part of one of the subsystem common to both the types of architectures.
This subsystem is Network Switching Subsystem (NSS) and it is indicated as
Switched Core Network subsystem in UMTS network topology using Figure 1.1.
The NSS is composed of units like Mobile Switching Center Server (MSC-S),
Home Location Register (HLR), Visitor Location Register (VLR) etc. so that
different functions of this subsystem could be realized by different functional entities
in the network [3]. Typically, a MSC-S node is responsible for the setup, supervision,
and release of calls as well as for handling SMSs and managing terminals mobility.
It also collects call billing data in order to send it to the Billing Gateway, which
processes this data to generate bills for the subscribers.
1
CHAPTER 1. INTRODUCTION
1.1. OVERVIEW
(SMS) with as optimal cost as possible. Since the end users also expect the
same Quality of Experience (QoE) as they obtain while using wired devices [4] for
some of these services, this in turn puts high demand on the network performance
while deliverying these services thorugh the mobile networks. Additionally in the
given case expansion of the mobile network is directly proportional to the growing
demands of such services and it is very dynamic. Hence the CAPEX and the OPEX
required to build and sustain such a deployment is becoming a major concern for
the telecom operators [5].
Furthermore the demands in terms of bandwidths are also increasing [6] (as
it can be seen from Table 1.1) especially due to the emergence of new services
and applications requiring Internet access [7]. Therefore developing a flexible, cost
optimal and future proof network solution is a challenging task. Currently the
solutions to boost the mobile networks bandwidth are being addressed by the
Long Term Evolution/Evolved Packet Core architecture (LTE/EPC) [8] [9]. LTE
introduces the sophisticated radio-communication techniques enabling faster and
more efficient access networks while EPC involves the deployment of a packet-based
core network capable of dealing with the future traffic increases [10]. Additionally
the IP Multimedia Subsystem (IMS) [11] [12] [13] is the main framework to provide
the voice and SMS services over IP. Hence exploring Cloud Computing technology
for hosting various telecommunication applications could be very futureproof and
worth of efforts [14].
Table 1.1. Global Mobile Data Traffic Growth
2011
2012
2013
2014
Year
2009
2010
(expected)
(expected)
(expected)
(expected)
Annual Increment
140%
159%
133%
110%
90%
78%
CHAPTER 1. INTRODUCTION
To acheive this, as a first step Simplified Infrastructure (SI) prototype was built
at Ericsson (which eventually enables the migration of telecom applications to the
Cloud) considering the important applications of the mobile core network (MSCS and HLR). The complete activity was divided into three phases as indicated in
Figure 1.2. The first two phases of the Simplified Infrastructure mainly focussed on
the design of different variants of the SI prototype, which is related work for this
master thesis project and it is described as a part of the related work in Section 1.2.
It is important to note that the successful implementation of the SI prototype
previous to this project work played a very crucial role during the verification phase
of the current master thesis project and without such a prototype in place, it would
not have been possible to practically demonstarte the end results of the current
master thesis project.
The study done as a part of current master thesis, which represents the third
and final phase, is mainly focussing on an analysis of "high availability" criteria with
respect to the proposed SI solution. The method used during this study makes a
very good logic and absolute clarity on what actions need to be taken in order to
achieve the same or better level of availability if a mobile core application to be
migrated from an Ericsson native infrastructure (MSC-S BC) to SI, and eventually
to the Cloud.
Using the end results obtained from the current master thesis project, it is
possible to say that there is a huge potential for hosting this type of mobile core
applications using SI with the improved level of availability. This proof of concept
would eventually help to secure the higher level of availability while migrating to
the Cloud as well. Of course, all the drawbacks that use of public Internet may
introduce when providing these services must be kept in mind (Oredope and Liotta
also regarded this as an important concern in [16]).
1.2.
Related Work
During first two phases of the study, the SI prototype was designed [17]. Three
different variants of the SI prototype were explored and all of the variants were built
by virtualizing the cluster based distributed system, which was considered one of the
most successful core infrastructure platform within Ericsson and it is communicating
over IP based connectivity with the rest of the components within SI. In Ericssons
terms, this core infrastructure platform is named as "Ericsson Blade Clutster" and
when MSC-S application is run on this platform, it would be identified as "MSC-S
BC" (Figure 1.3). The different variants of the SI included following types. The
purpose with each of the prototype variant is also presented further.
CHAPTER 1. INTRODUCTION
Hybrid Ericsson MSC-S BC topology:1st variant The purpose of this design was to demonstrate the correct operation of the system when placing
a prototype MSC-S blade in an emulated Cloud environment (outside the
racked architecture). Figure 1.4 depicts this topology of an Ericsson MSC-S
hybrid blade cluster, where a prototype MSC-S blade is implemented on an
external server located outside the rack.
Figure 1.4. Ericsson MSC-S hybrid cluster topology (1st variant of SI prototype).
Ericsson MSC-S external cluster topology:2nd variant The external cluster topology is represented in Figure 1.5. This prototype design consisted
of an Ericsson MSC-S BC implementation whose only MSC-S blades are
prototype MSC-S blades located in a emulated Cloud environment. The
purpose with this prototype variant was to verify the correct operation of the
cluster protocols in presence of network impairments as well as the systems
stability with this network configuration.
CHAPTER 1. INTRODUCTION
Figure 1.5. Ericsson MSC-S external cluster topology (2nd variant of SI prototype).
Figure 1.6. Ericsson MSC-S split cluster topology (3rd variant of SI prototype).
The tests results from all the three variants of the SI prototype had succeded to
show the practical demonstation of running one of the mobile core applications, in
this case MSC-S on SI. More details about each of the variants and their respective
tests could be found here [17].
1.3.
Problem Description
10
CHAPTER 1. INTRODUCTION
1.4.
Goals
The main goal of this master thesis project was to study the feasibility of
migration of one of the mobile core application from the native infrastructure
to the Simplified Infrastructure to enable the Cloud based solutions. Such a
migration would be considered feasible only if the Simplified Infrastructure is able to
maintain the same level of the availability as provided by the native infrastructure
1.5. METHODOLOGY
11
solution without bringining in any major architecture changes within the Simplified
Infrastructure.
Before explaining detailed goals of this thesis project, it is necessary to elaborate
on the meaning of important terms. In the given context,
In-service performance defines the measure of availability, which is measured
using the in-service performance statistics collected internally within Ericsson.
Cloud based solutions here represents geographically separated resources In the current project it represents a group of virtual blades running as a
distributed cluster with only IP based connectivity. This configuration is
equivalent to a distributed cluster formed by physical blades running within
the native infrastructure. In this case there exist two variants, one is called
Integrated Site (IS) and the other is Ericsson Blade System (EBS).
The main goal is divided into three subgoals as presented below.
Goal-1: Study the architecture of the native infrastructure, understand how it was
maintaining the high availability and how it differs in maintaining the high
availability compare to the Simplified Infrastructure.
Goal-2: Based on the identified differences between two infrastructure solutions
analyze if there is a way to propose a solution so that the same level of
availability can be achieved before and after the migration without bringing
in major architecture changes within the Simplified Infrastructure.
Goal-3: If there is a suitable solution, conduct various tests using the existing
Simplified Infrastructure prototype to practically demonstarte the proposed
solution works as expected and hence help to provide a concrete conclusion
on the feasibility of this migration.
1.5.
Methodology
In order to fulfill the goals of this thesis project, a qualitative approach was
utilized. Secondary research was used as a qualitative method, which also includes
understanding of the work done as a part of the previous studies. Moreover, this
research provided material for the background chapter and allowed to obtain a full
state-of-the-art overview of the subject. This literature review also provided a solid
foundation upon which various ideas for different proposals are built.
Step-1: As a first step, a study to be done in order to understand what defines
the in-service performance and what kind of data is available as a part of
12
CHAPTER 1. INTRODUCTION
1.5. METHODOLOGY
13
14
CHAPTER 1. INTRODUCTION
1.6.
Scope
Within Ericsson, there exist different variants of the processor and infrastructure blades. A certain combination of the processor and the infrastructure
blades together form one of the core infrastructure components within a core
network solution. As part of this thesis project, one such variant (IS based
Blade Cluster) was studied, and the mobile core application considered was
MSC-S.
1.7. LIMITATIONS
15
A similar study would be required to carry out for the other variants of
processor and infrastructure blades such as EBS (Ericsson Blade System),
but the method used in this masters thesis could be equally efficient for that
as well.
The practical experiment was carried out using a Ericsson proprietary MSC
application prototype with limited functionality. In the future further studies
should be conducted to verify the correct behaviour of a completely functional
Ericsson MSC-S BC application as well as the other (related) applications
to see if the results of this study can be generalized to the other (similar)
applications.
Study of certain software component, (even though they are part of the chosen
variant) was out of the scope of this masters thesis. One such software
component is the IP Stack designed by Telebit (TIP stack).
Troubleshooting of the prototyping problems was also decided to be kept
outside the scope of this thesis work.
1.7.
Limitations
One of the main limitation in this thesis work was the use of a simulated
environment during the verification phase.
During the last step, which was focusing on verification of the proposed
alternatives, the GSM and UMTS type of mobile calls were generated using
a simulated environment. However, since the main goal of this thesis was to
demonstrate that the proposed idea works (as a proof of concept), a simulated
environment was enough to carry out this initial verification.
1.8.
Target Audience
The primary audience of this work is the Ericssons internal design and systems
group within Evolved infrastructure. The idea here was to show that the proposed
methodology and derived results as one approach in order to simplify such a complex
platform without impacting its in-service performance. Through such an approach
it would be possible to have an open discussion on the proposed alternatives.
Another important target audience is Ericssons customers, who wish to leverage
the benefits of the cloud technology with respect to their current mobile core network
solution.
In addition to these readers, a specific group of researchers is interested in
acquiring the knowledge with respect to a telecom network performance in the
16
CHAPTER 1. INTRODUCTION
Cloud, such as the one studied in this thesis project can also take the advantage of
the described metodology.
1.9.
Thesis Outline
The thesis is structured in a linear manner, where the earlier chapters provide
a general overview of the subjects necessary to understand the remaining chapters
of the thesis. It is strongly recommended that the reader should thoroughly study
the introdcution and the background chapters in order to provide an appropiate
context for the subsequent experimental work.
Chapter 1 provides an introduction to the thesis. Chapter 2 provides related
background information. Chapter 3 describes an evaluation part of this thesis work,
which talks about the theoretical study findings and various conclusions of the
findings. It also discusses details about the prototype, the verification strategy and
the test cases used for verifying the findings of this theoretical study. Chapter
4 presents final conclusions and suggested future work. Appendix A explains a
brief architecture of different types of mobile core networks (GSM and UMTS
introduction). Appendix B (confidential) is a manual to configure a prototype
testing environment used during this thesis work.
Chapter 2
General Background
The purpose of this chapter is to give a brief overview of the technologies
and concepts involved in this thesis project so that the readers can easily
understand/visualize how the work has been carried out.
In addition the
information provided here focuses only on the important areas of the subject, which
are directly related to this project without going into unnecessary details.
Since the purpose of this thesis project was to analyze whether one of the
crucial infrastructure components of a mobile core network could be migrated
to a Simplified Infrastructure without any impact on its in-service performance,
therefore at the begining of the chapter important concepts of the MSC-S BC
architecture are described. The architecture includes both the HW and SW
components description (Section 2.1). Next the important concepts, definitions
and terminologies with respect to the in-service performance of the platform
(Section 2.2) are described. In the end a theoretical description of the Ericssons
MSC-S BC prototype and test environment (Section 2.3) is presented.
2.1.
2.1.1.
Overview
The Ericsson Mobile Switching Center Server (MSC-S) [18] forms one of
the important components within the Ericssons Mobile Softswitch solution [19].
Important functions of this server includes, set up and releases of end-to-end calls,
handling mobility and hand-over of the calls between dfferent mobiles, the call
charging etc. However recently it has been replaced by a more sophistacated stateof-the-art solution, called the MSC-S Blade Cluster (MSC-S BC). MSC-S BC is
designed on the principle of a cluster based distributed system.
All the components of the Ericsson MSC-S BC are implemented as a racked
architecture. As a part of this racked type of architecture, MSC-S BC can have
17
18
either one or two cabinets depending upon the capacity requirements it needs to
serve. The first cabinet hosts all the mandatory components, while the second
cabinet gives provision for an optional expansion of the components for supporting
additional capacity. Pictorially Figure 2.1 presents the racked view of MSC-S BC
where as Figure 2.2 gives a more detailed view of the same at blade level, where
BC0 represents the mandaory cabinet and BC1 is the optional one.
19
2.1.2.
20
The main components of the MSC-S BC are the IS infrastructure blades (MXB,
EXB and SIS), MSC-S BC blades, a signaling proxy (SPX), an IP Line Board
(IPLB) and IS Attached Systems.
2.1.2.1 IS Infrastructure: IS is an Integrated Site, which consists of subracks
and switches. It includes the subracks with MXB, EXB, SIS and several
MSC-S BC blades.
The IS infrastructure blades such as MXB and EXB provides the data link
layer connecitvity (L2) for the MSC-S BC blades and the IP Line Boards
(IPLBs). The main reason for using an IS infrastructure in MSC-S BC is that
IS could co-host different types of a telecom application Blade System. It
was a future vision that one node based on an IS infrastructure could house
an MSC Server Blade System as well as an IP Multimedia Blade System.
This was seen as a part of the solution for the main requirement to support a
migration possibility from a circuit switched core network to an IMS network.
2.1.2.1.1 Site Infrastructure Support Blade System (SIS): SIS is a central
management system in an IS infrastructure. It provides a number of important
21
22
IPLB pair for operation and maintenance. The IPLBs reside within the IS L2
infrastructure.
2.1.2.2 IS Attached System: Not all the components in the MSC-S BC fulfill the
requirements to reside in the L2 infrastructure provided by an IS framework.
These requirements are that certain L2 connectivity facilities, like the Link
Layer Aggregation with Ericsson proprietary extension must be supported.
Components in the MSC-S BC which do not support these requirements are
the SPXs and the I/O system. They are connected to the IS infrastructure as
an IS Attached System.
L2 connectivity of the components in an IS Attached System is provided
by the Switch Core Board (SCB) as shown in Figure 2.2. For redundancy
purposes two SCBs are present per subrack. To achieve connectivity between
the components of an IS infrastructure and an IS Attached System, the EXBs
in the IS infrastructure are connected with the SCBs of an IS Attached System.
2.1.2.2.1 Signalling Proxy (SPX): SPX is the part of an IS Attached System
and this element is responsible for distributing external SS7 signaling traffic
over the MSC-S BC so that it can be processed. The traffic distribution to
the MSC-S BC blades is done on an algorithmic basis (e.g. using a Round
Robin scheduling algorithm).
The SPX is based on a double sided processor, which in turn uses two
GEP boards as a hardware platform. The double sided processor offers 1+1
redundancy. The MSC-S BC consists of two SPXs, which can be used either
in a load-sharing manner or in a redundant manner. How the SPXs are used
depends on the network configuration.
2.1.2.2.2 I/O system: As the name suggests, the I/O system provides the
input/output functionality for the MSC-S BC blades and the SPXs. The
MSC-S BC contains two I/O systems. One is meant for basic input/output
and performance management while the second is used for charging and
accounting data collection from all the MSC-S BC blades and SPXs. Each
I/O is also based on a GEP hardware, running a Microsoft Windows Cluster
Server as an operating system. This provides a 1+1 redundancy for each
I/O device. The I/O system also communicates with the Operation Support
System (OSS) of a network.
2.1.3.
The software structure of the MSC-S BC system is designed with the aim of
upholding the functional modularity in order to simplify the installation, operation
and maintenance of the system apart from achieving the required functional
requirements.
23
24
4 An online ASA compiler (ASAC) that operates in two compilation modes, basic
and optimized. The compiler that compiles the code is called a JIT compiler
(Just In time). The compilation mode is selected on block level. Basic mode
is used for most blocks and it provides additional information for fault finding.
5 The APZ OS (central processor operating system) provides the service functions
for an application software and the functions for administration, operation
and maintenance of the software and hardware.
6 Applications SW layer.
7-10 I/O system Software layers.
By combining the above described software layers different subsystems are
formed. The important ones with respect to this thesis are:
CP Hardware Subsystem (CPHW) This subsystem contains the CP hardware
platform. Software layer 1 and 2 in Figure 2.4 together form the Central
Processor Hardware Subsystem. The main responsibility of the CPHW
subsystem is,
To provide the central processor board (CPUB), with the ENUX OS
To provide an execution platform for the PLEX Engine subsystem (PEs)
services such as ASAC and APZ-VM
To provide the support functions for other subsystems such as the PLEX
Engine subsystem (PEs) and the Maintenance subsystem (MAS) to
create a central processor that fulfills the telecom domain requirements
To provide the physical interfaces (NIC) towards the other MSC-S BC
cluster blades, SPX or IS components via the IS infrastructure
To provide different protocol stacks (like the Telebit IP stack (TIP) and
the OS Kernel IP stack (KIP))
To provide an execution platform for the Extra Processing Units (XPU)
applications
Maintenance Subsystem (MAS) This subsystem has a responsibility to provide the functions for an automatic HW and SW fault handling for individual
MSC-S BC blades during live traffic as well as for the important maintenance
functions through a manual intervation by an exchange technician. Fault
management is provided through a Blade Fault Tolerance architecture (BFT).
More details on the types of blade level fault tolerance are covered as a part
of Chapter 3 (Evaluation).
Cluster Quorum Subsystem (CQS) This subsystem has the responsibility for
making a group of individual MSC-S BC blades to operate as a cluster.
25
2.1.4.
Each MSC-S BC blade has a certain status within the MSC-S BC. The status of
a MSC-S BC blade is described by a Cluster Central Processor State (mostly just
called CP state or state). In addition to the CP state, an optional CP state and an
application substates also exist. These optional states describe the current situation
of a blade in more detail than the CP state does. As a part of this section only CP
states are discussed since it is believed that it would be sufficent with respect to the
scope of this thesis work.
The possible CP states are:
ACTIVE: The blade is part of the quorum and is used for normal traffic execution.
Blades in state ACTIVE are part of the Operative Group (OG) and are kept
consistent from the configuration point of view.
PASSIVE: The blade is a part of a quorum but it is not used for the traffic
execution. The blade is either not activated yet or has been put to PASSIVE
due to inconsistency reasons.
INTERMEDIATE: A previously ACTIVE blade that is temporarily out of the
quorum either due to the blade recovery or because this was ordered by
a command. The blade is expected to return to an ACTIVE state either
automatically or by a command, respectively.
RECOVERY: A previously ACTIVE blade that is temporarily out of the quorum
due to an extended recovery activities, or a previously PASSIVE blade that is
temporarily out of the quorum due to the blade recovery activities, or a blade
that has missed to rejoin the quorum during an Automatic Quorum Recovery
(AQR), is in the state RECOVERY. Typically, the RECOVERY state is a
transient state and it is expected that the blade will automatically return to
its previous state without manual intervention.
NON-OP: The blade is non-operational either due to the permanent failure or
because this was ordered by a command.
UNDEFINED: This is not a real state. The blade is not a member of the cluster
and it is unknown to the other blades.
26
2.1.5.
27
Figure 2.5. BSOM signal flow diagram between MSC-S blades and SIS blade.
28
2.1.6.
2.1.6.1 Introduction
The internal communication between all the MSC-S BC components is critical
for proper operation of the system. Therefore an IS L2 infrastructure provides two
redundant Ethernet switch planes (the left MXB and the right MXB). Each MSCS BC blade is connected to both sides of the MXB switch planes. The two links
operate in an Ericsson variant of the IEEE Q.802 Link Aggregation. A Rapid Link
Supervision Protocol (RLSP) is used between the MSC-S BC blade (CPUB) and
the MXB for the link fault detection. The same is depicted in Figure 2.6.
Even though each MSC-S BC blade is physically connected to both of the MXB
switch planes, every MSC-S BC blade normally send the messages over the left
switch plane as long as the left plane link is operational. When a particular blades
left link becomes unavailable, it start to transmit on the right plane of the MXB
switch. Received packets are always accepted on both the links. When a complete
left MXB plane fails, all the blades fail over to the the right MXB switch. And thus,
the L2 infrastructure is protected against a MXB failure in a single switch plane.
However, an IS does not provide protection against a single (left) link failure
between a blade and the MXB switch. The MSC-S BC blade can still send messages
29
over the right plane but it will no longer receive packets from the other MSC-S BC
blades as they continue to send on the left switch plane of the MXB switch. Hence
a MSC-S BC blade with a link failure must be taken out of operation immediately.
Link failures are detected and handled by the IS LANFM application running
on the MXB and the SIS. If several link failures are detected on the same MXB
plane (usually left) within a short time, it would result in an entire switch plane
being locked. This in turn will result in a failover to the redundant switch plane
(usually right plane of the MXB switch). Otherwise SIS informs an active BSOM
instance on the MSC-S BC blade, which broadcasts the link failure indication to
all the blades in a Cluster. Both notifications are sent through both of the switch
planes to ensure that the information reaches the faulty blade as well.
2.1.6.2 Types of Link Faults
2.1.6.2.1 Single Link Fault: In case of a single link fault, the MSC-S BC blade
looses communication with other MSC-S BC blades of a cluster since the left
link towards a MXB is down. The blade, with a single link fault will send
messages to the rest of the blades in the cluster through the right link of the
MXB switch. Although the other blades will receive the messages from this
suspected faulty blade, their replies will not reach the faulty blade. There are
two types of single link faults as described below.
a) Temporary Fault: If a link is down for a period between 0 and 250
seconds, it is catagorized as a temporary fault. The link downtime
value of 250 seconds was found out to be a limit that differentiated
a temporary single link fault from a permanent single link fault in the
MSC-S BC. When a temporary single blade link fault occurs, the affected
blade automatically restarts and switches to the "recovery" state. Then,
as soon as the connectivity is recovered, the faulty MSC-S BC blade
returns to a cluster in an "active" state and continue to handle the traffic
as it did before the fault occurred.
b) Permanent Fault: As mentioned above, if the link is down for more
than 250 seconds then it is considered as a permanent type of link
fault. When a permanent single blade link fault occurs, the affected blade
automatically restarts and switches to the "recovery" state. Then, when
the connectivity is recovered, the faulty MSC-S BC blade is automatically
reinserted in the cluster using the cloning process.
Multiple Link Fault: In case of a multiple link fault (usually left side), then all
those MSC-S BC blades for which the link is broken, they loose communication
30
towards the other MSC-S BC blades within a cluster. All those MSC-S BC
blades with a link fault will send messages to the other blades within the
cluster using the non broken link, which is the right side links. Although the
other blades will receive traffic from the suspected faulty blades, their replies
will not reach these faulty blades.
Multiple link faults could also be of type temporary or permanent one as
described above for the single link fault.
2.1.7.
2.1.7.1 Introduction
The MSC-S BC based on the hardware architecture described above has
following functional requirements.
Load Sharing: Since several MSC-S BC blades exist, the load must be distributed
equally over all the available MSC-S BC blades.
Scalable: Scalability must be achieved. It means that one or multiple MSC-S BC
blades can be added or removed without any in-service performance impact
and without any additional operation and maintenance configuration.
Redundant: Redundancy must be achieved. It means that one MSC-S BC blade
can fail or temporarily can be taken out of the service without any in-service
performance impact. Although several physical MSC-S BC blades exist,
logically all the MSC-S BC blades must be visible as a one single node in
the network as well as during the operation and maintenance activity.
To achieve the above requirements, the MSC-S BC consists of several functions,
which run on these blades in co-operation with the rest of the components.
More details about scalability and redundancy concepts are explained in further
subsections.
31
32
Loss of functionality
M+N redundancy on MSC-S BC blades does not mean that there is a spare
group of stand-by MSC-S BC blades. In normal operation, all the blades evenly
share all the roles and processing tasks. Furthermore, there is no hot stand-by
blade in this scheme. At a failure of the particular MSC-S BC blade, the tasks (e.g.
mobile calls) it was currently handling are lost and cannot be continued seamlessly
by the other blades.
It is important to understand that even the simultaneous failure of multiple
MSC-S BC blades does not render the MSC-S BC or any of its functions unavailable.
It only implies a capacity loss increasing with the number of failed blades.
Temporarily, a multi-blade failure can also mean a loss of service accessibility for
those calls (subscribers) that had both their primary and buddy records on the failed
blades. Only when the number of available active blades falls below a minimum of
two, the MSC-S BC fails as a node and is recovered through the cluster recovery
procedure.
2.2.
2.2.1.
ISP Overview
ISP, which is defined as the in-service performance, gives an idea about how the
performance of nodes is while in service. The performance is measured by measuring
availability and serveability of a node (MSC-S BC).
33
2.2.2.
Availability Measurements
34
SONE is collected only once in a year where as unplanned SONE is collected every
month.
Planned SONE: Under a planned only SONE one category exist and the collected
statistics under this catagory is named as PLM, which stands for plannedmanual and it includes downtime causes for the software upgrade, software
update and the hardware upgrade or update.
Unplanned SONE: Unplanned is further divided into following four catagories.
In the current thesis scope, only an automatic type of unplanned SONE was
considered during analysis and evaluation of the results.
Automatic (AUT): This type caters for the downtime causes due to
software faults and/or configuration faults which makes the blade
completely down. Also the system recovers from the fault on its own
either by restart or reload. Network or link faults are not counted here
since they make only part of the blade to go down and not the complete
blade fails.
Manual (UPM): This type caters for downtime causes where an automatic
recovery has failed and an operator intervention is needed. It also
considers the cases where the automatic recovery is not triggered.
Examples include hanging of devices, hanging of software etc.
CEF-Eric: This means complete exchange failure due to an Ericsson equipment.
CEF-Cust: This means complete exchange failure due to the customers own
equipment.
2.3.
SI Prototype Summary
2.3.1.
Overview
35
If applied to the MSC-S BC, the same idea will look like as presented with the
help of Figure 2.9.
36
37
2.3.2.
Processor frequency
Number of processors
RAM memory
Operating System
2.83 GHz
4, with 4 cores each
12 GB
Ubuntu 10.04.3 LTS, 64-bits version
The Cloud machine was a Genuine Intel computer with the configuration
outlined in Table 2.2.
Stockholm Laboratory B: The Stockholm Laboratory B contained two physical
machines that were used to implement the prototype MSC-S BC blades in
38
Processor frequency
Number of processors
RAM memory
Operating System
2.53 GHz
16, with 4 cores each
32 GB
Ubuntu 10.04.3 LTS, 64-bits version
the tests. Both were Intel Xeon machines with the configuration outlined in
Table 2.3.
Processor frequency
Number of processors
RAM memory
Operating System
2.4 GHz
24, with 6 cores each
60 GB
OpenSUSE 11.4, 64-bits version
Once the virtual machines were created, the Stockholm Laboratory B network
topology was modified so that all of the four new virtual machines were able
to communicate locally with each other, and through VPN with the Bridge
machine in the Stockholm Laboratory A, and, by extention, with the whole
test network. Figure 2.10 illustrates the Stockholm Laboratory B network
topology. As can be observed in this figure, the blade numbers chosen for the
machines were 13, 14, 15, and 16, although they could be modified as needed.
A more detailed configuration can be found in a previous master thesis [17] .
39
Chapter 3
Evaluation
This chapter describes the evaluation part of this thesis project. In the beginning
a discussion about how a theoretical study was carried out is presented (Section 3.1).
It is followed by the various findings of this theoretical study (Section 3.2). Next,
an individual finding analysis along with a suitable proposal for each finding is
presented (Section 3.4). Further to that a test strategy and the designed test
cases are discussed for verifying important proposals of the study. Then the test
execution, the test results and the challenges encountered during test execution are
clearly stated (Section 3.4.3). In the end an evaluation summary is presented.
3.1.
Since the higher goal of this thesis project was to find out about the
requirements, which enable the migration of one of the Ericssons platform to a
Simplified Infrastructure without causing any impact on its in-service performance,
the main focus area of the theoretical study was derived by understanding the
functions directly or indirectly contributing to the platforms in-service performance.
With this idea in mind important concepts of the in-service performance as well
as an overall architecture of the platform (MSC-S BC in this case) was studied
very thoroughly (considering both the HW and SW functions). Exact details
are presented further, where Section 3.1.1 talks about the functional areas of the
platform, which comes into picture with respect to in-service performance, and
Section 3.1.2 sheds some light on the current design of the platform and what it
means for such a platform to migrate to a Simplified Infrastructure with no impact
on its in-service performance.
3.1.1.
From the detailed study of the in-service performance concepts, it can be said
that there exist a very good ISP statistics internal to Ericsson. Hence detailed
41
42
CHAPTER 3. EVALUATION
3.1.2.
43
The detailed study of the platform indicated that many important system
functions including different groups of fault tolerant functions such as the fault
detection, the fault recovery and the logging were closely coupled to its existing
HW. It was also observed that such a close couplings had also increased the number
of interfaces, both IP based an NON-IP based, which became mandatory to utilize
in order to complete the connectivity for expected operation of its fault tolerant
architecture. Additonally these interfaces had also created more than one way
to access some of the most crucial functions as well as the data storages, whose
consistancy played a crucial role for every decision taken by the fault tolerant
architecture of the platform (for both the blade level as well as the cluster level
fault handling).
Since the Simplified Infrastructure could provide only IP based connectivity
between all of its components, this became a principle differentiator while doing
an analysis of the various fault tolerant functions of the blade. This also became a
governing factor for further analysis of the results.
Hence while carrying out this study, only those interfaces and the fault toelrant
functions were considered, which were making use of the NON-IP based connection.
They were analyzed with respect to their unavailability within the Simplified
Infrastructure and it was firmly believed that this methodology would help in
analyzing an impact on the platforms in-service performance for such a migration.
Figure 3.1 and Figure 3.2 demonstrate such a close coupling between various
system functions of two MSC-S BC blades (BC0 and BC1 in this case) and its
infrastructure blades (MXB and SIS in this case).
44
CHAPTER 3. EVALUATION
Figure 3.1. BSOM signal flow diagram between MSC blades and a SIS blade.
45
3.2.
As it can be seen from Figure 3.1, one of the ways that a particular MSC-S BC
blade could reach to an another MSC-S BC blade within a cluster was by using
the BSOM function. As learnt previously, BSOM is the "Blade System Operation
and Maintenance Master" and it forms part of the Plex Engine Subsystem. BSOM
uses a communication channel, which is a combination of IP and NON-IP interfaces
when a particular MSC-S BC blade want to reach another MSC-S BC blade within
a cluster. This interface is identified as "SNMP together with IPMI interface" in
the context of this study (indicated through a path between point 1 and point 12 in
Figure 3.1). Similarly to have a communication only between the SIS blade and the
MXB blade, an "IPMI interface" is used (indicated through a path between point
11 and 12 in Figure 3.1).
Based on this, the identified interfaces and the functions using these interfaces
are listed below. These functions were analyzed further with respect to the impact
about their unavailability on the in-service performance of the platform after
migrating to the Simplified Infrastructure. The identified functions are mainly
46
CHAPTER 3. EVALUATION
3.2.1.
Interfaces Identified
SNMP together with the IPMI interface - Combination of the IP and NON-IP
interface
IPMI, which is an Intelligent platform management bus based on the I2C
protocol - Pure NON-IP interface
3.2.2.
Function-1: An automatic boot (both hard and soft) ordered by the Blade
Recovery Manager function for the suspected faulty blade(s)
Function-2: Supervision of the suspected faulty blade(s)
Function-3: Link fault detection and the recovery (part which is done through
NON-IP interface)
Function-4: Plane fault detection and the recovery for different switches
Function-5: Various test functions for deteremining availability of NON-IP interfaces
Function-6: Boot order as a part of the manual repair for the suspected faulty
blade(s) with the help of the Blade Recovery Manager function
Function-7: HW clock synchronization
Function-8: Inventory management
Function-9: Support processor (IMC) supervision
Function-10: Various logging via SNMP-IPMI interface
3.3.
3.3.1.
3.3.1.1 Analysis:
An automatic boot is a part of the fault recovery function within the blade
fault tolerant architecture of the platform. The impact of an unavailability of the
automatic boot function was analyzed by studying a probability of the occurrenace
47
of this function from the nodes installed in the real fields. The probability of these
occurances were derived by making use of the in-service performance statistics of
the nodes. These statistics are available only internally and they are not shared or
published outside Ericsson.
For the cluster based distributed system like the MSC-S BC, an every occurance
of the automatic boot meant a certain percentage of the reduction in a capacity of
the platform in terms of handling the number of mobile calls and hence a certain
reduction in the availability percentage of the platform within a network. When
a MSC-S BC blade undergoes a reboot (automatic or manual), it leaves an active
group of CPs (quorum) and hence it will not contribute in serving any of the
mobile calls. This means if the probability of occurance of this function turns out
to be ZERO or close to ZERO in the installed base then, it would be fair to say
that the impact of an unavailability of this function on the in-service performance
of the platform is negligible in a situation when it is migrated to the Simplified
Infrastructure.
48
CHAPTER 3. EVALUATION
ISP Statistics collected as a part of Events from the Installed Base: By counting number of occurance of an automatic boot with the help of available inservice performance statistics collected regularly from the currently installed
base.
Benefits of a Clustered Architecture: By studying the benefits of the cluster
architecture having an M+N redundency principle, where the impact of
loosing a single blade on the availability of the platform due to an automatic
boot is ZERO and loosing more than one number of blades was considered
negligible, since the platforms capacity to handle the total number of mobile
calls reduces in a very graceful manner.
The graphical representation of the same is shown using Figure 3.3.
3.3.1.2 Discussions:
As it can be seen from Figure 3.3, the outcome for the two inputs considered for
an analysis of the unavailability of an automatic boot function gave the following
results.
ISP Statistics collected as a part of Events from the Installed Base: The
number of times the automatic boot executed in the currently installed base
turned out to be ZERO
Benefits of Cluster Architecture: Due to the M+N redundency principle, the
impact of loosing one or more MSC-S BC blades due to an automatic boot
could be considered negligible.
3.3.1.3 Proposal:
Due to such outcome on the considered inputs it can be concluded that the impact of unavailability of an automatic boot function due to the platforms migration
to the Simplified Infrastructure on its in-service performance is negligible. Hence
one of the alternatives could be not to take any action for unavailability and to
continue to have the system without the automatic boot function after migration.
3.3.2.
3.3.2.1 Analysis:
This function is a part of the fault recovery as well as the logging group within
the blade level fault tolerant architecture. The impact of this function was analyzed
by understanding the contribution made by this function during the blade recovery.
As it can be seen from Figure 3.4, the function necessarily served two functions.
They were mainly,
49
Figure 3.4.
function.
3.3.2.2 Discussions:
After doing a detailed analysis, the following could be said concerning unavailability of this function.
Decision of Escalating Blade Level Recovery to an Automatic Boot: This
part of the function would become automatically obsolete since the automatic
boot function would not be present in the Simplified Infrastructure as
discussed in the Function-1 analysis.
50
CHAPTER 3. EVALUATION
Fault Reporting through Logging and Raising an Alarm: Since this part
of the function was common for all the other recovery escalations (including
an automatic boot), it is necessary to keep this part of the function and hence
its unavailability could be compensated by minor changes in the exisiting
function.
3.3.2.3 Proposal:
Summarizing above analysis, it could be proposed that, it would be enough to
partly compensate an unavailability of the above function (only for the logs and
the alarms part of the function) with the help of an alternative implementation in
order to have the same functionality continued for other recovery steps before and
after the migration.
3.3.3.
3.3.3.1 Analysis:
During a detailed analysis, it was understood that the link management was
done by the cluster protocols as a part of the Cluster Quorum Subsystem (CQS)
as well as by the fault handling functions within the Maintenance Subsystem
(MAS). CQS used pure IP based connectivity (UDP packets) for performing the link
fault handling whereas MAS used NON-IP interfaces (with the help from BSOM
function). The reason behind MAS having such an implementation was to comply
with the requirement of an IS infrastructure blade of the platform, as discussed in
detail as a part of Chapter 2.
Additionally, the next version of the infrastructure blades (called as "Ericsson
blades system" (EBS)) had kept the sole responsibility of implementing this function
upto the platform. This means that the platform has the freedom of choice of
implementation for this function as well as the type of interface it want to use.
51
3.3.3.2 Proposal:
Considering the information presented in an analysis section of the link
management function as well as enough test results from the previous prototype
testing [17], it was decided to go with only one way of handling link management,
and it was through the cluster protocols (using only pure IP based connectivity).
Furthermore, in order to reconfirm this decision, a good amount of testing was
also decided to be performed as a part of the current master thesis project (using
the Simplified Infrastructure prototye).
The same thought-process is depicted using Figure 3.5.
3.3.4.
3.3.4.1 Analysis:
During a detailed study, it was understood that from the begining the MSC-S
BC was provisioned to perform the plane fault detection as well as the recovery
through the cluster protocols (through pure IP based interface) but it had never
been used for that. Instead the plane management was a responsibility of an IS
52
CHAPTER 3. EVALUATION
infrastructure blade (SIS and MXB), which made use of a combination of IP and
NON-IP interfaces as described in more detail in Section 2.1.
Furthermore, it was also learnt that the latest infrastructure (EBS based) does
not pause any such requirement on the MSC-S BC blades and its completely
dependent upon the platform to decide how to perform the plane fault detection
and recovery.
3.3.4.2 Proposal:
Based on the points presented in an analysis section of this function, it
was decided to give an attempt to use already implemeted plane management
functionality through the cluster protocols.
In order to demonstrate that the plane management through the cluster
protocols works as expected, it was also decided to perform a sufficient amount
of verification as a part of the current master thesis project (using the Simplified
Infrastrucure prototype).
The same thought-process is depicted using Figure 3.6.
3.3.5.
53
3.3.5.1 Analysis:
During this study the functions identified under function number 5 to 10 (listed
under Section 3.2.2) were not directly part of the Fault Tolerant Architecture of the
blade and hence were not studied as a part of this study. All these functions are
also indicated using Figure 3.7.
3.3.5.2 Proposal:
Even though these functions did not directly form the part of the Fault Tolerance
Architecture (neither at the cluster level or at the blade level), they were still
identified and considered as crucial ones with respect to the complete platform and
hence they would require a further study in order to provide an analysis similar to
the one provided for Function-1 to Function-5.
3.3.6.
54
CHAPTER 3. EVALUATION
3.4.
Many of the proposals of this theoretical study had been verified using the
Simplified Infrastructure prototype described in Section 2.3. The same prototype
could be applied to this study with minor modifications so that it would be possible
to draw concrete decisions on various proposals of this study (after a sufficient
amount of verification).
3.4.1.
Verification Strategy
To verify the different proposals presented in Section 3.3 and to make the
verification as simple as possible following a step by step manner, the test execution
was decided to carry out by dividing the tests into the following different groups.
Group-1:Verification when both the MXB plane is up and running and without
any mobile traffic (Normal case)
Group-2:Verification when one of the MXB plane is down without any mobile
traffic (Redundency situation case)
55
3.4.2.
For each of the group listed in Section 3.4.1, a specific set of test cases were
designed. They were divided into two main catagories depending upon their
purpose. They were mainly,
Cluster Scalability Tests
Fault Recovery Tests
56
CHAPTER 3. EVALUATION
3.4.3.
Test Execution
As a part of this section, the test execution procedure is presented for both
catagories of the test cases, which cover the cluster scalability tests and the fault
recovery tests.
3.4.3.1 Cluster Scalability Tests - Execution
1) Formation of a Cluster from Scratch
The cluster is formed by adding the MSC-S BC blade one after the other using
a procedure described in section:MSC-S blade addition. This loop is repeated untill
all the required blades are added to the cluster.
57
Topology description
Active
external
cluster with prototype
MSC-S BC blades 13
and 14 in the Stockholm
Laboratory A.
Active
external
cluster with prototype
MSC-S BC blades 13, 14
and 15 in the Stockholm
Laboratory A.
Test
Result
Prototype MSC-S BC
blade 15 is added to the
cluster.
Prototype MSC-S BC
blade 16 is added to the
cluster.
The leader of the cluster is designated when the Ericsson MSC-S BC cluster is created.
Specifically, the blade to which the MSC-S application is specified first would be the cluster leader.
58
CHAPTER 3. EVALUATION
Table 3.2. Compilation of tests results where one of the prototype MSC-S BC blade
was removed from an existing MSC-S BC cluster.
Topology description
Active
external
cluster with prototype
MSC-S BC blades 13,
14, 15 and 16 in the
Stockholm Laboratory A.
Active
external
cluster with prototype
MSC-S BC blades 14, 15
and 16 in the Stockholm
Laboratory A.
Test
Result
Prototype MSC-S BC
blade 13 is removed from
the cluster.
Prototype MSC-S BC
blade 14 is removed from
the cluster.
1) Laboratory Setup
Every prototype MSC-S BC blade was configured with the two interfaces
following an Appendix B (confidential).
In addition, the link faults needed to reproduce the different types of link faults
(single blade link and multiple blade link faults) were simulated by disconnecting the
machine Ethernet interfaces using a simple script that brought down the interfaces,
left them down for some specified time, and brought them up again.
59
60
CHAPTER 3. EVALUATION
The same procedure was repeated for the MSC-S BC blades 13, 14 and 16
as well. They each recovered in the same manner as described above for the
MSC-S BC blade 15.
61
MSC-S BC Blade Passive: The MSC-S application supervision function did not
have a consistent behavior when handling the prototype MSC-S BC blades.
The main responsibility of this supervision function was to make a decision
about a newly added MSC-S BC blade to the cluster about its activation
and hence contribute to the serving traffic. But whenever a new prototype
MSC-S BC blade was added to an active cluster after a successful cloning
process, its state was either "passive" or an "active" with no defined pattern
to be observed. This inconsistency also appeared when a cluster was created
with the prototype MSC-S BC blades, where one or more of the prototype
MSC-S BC blades forming the cluster sometimes remained in the "passive"
state after an activation process.
Chapter 4
4.1.
Conclusions
4.1.1.
System Study
During the system study phase (included both the ISP statistics analysis and
an identification and analysis of various fault tolerant functions using NON-IP
interfaces) apart from the self study, a number of discussions were carried out with
the Ericsson designers and the system managers. It was a very positive experience.
The final outcome of these studies resulted in an identification of a total of TEN
functions along with FIVE different proposals for the identified functions. The
proposed action on each of the functions would enable a smooth migration of a
63
64
considered mobile core application from the native infrastructure solution to the
simplified infrastructure solution and to the Cloud as well with an improved inservice performance. Out of the total FIVE proposals, the first FOUR proposals
were also also verified using the SI prototype and the last proposal requires further
study on the idetified functions. The detail about various conclusions related to the
verification phase is presented in futher section.
4.1.2.
Laboratory Tests
4.2.
Future Work
Based on the challenges faced during the current master thesis project, a number
of suggestions on future work are presented to extend the work carried out in this
master thesis project.
As it has been mentioned several times through out this report, the main
purpose of this project has been to investigate if one of the mobile core application
could be migrated to the Simplified Infrastructure and hence to the Cloud without
having any impact on the in-service performance of one of its core infrastructure
components. In this process some issues, which are directly connected to the MSC-S
application software affected some of the basic functionality such as scalability of
65
Bibliography
[1]
[2]
[3]
[4]
[5]
[6]
Cisco and/or its affilliates. Cisco Visual Networking Index: Global Mobile
Data Traffic Forecast Update, 2010-2015, February 2011.
Available
from:
http://www.cisco.com/en/US/solutions/collateral/
ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.pdf.
[Accessed 25 September 2012].
[7]
J.D. Chimeh.
Mobile services: Trends and evolution.
In Advanced
Communication Technology, 2009. ICACT 2009. 11th International Conference
on, volume 02, pages 946 948, February 2009.
[8]
[9]
Ericsson AB.
Voice and Video calling over LTE, February 2012.
Available from: http://www.ericsson.com/res/docs/whitepapers/
WP-Voice-Video-Calling-LTE.pdf. [Accessed 27 January 2013].
67
68
BIBLIOGRAPHY
[10] Srini, Rao. Mobile Broadband Evolution - LTE and EPC, April 2010.
Available from: http://ieee-boston.org/presentations_society/
lte_epc_ieee_comsoc_rao_april_8_2010.pdf. [Accessed 20 January
2013].
[11] Ericsson AB.
Introduction to IMS, March 2007.
Available from:
http://www.facweb.iitkgp.ernet.in/~pallab/mob_com/
Ericsson_Intro_to_IMS.pdf. [Accessed 25 January 2013].
[12] Korinthios Georgios. Mobile Services Network Technology Evolution and
the role of IMS. Available from: http://www.ict-fireworks.eu/
fileadmin/events/FIREweek/2nd-WS-Converged-Networks/
03-Georgios_Korinthios.pdf. [Accessed 27 January 2013].
[13] G. Camarillo and M.A. Garca-Martn. The 3G IP Multimedia Subsystem
(IMS): Merging the Internet and the Cellular Worlds. John Wiley & Sons,
2007.
[14] Yashpalsinh Jadeja and Kirit Modi. Cloud Computing - Concepts, Architecture
and Challenges. In International Conference on Computing, Electronics and
Electrical Technologies [ICCEET], 2012 IEEE, pages 877 880, 2012.
[15] Peter Mell and Timothy Grance.
The NIST Definition of Cloud
Computing, September 2011.
Available from: http://csrc.nist.
gov/publications/nistpubs/800-145/SP800-145.pdf. [Accessed 5
October 2012].
[16] Adetola Oredope and Antonio Liotta. Plugging 3G Mobile Networks into the
Internet: A Prototype-Based Evaluation. In Computing, 2006. CIC 06. 15th
International Conference on, pages 406 411, November 2006.
[17] Isaac Albarran and Manuel Parras. Telecommunication Services Migration
to the Cloud: Network Performance analysis, TRITA-ICT-EX-2012:54.
April 2012.
Available from:
http://web.it.kth.se/~maguire/
DEGREE-PROJECT-REPORTS/120429-Isaac_Albarran_and_Manuel_
Parras-with-cover.pdf. [Accessed 2 February 2013].
[18] Petri Maekiniemi and Jan Scheurich. Ericsson MSC Server Blade Cluster,
March 2008.
Available from:
http://www.ericsson.com/res/
thecompany/docs/publications/ericsson_review/2008/Blade.
pdf. [Accessed 10 Feb 2013].
[19] Ericsson AB. Softswitch in fixed networks, 2005. Available from: http:
//kambing.ui.ac.id/onnopurbo/library/library-ref-eng/
ref-eng-2/physical/wireless/trend-gsm/softswitch.pdf.
[Accessed 25 January 2013].
BIBLIOGRAPHY
69
Appendix A
A.1.
Two of the most widely used mobile core architectures include the Global System
for Mobile Communications (GSM) and the Universal Mobile Telecommunications
System (UMTS)). They are presented with focus specifically on core functionality
of the MSC-S.
The MSC-Ss main task is to implement call servicess signalling functionality.
Hence the architecture description covers only those elements, who intervene in the
call service provisioning.
A.1.1.
72
entities in the network. Figure A.1 illustrates the topology of a GSM network and
the connections between the different functional units it comprises of.
The NSS is responsible for call control, mobility management, signaling, billing
data collection and subscriber data handling [3]. The NSS consists of the following
units:
Mobile services Switching Center (MSC)
The MSC node is responsible for the setup, supervision, and release of calls
within the GSM mobile network as well as for handling SMSs and managing
terminals mobility. It also collects call billing data in order to send it to the
Billing Gateway, which processes this data to generate bills for the subscribers.
Additionally, a MSC may act as a bridge between its own GSM network and the
Public Switched Telephone Network (PSTN), or another operators GSM network.
Such a MSC is called a Gateway MSC.
Home Location Register (HLR)
The HLR database stores and manages all mobile subscriptions belonging to a
specific operator. The HLR is the main database containing permanent subscriber
information for a mobile network. It stores all the information related to the
73
subscriber: the International Mobile Subscriber Identity (IMSI), the Mobile Station
Integrated Services Digital Network (MSISDN), which is the subscribers phone
number, information about the subscribers supplementary services, authentication
data, and location information.
Visitor Location Register (VLR)
The Visitor Location Register (VLR) is another database in a mobile communication network associated with a specific MSC. It contains information about all
mobile terminals currently located in this MSCs service area. The VLR maintains
the temporary subscriber information needed by the MSC to provide service (e.g.
route a call to the correct base station) to visiting subscribers. The information in
the VLR changes dynamically, as the subscribers move from cell to cell and network
to network. In order to obtain service a mobile terminal registers with the network
and this information enables the MSC to determine if it can be authenticated and
if it is authorized for each specific service - and it also enables calls to and from this
mobile terminal to be effectively handled and routed. The VLR can be seen as a
distributed subset of the HLR as, when a mobile station roams into a new MSCs
service area, the VLR connected to this MSC requests data about the mobile station
from the HLR and stores the response in the local VLR. If the mobile station makes
or receives another call without changing its service area, then the VLR will use
the subscribers information that it already has for authentication and call set-up.
The database entry of the subscriber may be deleted once the subscriber leaves the
MSC service area (note that the deletion may or may not occur quickly depending
upon the probability that the subscriber may return to this service area).
Authentication Center (AUC)
The Authentication Center (AUC) is a database that is connected to the HLR.
This database provides the HLR with the subscribers authentication and encryption
parameters that are used to verify the subscribers identity and to ensure the
confidentiality of each call. The AUC protects network operators from fraud and
is used to authenticate each Subscriber Identity Module card (SIM card) when
a terminal with this SIM card attempts to connect to the network (typically
when the phone is powered on in the network operators service area). Once the
authentication is successful, an encryption key is generated, and this key is used in
all communications between the mobile terminal and the network.
Equipment Identity Register (EIR)
The Equipment Identity Register (EIR) is an optional database that contains
mobile equipment identity information which helps to block calls from stolen,
unauthorized, or defective mobile stations. Operators can maintain three different
lists of International Mobile Equipment Identities (IMEI) in their EIR: a white list
containing valid mobile terminals, a grey list where dubious mobile equipment is
74
included, and a black list containing the mobile devices to which the service is to
be denied.
The Base Station Subsystem (BSS) is responsible for all radio-related features.
Typically, a MSC controls several BSSs, covering a large geographical area that is
divided into many cells [3]. This subsystem is composed of: Base Station Controller
(BSC) and Base Transceiver Station (BTS).
Base Station Controller (BSC)
The Base Station Controller (BSC) provides all the radio-related functions and
physical links between the MSC and the Base Transceiver Station (BTS). It also
implements functions such as handover, cell configuration data, channel assignment,
and control of radio frequency and power levels in each connected BTS.
Base Transceiver Station (BTS)
The Base Transceiver Station (BTS) handles the radio interface to the mobile
station. It facilitates the communication between the mobile devices and the
network. The BTS is the radio equipment (transceivers and antennas) needed to
serve each cell in the network. Normally, a group of BTSs are controlled by a BSC.
Finally, the Network Management Subsystem (NMS) is the entity via which the
network operator monitors and controls the whole system. The Operation Support
System (OSS) can be divided into a two-level management function formed by a
Network Management Center (NMC) responsible for the centralized control of the
system, and subordinate Operation and Maintenance Centers (OMCs) focused on
regional maintenance issues. The functions of the NMS can be divided into three
groups [3]:
Fault management to ensure the correct operation of the network and a rapid
recovery when failures occur.
Configuration management to maintain updated information about operation
and configuration of the network elements.
Performance management The NMS collects data from the network elements so
that the actual performance of the network can be compared to the expected
performance.
A.1.2.
75
complete system composed of the radio access network and the core network. The
core network initially preserved the GSM architecture in order to enable a graceful
evolution of the GSM networks. Regarding the UMTS radio access network, two
new elements were introduced to replace GSMs BSCs and BTSs respectively: Radio
Network Controller (RNC) and Node B. The remaining GSM network elements are
compatible with the UMTS network. Figure A.2 illustrates the UMTS network
architecture.