Self-Healing Mechanism On Switch-Controller Connections in SDN

Self-healing mechanism on switch-controller
connections in SDN
by
Takuma Watanabe
Supervisor
Assoc. Prof. Katsuyoshi Iida
Submitted to
Department of Communications and Computer Engineering
in fulfillment of the requirements for the degree of
Master of Engineering
at the
TOKYO INSTITUTE OF TECHNOLOGY
February 2015
Summary
Modern networking infrastructure is becoming more and more complex facing highly
growing demands from modern applications, and is required to meet certain big chal-
lenges including the manageability, flexibility and extensibility. Software-Defined Net-
working (SDN) is an emerging paradigm of computer networking that meets these de-
mands. On the other hand, SDN, besides its desired advantages, has a considerable dis-
advantage in its reliability due to its centralized architecture. To overcome this reliability
problem, many researches have been performed. However, none can protect networks,
especially their control logic, under large-scale, unexpected link failures. Networking in-
frastructures are now fundamental basis of modern society, so that they must be reliable
even under severe failures which may caused by disasters.
We have found that centralized protection and restoration mechanisms, in which only
controllers take actions to recover control logic against link failures, is incapable of re-
covering from large-scale link failure, so that we decided to use a distributed mechanisms,
in which all switches are able to maintain their own control logic. We proposed Resilient-
Flow, a self-healing mechanism in which switches can manage their control channel by
their own means. We introduced a module, the Control Channel Maintenance Module
(CCMM) that enables a switch to detect control channel failure and restore the control
channel via an alternative path, so that the switches can maintain their control channel by
their own means. Inside all switches, A CCMM 1) monitors link status of the switch with
heartbeat packets, 2) exchanges network topology maps with the switch’s controller(s)
and with neighboring CCMMs, and 3) sets up flow entries in the switch to establish a
path from the switch to controller(s).
In this paper, we designed and implemented our ResilientFlow. For our implementa-
tion, we used the OSPF daemon in Quagga to monitor link status and to exchange network
topology maps. We utilized Internal Ports of Open vSwitch for OSPF daemon to work
correctly with. We then implemented the CCMM’s flow entry installer using Python. We
used Linux kernel’s multiple routing table functionality so that the switch’s routing should
be converged into SDN manner where routing only follows flow table, not a conventional
TCP/IP mechanisms.
To prove our concept and show how the ResilientFlow recovers control channels, we
have placed a series of experimental evaluation in two different scenarios: the scenario
in which a single specified link is failed with a dedicated topology, and the scenario in
i
which random multiple links are failed with a real world topology. We showed that the
ResilientFlow recovers control channel within 300 ms against a single link failure. We
also showed that the ResilientFlow can restore control channels against multiple, severe
link failures and they take the time in the order of seconds.
We also made a further extension to the CCMM against domain-splitting problem,
where a switch has no path available to the controllers. We extend the CCMM to be
an emergency alternative controller. We performed experiment and showed the applica-
bility of the CCMM to domain-splitting problem, with approximately 1 to 2 seconds of
restoration time.
For future works, we suggested further applications of the ResilientFlow for a SDN
bootstrapping problem and discussed a better controller selection method in a split-domain
environment.
ii
Acknowledgement
Foremost, as a beginning, my deeply grateful appreciation goes to Associate Professor

Katsuyoshi Iida for his insightful comments, constructive suggestions, kind guidance and
supportive encouragements throughout my days in master course.
To Associate Professor Toyokazu Akiyama do I felt deep gratitude for his strong per-
sistent supports on my research project. I would like to express my cordial thankfulness
to Postdoctoral Researcher Takuya Omizo for his in-depth discussions, valuable advices
and technical dedications.
I am also grateful to Former Project Assistant Professor Masayoshi Shimamura and
Postdoctoral Researcher Yoshiyuki Uratani for their valuable feedback and useful advices.
I feel thankful to Secretary Rieko Aoki of the Iida Laboratory for her dedications to the
comfortable working environment of the laboratory. I would put my gratitude to all the
members in the Iida Laboratory for their enormous, persistent helps.
I would like to note that my study is partially supported by Strategic Information
and Communications R&D Promotion Programme (SCOPE) of the Ministry of Internal
Affairs and Communications of Japan.
As a personal gratitude, I would express surely, deeply, grateful to my personal, close
friends for their inspiring words and invaluable supports.
Finally, I would like to put my indebtedness to my family for long-term, moral and
invaluable support through out my entire life.
iii
Contents
Summary i
Acknowledgement iii
Table of Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
2 SDN and its Reliability 4

2.1 Software-Defined Networking (SDN) . . . . . . . . . . . . . . . . . . . 4
2.1.1 What is SDN, and Why . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 SDN Design: Switch, Controller and Application . . . . . . . . . 5
2.1.3 SDN Design: Switch–Controller Connection . . . . . . . . . . . 6
2.1.4 SDN Switch Design . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Reliability in SDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Reliability of Data Plane . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Reliability of Control Plane . . . . . . . . . . . . . . . . . . . . 9
3 ResilientFlow 11
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1) Monitoring Link Statuses . . . . . . . . . . . . . . . . . . . . . . . . 13
2) Exchanging Network Topology Maps . . . . . . . . . . . . . . . . . . 13
3) Installing Flow Entries for Control Channel . . . . . . . . . . . . . . . 13
4 Implementation 15
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Monitoring Link Status and Exchanging Network Topology Maps . . . . 16
4.3 Installing Flow Entry for Control Channel . . . . . . . . . . . . . . . . . 17
iv
5 Experimental Evaluation 19
5.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.1 Emulated Network with Mininet . . . . . . . . . . . . . . . . . . 19
5.1.2 Switch Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1.3 Controller Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Evaluation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.1 Single Specified Link failure Scenario . . . . . . . . . . . . . . . 22
5.2.2 Random Multiple Links Failure Scenario . . . . . . . . . . . . . 27
6 Extension for Domain-Splitting Environment 32

6.1 SDN Domain-Splitting Problem . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Extension to the CCMM . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1) SDN Controller Functionality to Switches . . . . . . . . . . . . . . . . 33
2) Controller Choosing Functionality . . . . . . . . . . . . . . . . . . . . 33
3) Switch Reconfiguring Functionality . . . . . . . . . . . . . . . . . . . 33
6.3 Experiments with Extension . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Discussion 36
7.1 Controller–CCMM Coordination Problem . . . . . . . . . . . . . . . . . 36
7.2 SDN Domain-Splitting Problem . . . . . . . . . . . . . . . . . . . . . . 36
Controller Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Controller–Application Re-coordination . . . . . . . . . . . . . . . . . . 37
7.3 SDN Bootstrapping Problem . . . . . . . . . . . . . . . . . . . . . . . . 37
8 Conclusion 38
Bibliography 40
Publications 44
English Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Japanese Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
v
List of Figures
2.1 SDN architecture compared with conventional network architecture . . . 6

2.2 SDN switch–controller connection . . . . . . . . . . . . . . . . . . . . . 7
2.3 SDN switch design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Different aspects and different parts of the reliability in SDN . . . . . . . 9
3.1 Overview of ResilientFlow . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Design of CCMM-enabled switch . . . . . . . . . . . . . . . . . . . . . 12
3.3 Functionality of CCMM . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 Implementation of CCMM-enabled switch . . . . . . . . . . . . . . . . . 16

4.2 OSPF daemon to use with OpenFlow switch: Problem description . . . . 17
4.3 OSPF daemon to use with OpenFlow switch: Solution using Internal Ports 17
5.1 Overview of CCMM-enabled SDN emulator with Mininet . . . . . . . . 20

5.2 Topology for single specified link failure experiments . . . . . . . . . . . 23
5.3 Three failure cases in single specified link failure experiments . . . . . . 24
5.4 Link restoration performance on three cases in single specified link failure
experiments: out-to-in case . . . . . . . . . . . . . . . . . . . . . . . . . 25
experiments: in-to-in case . . . . . . . . . . . . . . . . . . . . . . . . . . 25
experiments: in-to-in-middle case . . . . . . . . . . . . . . . . . . . . . 26
5.7 Topology for large scale link failure experiments . . . . . . . . . . . . . 28
5.8 Network restoration time against link disconnection rate . . . . . . . . . 29
5.9 Number of reachable switch against link disconnection rate . . . . . . . . 29
5.10 An example timeline of switches’ link restoration . . . . . . . . . . . . . 30
5.11 An example Topology for large scale link failure after link has been dis-
connected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.12 An example Topology with 80% link failure after link has been disconnected 31
6.1 SDN domain-splitting problem . . . . . . . . . . . . . . . . . . . . . . . 33

6.2 Topology for domain-splitting experiment . . . . . . . . . . . . . . . . . 35
6.3 Network restoration time against link disconnection rate . . . . . . . . . 35
vi
List of Tables
5.1 Implementation and evaluation environment . . . . . . . . . . . . . . . . 21
vii
Chapter 1
Introduction
Today’s modern networking infrastructure is becoming more and more complex facing
highly growing demands from modern applications, and is required to meet certain big
challenges including the manageability, flexibility and extensibility. Despite their today’s
wide adoption, managing the conventional networks are known to be hard task due to
its lacking manageability [6]. Also, not only modern diversified network services and
applications have rich in amount of data traffic [12] but also have multifarious resource
demands including bandwidth, latency and other kind of QoS demands. Enforcing high-
level policies through out networks is required in today’s datacenter and inter-datacenter
networks [8], which leads to the strong needs of efficient support of flexible network man-
agement, or the flexibility of the network [9]. Additionally, the conventional networking
architecture is lacking extensibility adopting to new applications and their demands, due
to its vertically-integrated, hardware-implemented architectural design [13].
Software-Defined Networking (SDN) is an emerging paradigm of computer network-
ing against these problems [1][2][3][10]. The key idea of SDN is to separate the net-
work’s control logic (control plane) from packet forwarding logic (data plane), which is
vertically integrated into the same switch box in the conventional networking architecture,
and converge the control logic into the centralized few instances called controllers [13].
In SDN, switches only perform packet forwarding functions, while controllers take care
of the rest of the functions required, including maintaining the forwarding tables of the
switches. Through the controllers, administrators and approved applications take control
of the networks so that they can flexibly manipulate the networks. With the controllers in
SDN basically implemented as software, we can easily extend networking infrastructure
to meet additional demands from new applications without upgrading physical switches.
Also, only one or few number of controllers assumed to be placed in each administra-
tive domain of network. This architecture significantly reduces the tasks of managing
networks [1].
While SDN has advantages in the manageability, flexibility and extensibility, it also
has a notable disadvantage in its reliability. Having split and centralized control logic,
1
SDN may lose its control capability when the controller fails or the links between switches
and controllers fail.
Today’s networking infrastructure needs to be manageable and extensible as noted
above, which SDN can realize, it also needs to be reliable as it has become crucial to
modern society. Networking infrastructure must continue to be reliable, while being ex-
posed to many disasters. Many disasters are reported to cause large-scale link failures and
affect unexpected parts of the network. For example, hurricane causes a persistent link
failure [20]. Earthquake causes an even worse failure, international fibre disconnection
and large-scale re-routing [18][19].
Many researches, thus, have done to bring the reliability for SDN [5], which will be
introduced in the next chapter. Fonseca et al. [15] proposed approaches deploy multi-
ple controllers for backups in a single administrative domain of the network. Sharma
et al. [17][16] proposed protection methods that use prepared backup links to keep con-
trollers and switches connected. These preparation methods only work well where some
expected node or link fails.
Current SDN failure recovery mechanisms introduced by many researchers have been
bound to the core idea of SDN, a centralized paradigm, therefore they have the limited
failure recovery ability; i.e. the existing approaches cannot deal with unexpected large-
scale link failures. SDN removed switches’ capability of self-management of forwarding
information base including one for their own control channels. This architecture is be-
lieved to be introduced for the simplicity thus the cost efficiency of switches in very early
days of SDN [13] and this simplicity is exactly what reduces the reliability of control
channel as described above. Being more widely used as demands from applications grow
and get complex, SDN must have the reliability in their control logic and this reliability is
based on both the reliability of controller and control channel. The reliability of control
channel is hard to be held with no capability of self-management in switches, and we thus
believe switch has to be able to manage their own control channel by their own means.
SDN’s promise is that controllers can manage their switches (thus entire network) [10].
This does not meant that only controllers have to manage their switches. And for a fun-
damental basis, we must keep switches and controllers connected even under unexpected,
large-scale link failures, to keep network alive.
To overcome the SDN’s limitations and have them a tolerance against large-scale, un-
expected link failures which are caused by disasters, we introduce a distributed paradigm
to protect and restore control logic in SDN; that is, to make SDN resilient to link failures.
Here we propose ResilientFlow, a self-healing mechanism in which we utilize a dis-
tributed link failure detection and restoration method. Our proposal is to enable switches
to manage their own logical connections to the controllers, called control channels. We in-
troduce a module called the Control Channel Maintenance Module (CCMM) and deploy
it to every switch and controller. Each CCMM monitors the link status between neighbor-
ing switches and controllers, and exchanges neighboring information with neighboring
2
CCMMs, so that the switch can detect disconnection of the control channel. Upon the de-
tection of the link failure, CCMM modifies the forwarding table of the switch, to keep the
control channel alive. Note that the CCMM only maintains forwarding table entries which
establish the control channel, not any entries for data forwarding, so that the centralized
controlling remains at the controller.
The rest of the content is organized as follows. Chapter 2 describes technical details
of SDN and its reliability, where we introduce the precedent research to improve the re-
liability of SDN. Chapter 3 describes the design of ResilientFlow and its CCMM, while
Chap. 4 shows the implementation of CCMM based on the design. Chapter 5 presents ex-
perimental results from our implementations in two types of evaluation scenarios: the per-
formance evaluation in a small topology scenario and the resiliency demonstration in the
real world topology scenario. Chapter 6 shows the further application of the CCMM with
the specific extension to the domain-splitting environment. Chapter 7 discusses technical
issues of SDN and CCMM, and suggests advanced applications of CCMM, and Chapter
8 concludes the paper.
3
Chapter 2
SDN and its Reliability
Software-Defined Networking (SDN) is an emerging concept of computer networking

that provides both powerful manageability and flexibility to meet complex demands from
applications. In this section, we first introduce SDN and its architecture. We then survey
the existing researches that focus on the reliability of SDN.
2.1 Software-Defined Networking (SDN)

2.1.1 What is SDN, and Why
SDN is a concept and an architectural design of networking. SDN decouples the data
forwarding component in the network, called data plane, from the network management
component, called control plane, and centralized the control plane into a dedicated equip-
ment called controller. Through controller, SDN gives a controllability to the user ap-
plications. SDN takes this architectural design aiming to bring networking infrastructure
with three features, the manageability, flexibility and extensibility [3]. The flexibility
also implies programability of networking. With these functionalities provided, the use
of SDN is rapidly and widely growing including campus network, datacenter network,
inter-datacenter network and carrier network [11]. In the following, we will describe
these three problems of the manageability, flexibility and extensibility in the conventional
architecture.
The conventional networking architecture is made with a distributed manner, in which
every switching equipment is independent and has both data plane and control plane im-
plemented in their hardwares. Early days of computing network is developed to be war-
tolerant and they needs to be reliable in military grade, so that they take a distributed
architectural design [7]. In the conventional architecture, each switch manages their own
controlling system including their own forwarding table, with distributed routing algo-
rithms and protocols. This architecture ensures the tolerance against link and/or switch
failures. Despite their today’s wide adoption, managing the conventional networks are
4
hard task lacking manageability [6]. In the conventional networks, applying a specific
high-level network policies, which is essential for modern complex networks, needs net-
work operators to use low-level command interface on each network devices separately.
As a result from this distributed design, the manageability improvement is highly required
in the conventional architecture.
Also a standardized automatic configuration framework is not exist in the conventional
networking architecture, which means that flexible management and configuration of net-
works in a dynamic environment is highly challenging. Today’s usage of networking,
especially in a data center networks, are lacking of efficient support of flexible network
management, thus the flexibility of the network is required [9]. Not only modern di-
versified network services and applications have rich in amount of data traffic [12] but
also have multifarious resource demands including bandwidth, latency and other kind
of QoS demands, and adopting these applications needs an efficient support of flexible
network management in these dynamic networking environment. Enforcing high-level
policies through out networks is required in today’s datacenter and inter-datacenter net-
works [8]. For example, handling interactive user traffic, which is delay-sensitive, and
background traffic, which is delay-non-sensitive, separately makes Quality-of-Service,
Quality-of-Experience better while making better accommodation of networks; which we
cannot in the current networking architecture [4].
Additionally, in the conventional architecture, network controlling functionality in-
cluding distributed protocols and algorithms, are implemented on a hardware of switch-
ing equipment, due to its vertically integrated model of data plane and control plane. This
architectural design makes introducing, extending and innovating new protocols, designs
and architectures hard as it may easily need existing networking hardwares to be replaced
entirely, thus is lacking of extensibility and enabling extensibility of networks are desired
[13].
As a solution to these problems on modern networking infrastructure, SDN is pro-
posed and widely growing. Later in this section, we will describe the design of SDN in
detail.
2.1.2 SDN Design: Switch, Controller and Application

Following detailed description of SDN is based on a standardized SDN specification,
OpenFlow specification [23] from Open Network Foundation and surveys of SDN [1][2].
More wide or specific description including managing application layer, available imple-
mentations and technically detailed specifications, refer to these citations.
A key architectural attribute of SDN is the separation of the control plane from the
data plane. SDN consists of three parts: switches, controllers, and user applications.
Figure 2.1 compares the architecture of SDN with the conventional architecture.
In conventional networking architecture, both the data and control plane are integrated
into the same switching equipment. By contrast, switches in SDN only have the data
5
Architecture(Comparison(
Applica=ons(
Controller( Control(Plane(
Control( Control(
Plane( Plane( Control(Channel(
Data( Data(
Plane( Plane(
Data( Data( Data(
Switch( Switch(
Plane( Plane( Plane(
Switch( Switch( Switch(

Conven=onal(Architecture( SDN(Architecture(
Figure 2.1: SDN architecture compared with conventional network architecture
plane, which is responsible for forwarding data packets on the basis of their forwarding
table, called a flow table. In SDN, switches never manage their own flow tables by them-
selves. In its early days, SDN is intended to be used without replacing any hardware
devices [13]. For this reason, SDN removed self-control capability from switches.
Controllers, on top of the switches, handle the rest of the functionality required for net-
working infrastructure to work, including managing their switches’ flow table. The com-
ponent that keeps networking infrastructure working is called the control plane. Switch’s
interface, through which a controller manage their switches is called Southbound API.
Controller nodes are usually implemented as a software, so that they can easily replaced
for upcoming demands from new applications. This realizes an extensibility of networks.
Also, in SDN, few number of controller instances are assumed to be places in a single
administrative networks. These controllers will entirely manage their own networks. This
architectural design significantly reduces a manage task of network operators, thus brings
us a better manageability of networks.
Applications, on top of the controllers, are given an abstract view and programmability
of networking infrastructure. Controller’s interface, through which applications take con-
trol of their given network is called Northbound API. Through this controller–application
interface, SDN realizes flexible management of networks.
2.1.3 SDN Design: Switch–Controller Connection

In SDN, controllers and switches are logically connected through the network. The con-
nections between switches and controllers are called control channels, through which
controllers manage their switches. A widely used protocol for a control channel is called
OpenFlow [13]. OpenFlow is an standardized specification of a controller-switch protocol
and switch’s features. A switch can be either configured to connect directly to a controller
6
SDN(SwitchDController(Connec=on(
(1)(InDband( Controller(
(2)(OutDofDband
(2)( (1)(
Switch( Switch(
Figure 2.2: SDN switch–controller connection
via a dedicated link (which is only for control channels), where the channel is called an
out-of-band connection, or via another switch and a path to that switch, where the channel
is called an in-band connection. In other words, switches can be configured to connect to
controllers via links that are also used for data forwarding.
2.1.4 SDN Switch Design

An SDN switch consists of three parts; physical ports, flow table and control channel end
point as shown in Fig. 2.3. Switch receives and sends any packets through their physical
ports. Flow table consists of flow entries, each of them has matching rule and action(s).
When a switch receives a packet, the switch looks up to its own flow table. If a switch
find a flow entry whose rule matches to the packet, the switch take the actions of the flow
entry. Actions in a flow entry can be, for example, output to a port or multiple ports, only
output to the first available ports, output to the different port chosen from specified ports
each time, send to a controller as a switch-controller message, or simply to discard.
A switch establishes one or more control channel(s), and the end point of the control
channel(s) on the switch side is called control channel end point. The way for a switch
to establish control channel is out of scope of an SDN standardization (e.g. OpenFlow.)
A switch may follow flow entry and use their physical ports, or may use special routing
entities and a dedicated port.
2.2 Reliability in SDN

SDN’s split and centralized architectural design leads to a considerable disadvantage in
its reliability. Thus, many researches have been done against problems in the reliability
of SDN. Discussions of the reliability in SDN has two different aspects; the reliability of
data plane and the reliability of control plane. Latter, the reliability of control plane, can
be split in to two different parts; the reliability of controllers themselves and the reliability
7
SDN(Switch(Design(
Control(Channel( Matching(Rule(1 Ac=on(s)(1
End(point( Matching(Rule(2 Ac=on(s)(2

… …
Datapath(
Flow(Table(
Physical(Ports(
To(the(Controllers(
Figure 2.3: SDN switch design
of connection between controllers and switches (2.4). In a conventional architecture, data

plane and control plane are integrated into the same switching instance and each switches
work independently; i.e. they work in a distributed manner, so that a failure on a switch
has relatively small impact on the other part of the network. On the other hand, SDN’s
availability depends on the availability of the control plane and the data plane. For SDN’s
control plane to work correctly, it requires all of that, first, corresponding controllers
work correctly, and second, a switch’s control channel end point works correctly, and
third, connections between a switch and controllers are established correctly. Relying on
the working control plane, we can also discuss the reliability of data plane. The reliability
of the data plane can be called the reliability of the SDN as a service to applications,
compared to that of the control plane, the reliability of the SDN itself. In this section, we
survey researches with a classification of their fields: data plane and control plane. Please
note that you may find a comprehensive survey on the reliability in SDN elsewhere [5].
2.2.1 Reliability of Data Plane

On SDN’s architecture, a flow table, which a switch depends on when forwarding packet,
is managed by controllers, in a relatively larger time scale compared to a one of packet
forwarding instructions. When relying only and entirely on the controllers to manage
flow table, failover actions against link failures may take longer time and will not satisfy a
failover time scale requirement for specific kind of networks (e.g. core networks) [17][14].
OpenFlow provides an fast failover mechanism as of OpenFlow 1.1 [21]. On Open-
Flow switches, we can set port groups as a flow’s output port using a group table. In these
cases, every time a packet arrives at the switch, the switch checks port status in the group
table in a specified order and outputs the packet to the first available port.
8
Reliability(in(SDN(
Reliability(of(Control(Plane(
Controller(
Control(Channel(
Switch( Switch( Switch(
Reliability(of(Data(Plane(
Figure 2.4: Different aspects and different parts of the reliability in SDN
Sharma et al. [16] utilized this fast failover mechanism to achieve data plane failure
recovery. They proposed and evaluated two different recovery methods: restoration and
protection.
2.2.2 Reliability of Control Plane

The control plane consists of two parts: controllers and their control channels. The reli-
ability of controllers means the reliability against controller failure, which can be caused
by hardware failure or overload of a controller. This type of reliability can be achieved
by improving the reliability of a controller itself (e.g. by using reliable hardware) or by
having multiple redundant controllers. OpenFlow includes multiple controllers’ speci-
fications as of OpenFlow 1.2[22], in which switches can be configured to have control
connection to more than one controller. Utilizing this multi-controller capability, Fon-
seca et al. [15] proposed a method to synchronize state information of the network among
multiple controllers, required for multiple controllers to work correctly. They proposed
CPRecovery, a state synchronization component based on primary-backup mechanisms to
synchronize status among multiple controllers and performed an experimental evaluation
in an emulated networking environment.
On the other hand, the reliability of connection between controllers and switches
means the reliability against control channel failures that can be caused by overload or
failures of physical links or logical connection alongside the path between the switch and
the controllers. Sharma et al. [17] presented restoration and protection methods on the
control plane, applying the same technique they had used previously [16]. They used
Bidirectional Forwarding Detection (BFD) [26] to detect a link failure. After a controller
detects link failure, the controller tries to recover the control channel by modifying flow
tables of the switches alongside the path from a failed switch to the controller and from the
9
controller to the switch in order. The authors focused on fast failure recovery in career-
grade speed, which is up to 50 [ms]. However, the centralized controller takes actions
to restore the control channel by using port status messages from the other switches to
detect failure link and by installing flow entries from the controller. Therefore, the con-
troller cannot calculate a recovery path correctly and may fail in large-scale link failures.
Indeed, the authors only conducted experiments of single-link failure.
The current researches described above is focusing on the reliability of controller and
the reliability of switch–controller connection under few expected link disconnections,
and especially on their fast failover mechanisms. Today’s modern networks are exposed
to a disasters where severe link disconnections will occur. Under this kind of situations,
the current failover mechanisms are considered not to work correctly, so that a strong
recovery mechanism must be required against large-scale, unexpected link failures.
10
Chapter 3
ResilientFlow
We introduced the SDN architecture and its reliability discussions in Chapter 2, where
we survey control channel restoration proposals that handle only single link failure. To
make SDN stably reliable through enabling it to recover control channel failure from mul-
tiple and unexpected link failures, which a centralized recovery mechanism cannot meet,
we decided to use a distributed mechanisms, in which all switches are able to self-heal
their own control channel, so that they will keep working under multiple and unexpected
link failures. We introduce ResilientFlow, a failure recovery mechanism that utilizes a
distributed link failure detection and restoration method to recover from control channel
failure under multiple and unexpected link failures. In this section, we will present an
overview and detailed design of ResilientFlow.
3.1 Overview
ResilientFlow is a control channel failure recovery mechanism, which protect and re-
store control logic in SDN. To enable SDN to recover from control channel failure au-
tonomously, switches have to 1) detect control channel failure and 2) restore a control
channel by calculating and establishing alternative paths. Our main idea is to enable all
switches to maintain their control channel by themselves. More specifically, we intro-
duce a module, the Control Channel Maintenance Module (CCMM) that enables a switch
to detect control channel failure and restore the control channel via an alternative path,
so that the switches can maintain their control channel by their own means, as shown
in Fig. 3.1. A CCMM 1) monitors link status of the switch with heartbeat packets, 2)
exchanges network topology maps with the switch’s controller(s) and with neighboring
CCMMs, and 3) sets up flow entries in the switch to establish a path from the switch to
controller(s), as shown in Fig. 3.3. The design of an SDN switch with CCMM is shown
in Fig. 3.2 annotated with numbered labels indicating described three functionalities.
Note that, a CCMM only modify its switch’s flow entries which are mandatory for a
control channel to be established and the CCMM to work correctly and not the rest of
11
ResilientFlow(Overview(
Modify(flow(entries(
C( M
S(
C(
✗( M
S(
M M
S( S(
M M
S( S(
C( Controller(
S( Switch(
M( CCMM(
Before(Links(are(Disconnected( AQer(Links(are(Disconnected(
Switch(Design(
Figure 3.1: Overview of ResilientFlow
C(
Control(Channel( Control(Channel(
End(point( Maintenance(Module(
M(
S( 3)(Installs(Flow(Entries( 1)(Monitors(Link(Status(
2)(Exchanges(
(3)( ((Network(Topology(Map(
(2)( (1)(
M(
S(
M( Flow(Table(
S(
(1)(Heartbeat( Physical(Ports(
(2)(Topology(Map
Controllers( Neighboring(
CCMMs(
Figure 3.2: Design of CCMM-enabled switch
flow entries including those for data forwarding. The entire capability and flexibility of
management of the networks will keep held by domain-administrative controllers. This
describes why a CCMM is not a controller, but a module.
3.2 Design
We introduced three functionalities of the CCMM in Sect. 3.1. In this Section, we describe
the details of these functionalities shown in Fig. 3.3.
12
CCMM(Func=onality(
He’s(alive
1) M
S( M
S(
M M
2) S( S(
✗(
✓(
3) C(
M
S( ✗( M
S(
M
S(
Control(Channel(
via(new(path
Figure 3.3: Functionality of CCMM
1) Monitoring Link Statuses

To detect control channel failure, a CCMM monitors switch’s link status by physical
port status and by exchanging heartbeat packets with neighboring CCMMs. Fig. 3.3 (1)
illustrates heartbeat packets exchanging. The CCMM installs flow entries for sending and
receiving heartbeat packets to the switch. Through the switch’s flow rules and physical
ports, the CCMM exchanges heartbeat packets with neighboring CCMMs as shown in
Fig. 3.2 (1). Also, CCMM monitors physical ports’ statuses.
2) Exchanging Network Topology Maps

To restore a control channel, a CCMM should be able to calculate an alternative path from
the switch to the controllers. For calculating paths, the CCMM continuously collects net-
work topology maps by exchanging its own network topology map for those of neighbor-
ing CCMMs. Fig. 3.3 (2) illustrates topology map exchanging. A CCMM should know
the network topology maps including controllers, so it also exchanges network topology
maps with controllers’ CCMMs. The CCMM also advertises its network topology map
when it detects link status change. Similarly, the CCMM installs the necessary flow en-
tries for sending and receiving network topology map advertisements and based on these
flow entries, the CCMM communicates with neighboring CCMMs as shown in Fig. 3.2
(2).
3) Installing Flow Entries for Control Channel

To restore the control channel after it has failed, a CCMM should be able to maintain
the switch’s flow entries so that the switch can connect to the controller. Fig. 3.3 (3)
13
illustrates re-establishing new control channel via new alternative path. When a switch
detects a control channel failure, the switch’s CCMM calculates and determines a path
from the switch to the controllers on the basis of the collected network topology maps.
After calculating the path, the CCMM sets up flow entries into the switches in accordance
with the determined path as shown in Fig. 3.2 (3).
14
Chapter 4
Implementation
In the previous chapter, we presented the design of ResilientFlow and its CCMM. In this
chapter, we describe our implementation of our CCMM-enabled switch and how each of
their components represents the functions of the CCMM, described in Chapter 3.
4.1 Overview
We decided to use Linux as the basis of switch nodes and Open vSwitch as our OpenFlow
switch software. As we described in section 3.2 the CCMM has three functionalities: 1)
monitoring link statuses, 2) exchanging network topology maps, and 3) installing flow
entries for control channel. Given that the behaviour of link-state routing protocol fol-
lows former two functionality, monitoring link statuses and exchanging network topology
maps follows, we decided to use OSPF protocol and an OSPF routing daemon to imple-
ment former two functionalities. For the latter functionality, installing flow entries, we
built our flow installer using Python. The whole structure of the implementation of a
switch node is shown in Fig. 4.1. In the Linux node that represents a switch node, we
run Open vSwitch. On top of the Open vSwitch, the OSPF daemon runs to monitor the
links status and the physical ports statuses and exchanges network topology maps with
neighboring CCMMs’ OSPF daemons. The OSPF daemon generates and dumps routing
entries to the Linux kernel’s routing table in accordance with its collected network topol-
ogy maps. The flow entry installer checks the routing table change notifications. Upon
the notification, the flow entry installer generates and installs flow entries in accordance
with the Linux kernel’s routing table. Finally, the Open vSwitch establishes a control
channel in accordance with the flow entries of its own, not with the routing table of the
Linux kernel.
15
Switch(Implementa=on(
Linux(Node(
Generates(Rou=ng(Table(
Linux(kernel’s(
Rou=ng(Table(
Retrieves(
Rou=ng(Table(
Flow(Entry( OSPF(daemon(
Installer( (Quagga(and(OSPFd)(
Monitors(Link(Status(and(
Installs(
Exchanges(Network(Topology(Maps(
Flow(Entries(
Internal(Ports(
Open(vSwitch(
External(Ports(
Figure 4.1: Implementation of CCMM-enabled switch
4.2 Monitoring Link Status and Exchanging Network Topol-

ogy Maps
We decided to use OSPF to monitor link status and also to exchange network topology
maps. The OSPF daemon uses OSPF hello packet to monitor link statuses. We use
Fast Hello [27] for fast routing convergence. The daemon also monitors physical port
statuses. In OSPF, Link State Advertisements (LSAs) carry the network topology maps.
We use OSPF daemon in Quagga [25], an open source routing software suite for UNIX-
like systems.
To exchange hello packets and network topology maps correctly, the OSPF daemon
should be able to handle each physical port separately. Figure 4.2 points out the problem
where OSPFd is in use with Open vSwitch, in contrast to a normal usage. The OSPF
daemon uses each ports separately to discover and communicate to the neighboring nodes’
OSPF daemons. The problem is that, as a design of SDN, Open vSwitch converges all the
incoming packets to a bridge, where we apply SDN’s forwarding rule then take actions to
the packets. In this case, only a single interface for the bridge can be seen on applications
including the OSPF daemon, so that the OSPF daemon cannot handle each port separately,
as shown in the Fig. 4.2 right-hand side. To make the OSPF daemon able to handle each
physical port belonging to the OpenFlow switch, we developed a solution that utilizes
the internal ports of Open vSwitch. An internal port is a virtual port that can be seen
and handled as a normal network interface on the Linux kernel and a normal port on
Open vSwitch. Fig. 4.3 illustrates the solution utilizing internal ports. First, we create
the same number of internal ports as physical ports representing a corresponding physical
16
Switch(Implementa=on:(Internal(Ports(
Normal(Situa=on( With(Open(vSwitch(Situa=on(
OSPF(daemon OSPF(daemon
OSPF(daemon(cannnot(handle(
each(ports(separately
Interfaces( Interfaces(
Seen(by(applica=ons Seen(by(applica=ons
Bridge(
(Open(vSwitch(Bridge)
Physical(Ports Physical(Ports
Switch(Implementa=on:(Internal(Ports(
Figure 4.2: OSPF daemon to use with OpenFlow switch: Problem description
With(Open(vSwitch(Situa=on((Fixed)((1)( With(Open(vSwitch(Situa=on((Fixed)((2)(
OSPF(daemon OSPF(daemon
Create(the(same(number(of( Insert(OSPFDpassthrough(
Internal(Ports flow(entries
Interfaces Interfaces
Bridge(
Physical(Ports Physical(Ports
Figure 4.3: OSPF daemon to use with OpenFlow switch: Solution using Internal Ports
interface on the Linux kernel as shown in Fig. 4.3 left-hand side. We then install flow
entries that pass through OSPF packets between each physical port and its corresponding
internal port, shown in Fig. 4.3 right-hand side. The OSPF daemon is then assigned to
internal ports seen on Linux kernel’s network interface list and uses them to exchange
hello packets and network topology maps.
4.3 Installing Flow Entry for Control Channel

The OSPF daemon is responsible for not only monitoring and exchanging link status but
also calculating and determining paths that establish control channels from the switch to
the controllers. We decided to use the routing entry created by the OSPF daemon for in-
stalling the flow entries that establish the control channel. To modify flow table upon the
creation of new routing entry by OSPF daemon, we use the ip-monitor command, a com-
mand that monitors routing table modification, which is included in iproute2 command
17
suite.
We implement the CCMM’s flow entry installer using Python. The CCMM monitors
network topology map changes using the ip-monitor command as a trigger and retrieves
the Linux kernel’s routing table. A CCMM then calculates flow entries in accordance
with routing entries and installs them into the switch.
Upon OSPF daemon giving routing information to the flow entry installer, we utilize
the Linux kernel’s multiple routing tables, and set the OSPF daemon to dump the calcu-
lated routing table into the specified routing table, other than the main routing table. This
change is made to avoid two unwanted side effects. The OSPF daemon generates routing
entry on the basis of its (internal) ports and their addresses, forcing applications to use
an internal port as the source port and its address as the source address. This causes two
problems. First, it breaks the TCP/IP’s 5-tuple, thus disconnecting the control channel
forcibly when the source interface of the control channel connection changes as a result
of link failure. Instead, we want Open vSwitch to use a dedicated address as the source ad-
dress when establishing the control channel, so that we can exactly measure our CCMM’s
restoration performance. Second, it exactly represents the hybrid switch’s behavior. Hy-
brid switch is a switch that has both capability of an SDN switch and a conventional
switch (IP router). As we intended to propose SDN’s failure recovery mechanism, we
should make a normal SDN switch, which follows flow table for its packet forwarding, as
our implementation. We want the switch’s control channel to simply and only follow its
flow table as in a normal operation of OpenFlow switch. As the Linux’s main routing ta-
ble is used for routing ordinary traffic, including Open vSwitch’s, we decided to maintain
our routing table’s simplicity, as it routes all the ordinary traffic into the Open vSwitch’s
bridge, so that all the routing is converged onto OpenFlow’s mechanisms.
18
Chapter 5
Experimental Evaluation
Chapters 3 and 4 detailed the design and the corresponding implementation of the Re-
silientFlow. To prove our concept and show how the ResilientFlow recovers control chan-
nels, we run our implementation of the ResilientFlow in emulated network environments
and perform series of experiments. We also evaluate the recovery performance of the
ResilientFlow. In this section, we will describe our evaluation environment, its scenarios,
and two of our evaluation results.
5.1 Experimental Environment

For ease and flexibility of making networking experiments in large-scale, customized
network topologies with modified OpenFlow switches (which include our CCMMs), we
decided to use network emulation as our experimental evaluation method. As it provides
the SDN emulation environment, we use Linux and Mininet [28][29] as a framework of
our evaluation environment. We perform our experiments in a single Linux machine.
5.1.1 Emulated Network with Mininet

Mininet is software for network emulation. It uses Linux containers (LXC), a light weight
virtual machine with individual network interfaces and routing tables, as network nodes
and connects them in accordance with the given network topology. Figure 5.1 show the
overview of our emulation environment with Mininet. Unlike methods that utilize fully
virtualized machines, Mininet only runs one single shell in each LXC, thus making it
highly memory-efficient and CPU-efficient. Guest LXCs also share the host machine’s
kernel space (except specified functions; that is the networking functions), so that clocks
in the entire emulated network’s nodes point exactly the same, thus enabling us to pre-
cisely analyze the evaluation results.
Though Mininet is a fine network emulator, some functions of Mininet is not desirable
for our emulation. Thus, we perform two special modification to Mininet. First, we
19
Experimental(Evalua=on(with(Mininet(
Host(Linux(Node
LXC LXC LXC
Controller( Flow( Flow(

Installer( Installer(
OSPFd OSPFd OSPFd
Switch( Switch(
(OvS) (OvS)
Interface
TC(
Link
Controller Switch(1 Switch(n
Mininet Creates(Network(Topology(
Manages(Nodes
14(
Figure 5.1: Overview of CCMM-enabled SDN emulator with Mininet
develop a set of special configurations and scripts on Mininet that enables individual
networking functions and runs individual Open vSwitch daemons on each different node.
Original Mininet does not separate each switch node’s networking functions; this means
all switch share the same network interface list and routing table. For this reason, original
Mininet cannot run individual Open vSwitch for each different node without our special
configuration. Second, we develop functions that add internal ports and their links to
node’s instance in Mininet and that bring both physical and corresponding internal ports
up/down simultaneously. As the OSPF daemon can only monitor assigned ports, it cannot
detect port status changes, therefore, it only relies on Hello Packet failure (which lasts
until OSPF dead interval, which is one second when Fast hello is enabled). With our
modification, OSPF daemon can detect port status changes and are thus able to broadcast
LSAs immediately.
5.1.2 Switch Nodes

We then set up software in order. On the switch nodes, first we set up and run Open
vSwitch instances in each node and add internal ports and flow entries as described in
Sect. 4.2. We then start OSPF daemon and install flow entries for enabling OSPF com-
munications. After that, we start the flow entry installer.
As described later, we also use packet-in messages to measure control channel restora-
tion time. To generate packet-in messages in a constant rate, we use a packet generator
and a flow entry that captures the generated packet and sends as a packet-in messages
to the controller. For a packet generator, we use a packet generator command, Nping,
20
Table 5.1: Implementation and evaluation environment
OS
Linux distribution Ubuntu 14.04.1 Trusty Tahr
Linux kernel 3.13.0
Hardware specification
CPU Intel(R) Core(TM) i7-3770K CPU 3.5 GHz
RAM 16GB
Software version
Mininet 2.1.0
Open vSwitch 2.0.2
Ryu 3.13
Quagga 0.99.22.4
Python 2.7.6
Nping 0.6.40
included in the Nmap software suite [32]. We set nping to generate UDP packets at a
specified constant rate.
5.1.3 Controller Nodes

For performance evaluation, we build a controller that dumps all the activity and its time
into a log file. The logged activities include the receiving of an OpenFlow Hello mes-
sage, which indicates an initial hand shake of a control channel, OpenFlow feature reply
message, which indicates an establishment of a control channel, and a switch-to-controller
message called packet-in message. We build our controller with the Ryu controller frame-
work [33]. In experiments, we run our controller on the controller node.
Table 5.1 lists all information about the OS, the hardware, and the software of our
implementation and evaluation environment.
5.2 Evaluation Scenario

We perform evaluations in two different scenarios.
In the first scenario, a single specified link failure takes place in a small dedicated
topology in a series of experiments. We perform three different experimental cases with
different control channel types. We choose a small topology that consists of three switches
and a single controller described in Fig. 5.2 to carefully examine failure recovery perfor-
mance.
In the second scenario, random, multiple link failures take place in a large-scale, real
world topology from Topology Zoo [30][31] in a series of experiments. We use this
scenario to mainly study and demonstrate its feasibility, while we also examine failure
21
recovery performance in this scenario.
We want to consider the link disconnection time to be the epoch time in a experiment,
so we decided to disconnect all the specified links at once when we choose multiple links
to be disconnected. Also, we use the time at which we finish all the specified links to
disconnect as the link disconnection time.
While we keep track of Open vSwitch daemon log so that we can detect control chan-
nel disconnection and reconnection due to timeout, we consider (and later found to to
be true) that a control channel restoration is so fast that we cannot use underlying TCP
disconnection and reconnection time detected by Open vSwitch for measuring control
channel restoration performance, so that we decided to use packet-in message for measur-
ing control channel restoration performance. We generate packet-in messages at the fast,
constant rate on the switches, and log the time when received these messages on the con-
troller. For experiments, we employ the time between the link disconnection and packet-in
from the switch restarting as the control channel restoration time or link restoration time
of the switch. We also use the latest link restoration time of all the restored switches to be
the network restoration time.
In both scenarios, we follow the same steps described below. At first, we set up
switches, a controller and a network that connects them in accordance with a given topol-
ogy map. We then establish control channels from the specified switches to the controller,
and start sending packet-in messages from specified switches to the controller. After these
initial setup finished, we disconnect determined links. Finally, we keep the controller, the
switches, and the packet-in message generators running until desired switches’ control
channels are restored.
In the following sections, we describe the detail of the scenarios and their results.
5.2.1 Single Specified Link failure Scenario

In this scenario, we use a small topology, as described in Fig. 5.2. We examine failure
recovery performance by measuring the duration from the link disconnection time to the
time at which 1) packet-in stopped, 2) network topology map generated (when OSPF
daemon changed the Linux kernel’s specified routing table), 3) installing of flow entries
started, 4) installing of flow entries finished, and 5) packet-in restarted. In these experi-
ments, the node 1 is configured to be the target switch, where we perform all measure-
ments of event time except packet-in messages. Node C is configured to be the controller.
In this scenario, the switch generates packet-in messages at the rate of 1000 packet/s.
We employ this topology to perform link failure experiments in three different cases
as shown in Fig. 5.3 (a), (b), and (c). In the first case, the out-to-in case, the switch is
initially configured to connect to the controller via out-of-band path, 1–C, as shown in
Fig. 5.3 (a) upper figure. In this case, we disconnect the link between node 1 (the switch)
and node 4 (the controller). After the link is disconnected, the switch will be connected via
in-band path, 1–2–C. In the second case, the in-to-in case, the switch-controller channel
22
C
10
00
s, M
5m co b/
/ s, 0 st s,
:1 5
b
M t: 1 0 ms
0 00 cos ,
1
1000Mb/s, 5ms,
1000Mb/s, 5ms,
3
cost: 10
1 cost: 15
10
Target 00
Mb s,
m
switch cos /s, 5 s,5
t: 1 ms 0
0 , b/ 1
0 M st:
0 co
10
Figure 5.2: Topology for single specified link failure experiments
is an in-band path, 1–2–C. In this case, we disconnect the link, 1–2. After the link is
disconnected, then the switch is reconfigured to connect via a different in-band path on a
different source interface on the switch side, 1–3–C, as shown in Fig. 5.3 (b). In the third
case, the in-to-in-middle case, the switch is initially connected via in-band path, 1–2–C,
and after we disconnect the link 2–C, the switch is connected via different in-band path
via the same source interface, 1–2–3–C, as shown in Fig. 5.3 (c) upper and lower. In the
third case, only the middle links alongside the path are changed from the switch’s view.
In all three cases, we performed 20 times of experiments.
Figure 5.4, 5.5, 5.6 show the evaluation results. Each figure shows the measured
duration from the link disconnection to each event, both in the figure and the table, in
each evaluation case. Each figure also includes a 95 percentile confidence interval of the
result measurements. We also plot the sample points of each event on the bottom of each
result point.
The results show the link restoration takes 250 to 300 ms. In the time between the link
disconnection and the routing table modification, only OSPF daemons do their own job.
The other component, the flow entry installer, simply awaits the routing table change no-
tification. The results suggest the network topology maps collection and the routing table
calculation takes up most of the restoration time. This also suggests that we may improve
the network restoration by optimizing OSPF configurations (e.g. more frequent Heartbeat,
LSA exchanging.) Note that in this scenario, the control channel kept connected during
the experiment.
We also see that the in-to-in-middle case and the in-to-in case take longer than the
out-to-in case. As described in the previous sections, the CCMM in each switch asyn-
23
C C C
3 3 3
1 1 1
2 2 2
C C C
3 3 3
1 1 1
2 2 2
(a) Out-to-in (b) In-to-in (c) In-to-in-middle
Figure 5.3: Three failure cases in single specified link failure experiments
chronously collects the network topology map, calculates the path, and installs the flow
entries. This is considered to happen due to the reason that the in-to-in and the in-to-in-
middle case take a longer path than the other cases after the disconnection. After all the
switch alongside the path from the switch to the controller are configured, the switches
can restore the connectivity to the controller. This describes the time between the time at
which the flow entries are installed and packet-in messages restart being received.
24
Time table from link disconnection
Packet-in stopped
Network topology map generated
Installing of flow entries started
Installing of flow entries finished
Packet-in restarted
0
100
200
300
400
Time from link down [ms]
Event Time from link down [ms]
Packet-in stopped 0.0 ± 0.0
Network topology map generated 211.5 ± 0.3
Installing of flow entries started 214.5 ± 0.4
Installing of flow entries finished 222.8 ± 0.5
Packet-in restarted 246.3 ± 7.0
Figure 5.4: Link restoration performance on three cases in single specified link failure
experiments: out-to-in case

Packet-in stopped
Packet-in restarted
0
100
200
300
400

experiments: in-to-in case
25
Packet-in stopped
Packet-in restarted
0
100
200
300
400
experiments: in-to-in-middle case
26
5.2.2 Random Multiple Links Failure Scenario
In this scenario, we use a large-scale, real world topology from Topology Zoo [30][31]
to mainly study and demonstrate our proposal’s feasibility. We also measure the network
failure recovery time in this scenario.
We use the topology of the BT North America network consists of 36 nodes and 76
links shown in Fig. 5.7. To initialize our network, we set the link speed to be 1000 Mb/s
with the link delay based on the distance calculated using nodes’ latitude and longitude
given in the topology file from the Topology Zoo, and the speed of light in optical fiber.
We then choose one node with the largest degree, and with the lowest node number given
in the original topology file, which is node number 28 in Figure 5.7. This node is config-
ured to be the controller, and all the other nodes are configured to be switches.
In this scenario, we choose links to be disconnected randomly in each experiment.
The number of the links to be disconnected is given as a ratio to the total links, and we
call this the link disconnection rate. We then calculate the nodes that will have alternative
paths to the controller after all the determined link has been disconnected. We call these
nodes the controller-reachable switches. We run packet generators in all the controller-
reachable switches. In this scenario, the switches generate the packet-in messages in the
rate at 10 packet/s.
We perform 20 times of experiments on each link disconnection rate: 10, 20, 40, 60,
and 80 percent. We both measure the link restoration time of all the switches and the
network restoration time. We also measure the number of controller-reachable switches.
Figure 5.8 shows the result of network restoration times on each link disconnection
rate. The figure also includes a 95 percentile confidence interval of the result measure-
ments. We also plot the sample points of each event on the bottom of each result point.
We can see that the network restoration time gets longer till the link disconnection rate
goes up to 40, and gets shorter after. This is due to the number of the controller-reachable
switches and the number of the changed links alongside the paths of the control chan-
nels. Figure 5.9 shows the number of reachable switch against link disconnection rate.
As the link disconnection rate gets higher, so the number of the changed links during the
disconnection increases, but also the number of controller-reachable switches decreases.
Compared to the former scenario, the topology scale impacts on OSPF convergence time
thus the restoration time. Also, control channels get disconnected at some switches due
to TCP time out, thus making restoration time even longer.
We also showed the restoration progress of an experiment from the previous series of
experiments, with 40 percent of link down rate in Fig. 5.10. Also, the network topology
map after all the determined link has been disconnected is shown in Figure 5.11 with solid
line indicating a connected link and dashed line indicating a disconnected link. The figure
5.10 shows the cumulative number of restored switches, and the horizontal dashed line on
the upper side indicates the number of controller-reachable switches, which is the limit
of the number of restored switches. In this figure, we can see that some switches behave
27
8
16 7
31 33
5 32
34
6
30
35 13
23
18
1 36
2 11
12
27
3 15
4
28 14
17
29
20
9
10
22 24
19
21
26
25
Figure 5.7: Topology for large scale link failure experiments
to wait for other switches to be configured. In order for a switch to be connected to the
controller, the switch needs all the switches alongside the path to be configured.
Another example with 80% of link down rate shows us a notable result. Figure 5.12
shows another example network topology map with 80% of the link failure rate, in which
we can see the network domain is split into multiple disconnected domains. The CCCM
has a capability of modifying flow entries, which is a partial, limited, but fundamental ca-
pability of an SDN controller. This means that with further extensions to make CCMM a
basic mini controller, we can use the CCMM to emergency alternative controllers in these
disconnected domains. Against this domain disconnection problem, we will make further
extension to the CCMM, implement the extension and perform further experiments in
Chapter 6.
28
16
14
Network restoration time [s]
12
10
0
0 10 20 30 40 50 60 70 80 90 100
Link failure rate [%]
Figure 5.8: Network restoration time against link disconnection rate
40
35
# of controller-reachable switches
30
25
20
15
10
0
0 10 20 30 40 50 60 70 80 90 100
Link failure rate [%]
Figure 5.9: Number of reachable switch against link disconnection rate
29
35 1.0
Ratio of restored switches to total switches

30
0.8
# of restored switches
25
0.6
20
15
0.4
10
0.2
5
0 0.0
12
13
11
10
05
02
03
01
06
04
09
00
08
07
Time from link down [s]
Figure 5.10: An example timeline of switches’ link restoration
8
16 7
31 33
5 32
34
6
30
35 13
23
18
1 36
2 11
12
27
3 15
4
28 14
17
29
20
9
10
22 24
19
21
26
25
Figure 5.11: An example Topology for large scale link failure after link has been discon-
nected
30
8
16 7
31 33
5 32
34
6
30
35 13
23
18
1 36
2 11
12
27
3 15
4
28 14
17
29
20
9
10
22 24
19
21
26
25
Figure 5.12: An example Topology with 80% link failure after link has been disconnected
31
Chapter 6
Extension for Domain-Splitting

Environment
In Chapter 5, we performed series of experiments, in which we found network domain

splitting problem. Against this problem, we make further extension to the CCMM so that
we can use the CCMM as an emergency alternative controllers in split-domain environ-
ment. In this section, we will describe the domain-splitting problem, required additional
functionalities for the CCMM against the problem, and perform further experiments in
domain-splitting environment.
6.1 SDN Domain-Splitting Problem

In the first small topology scenario, we assumed that there is at least one redundant path
from the switch to the controllers. As shown in the large scale scenario, some switches
may have no paths to the controllers, when one SDN domain is split into multiple do-
mains. Figure 6.1 illustrates this problem. We call this the SDN Domain-Splitting Prob-
lem.
The centralization is the core idea of SDN. SDN provides converged programmable
control of the network, flexible coordination with applications through controller–switch
separation and centralization, which a simple falling back to the conventional switching
method does not apply. This means that, to align with the idea of SDN, we keep a cen-
tralized controlling mechanism even in split-domain environment.
To realize centralized controlling in a split-domain environment, we have to select
one controller among the controllers in the domain as an emergency alternative controller.
After a switch is chosen as an alternative controller, all the switch should be reconfigured
to connect to the newly chosen alternative controller.
32
SDN(domainDspliWng(problem(
Reachable(domain(
C(
C( S( S(
S( S( S(
S(
S(
S( S( S( S(
S( S( S(
C( Controller(
S( Switch( Unreachable(domain( Unreachable(domain(
Before(Links(are(Disconnected( AUer(Links(are(Disconnected(
Figure 6.1: SDN domain-splitting problem
6.2 Extension to the CCMM

The CCMM can modify the flow table of a switch. This means we can use the CCMM as
an emergency alternative controller in situations where no path to the controller is avail-
able. In order to use the CCMM as an alternative controller, we should extend CCMM
with three functionalities described below.
1) SDN Controller Functionality to Switches

First, the CCMM should have an SDN controller functionality including interface to
switches. The current CCMM implementation directly modifies a switch’s flow entries,
which means the current CCMM will not require an controller interface as a mandatory.
On the other hand, to use the CCMM as an alternative controller, the CCMM should
handle control channel to switches, through which the CCMM modify flow entries of
switches, instead of using direct insertion and modification method.
2) Controller Choosing Functionality

Second, the CCM has a capability of choosing one switch as an alternative controller in a
distributed manner. To use the CCMM as an alternative controller, they have to choose the
same single switch among the switches in the reachable domain as a domain’s alternative
controller, with distributed algorithms and protocols.
3) Switch Reconfiguring Functionality

Third, the CCMM should be able to configure the switch’s controller address. After the
CCMM detects a routing entry modification for the controller to which the switch is con-
nected, the CCMM forces the switch, through the switch’s configuration interface, to
reconnect to the controller immediately if the switch’s control channel is disconnected.
33
For Open vSwitch, we can use OVSDB [34], Open vSwitch’s switch configuring pro-
tocol. Also, there is an alternative protocol, OF-Config [35], standardized by the Open
Networking Foundation. We may use OF-Config if we use a switch other than Open
vSwitch. As our implementation uses Open vSwitch, we decided to use OVSDB for our
implementation of extension.
6.3 Experiments with Extension

For proving concept of using CCMM as an emergency alternative controller in the domain-
splitting environment, we perform further experiments with CCMM extensions.
We use the same topology in the small topology scenario, shown in Figure 5.2. For
handling incoming control channels, we run the same controller used in Chapter 5 on all
the switches. For choosing and reconnecting to a new controller, we develop a switch con-
figuration modifier using Python. The switch configuration modifier listens to a routing
table changes brought by OSPF daemon. When the modifier detects that the switch has
no route to the controller, it modify switch configuration to connect to a newly chosen al-
ternative controller. As this experiment not intending to place performance evaluation, we
decided to use a simple controller choosing algorithm, in which a switch simply choose a
reachable node with lowest node ID (which has a lowest address). The switch configura-
tion modifier simply follows this rule to choose an alternative controller.
First, switches are initially configured to connect to the controller node C. We then
disconnect the node C from all the switches once. After all the links to the controller has
been disconnected, the switches choose a reachable node with lowest node ID, which is
node 1 in this experiment. Figure 6.2 shows the topology and the control channels before
link disconnection and after disconnection.
We keep track of the control channel initialization log on the node 1. In this experi-
ment, we use the time at which all the control channel has established to be the time at
which network has been restored, and the duration from the time link has been discon-
nected to the time network has been restored time to be the network restoration time.
Figure 6.3 shows the experiment result. The result shows that the network restoration
takes 1618 ± 499 milliseconds. Throughout the experiment, we see that all the switches in
the same split-domain has been converged correctly into the same alternative controller,
the switch 1, which indicates restoration works on all the switches, and is ranging from
approximately 1 to 2 seconds.
34
SDN(domainDspli_ng(problem(
C( C(
✗(
M M M M
1( 3( 1( 3(
M M
C( Controller( 2( 2(
S( Switch(
M( CCMM(
Before(Links(are(Disconnected( AUer(Links(are(Disconnected(
Figure 6.2: Topology for domain-splitting experiment
Network restoration time

1000
2000
0
Figure 6.3: Network restoration time against link disconnection rate
35
Chapter 7
Discussion
Up to here we have described the design, implementation, and evaluation of Resilient-

Flow. In this section, we point out three problems of SDN and CCMM, where we discuss
and suggest the improvement and applicability of our ResilientFlow.
7.1 Controller–CCMM Coordination Problem

In SDN, central controllers should be able to take control of their entire networks. The
CCMM modifies flow entries of a switch. Even though the flow entries modified are those
mandatory for CCMM to work correctly and control channel to be established correctly,
the central controllers should be able to know what changes are made to their network, so
that Controller–CCMM coordination mechanisms, or a Controller–CCMM state synchro-
nization mechanisms should be required. The CCMM should know, for example, what
IP ranges can be used for OSPF, what flow table can be used for control channel mainte-
nance, from controllers. Also the controllers should be notified what changes are made
by CCMM to the network. Also, as in an SDN-enabled network where not only flows of
the network but also entire networking equipment should be configurable by the central
controller, even the CCMM may be configurable remotely by the controllers (including
the ability to disable them).
7.2 SDN Domain-Splitting Problem

We described SDN Domain-Splitting Problem and suggested the CCMM’s application
for emergency alternative controller in Chapter 6. To use the CCMM as a minimum
alternative controller in domain-splitting environment, we have two topics to be discussed:
method on how to choose controller and method on how to coordinate application to the
new controller.
36
Controller Election
We have noted that as centralization is the core idea of SDN, we keep a centralized con-
trolling mechanism even in split-domain environment, so that we have to select one con-
troller among the controllers in the domain as a split-domain’s alternative controller. In
this selection, called master controller election, the way to choose the alternative con-
troller can be an important issue. In our proof-of-concept demonstrative experiment, we
use a simple controller selection algorithms which simply chooses a reachable node with
lowest ID to be a controller. For more sophisticated solution to this alternative controller
election problem, Heller et al. [36] describe this controller selection problem and sug-
gested that it is best to choose the node with the minimum average delay.
Controller–Application Re-coordination
In a split-domain environment, controlling applications which works with the controller
may lose connectivity to the controller due to link failures. In these cases, the controlling
applications should re-establish connection to the controllers, which can be the same or
the new alternative controller. For the controlling applications to continuously work with
controllers to take control of the network, the controlling applications should know the
new alternative controller, which involves some controller advertising mechanisms. This
mechanisms is out of scope of this research, and will be future work.
7.3 SDN Bootstrapping Problem

In SDN, switches in an administrative domain should be initially configured to connect to
controllers. After the switches get connected to the controllers, the controllers are able to
manage their own switches including modifying flow entries. This implies a local device
configuration problem; how the switch configuration (including its flow entries) be done
before it get connected to the controllers. We call this problem SDN Bootstrapping Prob-
lem. While this problem is out of the scope of SDN including the OpenFlow specification,
this is a large problem for network operators. This problem becomes even worse when
local network operators add new switches that use in-band connection. When we config-
ure a switch to connect to the controllers using in-band connection, we have to configure
flow tables of all other switches alongside the path from the switch to the controllers.
With our ResilientFlow, this problem can be partially solved. Specifically, Resilient-
Flow only requires newly added switches to be configured. The network operator should
at least configure normal OSPF configuration and the address of the controller to connect.
After that, ResilientFlow configures all other switches. Indeed, we used the Resilient-
Flow CCMM to set up initial flow entries that establish switch-controller paths in our
experiments, rather than manually configure these flow entries.
37
Chapter 8
Conclusion
SDN is an emerging networking paradigm of centralized architecture, which enhances

the manageability, flexibility and extensibility of network thus meets complex demands
from applications. SDN, due to its centralized architecture, has a large disadvantage in its
reliability on the other hand. To overcome this reliability problem, many researches have
been performed. However, none can protect networks, especially their control logic, under
large-scale, unexpected link failures. Networking infrastructures are now fundamental
basis of modern society, so that they must be reliable even under severe failures caused
by, for example, disasters.
Here, we attempted to protect SDN enabled networks under multiple and unexpected
link failures. SDN may lose its control capability when the links between switches and
controllers fail, and mechanisms only inside the controller are not enough to protect con-
trol logic. For this reason, we decided to enable all switches to maintain their own control
channel. We proposed ResilientFlow, a self-healing mechanism in which switches can
manage their control channel by their own means. In ResilientFlow, we deployed modules
called Control Channel Maintenance Module (CCMM) to all switches and controllers.
Using a distributed mechanism including a link-state routing protocol, each CCMM mon-
itors its switch’s control channel, and after the CCMM detected control channel failure,
the CCMM calculate new alternative path to the controllers in accordance with the previ-
ously collected topology and converts them into corresponding flow entries, then modifies
the flow entries of the switch thus keeps the control channel alive.
In this paper, we designed and implemented our ResilientFlow. For our implementa-
tion, we used the OSPF daemon in Quagga to monitor link status and to exchange network
topology maps. We utilized Internal Ports of Open vSwitch for OSPF daemon to work
correctly with. We then implemented the CCMM’s flow entry installer using Python. We
used Linux kernel’s multiple routing table functionality so that the switch’s routing should
be converged into SDN manner where routing only follows flow table, not a conventional
TCP/IP mechanisms. To prove our concept and show how the ResilientFlow recovers
control channels, we have placed a series of experimental evaluation in two different sce-
38
narios: the scenario in which a single specified link is failed with a dedicated topology,
and the scenario in which random multiple links are failed with a real world topology. We
showed that the ResilientFlow recovers control channel within 300 ms against a single
link failure. We also showed that the ResilientFlow can restore control channels against
multiple, severe link failures and they take the time in the order of seconds. We also made
a further extension to the CCMM against domain-splitting problem, where a switch has
no path available to the controllers. We extended the CCMM to be an emergency alter-
native controller. We performed experiment and showed the applicability of the CCMM
to domain-splitting problem, with approximately 1 to 2 seconds of restoration time. For
future works, we suggested further applications of the ResilientFlow for an SDN boot-
strapping problem and discussed a better controller selection method in a split-domain
environment.
39
Bibliography
[1] D. Kreutz, F.M.V. Ramos, P. Verissimo, C.E. Rothenberg, S. Azodolmolky and

S. Uhlig, “Software-Defined Networking: A Comprehensive Survey,” Proceedings
of the IEEE, vol. 103, no. 1, pp. 14–76, Jan. 2015.
[2] B. Nunes, M. Mendonca, X. Nguyen and K. Obraczka, “A Survey of Software-

Defined Networking: Past, Present, and Future of Programmable Networks”, IEEE
Communications Surveys & Tutorials, vol. 16, no. 3, pp. 1617–1634, Third quar-
ter, 2014.
[3] N. Feamster, J. Rexford and E. Zegura, “The Road to SDN,” ACM Queue, vol. 11,
issue 12, pages 20, Dec. 2013.
[4] I. Akyildiz, A. Lee, P. Wang, M. Luo and W. Chou, “A Roadmap for Traffic Engi-
neering in SDN-OpenFlow Networks,” Computer Networks, Elsevier, vol. 71, pp. 1–
30, Oct. 2014.
[5] B. Asten, N. Adrichem and F. Kuipers, “Scalability and Resilience of Software-

Defined Networking: An Overview,” Networking and Internet Architecture,
arXiv.org (Cornell University Library), arXiv:1408.6760 [cs.NI], pp. 1–19,
Aug. 2014.
[6] T. Benson, A. Akella, and D. Maltz, “Unraveling the complexity of network manage-
ment,” Proc. 6th USENIX Symp. Networked Syst. Design Implement., pp. 335-348.,
2009.
[7] B.M. Leiner, V.G. Cerf, D.D. Clark, R.E. Kahn, L. Kleinrock, D.C. Lynch,
J. Postel, L.G. Roberts and S. Wolff, “Brief History of the Inter-
net,” http://www.internetsociety.org/internet/what-internet/
history-internet/brief-history-internet, last accessed at 5 Feb. 2015.
[8] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wan-

derer, J. Zhou, M. Zhu, J. Zolla, U. Hlzle, S. Stuart and A. Vahdat, “B4: Experience
with a Globally-Deployed. Software Defined WAN,” Proc. ACM SIGCOMM Conf.,
pp. 3-14, 2013.
40
[9] M.F. Bari, R. Boutaba, R. Esteves, L.Z. Granville, M. Podlesny, M.G. Rabbani,
Q. Zhang and M.F. Zhani, “Data Center Network Virtualization: A Survey,” IEEE
Communications Surveys & Tutorials, vol. 15, no. 2, pp. 909–928, Second Quar-
ter, 2013.
[10] Open Networking Foundation, “Software-Defined Networking (SDN) Definition,”

https://www.opennetworking.org/sdn-resources/sdn-definition, last
accessed at 5 Feb. 2015.
[11] Open Networking Foundation, “Migration Use Cases and Methods,”

https://www.opennetworking.org/images/stories/downloads/
sdn-resources/use-cases/Migration-WG-Use-Cases.pdf, Feb. 2014.
[12] Cisco VNI Forecast, “Cisco Visual Networking Index: Forecast and
Methodology, 20132018,” Cisco Public Information, http://www.
cisco.com/c/en/us/solutions/collateral/service-provider/
ip-ngn-ip-next-generation-network/white_paper_c11-481360.pdf,
Jun. 2014.
[13] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford,

S. Shenker and J. Turner, “OpenFlow: Enabling Innovation in Campus Networks,”
ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 69–74,
Mar. 2008.
[14] B. Jenkins, D. Brungard, M. Betts, N. Sprecher, and S. Ueno, “MPLS-TP require-

ments,” IETF RFC 5654 (Proposed Standard), Sep. 2009.
[15] P. Fonseca, R. Bennesby, E. Mota and A. Passito, “A Replication Component for

Resilient Openflow-Based Networking,” Proc. IEEE Network Operations and Man-
agement Symposium (NOMS 2012), pp. 933–939, Apr. 2012.
[16] S. Sharma, D. Staessens, D. Colle, M. Pickavet and P. Demeester, “Enabling Fast

Failure Recovery in OpenFlow Networks,” Proc. IEEE International Workshop
on the Design of Reliable Communication Networks (DRCN 2011), pp. 164–171,
Oct. 2011.
[17] S. Sharma, D. Staessens, D. Colle, M. Pickavet and P. Demeester, “Fast Failure

Recovery for In-band Openflow Networks,” Proc. IEEE International Conference
on the Design of Reliable Communication Networks (DRCN 2013), pp. 52–59,
Mar. 2013.
[18] Y. Kitamura, Y. Lee, R. Sakiyama and K. Okamura, “Experience with Restoration

of Asia Pacific Network Failures from Taiwan Earthquake,” IEICE transactions on
communications, vol. 90, no. 11, pp. 3095–3103, Nov. 2007.
41
[19] K. Cho, C. Pelsser, R. Bush, and Y. Won, “The Japan earthquake: The Impact on
Traffic and Routing Observed by a Local ISP,” Proc. ACM Special Workshop on
Internet and Disasters (SWID 2011) in Conjunction with International Conference
on emerging Networking EXperiments and Technologies (CoNEXT 2011), pp. 2:1–
2:8, Dec. 2011.
[20] J. Cowie, A. Popescu and T. Underwood, “Impact of Hurricane Katrina on Inter-

net Infrastructure,” http://research.dyn.com/content/uploads/2013/05/
Renesys-Katrina-Report-9sep2005.pdf, Sep. 2005.
[21] Open Networking Foundation, “OpenFlow Switch Specification 1.1.0”, https://

www.opennetworking.org/images/stories/downloads/sdn-resources/
onf-specifications/openflow/openflow-spec-v1.1.0.pdf, Feb. 2011.
[22] Open Networking Foundation, “OpenFlow Switch Specification 1.2”, https://

onf-specifications/openflow/openflow-spec-v1.2.pdf, Dec. 2011.
[23] Open Networking Foundation, “OpenFlow Switch Specification 1.3.4”, https://

onf-specifications/openflow/openflow-switch-v1.3.4.pdf, Mar. 2014.
[24] Open vSwitch, http://openvswitch.org/, last accessed at 21 Oct. 2014.
[25] Quagga Software Routing Suite, http://www.nongnu.org/quagga/, last ac-

cessed at 21 Oct. 2014.
[26] D. Katz and D. Ward, “Bidirectional Forwarding Detection (BFD),” IETF RFC 5880
(Proposed Standard), Jun. 2010.
[27] Cisco Systems, “OSPF Support for Fast Hello Packets — iro-fast-hello.pdf,”
http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_ospf/
configuration/xe-3s/iro-xe-3s-book/iro-fast-hello.pdf, last ac-
cessed at 28 Oct. 2014.
[28] B. Lantz, B. Heller and N. McKeown, “A Network in a Laptop: Rapid Prototyping

for Software-Defined Networks,” Proc. ACM SIGCOMM Workshop on Hot Topics
in Networks (Hotnets-IX), pp. 19:1–19:6, Oct. 2010.
[29] Mininet Team, “Mininet: An Instant Virtual Network on Your Laptop (or Other PC)
- Mininet,” http://mininet.org/, last accessed at 21 Oct. 2014.
[30] S. Knight, H. Nguyen, N. Falkner, R. Bowden and M. Roughan, “The Internet

Topology Zoo,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 9,
pp. 1765–1775, Oct. 2011.
42
[31] The University of Adelaide, “The Internet Topology Zoo,” http://www.
topology-zoo.org/, last accessed at 21 Oct. 2014.
[32] Nmap.org, “Nmap — Free Security Scanner For Network Exploration & Security
Audits,” http://nmap.org/, last accessed at 12 Nov. 2014.
[33] Ryu SDN Framework Community, “Ryu SDN Framework,” http://osrg.

github.io/ryu/, last accessed at 28 Oct. 2014.
[34] B. Pfaff and B. Davie, “The Open vSwitch Database Management Protocol,” IETF
RFC 7047 (Informational), Dec. 2013.
[35] Open Networking Foundation, “OF-Config 1.2 — OpenFlow Management and Con-
figuration Protocol”, https://www.opennetworking.org/images/stories/
downloads/sdn-resources/onf-specifications/openflow-config/
of-config-1.2.pdf, Jun. 2014.
[36] B. Heller, R. Sherwood and N. McKeown, “The Controller Placement Problem,”

ACM SIGCOMM Computer Communication Review, vol. 42, no. 4, pp. 473–478,
Sep. 2012.
43
Publications
English Publications
[A1] T. Watanabe, T. Omizo, T. Akiyama and K. Iida, “ResilientFlow: Deployments of
Distributed Control Channel Maintenance Modules to Recover SDN from Unex-
pected Failures,” to be presented at IEEE International Conference on the Design
of Reliable Communication Networks (DRCN 2015), Mar. 2015 (Accepted).
Japanese Publications
[B1] T. Watanabe, T. Omizo, T. Akiyama and K. Iida, “Self-Healing Mechanism on
Switch-Controller Connections in SDN,” to be presented at IEICE General Confer-
ence 2015, Proceedings of the 2015 IEICE General Conference, BS-2-2, Mar. 2015
(Sumitted).
[B2] T. Watanabe, T. Omizo, T. Akiyama and K. Iida, “Design and Evaluation of Self-
Healing Mechanism on Switch-Controller Connections in SDN,” to be presented
at IEICE Technical Committee on Internet Architecture, IEICE Technical Report,
Mar. 2015 (Submitted).
44

Self-Healing Mechanism On Switch-Controller Connections in SDN

Uploaded by

Copyright:

Available Formats

You might also like

Self-Healing Mechanism On Switch-Controller Connections in SDN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Self-Healing Mechanism On Switch-Controller Connections in SDN

Uploaded by

Copyright:

Available Formats

Self-healing mechanism on switch-controller

Assoc. Prof. Katsuyoshi Iida

Foremost, as a beginning, my deeply grateful appreciation goes to Associate Professor

List of Tables vii

2 SDN and its Reliability 4

6 Extension for Domain-Splitting Environment 32

2.1 SDN architecture compared with conventional network architecture . . . 6

3.1 Overview of ResilientFlow . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 Implementation of CCMM-enabled switch . . . . . . . . . . . . . . . . . 16

5.1 Overview of CCMM-enabled SDN emulator with Mininet . . . . . . . . 20

6.1 SDN domain-splitting problem . . . . . . . . . . . . . . . . . . . . . . . 33

5.1 Implementation and evaluation environment . . . . . . . . . . . . . . . . 21

SDN and its Reliability

Software-Defined Networking (SDN) is an emerging concept of computer networking

2.1 Software-Defined Networking (SDN)

2.1.2 SDN Design: Switch, Controller and Application

Switch( Switch( Switch(

Figure 2.1: SDN architecture compared with conventional network architecture

2.1.3 SDN Design: Switch–Controller Connection

Figure 2.2: SDN switch–controller connection

2.1.4 SDN Switch Design

2.2 Reliability in SDN

Control(Channel( Matching(Rule(1 Ac=on(s)(1

End(point( Matching(Rule(2 Ac=on(s)(2

Figure 2.3: SDN switch design

of connection between controllers and switches (2.4). In a conventional architecture, data

2.2.1 Reliability of Data Plane

Switch( Switch( Switch(

2.2.2 Reliability of Control Plane

Figure 3.2: Design of CCMM-enabled switch

Figure 3.3: Functionality of CCMM

1) Monitoring Link Statuses

2) Exchanging Network Topology Maps

3) Installing Flow Entries for Control Channel

Figure 4.1: Implementation of CCMM-enabled switch

4.2 Monitoring Link Status and Exchanging Network Topol-

4.3 Installing Flow Entry for Control Channel

5.1 Experimental Environment

5.1.1 Emulated Network with Mininet

LXC LXC LXC

Controller( Flow( Flow(

OSPFd OSPFd OSPFd

Figure 5.1: Overview of CCMM-enabled SDN emulator with Mininet

5.1.2 Switch Nodes

5.1.3 Controller Nodes

5.2 Evaluation Scenario

5.2.1 Single Specified Link failure Scenario

Figure 5.2: Topology for single specified link failure experiments

Time table from link disconnection

Time from link down [ms]

Figure 5.7: Topology for large scale link failure experiments

Figure 5.8: Network restoration time against link disconnection rate

Figure 5.9: Number of reachable switch against link disconnection rate

Ratio of restored switches to total switches

Time from link down [s]

Figure 5.10: An example timeline of switches’ link restoration

Extension for Domain-Splitting