Network Survivability

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Network Survivability

Brief Intro
A connection is often routed through many nodes in the network between its source and
its destination, and there are many elements along its path that can fail.
To obtain 99.999% (five 9s) availability, we need to make the network survivable.
Protection switching is the key technique used to ensure survivability. They involve
providing some redundant capacity within the network and automatically rerouting traffic
around the failure using this redundant capacity (restoration).
Protection is usually implemented in a distributed manner without requiring centralized
control in the network.
Some reasons for failure are human error, failure of active components inside network
equipment, node failures, catastrophic events.
9.1 Basic Concepts
Paths

Working Paths: They carry traffic under normal operations


Protect Paths: They provide an alternate path to carry the traffic in
case of a single failure.
Both are diversely routed so that both paths are not lost in case of
a single failure.
Protection Schemes
They are designed to operate over a range of network topologies which includes P2P
links, ring (popular in SONET/SDH), mesh.
They are designed to succeed under likely physical failure scenarios. It is assumed that
the most likely failures are single failures.
A physical failure will lead to one or more links failing at the client layers.
Transceiver failure can lead to single link failures.
A fiber cut can lead to multiple link failures at the client layer if fibers carry multiple
wavelengths
Shared Risk Link Groups (SRLGs): Links that fail together due to a single failure event.
Protection may be dedicated or shared
Dedicated Protection: Each working connection is assigned its own dedicated
bandwidth in the network over which it can be rerouted in case of a failure.

Shared Protection: Multiple working connection can share protection bandwidth.


This works because not all working connections in a network fail simultaneously.

Advantages of shared protection:

1. Reduces the amount of bandwidth needed in the network for protection


2. Protection bandwidth is available to carry low priority traffic under normal
conditions. This is discarded in the event of a failure
Protection schemes can be revertive or non revertive
In both schemes traffic is switched to protect path in case of failure but
Non-revertive scheme: The traffic remains in the protect path until it is manually
switched back onto the original working path, usually by a user through the
network management system
Revertive scheme: In here, once the working path is repaired, the traffic is
automatically switched back.
Dedicated protection schemes can be non-revertive or revertive but shared
protection schemes are usually revertive so that it can be used to protect other
connections in the event of another failure.
Protection schemes can be unidirectional or bidirectional

Don’t confuse with transmission!

Unidirectional: Each direction of traffic is handled


independently. In the event of a single fiber cut, only
one direction of traffic is switched over to the
protection fiber.

Bidirectional: In here, both the directions are


switched over to the protection fibers.

In case of bidirectional transmission, switching


becomes bidirectional by default because both
directions of traffic are lost when a fiber is cut. (not
true in case of equipment failure)
Why protocols?
Unidirectional Protection Switching is used in conjunction with dedicated
protection schemes since
- It can be easily implemented by switching the traffic at the receiving end from
the working to the protect path, without requiring a signaling protocol between
the receiver and transmitter.
- So if the traffic is simultaneously transmitted in both the working and protect
paths, the receiver at the end of the paths simply selects the better of the two
arriving signals.
However if bidirectional switching is required, the receiver need to inform the
transmitter that there has been a cut. This requires a signaling protocol - APS.
Automatic Protection-Switching (APS) Protocol
Simple APS works as follows

1. If a receiver in a node detects a fiber cut, it turns off its transmitter on the
working fiber and then switched over to the protection fiber to transmit traffic.
2. The receiver at the other node then also detects the loss of signal on the
working fiber and switches its traffic over to the protection fiber.

Actual APS protocols used in SONET and optical networks are quite a bit more
complicated because they have to deal with many different possible scenarios.
Although no APS protocol is necessary to deal with fiber cuts, an APS protocol will
still be needed to deal with equipment failures and to support maintenance
functions

In the case of shared protection schemes, an APS protocol is required to


coordinate access to the shared protection bandwidth. Therefore, most shared
protection schemes use bidirectional switching because it is easier to control and
manage in a more complex network.
Switching
How and where the traffic is rerouted in the event of failure?

Path Switching: The connection is rerouted end to


end from its source to its destination along an
alternate path.

Span Switching: The connection is rerouted on a


spare link between the nodes adjacent to the
failure.

Ring Switching: The connection is rerouted on a


ring between the nodes adjacent to the failure
9.4 Why Optical Layer
Protection
Reasons for need of protection in the optical layer:

● SONET/SDH networks have extensive protection functions whereas other


networks such as IP networks do not provide the same level of protection,
one way to protect data networks is to rely on optical layer protection.

● Cost savings can be realized by making use of optical layer protection


instead of client layer protection.
Fig-9.18- WDM ring built using optical add/drop multiplexers (OADMs), supporting two
interconnected SONET line terminals (LTEs) and two interconnected IP routers using protection
provided by the SONET and IP layers, respectively.

The SONET and IP boxes do not share protection bandwidth.


Fig-9.19The configuration is the same as that of Figure 9.18.

However, the optical layer now uses a single wavelength around the ring to protect
both the SONET and IP connections
● Optical layer can handle some faults more efficiently than the client layers.

All the protection is handled by the routers. Two diversely routed WDM links are used.
Each IP router uses three working ports and three protect ports to protect against both fiber
cuts and equipment failures
Single WDM line system is deployed, with protection against fiber cuts handled by the
optical layer. Equipment failures are handled by the IP layer.
The IP routers now use three working ports and an additional protect port in case one
of the working ports fails
● Optical layer protection can be used to provide an additional degree of
resilience in the network, for instance, to protect against multiple failures.

● Protection in SONET is currently based on rings (UPSR/BLSR).Ring based


schemes require that the capacity in the network reserved for protection be
equal to the capacity used for working traffic.
Within the optical layer, a variety of mesh-based protection schemes are
being developed. These offer the promise of requiring significantly less
protection capacity than ring-based schemes
Limitations of optical layer protection

1. Not all failures can be handled by the optical layer. If a laser in an attached
client terminal fails, the optical layer cannot do anything about it.
2. The optical layer may not be able to detect the appropriate conditions that
would cause it to invoke protection switching.
3. The optical layer protects traffic in units of lightpaths, and it cannot protect
part of the traffic within a lightpath and not protect other parts. Such functions
need to be performed by the client layers
4. Protection routes in the optical layer may be longer than the primary routes,
and the choice of alternate routes may be severely limited due to link budget
considerations.

5. We need to pay careful attention to the interworking of protection schemes


between the different layers.
9.4.1 Service Classes Based on Protection
Optical layer can provide multiple classes of service based on the type of
protection provided.

1.Platinum: provides the highest level of availability and the fastest restoration
times( around 60 ms). Example- dedicated 1 + 1 protection scheme.

2.Gold: provides high availability and fast restoration times(around


100ms).Example- shared mesh protection scheme.

3.Silver: This class sits below gold in terms of availability and restoration time.
Example- a protection scheme that provides “best-effort” restoration.
4.Bronze: the optical layer provides unprotected lightpaths. In the event of a failure of the
working path, the connection is lost.
5.Lead: Have the lowest availability and the lowest priority among all classes.

What types of applications will use these service classes depends on the
application itself and the user.

Telephony and SONET/SDH uses platinum type service

You might also like