Professional Documents
Culture Documents
Huawei Optical Network Maintenance Reference-WDM ASON-20140826-C
Huawei Optical Network Maintenance Reference-WDM ASON-20140826-C
Huawei Optical Network Maintenance Reference-WDM ASON-20140826-C
Issue
02
Date
2014-08-26
Huawei Technologies Co., Ltd. provides customers with comprehensive technical support and service.
Contact our local office or company headquarters.
Website:
http://www.huawei.com
Email:
support@huawei.com
Notice
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, expressed or implied.
Acknowledgement
This document is prepared and reviewed by the ASON R&D Maintenance Team, Information Development
Dept, Customer Support Dept, and Technical Support Dept together.
Editor:
Feng Junjie, Zhu Fei, Feng Haoyu, Wang Chaokai, Feng Chao, Liu Yuan, Zhang Meng, Zhou Yuxing,
Zheng Fan, Li Weiping
Others:
ASON R&D Maintenance Team: Jiang Yi, Bai Zhongqiang, Li Qingsong
Information Development Dept: Fan Xiaoke, Pei Xin
Technical Support Dept: Dou Yongtan, Xie Bing
Customer Support Dept: Zhang Junguang, Fu Ming, Ma Qingquan
Quality Assurance Dept: Xue Xiuhua
Special acknowledgements to Jin Yuzhi, Mu Jianhong, Feng Zhigang, Wu Gang, Niu Shouchang, and
Chen Bin
Issue 02 (2014-08-26)
FAQs
This chapter provides frequently asked questions about NG WDM ASON operation and
maintenance.
Change History
Issue
Date
Description
02
2011-08-26
01
2011-09-29
Issue 02 (2014-08-26)
ii
Contents
Contents
About this Document ..................................................................................................................... ii
1 SOP for Maintaining NG WDM ASON Devices.................................................................... 1
1.1 Introduction to SOP for ASON Maintenance ................................................................................................... 2
1.2 ASON Device SOP Checklist ......................................................................................................................... 12
Issue 02 (2014-08-26)
iii
Contents
Issue 02 (2014-08-26)
iv
Contents
4.1.7 Case 7: A Newly Created ASON Service Fails to Traverse a Node or an Existing ASON Service Fails
to Traverse a Node During Trail Optimization or Rerouting ......................................................................... 68
4.1.8 Case 8: LMP Protocol Check Fails Due to DCN Errors and Consequently Service Deployment Fails 69
4.1.9 Case 9: ASON Services Fail to Be Deployed Because Line Attenuation Is Excessively High ............. 71
4.1.10 Case 10: An Error Message Is Displayed When Users Attempt to Create Virtual TE Links .............. 72
4.1.11 Case 11: Route Computation Fails After Explicit Resources Are Specified to Create or Optimize an
ASON Service ................................................................................................................................................ 75
4.2 ASON Service Restoration ............................................................................................................................. 77
4.2.1 Case 1: An ASON Service Is Interrupted Because OPA Fails ............................................................... 77
4.2.2 Case 2: An ASON Service Is Interrupted Because Protection Switching Fails After a Second Fiber Cut
....................................................................................................................................................................... 78
4.2.3 Case 3: An ASON Service Is Interrupted After Being Rerouted Because of Incorrect Fiber Connections
....................................................................................................................................................................... 79
4.2.4 Case 4: An ASON OCh Trail in a Slave Subrack Is Interrupted but Not Rerouted After the Slave
Subrack Is Powered Off ................................................................................................................................. 81
4.2.5 Case 5: An ASON Service Is Frequently Rerouted Among Multiple Trails.......................................... 82
4.2.6 Case 6: An Optical ASON Service Fails to Be Automatically Restored in Case of a Wavelength-Level
Fault ............................................................................................................................................................... 83
4.2.7 Case 7: An ASON Service Enabled with Scheduled Reversion Fails to Be Reverted to Its Original
Trail After the Scheduled Reversion Time Elapses ........................................................................................ 84
4.2.8 Case 8: Route Computation Fails When an ASON OCh Service Traverses a Regeneration Board ...... 84
4.2.9 Case 9: An ASON OCh Service Is Interrupted but Not Rerouted ......................................................... 86
5 FAQs .............................................................................................................................................. 90
5.1 Operations on the NMS .................................................................................................................................. 90
5.1.1 How to Distinguish Between ASON Services and Traditional Services on the NMS? ......................... 90
5.1.2 How to Identify the First and Last Nodes of an ASON Service? .......................................................... 91
5.1.3 How to Identify the Original Trail and Preset Restoration Trail of an ASON Service? ........................ 92
5.1.4 How to Manually Optimize an ASON Service? .................................................................................... 93
5.1.5 How to Manually Optimize ASON Services in Batches? ..................................................................... 94
5.1.6 How to Change a Wavelength to Optimize Trails? ............................................................................... 95
5.1.7 How to Obtain Fiber Connections of Boards that an ASON Service Traverses? .................................. 95
5.1.8 How to Quickly Locate the Board Where an OCH_SER_INT Alarm Is Generated? ........................... 95
5.1.9 How to Quickly Query the Current Trail or Preset Restoration Trail of an ASON Service that Traverses
a Specific NE or Board? ................................................................................................................................ 96
5.1.10 How to Check Whether a Preset Restoration Trail Is Available? ........................................................ 96
5.1.11 How to Check Whether an Optical Cross-Connection Is Successfully Created for an ASON Service?
....................................................................................................................................................................... 96
5.1.12 How to Quickly Restore an Interrupted Service? ................................................................................ 96
5.1.13 How to Quickly Create Fiber Connections Between Sites? ................................................................ 97
5.2 Configuration Rules ....................................................................................................................................... 97
5.2.1 What Are Common Attributes and Recommended Configurations for ASON Services? ..................... 97
5.2.2 What Are the Risks if ODU0, ODU1, and ODU2 ASON Services Are Concurrently Configured? ..... 97
5.2.3 What Are the Basic Rules for Configuring Preset Restoration Trails? .................................................. 97
Issue 02 (2014-08-26)
Contents
5.2.4 What Are the Recommended Configurations for the Preset Restoration Trails and Revertive Attributes
of SDH/OTN/WDM ASON Services? ........................................................................................................... 98
5.2.5 What Are the Restrictions on Regeneration Boards When Optical NEs and Electrical NEs Are
Separated and Why?....................................................................................................................................... 98
5.2.6 How to Ensure the Rerouting Function When No OA Board Is Configured Between an FIU Board and
a WSS Board? ................................................................................................................................................ 99
5.2.7 Why Are TN52SCC Boards Instead of TN51SCC or TN11SCC Boards Recommended for ASON NEs?
....................................................................................................................................................................... 99
5.2.8 What Are the Rules for Configuring Node IDs for ASON NEs? .......................................................... 99
5.2.9 Why Must the Node ID and IP Address of an NE Be in Different Network Segments? ..................... 100
5.2.10 Why Does the LMP or OSPF Protocol Need to Be Disabled on Electrical Links on OTU Boards
Adding or Dropping WDM ASON Services? .............................................................................................. 100
5.2.11 What Are the Application Scenarios and Configuration Method for Resource Reservation? ........... 102
5.2.12 Why Does the LMP Protocol Need to Be Disabled for Optical Ports on Tributary Boards? ............ 103
5.2.13 How to Disable the LMP Protocol for the Optical Ports that Are Not Used by ASON Services? .... 103
5.2.14 How to Disable the OSPF Protocol for the Optical Ports that Are Not Used by ASON Services? ... 103
5.2.15 How to Split a Large DCN Subnet into Smaller DCN Subnets? ....................................................... 104
5.2.16 Do Diamond, Gold, Silver, and Copper ASON Services Support Hitless Conversion? ................... 105
5.3 ASON Principles .......................................................................................................................................... 105
5.3.1 What Overheads Are Used by Control Channels on the Control Plane? ............................................. 105
5.3.2 What Is the Difference Between the Menu Items "Revert To Port" and "Revert to Channel"? .......... 105
5.3.3 What Is the Difference Between Trail Overlap and Trail Sharing? ..................................................... 106
5.3.4 What Is Associated Sharing? ............................................................................................................... 106
5.3.5 What Is the Relationship Between SRLGs and Associated Services? ................................................ 106
5.3.6 Why Is a Revertive Service Reverted to the Original Trail 5 Minutes After Rerouting and How to
Revert the Service to the Original Trail Within 5 Minutes? ......................................................................... 106
5.3.7 Does an OPA Adjust Failure Affect Rerouting of ASON Services? .................................................... 106
5.3.8 Why Cannot Revertive Services Be Downgraded After Rerouting? ................................................... 107
5.3.9 What Is the Difference Between the Function of Downgrading ASON Services in an NE Explorer and
the Function of Downgrading ASON Services in the Trail Management Window? .................................... 107
5.3.10 Do Service Optimization Must Be Performed at the First Node? ..................................................... 107
5.3.11 Why Do Services Fail to Be Reverted to the Original Trail? ............................................................ 107
5.3.12 Why Does Synchronization Between the NE and NMS Need to Be Performed During Each Query of
ASON Service Information? ........................................................................................................................ 108
5.3.13 Does a CPW_XXX_INT Alarm on the Control Plane Mean Service Interruption? ......................... 108
5.3.14 What Is the Difference in Database Backup and Restoration Between an ASON Network and a
Non-ASON Network? .................................................................................................................................. 108
5.3.15 Why Do I Need to Periodically Check Whether Rerouted ASON Services Are Reverted to the
Original Trails? ............................................................................................................................................ 108
5.3.16 What Are Residual Cross-Connections and How to Delete Residual Cross-Connections? .............. 108
5.3.17 What Do the CPW_OCH_SER_INT and CPW_ODUk_SER_INT Alarms Mean?.......................... 109
Issue 02 (2014-08-26)
vi
Precautions
ASON networks must be appropriately planned and designed so that the network is
robust and services can survive against multiple fiber cuts. For the detailed process, see
Figure 1-1.
Normal ASON network running and service security depend on meticulous routine
maintenance and periodical comprehensive assessment and optimization.
Issue 02 (2014-08-26)
Purpose
This chapter provides the SOP for preventive maintenance personnel so that they can discover
and eliminate network risks, thereby ensuring network stability and security.
Intended Audience
System maintenance personnel
Application
Device maintenance personnel perform the standard operations and activities for preventive
maintenance at suggested intervals.
Issue 02 (2014-08-26)
Sub-Item
ASON
databases
Checking
backup
databases
for ASON
NEs
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
5 min/NE
NE
databases
must be
backed up
once
network
changes
occur.
Manual
1. Verify that
the NMS can
successfully
back up NE
databases at
specified
intervals.
Minor
To ensure
that users
can
download
the NE
database to
recover the
node once a
node fault
occurs on
the
network.
Major
To ensure
that control
links are in
normal
state. If the
control link
topology is
incorrect,
you may
not be able
to create,
optimize,
It is
recommended
that the NMS
start backing
up databases
for
network-wide
NEs at 2 a.m.
every day.
2. After a
network
change
occurs, for
example,
many services
are deployed
or rerouting
occurs,
request the
customer to
arrange a time
window for
manually
backing up
debases for all
ASON NEs
on the
network.
ASON
resources
Checking
the status
of control
links
Issue 02 (2014-08-26)
5
min/100
links
Monthly
Manual
1. On the
NMS client,
navigate to
the ASON
control link
management
window and
synchronize
the control
link
information
network-wide.
Preventive
Maintenance Item
Item
Sub-Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
2. Check
whether
isolated NEs
are present in
the control
link topology
view. If there
are isolated
NEs, pinpoint
the cause and
restore the
NEs to
normal state.
In addition,
check for
alarms on the
isolated NEs
and clear
them one by
one.
Priority
Purpose
delete, or
reroute an
ASON
service.
3. Check
whether
abnormal
alarms are
generated on
the control
links. If there
are abnormal
alarms, locate
the boards
that report the
alarms and
clear the
alarms one by
one.
4. Export the
control link
information
into an excel
file. Compare
the control
link
information
with this
information
next time you
perform the
preventive
maintenance,
and check
Issue 02 (2014-08-26)
Preventive
Maintenance Item
Item
Sub-Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
Major
To ensure
that virtual
TE links are
available.
Major
To ensure
that TE
links are up.
If alarms
are
generated
on TE links,
recovery of
ASON
services
whether the
control links
are the same
in the two
inspections.
Checking
virtual TE
links
2 min/one
link
Monthly
Manual
1. On the
NMS client,
navigate to
the ASON TE
link
management
window and
synchronize
the TE link
information
network-wide.
2. Check the
value of
Extend Type
of TE links. If
the value is
not
Automaticall
y Discovered,
check for
alarms on the
source and
sink boards of
the TE links.
If alarms have
been
generated on
the source and
sink boards,
clear the
alarms
according to
the NMS
online help.
Checking
status of
TE links
Issue 02 (2014-08-26)
5
min/100
links
Monthly
Manual
1. On the
NMS client,
navigate to
the ASON TE
link
management
window and
synchronize
the TE link
information
Preventive
Maintenance Item
Item
Sub-Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
network-wide.
2. Check
whether
abnormal
alarms are
generated on
the TE links.
If there are
abnormal
alarms, locate
the boards
that report the
alarms and
clear the
alarms one by
one.
Priority
Purpose
will be
affected.
3. Check the
value of Link
Status of the
TE links. If
the value is
not Up, check
for alarms on
the boards
where the TE
links are
down. Then
clear the
alarms.
4. Export the
TE link
information
into an excel
file. Compare
the control
link
information
with this
information
next time you
perform the
preventive
maintenance,
and check
whether the
TE links are
the same in
the two
inspections.
Issue 02 (2014-08-26)
Preventive
Maintenance Item
Item
ASON
services
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
Checking
for
residual
cross-conn
ections
5min/one
cross-con
nection
Monthly
Manual
Check
whether a
CPW_XXXX
_TEL_PATH
MIS alarm is
generated
throughout
the network.
If there is a
CPW_XXXX
_TEL_PATH
MIS alarm,
clear it
according to
the NMS
online help.
Major
To ensure
that there
are no
residual
cross-conne
ctions. The
CPW_XXX
X_TEL_PA
THMIS
alarm may
affect
recovery of
ASON
services.
Checking
ASON
services
10 min
Daily
Manual
1. On the
NMS client,
navigate to
the ASON
trail
management
window; then
synchronize
ASON trail
information.
Critical
To ensure
that ASON
services are
normal.
Sub-Item
/network
2. Check
whether
ASON trails
are activated.
If an ASON
trail is
displayed as
Inactive,
check whether
a client
service is sent
to the ASON
trail. If no
client service
is sent to the
ASON trail,
no further
action is
required. If a
client service
is sent to the
ASON trail,
contact the
Issue 02 (2014-08-26)
Preventive
Maintenance Item
Item
Sub-Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
Major
To ensure
that there is
no control
plane alarm.
A control
plane alarm
may disable
ASON
services
from
running
customer for
further
confirmation
and take
records of the
confirmation
result.
3. Check
whether
rerouting
lockout is
disabled for
ASON trails.
If it is
disabled,
contact the
customer to
check why it
is disabled
and take
records.
4. Check
whether
alarms are
generated on
the ASON
trails. If yes,
clear them
according to
the NMS
online help.
5. Export the
ASON
information
into an Excel
file and save
it for future
reference.
Alarms
Clearing
control
plane
alarms
Issue 02 (2014-08-26)
5 min/one
alarm
Monthly
Manual
Check
whether an
alarm starting
with "CP" or
"CPW" is
generated
throughout
the network.
If there is
such an alarm,
clear it
Preventive
Maintenance Item
Item
Sub-Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
according to
the NMS
online help.
Issue 02 (2014-08-26)
Priority
Purpose
normally
and may
even
directly
affect
ASON
trails.
Preventive
Maintenance Item
Item
Sub-Item
ASON
events
Checking
for
abnormal
ASON
events
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
20
min/50
ASON
services
Monthly
Manual
1. Check
whether an
ASON service
rerouting
failure has
occurred
lately. (To
verify this
information,
click new
events icon in
the main
topology of
the NMS,
then browse
the events in
the Browse
Events Logs
[New
Events]
window.)
Critical
To monitor
network
operation. If
an ASON
rerouting or
re-creation
failure is
reported
frequently
during a
time period,
identify the
cause and
take
correspondi
ng
measures.
2. Check
whether an
ASON service
re-creation
failure has
occurred
lately. (To
verify this
information,
click new
events icon in
the main
topology of
the NMS,
then browse
the events in
the Browse
Events Logs
[New
Events]
window.)
Issue 02 (2014-08-26)
10
Preventive
Maintenance Item
Time
Require
d
Frequenc
y
Inspection
Method
Procedure
Priority
Purpose
Monthly
Manual
Download
pre-warning
notices from
the
http://support.
huawei.com
website and
perform the
workarounds,
preventive
measures, or
solutions
provided in
the notices.
Major
To remove
potential
risks
according to
officially
released
pre-warning
notices.
Monthly
Tool +
manual
1. Perform
preventive
maintenance
inspection
(PMI) of
ASON
networks
using a PMI
tool
(download the
latest tool
from the
http://support.
huawei.com
website) and
provide the
PMI result to
Huawei HQ
for filing.
Major
To ensure
that the
ASON
resources
and services
on an
ASON NE
are in good
condition.
Item
Sub-Item
Inspection
of potential
risks
according to
pre-warning
notices
Checking
potential
risks
based on
pre-warni
ng notices
30 min
Preventive
maintenanc
e inspection
Preventive
maintenan
ce
inspection
60
min/NE
/network
2. Analyze the
PMI result
according to
the PMI
guide. If there
are any
problems,
rectify them
immediately.
Issue 02 (2014-08-26)
11
PMI Item
PMI Sub-Item
Result
ASON databases
OK NOK POK
ASON resources
OK NOK POK
OK NOK POK
OK NOK POK
OK NOK POK
ASON services
OK NOK POK
Alarms
OK NOK POK
ASON events
OK NOK POK
Checking potential
risks based on
pre-warning notices
OK NOK POK
Preventive maintenance
inspection
Preventive maintenance
inspection
OK NOK POK
Issue 02 (2014-08-26)
Remarks
12
2.1 Overview
This chapter provides guidelines for recovering ASON services from service interruption or
node faults on an ASON network by assuming that the ASON network has been in operation,
aiming to guide the network maintenance personnel through the service recovery and fault
diagnosis processes.
This chapter assumes that the network maintenance personnel are skilled in fault diagnosis for
traditional WDM services on transport equipment and therefore focuses on the differences for
maintaining the ASON-capable WDM transport equipment and the traditional WDM transport
equipment.
Customer personnel: members of the operator's maintenance team. They are responsible
for routine maintenance of the ASON and handling of common faults.
Customer service personnel: members of Huawei's GTS team. They provide technical
support for the customer, and assist the customer in handling network faults.
R&D personnel: members of Huawei's R&D team. They assist the customer service
personnel in handling network faults.
2.
Issue 02 (2014-08-26)
13
3.
4.
5.
6.
Create records for software and hardware versions of each network node.
Fiber cuts
Fiber cuts may result in shortage of network resources, which will then cause service
interruption. If services are interrupted due to shortage of network resources, they cannot
be recovered through rerouting. To recover the services, users must repair the fibers first.
Configuration errors
A configuration error, such as incorrect clock configurations and port attributes, can also
lead to an interruption of ASON services.
Issue 02 (2014-08-26)
14
Domain
OCH optical
layer
MUT_LOS or BD_STATUS
alarms are generated on FIU
boards.
2.
An OTU board reports an R_LOS, OTUk_SSF, DEG, or EXC alarm or an FIU board reports a
MUT_LOS or BD_STATUS alarm when one or multiple services are abnormal.
The following ASON alarms have no direct impact on ASON services:
3.
4.
5.
Issue 02 (2014-08-26)
The root causes for other ASON alarms (for example, CPW_CLNT_SER_NOTOR,
CPW_OCH_SER_SLADEG, CPC_NODE_ID_CONFLICT, and
CPC_RSVP_NB_DOWN) are not related to traditional alarms, nor have direct
relationship with interruption of ASON services. For details about the causes and
methods for handling them, refer to the alarm reference manual.
15
Locate the interrupted service, and optimize the service onto the preset restoration trail or
revert the service back to the original trail (or switch the service to the protection trail)
after confirming the trail is available.
2.
If the preceding step fails, deactivate the service. Then create the service as a traditional
service.
3.
Check for traditional alarms that have caused the service interruption, and clear them.
The preceding steps are three essential actions for promptly recovering ASON services.
For the detailed service recovery procedure, see the next section.
To determine whether a trail is available, users can check whether an end-to-end trail is
available based on the network topology, fiber interruption symptoms, ASON topology
(TE link and control link), and service source or sink node.
NOTE
Check whether any configurations are changed before or after the service is interrupted.
These configurations include protocol parameter settings (OSPF, LMP, and RSVP parameters) that
affect the functions of the control plane, port attribute settings (port loopback, FEC/AFEC mode, and
port rate mode) that are related to the service, and service configurations (for example, service trail
configurations on the client side).
Check whether there is any service-affecting major alarm after the service is interrupted.
For example, check whether there is a new hardware damage alarm (HARD_BAD), fiber break
alarm (R_LOS and MUT_LOS), traditional service interruption alarm (LOF, and AIS), or ASON
service interruption (XXX_SER_INT).
2.
3.
What alarms (ASON or traditional alarm) are reported for the interrupted service?
Method: Check whether there are any new major alarms.
4.
5.
If multiple services are interrupted, do they all carry client services? Which services are
the most important? Are all interrupted services in the same direction?
Method: Directly confirm this information with the customer.
Issue 02 (2014-08-26)
16
6.
Which services are interrupted (source/sink nodes and boards)? What are the service
protection levels? Which trails does the service traverse? Do the interrupted services pass
any regenerators?
Method: View the information directly in the WDM ASON Trail Management window.
Click Original trail and Current trail in the lower left part of the window, and take the
screenshots for the required information.
7.
Has rerouting been triggered for the interrupted service? Has the rerouting succeeded or
failed?
Method: Directly click Event to view the detailed event information (alternatively, select
the specific event screenshot).
8.
Has any network configurations been modified, has any services been added or deleted,
or has service cutover been performed before the fault occurs?
Method: Directly confirm this information with the customer or maintenance personnel.
Issue 02 (2014-08-26)
17
1.
According to the customer feedback, determine when service interruption occurs, how
many services are interrupted in addition to the source and sink nodes and fiber status
(for example, fiber break) of the network.
2.
Choose Configuration > WDM ASON > WDM Trail Management to display the
WDM ASON trail management window. Identify the interrupted ASON service trail
according to the service source or sink node information and ASON service alarms. If
the CPW_OCH_SER_INT alarm is generated, clear the alarm to recover the services.
3.
4.
Issue 02 (2014-08-26)
18
there are inactive services. If there are inactive services, check whether they are the
faulty services. If yes, activate the services to recover them.
5.
If no ASON service interruption alarm has ever been reported, check for traditional
alarms that are generated on the source or sink NE during the service interruption time
period as specified by the customer. Then determine the affected boards and
corresponding ASON service trails according to the alarm information.
NOTE
For optical-layer services on WDM devices, only BD_STATUS or MUT_LOS alarms on FIU boards can
trigger rerouting. In other words, alarms on OTU boards cannot trigger rerouting.
In the WDM Trail Management window, select the service and click
.
Then save the service information (for example, the service attributes, current trail, and
original trail) as an excel file. The information will be subsequently used as reference for
fault diagnosis and service recovery.
2.
In the WDM Trail Management window, select the service. Check whether preset
restorations are configured for the service. If they are configured, take the screenshots
for the two preset restoration trails, because this information can be used as reference for
reconfiguring preset restoration trails for the service in future fault diagnosis. In addition,
check whether the current trail of the service is the same as one of the preset restoration
trails. Ensure that the preset restoration trails are explicit trails during service
optimization.
The following figure shows a screenshot for one of the preset restoration trails.
Issue 02 (2014-08-26)
19
3.
If no restoration trail is configured for the service, optimize the trail for the service.
During trail optimization, try to select a trail that has common nodes as the original trail
and specify it as the explicit trail for the service. Then check whether the service is
recovered. If a failure message is displayed or if a success message is displayed but the
service is not recovered, go to the next step.
4.
If the service fails to be optimized or reverted back to the original trail, check for
configuration errors (such as rerouting lockout and incorrect optical parameter settings,
link verification failure/unavailability of control links not triggered by alarms). Then
assess the current network topology, fiber status (for example, fiber breaks), and service
source or sink node. If idle trails are available for the service, deactivate the service and
then try to recover the service as a traditional service by creating a WDM trail for it. If
the service still fails to be recovered, go to the next step.
5.
Query the traditional alarms corresponding to all ASON service trails. If there are some
traditional alarms (for example, R_LOS, MUT_LOS, VOADATA_MIS, and
OPA_FAILED), clear the traditional alarms according to the traditional alarm handling
process. Then check whether the service is recovered. If not, go to the next step.
6.
For WDM ASON services, disable the function of computing optical parameters if the optical
parameter check fails when you are optimizing the service. Then optimize the service again.
To disable the optical parameters, right-click the service in the ASON trail management window, and
choose Disable Optical Parameters from the shortcut menu.
----End
Issue 02 (2014-08-26)
20
1.
According to the customer feedback, determine when service interruption occurs, how
many services are interrupted in addition to the source and sink nodes and fiber status
(for example, fiber break) of the network.
2.
Choose Configuration > WDM ASON > WDM Trail Management to display the
WDM ASON trail management window. Identify the interrupted ASON service trail
according to the service source or sink node information and ASON service alarms. If a
CPW_ODU3_SER_INT, CPW_ODU2_SER_INT, CPW_ODU1_SER_INT, or CPW_
ODU0_SER_INT is generated, clear the alarm to recover the services.
3.
4.
Issue 02 (2014-08-26)
21
5.
If no ASON service interruption alarm has ever been reported, ensure that the static
cross-connections are configured correctly for the client side of the NE. If alarms have
been generated on the client side, ensure that the client-side boards are free of any faults
or are online. Then, check for alarms and performance events on OTU boards. If there
are alarms, locate the affected ASON services and rectify the faults.
2.
In the WDM Trail Management window, select the service. Check whether preset
restorations are configured for the service. If they are configured, take the screenshots
for the two preset restoration trails, because this information can be used as reference for
reconfiguring preset restoration trails for the service in future fault diagnosis. In addition,
check whether the current trail of the service is the same as one of the preset restoration
trails. Ensure that the preset restoration trails are explicit trails during service
optimization.
The following figure shows a screenshot for one of the preset restoration trails.
Issue 02 (2014-08-26)
22
3.
If no restoration trail is configured for the service, optimize the trail for the service.
During trail optimization, try to select a trail that has common nodes as the original trail
and specify it as the explicit trail for the service. Then check whether the service is
recovered. If a failure message is displayed or if a success message is displayed but the
service is not recovered, go to the next step.
4.
If the service fails to be optimized or reverted back to the original trail, check for
configuration errors (such as rerouting lockout, link verification failure/unavailability of
control links not triggered by alarms). Then assess the current network topology, fiber
status (for example, fiber breaks), and service source or sink node. If idle trails are
available for the service, deactivate the service and then try to recover the service as a
traditional service by creating a trail for it. If the service still fails to be recovered, go to
the next step.
5.
Query the traditional alarms corresponding to all ASON service trails. If there are some
traditional alarms (for example, R_LOF, R_LOS, LOM, and BUS_ERR), clear the
traditional alarms according to the traditional alarm handling process. Then check
whether the service is recovered. If not, go to the next step.
Issue 02 (2014-08-26)
23
6.
----End
For an external network alarm, locate and handle the service faults on the client side. For
example, check whether the service receiving boards are faulty or offline and whether
errors have been generated in client signals.
2.
For an internal network alarm, handle the alarm as instructed in section 2.3.2 "Quick
Recovery Process for ASON Services (OTN Electrical-Layer Service)".
Issue 02 (2014-08-26)
24
If no control plane alarm indicating service interruption is reported, check for corresponding
traditional alarms according to the service interruption time, and clear them to recover the
services.
Workarounds
According to the fault occurrence time and the customer feedback, identify the interrupted
services and promptly recover the services by, for example, optimizing the services or
reverting them to their original trails. In events of a route computation failure or optical
parameter check failure, preferentially check whether there are sufficient network resources.
Normally, many factors can cause a shortage of network resources. They include:
configuration errors, fiber breaks, residual cross-connections, device hardware faults, node
faults, and software bugs. Currently, rerouting protection is provided for optical-layer services
using preset restoration trails. If a service is interrupted and cannot be automatically recovered
because of a route computation failure, manually recover them using the following methods:
1.
Find the current ASON service trail (including the preset restoration trail) of the service,
and deactivate the service.
2.
Create a static service over a traditional WDM trail and recover the service using
traditional commissioning methods. For an optical-layer service, try to use the original
trail or configure two preset restoration trails.
NOTE
If the preceding methods cannot recover the service, isolate the fault according to the symptoms (for
example, fiber breaks, residual cross-connections, hardware faults, node faults, and software bugs) and
then recover the service. For details about the troubleshooting procedure, refer to XXX Fault Location
and Handling Procedure.
Issue 02 (2014-08-26)
25
Troubleshooting Procedure
Step 1 Review the ASON service events that are generated during the time period in which the fault
occurs, and check whether the "Route computation failed" or "Optical parameter check failed"
error message is displayed.
Step 2 If the "Optical parameter check failed" error message is displayed, it is possible that the preset
restoration trails or the specified optical parameters about trails for service optimization are
configured incorrectly. If this is the case, disable the optical parameter check function for the
service, and optimize the service to recover it.
NOTE
The route computation of the optical-layer service may return the error prompt about the failure of
optical parameter check. Usually, the following optical parameters may be configured incorrectly: TE
link distance, FIU dispersion compensation, dispersion coefficient, PMD coefficient, and rated power of
the optical amplifier.
Step 3 If the "Route computation failed" error message is displayed, check whether all TE links are
in normal state in the ASON TE link management window. If there are some TE link alarms
(for example, CPW_XXX_TEL_DOWN, CPW_XXX_TEL_DEG, CPW_OMS_TEL_DEG,
and CPW_OMS_TEL_DOWN), check the fiber connection status on the network. If some
fibers have been broken, repair them. If no fiber has been broken, go to the next step.
Step 4 If all TE links on the network are in normal state, check for node faults. If the NE is
unreachable by the NMS or if there is a control plane alarm (for example,
CPC_OSPF_NB_DOWN and CPC_RSVP_NB_DOWN), then you can determine that a node
fault has occurred. Locate the node fault and rectify it. If no node fault has occurred, go to the
next step.
Step 5 If all network nodes are in normal state, check whether network resources are sufficient in the
following ways:
1.
Check whether there are any reserved resources or residual cross-connections on the
preset restoration trails or specified explicit trails.
2.
Check whether there are any resource inconsistency alarms (for example,
CPW_XXX_TEL_PATHMIS).
If there are some reserved resources or residual cross-connections, manually release the
reserved resources or delete the residual cross-connections.
If there are no residual network resources, go to the next step.
Step 6 If there are sufficient network resources, check whether network configurations are correct.
For an optical-layer service, check the fiber connections at the optical layer, for example, the
fiber connections between OAs, WSSs, OSCs, and FIUs. For an electrical-layer service, check
whether the configured cross-connect capacity is beyond the limit permitted by the tributary,
line, and cross-connect boards, and check whether any electrical boards are faulty. If the NE
configurations are correct, contact Huawei for support.
----End
Issue 02 (2014-08-26)
26
Workarounds
For "RSVP egress or ingress port is down", explicitly specify different egress ports to
optimize the service onto another trail so as to quickly recover the service.
The system returns the error prompt "Failure of label allocation". If the faulty node is the
source or sink node of the ASON service, attempt to recover the interrupted service by
optimizing the tunable wavelength for the optical layer. For the non-tunable NE
configurations, identify the resources that conflict with the cross-connections or
wavelengths of the current service and delete the conflicting resources to recover the
interrupted service.
Troubleshooting Procedure
Step 1 Review the ASON service events that are generated during the time period when the fault
occurs, and check whether the "Creation timeout", "RSVP egress or ingress port is down" or
"Label allocation failure" message is displayed.
Step 2 If the "Creation timeout" message is displayed, check whether network nodes successfully
communicate with each other. The nodes fail in control plane communication if the
CPC_OSPF_AUTH_ERR, CPC_OSPF_NB_DOWN, CPC_RSVP_AUTH_ERR, or
CPC_RSVP_NB_DOWN alarm is generated on the control plane. To restore the control plane
communication, clear the alarm. If the network nodes communicate successfully with each
other, check whether traffic congestion has occurred at the source node, sink node, or an
intermediate node of the service using the following methods:
1.
Run the nbb-set-debug-mode:open command to the node to enable the debug function
of the node.
2.
If the same triplet information is displayed in the command output every time, then traffic
congestion has occurred at the node.
If traffic congestion has occurred at the source or sink node, deactivate/activate the service or
perform a soft reset on the node to rectify the fault. If traffic congestion occurs at an
intermediate node, recover the service by optimizing it to an available trail, then perform a
warm reset on the node to rectify the fault. At this point, the "creation timeout" fault is
rectified.
Step 3 If the "RSVP egress or ingress interface Down" error message is displayed, first determine the
faulty node according to the rerouting failure event. Then check whether the interface control
Issue 02 (2014-08-26)
27
block of the control plane is in normal state and whether the RSVP interface information is
complete using the following methods:
1.
Run the mpls-get-if:all command to query the interface information. Check whether
there is an interface corresponding to the interrupted service and whether the interface
information is correct (DEFINED=0x7, ADMIN=UP, OPER=UP; the remote
information is not null).
2.
At this point, the "RSVP egress or ingress interface Down" fault is rectified.
Step 4 If the "Failure of label allocation" error message is displayed, check whether there are
conflicting resources on the network. Handle the fault using the following methods:
1.
Check whether there are any reserved resources or residual cross-connections on the
preset restoration trails or specified explicit trails.
2.
Check whether there are any resource inconsistency alarms (for example,
CPW_XXX_TEL_PATHMIS).
If there are some reserved resources or residual cross-connections, manually release the
reserved resources or delete the residual cross-connections.
At this point, the label allocation failure is resolved.
----End
Workarounds
After the ASON service is interrupted and the NE is unreachable by the NMS, identify the
cause. If the NE is unreachable by the NMS because the gateway NE is faulty, check whether
the secondary gateway is available or reconfigure another gateway NE for the unreachable NE.
Then try to recover the service using the Navigator if the NE can be managed by the
Navigator. The recovery steps are as follows:
Step 1 Query alarms on the add/drop boards and the source/sink nodes of the affected services.
Step 2 According to the source or sink node of the interrupted service, run a command to query and
determine the corresponding affected ASON service (determine the affected ASON service
according to the add/drop service board and ASON service alarm).
Step 3 Recover the interrupted service quickly be optimizing the service (or reverting the service
over to the original trail) manually.
Issue 02 (2014-08-26)
28
NOTE
Recover the service quickly by optimizing the service or reverting the service over to the original
trail.
Step 4 If neither the NMS nor the Navigator is available but trails are available for the service, you
can disable the SC2 and OA laser of the reachable node on an available trail to trigger service
switching, or perform a cold reset on the FIU and remove the line fiber to trigger service
rerouting, finally restoring the interrupted service.
----End
Troubleshooting Procedure
An NE may become unreachable by the NMS for many reasons: IP address conflict, subnet
mask errors, OSC/ESC communication fault between NEs, SCC hardware or software faults
of the NE, ECC storm, and faults of the customer's outband DCN. You can simply
troubleshoot this fault by determining whether the NE becomes unreachable by the NMS
because of the fault of the gateway NE. If yes, replace the faulty gateway NE. For other types
of faults, refer to the DCN Recovery Guide.
NOTE
Issue 02 (2014-08-26)
29
Workarounds
For details about the workarounds, see section 2.4.1 "Troubleshooting Process for Route
Computation Failures."
NOTE
If the interrupted single-wavelength ASON service does not initiate the rerouting process, it is possible
that no trail meets the requirement or a component (for example, board or fiber) is faulty (the current
ASON service only supports the rerouting switchover triggered by a line-class fault rather than a
channel-class fault).
Troubleshooting Procedure
Step 1 According to the fault information (for example, fault time, faulty node, and current alarm)
provided by the customer, determine whether a single ASON service or multiple ASON
services are interrupted through the MUT_LOS alarm and the R_LOS alarm of multiple
boards. If a single ASON service is interrupted, proceed to the next step.
Step 2 If the ASON service has the protection capability (traditional protection plus ASON
associated service), check whether the service is locked to an abnormal channel forcibly. If
yes, unlock the service and check whether the service is recovered. If a normal channel is
available, switch the service toward the normal channel forcibly. Alternatively, check whether
the switching conditions are configured correctly (SD). If not, correct the switching condition
configurations and check whether the service is recovered. If the service is still not recovered,
proceed to the next step.
Step 3 Check the interrupted ASON service. If rerouting is locked for the ASON service, unlock
rerouting. The ASON service may still fail to initiate the rerouting process (it is possible that a
channel alarm causes the service interruption; if the service has an available trail, optimize the
service manually), or the ASON service is still interrupted even after the ASON service
initiates the rerouting process (by default, an optical-layer alarm does not trigger service
rerouting). In this case, proceed to the next step.
Step 4 According to the interrupted service information, check whether the service trail must traverse
a repeater, and check whether the current trail has traversed a repeater. If the current trail does
not traverse a repeater, you can specify a repeater to optimize the service for restoring the
service. If the service trail does not need to traverse a repeater or the current trail has traversed
a repeater, but the ASON service is still interrupted, proceed to the next step.
Issue 02 (2014-08-26)
30
Step 5 Determine whether a service of the same wavelength causes any crosstalk. You can disable
the OTU at the source or sink node of the interrupted service or disable the laser of the trunk
board to determine whether another service board generates any same-wavelength crosstalk
section by section. Alternatively, ask the customer's maintenance personnel to check whether
the wavelength of a board is adjusted at the OTU of the source or sink node, on a trunk board,
or in an intermediate site of the service, or whether any service board is inserted. According to
the check results, adjust the conflicting wavelength to recover the interrupted service. If the
ASON service is still interrupted, or there is no conflicting wavelength, proceed to the next
step.
Step 6 Check whether the OTU configurations at the source node are consistent with those at the sink
node and whether the trunk board configurations (for example, wavelength, rate, FEC, and
TDC) between two ends are consistent with each other. If some configurations are
inconsistent, correct the configurations immediately. If the configurations are all consistent
but the ASON service is still interrupted, proceed to the next step.
Step 7 Check whether a power adjustment alarm is generated for the service trail, for example, an
OPA adjustment failure alarm (OPA_FAIL_INDI). If yes, check the corresponding
configuration information (including the WSS board, optical amplifier, and preset insertion
loss) of the OPA reference section. If there are some configuration errors, correct the
configurations as instructed by the System Commissioning Guide. If no adjustment failure
alarm is generated, or if the corresponding configurations are correct but the ASON service is
still interrupted, proceed to the next step.
Step 8 Determine the current trail for the interrupted service. Check the power of the OTU (including
the trunk board) at the source or sink node of the single-wavelength service. Then query the
current and historical performance of the OTU at the receiving or transmitting end. If the
current network is configured with an MCA board, use the MCA board to scan the
single-wavelength power in the service trail and check whether any power is abnormal in the
service trail. If the current network is not configured with any MCA board, check whether the
receiving power of the corresponding port is normal (performing a hardware loopback) on site.
If a faulty fiber or board is identified, replace the faulty fiber or board. Then, the ASON
service is recovered.
NOTE
When checking the faulty trail, note the following alarms and configuration information:
Multiplexer or demultiplexer board alarm and optical amplifier board alarm: RLOS, MUT-LOS,
BD_STATUS, MODULE_ADJUST_FAIL, MOD_COM_FAIL, WAVEDATA_MIS, IN_PWR_LOW,
IN_PWR_HIGH, OUT_PWR_HIGH, and OUT_PWR_LOW
Multiplexer or demultiplexer board and optical amplifier board configurations: insertion loss,
attenuation, rated power, gain, and dispersion compensation.
Rectify a board fault (if available) as instructed by the Board Replacement Guide, and rectify a fiber
fault (if available) as instructed by the Fiber Repair Guide.
----End
Issue 02 (2014-08-26)
31
In the ASON service management window, multiple ASON services are interrupted. You can
determine whether the service initiates the rerouting process according to the corresponding
performance event (rerouting success or failure event) and service alarm (ASON service
interruption alarm, "not over the original trail" alarm, and rerouting lockout alarm). The
possible symptom of service interruption is that the rerouting process is not initiated or the
ASON service is still interrupted even after the rerouting process is initiated.
Workarounds
For details about the workarounds, see section 2.4.1 "Troubleshooting Process for Route
Computation Failures."
Troubleshooting Procedure
Step 1 According to the fault information (for example, fault time, faulty node, and current alarm)
provided by the customer, determine whether a single ASON service or multiple ASON
services are interrupted by checking the related alarms, for example, the MUT_LOS alarm
and the R_LOS, IN_PWR_HIGH, or IN_PWR_LOW alarm of multiple OTUs. If multiple
ASON services are interrupted, proceed to the next step.
Step 2 Check each interrupted ASON service. If rerouting is locked for the ASON service, unlock
rerouting. if the ASON services initiate the rerouting process and are recovered, the fiber on
the line side is faulty. Therefore, rectify the line fault. If the ASON services still fail to initiate
the rerouting process or the ASON services are still interrupted even after the ASON services
initiate the rerouting process, proceed to the next step.
Step 3 If the ASON services are still interrupted even after the rerouting or switching process is
initiated, a board on the site is faulty, the line attenuation is extremely large, or the pigtail is
faulty. If multiple OTUs generate the R_LOS, IN_PWR_HIGH, or IN_PWR_LOW alarm, the
input optical power is obviously changed as compared with the historical performance. Check
whether the input optical power of the demultiplexer unit in the upstream of the OTU is
changed as compared with the historical performance. If the input optical power of the
demultiplexer board is not changed, check the pigtail between the demultiplexer board and the
OTU. If the pigtail is abnormal, replace the abnormal pigtail. If the pigtail is normal, check
whether the demultiplexer board is faulty and replace the faulty demultiplexer board.
Otherwise, proceed to the next step.
Step 4 In the drop direction on the site, check whether the input optical power of the demultiplexer
board is changed obviously. If yes, continue to check whether the input power and output
power of the optical amplifier board in the upstream are changed. If the output power of the
optical amplifier board remains stable, clean or replace the pigtail between the optical
amplifier board and the demultiplexer board. If the output power of the optical amplifier
board is changed and the input power of the optical amplifier board is stable, check whether
the optical amplifier board is configured with any gain and whether the laser is disabled. For
the OAU, also check whether the connection insertion loss between the TDC and the RDC is
changed. If the settings are all normal, you can determine that the optical amplifier board is
faulty and replace the faulty optical amplifier board. Otherwise, proceed to the next step.
Step 5 Check whether the input power of the optical amplifier board is abnormal. If the input power
is abnormal, check whether the input power or output power of the FIU is changed. If the
output power of the FIU is stable, check the connection insertion loss between the TDC and
the RDC and check whether the adjustable attenuation is changed for the OAU. If the
insertion loss and configurations are normal, clean or replace the pigtail between the optical
amplifier board and the FIU. If the output power of the FIU is changed, check the input power
of the FIU. If the input power is stable, you can determine that the FIU is faulty and replace
the faulty FIU. Otherwise, proceed to the next step.
Issue 02 (2014-08-26)
32
Step 6 Check whether the input power of the FIU is changed. If the input power is changed, check
whether the output power of the upstream FIU is changed. If the output power of the upstream
FIU is stable, check whether the line attenuation between the FIUs of two sites is changed.
Otherwise, proceed to the next step.
Step 7 If the output power of the upstream FIU is changed, perform the preceding steps to check the
upstream site continuously. Observe the following principles for troubleshooting: In the
direction contrary to the signal flow, check the power reported by the FIU, optical amplifier
board, and multiplexer board. According to the power change point, determine the board or
pigtail where the output power begins to change. If a board causes the power change, replace
the board. If a pigtail causes the power change, clean or replace the pigtail.
Step 8 If an attenuator board is available in the line, check whether the attenuator board is configured
correctly. If the actual attenuation of the attenuator board does not match the configured
attenuation, replace the attenuator board.
----End
Workarounds
If the unreachable node is an intermediate node of the interrupted service and a redundant trail
is available for service restoration, recover the interrupted service as instructed in section
2.4.1 "Troubleshooting Process for Route Computation Failures."
If the unreachable node is the source or sink node, or essential intermediate node of the
interrupted service, you must rectify the node fault while restoring the interrupted service.
Troubleshooting Procedure
Step 1 If the unreachable NE is an intermediate node of the interrupted service, handle the fault as
instructed in section "Workarounds". If the unreachable NE is the source or sink, or essential
intermediate node of the interrupted service, proceed to the next step.
Step 2 Check whether the fibers connected to the NE are all broken according to the alarm
information of an adjacent NE. If the fibers are all broken, restore the interrupted fiber links
first. If all fibers are normal, proceed to the next step.
Step 3 Check whether the DCN configurations of the network or the NE have been recently modified.
If yes, check whether the new IP address, mask, gateway, and DCC pass-through are correct,
and recover the interrupted service according to the correct configurations. If the DCN
configurations are not modified, proceed to the next step.
Step 4 Check whether the unreachable NE is only unreachable by the NMS intermittently. An NE
may be reset repeatedly because of abnormal software or configurations. If the NE is
configured with double SCC boards, attempt to recover the interrupted service by initiating
active/standby switchover for the SCC boards. If the NE is configured with only a single SCC
board or if the ASON service is still interrupted even after the active/standby switchover is
initiated, proceed to the next step.
Issue 02 (2014-08-26)
33
Step 5 If you cannot locate and rectify the fault by taking all the preceding steps, check the hardware
on site and make preparations for replacing boards and restoring the database. For details,
refer to the Appendix: ASON Node Troubleshooting Process.
----End
If a channel on an intermediate node is unavailable, you can switch the service on the channel to another
channel to preferentially recover the service. For details on how to recover the service, see the service
quick recovery process. This section only provides the service recovery process for add/drop channel
faults at the source or sink node.
Workarounds
Take the following workarounds when an add/drop channel at the source or sink node is
unavailable:
If other trails are available on the access side for carrying the service, help the customer
switch the service to one of the trails to preferentially recover the service.
If no other trails are available on the access side, replace the board that provides the
add/drop channel at the source or sink node and reconfigure an end-to-end trail to first
recover the service.
Troubleshooting Procedure
Step 1 Identify the interrupted ASON service according to the fault symptom (including the fault
occurrence time, faulty node, and current alarms).
Step 2 Switch the service to another available trail using the trail optimization function of the NMS,
or directly delete the ASON service and then reconfigure an end-to-end trail using the original
channel resources. If the control plane alarm or traditional alarm that indicates the service
interruption persists, go to the next step.
Step 3 Check for traditional alarms and determine whether the faulty channel is used for adding or
dropping a service according to the traditional alarm handling method. After identifying the
type of the faulty channel, go to the next step.
Step 4 If other trails are available on the access side for carrying the service, help the customer
switch the service to one of the trails and ensure that the service is recovered. If no other trails
are available on the access side, go to the next step.
Step 5 Replace the board that provides the add or drop channel at the source or sink node and
reconfigure an end-to-end trail for the service. After ensuring that the new trail is free of fault,
switch the service to the new trail.
Issue 02 (2014-08-26)
34
----End
Recover the interrupted service first if possible, and then rectify the node fault.
If the interrupted service cannot be recovered temporarily, rectify the node fault first and
then attempt to recover the interrupted service.
Simply speaking, if the faulty node is the source node or sink node of the interrupted service
or is an essential intermediate node, rectify the node fault first and then attempt to recover the
interrupted service. A node-class fault may be caused for the following reasons:
1.
Because the database configurations are lost or corrupted, the NE is reset repeatedly and
cannot be started. Subsequently, the NE enters the BIOS state, DCN mode, installation
mode, or database protection mode. In this scenario, the management plane, control
plane, and service plane all fail.
2.
Because the NE software is corrupted or lost, the SCC board undergoes abnormal warm
resets and cannot start. In this scenario, the management plane, control plane, and service
plane all fail.
Check whether a single NE, multiple NEs, or all NEs in the same subnet are unreachable
by the NMS.
2.
Check whether the NEs are unreachable by the NMS persistently or intermittently.
Many NEs are unreachable by the NMS on a large scale usually because of a DCN fault. If a
single NE or few NEs are unreachable by the NMS persistently, the probable cause is a node
fault. Therefore, further locate and handle the fault on the site. On the faulty site, you can
query the operating status of the NE according to the LED indicator of the SCC board or
using the LCT/Navigator to connect to the device.
NOTE
Check whether the operating status of the NE is Running by running the cfg-get-nestate command.
Issue 02 (2014-08-26)
35
service node, check whether the backup database is the updated to the latest. If the ASON
service is not interrupted, the database restoration process may cause service interruption.
For the preceding reasons, assess the risks of the database restoration solution when a node
fault occurs. The assessment items are as follows:
1.
Current service status (no service is interrupted, few services are interrupted, and many
services are interrupted)
2.
Extent to which the customer can tolerate the impacts of service interruption, and
whether the customer can provide a service maintenance window when services are
interrupted
If no service is interrupted, and the customer cannot tolerate the impacts of service
interruption, use a more complex recovery solution. For details, see section 2.5.4 "Restoring
the Configurations Manually." The section only describes the scenario in which a backup
database is available. The scenario in which a backup database is unavailable is equal to
Scenario 3. For details, see section2.5.5 "Restoration Process If No Backup Database Is
Available."
NOTE
As configured in the SCC boards, the following database restoration scenarios are available:
Scenario 1: Use the backup database in the CF card or on the NMS side
Scenario 2: New user configurations are available after the database is backed up
The prerequisite for the database restoration solution is that a backup database is available. Normally,
the updated database backs up the data of the previous day. Therefore, timed database backup and
manual database backup should be an integral part of routine maintenance of ASON services. The NE
data should be backed up every day, and the data of at least one recent month should be stored. After the
ASON service is operated, the database must be backed up manually.
Prepare a PC where the LCT, DC, or Navigator tool is installed, and copy the backup
database of the unreachable node (if there are multiple databases that are backed up at
different time on the NMS side, copy all the backup databases to the portable computer
and prevent corruption of the updated backup data).
2.
Make preparations for replacing the damaged SCC board as instructed in the section
"Replacing the SCC Board" in the Parts Replacement.
NOTE
Both active and standby SCC boards are available for each ASON node. If an ASON node is faulty, the
possible cause is that both the active and standby SCC boards are damaged. Preferably, prepare two
standby SCC boards.
3.
Obtain the DCN configuration data (including the NEID, NEIP, subnet mask, NODEID,
and OSPF IP address) from the NMS side, and use such configuration data to set basic
parameters of the system after the damaged SCC hardware is replaced.
4.
Save the alarm information of the faulty node on the NMS side and use the alarm
information for alarm comparison after the node fault is rectified. Check whether any
new alarms are available after the node fault is rectified.
5.
Save the ASON service information on the NMS side if the faulty node is the source
node of the ASON service; the ASON service information can be exported as a report.
After arriving the site, perform the following operations to replace hardware:
Issue 02 (2014-08-26)
36
1.
Replace the faulty SCC board or clear the database (for details, see section "Replacing
the SCC Board" in the Parts Replacement)
2.
After reaching the site, check whether the system works normally according to the LED
indicator status of the SCC board (for details, refer to the Hardware Description about
indicators) or the access device.
3.
If the active SCC board cannot work normally and the indicator status of the standby
SCC board is normal, remove the active SCC board and use the standby SCC board as
active to rectify the fault.
4.
If the standby SCC board works normally, check whether the interrupted service is
recovered through the NMS. If the standby SCC board also cannot work normally,
replace the SCC board (return the faulty SCC board to the R&D department for fault
location) as instructed in the section "Upgrading the SCC Board" in the Parts
Replacement.
5.
If no standby SCC board is available, clear the database of the SCC board through the
DIP as instructed in section "Upgrading the SCC Board" in the Parts Replacement (if the
SCC board nevertheless cannot be started normally, replace the SCC board).
Connect the portable computer to the Ethernet management port of the NE (NM_ETH of
the WDM product), and log in to the device by using the WEBLCT or Navigator.
2.
Set basic parameters for the NE, including the NEID, NEIP, subnet mask, NODEID, and
OSPF IP address.
Run the cm-set- submask command to set the subnet mask of the NE.
Run the cm-set-neid command to set the NEID parameter (Note: For certain products,
the SCC board is automatically reset once after the NEID parameter is set).
If the NE is started after the database is cleared, run the sftm-show-dir command to
query the backup database in the dbbackup directory in the CF card.
If the NE can be started normally, proceed to the next step to check whether the
interrupted service is recovered. If the NE cannot be started normally, clear the database
and attempt another database backup. If none of the backup databases is available,
proceed to the next step.
2.
Log in to the faulty NE through the NMS (because the SCC board is replaced or the
database is cleared, the NE user information is all lost; therefore, you must reconfigure
Issue 02 (2014-08-26)
37
After logging in to the NE, choose System > NE software management > NE data
backup or restoration from the main menu of the NMS client. Then, the NE view is
displayed.
Select the faulty NE, click Restore. The Restore view is displayed.
In the Filename drop-down list, select the file to be restored. Select the file that is
backed up on the most recent date for restoration. You can select Browse to view the
file.
After the database is restored successfully, proceed to the next step to check whether the
interrupted service is recovered. If the NE cannot be started normally, clear the database
and attempt another database backup. If none of the backup databases is available,
restore the configurations manually.
button becomes
In the ASON Trail Management window of the NMS, synchronize the ASON service,
check whether the ASON service alarm is cleared, check whether the service attributes
are correct, and check whether all interrupted static services are recovered. If some
ASON services are nevertheless interrupted, attempt to restore the ASON services as
instructed in section 2.3 "Quick Recovery Process for ASON Services". If the service
trails do not meet the original planning requirements, adjust the service trails through
optimization.
2.
Perform health check in the entire network, and ensure that the node fault is rectified
without causing other faults.
NOTE
During the node fault, the database on the NMS does not store the updated data if a dynamic service is
changed (for example, newly created, optimized, or rerouting). After the database is restored through a
backup database, it is possible that the service is lost or the service trail does not conform to the planning.
In this case, manual intervention is required.
----End
Issue 02 (2014-08-26)
38
----End
----End
Restoring the Configurations When the Database Is Not Backed Up in Real Time
Step 1 If the static cross-connections are not consistent with the information saved on the NMS, add
the missing static cross-connections.
Step 2 If the ASON cross-connections are not consistent, degrade the nodes in a mirroring
environment and add the missing static cross-connections.
----End
39
Step 2 Use the database backup that is handled in real time to restore the database for the faulty
node.
Step 3 Upgrade the previously degraded services into ASON services again.
----End
Issue 02 (2014-08-26)
40
Issue 02 (2014-08-26)
41
3.1.2 Flowchart
Issue 02 (2014-08-26)
42
Specific service that is interrupted, protection level and route of the service, and whether
the service traverses regeneration boards.
Whether the service is rerouted if the service is an ASON service and whether alarms
indicating that the service is not on the original path are reported.
Issue 02 (2014-08-26)
43
Step 2 Right-click in the Browse History Alarm window that is displayed, choose Select All from
the shortcut menu. Then choose Save > Save All Records to save all historical alarms into an
Excel file.
----End
Issue 02 (2014-08-26)
44
Identifying OTU Boards Where Abnormal Alarms Are Reported During the
Service Interruption
Check the abnormal alarms in the queried historical alarms to identify the OTU boards where
service-affecting alarms are reported during the service interruption and the start time and end
time of these alarms. Check for the following alarms on OTU boards:
If the listed alarms are not in the historical alarm list, the WDM side functions properly. When this
occurs, check for an electrical-layer service fault or a client-side fault. The detailed fault diagnosis is not
provided in this document.
If the service is not in the ASON service list, the service may be an end-to-end static service. When this
occurs, check for a static service fault. The detailed fault diagnosis is not provided in this document.
If the ASON service is successfully rerouted, the service is unavailable after being
rerouted to the current path from another path. When this occurs, see section 3.2
"Diagnosing the Fault that a Service Is Unavailable After Being Successfully Rerouted"
to check for a system fault but not a fault in the ASON protocol.
2.
If the ASON service is rerouted but rerouting fails, see section 3.3 "Diagnosing the Fault
that Service Rerouting Fails" to find the cause for the rerouting failure.
3.
If the ASON service is not rerouted (in other words, no rerouting event is reported), see
section 3.4 "Diagnosing the Fault that a Service Is Not Rerouted" to diagnose the fault
accordingly.
If there is no event during the service interruption in the historical event list, it is probable that
the communication between the NE and the NMS is abnormal or there are too many network
events. When this occurs, contact Huawei R&D engineers to collect the ASON log at the first
node of the faulty service to obtain the service rerouting information.
Issue 02 (2014-08-26)
45
3.2.2 Flowchart
Issue 02 (2014-08-26)
46
When this occurs, see section 3.5.1 "Diagnosing OPA Adjust Failures" to diagnose the fault.
Step 5 Check whether MUT_LOS alarms are reported on some optical-layer boards and R_LOS
alarms are reported on OTU boards on the path.
If the preceding alarms are reported, the possible causes are as follows:
Physical fibers are incorrectly connected. For example, fibers between the OSC and FIU
boards are incorrectly connected, fibers between optical and electrical subracks are
incorrectly connected in optical-electrical subrack separation scenarios, and fibers at a
site are incorrectly connected.
The OPA function incorrectly delivers attenuation. See section 3.5.2 "Diagnosing
Incorrect Attenuation Delivery of OPA" to diagnose and rectify the fault accordingly.
Step 6 Check for the following faults if no R_LOS but OTUx_LOF alarms are reported on the OTU
boards where the service is added and dropped or on the regeneration board:
The FEC types or ODUk rates on the OTU boards where the service is added and
dropped mismatch.
The wavelength configuration at the receive end mismatches that at the transmit end. Do
as follows to check the wavelength configuration of 40G boards at the receive end:
Choose Configuration > WDM Interface in the NE Explorer of the NMS, click By
Board/Port (Channel), and click the Advanced Attributes tab. If the returned
wavelength value is not 0xff (this means that the wavelength configuration at the receive
end has been manually modified) and the returned wavelength value is different from the
wavelength value configured at the transmit end, you can confirm that the fault cause is
wavelength configuration inconsistency.
Issue 02 (2014-08-26)
47
There are wavelength conflicts. When this occurs, check whether physical fibers on the
TN11RMU9 board are correctly connected.
Step 7 Check for the following faults if no R_LOS but OTUx_Exc and OTUx_Deg alarms are
reported on the OTU boards where the service is added and dropped or on the regeneration
board:
Step 8 Check for the following faults if COMMUN_FAIL, MOD_COM_FAIL, HARD_ERR, and
MODULE_ADJUST_FAIL alarms are reported on some boards on the path:
Issue 02 (2014-08-26)
48
3.3.2 Flowchart
49
Whether the feature of ASON route computation using optical parameters is enabled. If
the feature is enabled, disable the feature.
Basic configuration information such as FEC type, ODUk service rate, and optical
module information on OTU boards (including regeneration boards).
----End
The service is locked. (Confirm this possibility immediately after the service is
interrupted.)
The service is not a silver service. (Confirm this possibility immediately after the service
is interrupted.)
To diagnose the fault, collect the ASON logs on all NEs on the path.
Issue 02 (2014-08-26)
50
3.4.2 Flowchart
Issue 02 (2014-08-26)
51
Step 7 Check whether OPA adjust failure alarms are reported on the path. If yes, see section 3.5.1
"Diagnosing OPA Adjust Failures" to diagnose the fault and obtain operation logs on the NE
where the alarms are reported.
Step 8 Check whether COMMUN_FAIL, MOD_COM_FAIL, HARD_ERR, and
MODULE_ADJUST_FAIL alarms are reported on some boards on the path. If yes, the boards
reporting the alarms are malfunctioning.
----End
Permitted attenuation adjustment range and the specific adjustment values of the VOAs
and EVOAs on the path. The navigation path on the NMS is Configuration > WDM
Interface.
Insertion loss of the boards on the path. For the obtaining method, contact Huawei R&D
engineers.
Nominal input and output optical power of the OA boards on the path. The navigation
path on the NMS is Configuration > WDM Interface.
Input and output optical power of the malfunctioning OTU board. The navigation path
on the NMS is Configuration > Optical Power Management.
OPA preset insertion loss (available for OptiX OSN 8800 V100R005 and later versions)
on the path. For the obtaining method, contact Huawei R&D engineers.
Issue 02 (2014-08-26)
52
DCN configurations for optical and electrical NEs in separated optical and electrical NE
scenarios. For the obtaining method, contact Huawei R&D engineers.
Step 2 Assess whether the path information satisfies OPA adjust requirements.
For the OPA working principle, see OPA in the Feature Description manual for OptiX OSN
8800.
----End
Issue 02 (2014-08-26)
53
The bb1.log, OPLOG, and ERRLOG.log files in the OFS1 area on the active SCC
board.
The bb10.log file in the OFS1 area on the WSS board. If the WSS board is TN11 series,
obtain the bb10.log file of the board; if the WSS board is TN12 or TN13 series, obtain
the bb10.log file of the SCC board in the subrack housing the WSS board.
For example, to obtain the bb10.log file of the TN11WSM9 board, run
the :log-query:bid,"bb0.log" command. To obtain the bb10.log file of the TN12WSM9
board housed in slot 1 in subrack 2, run the :log-query:2-18,"bb0.log" command in
which 2-18 indicates the ID of the slot housing the SCC board in subrack 2.
Check for communication exceptions, abnormal board resets, abnormal intra-board routes,
and abnormal attenuation delivery records in the logs.
If there are faults occurring at the time when the service is interrupted, the faults are probable
causes to the service interruption. Generally, a communication exception is caused because a
board is not properly inserted or cables between subracks are incorrectly connected. Abnormal
board resets may be caused by inappropriate manual operations.
If there is no route information on the WSS board, the cause may be a communication failure.
Step 6 If no exception is found in the logs, check for the following faults:
Issue 02 (2014-08-26)
54
1.
The logical fiber connections are inconsistent with the physical fiber connections.
2.
A board is malfunctioning.
3.
----End
Step 2 In the Filter window that is displayed, select only the OCh check box in the Level area and
click Filter All.
The ASON Trail Management window is displayed.
Issue 02 (2014-08-26)
55
The following describes each area in the ASON Trail Management window:
Area 1 displays the sink and source NEs of each service, the OTU boards where services are
added and dropped, the wavelengths that carry services, and the current alarm status of each
service.
Area 2 displays the current path, associated path, preset restoration path, and original path of a
specific service after you select the service in area 1.
Area 3 displays the path of a specific service on the network topology after you select the
service in area 1.
Step 3 Locate the source and sink NEs, service adding and dropping OTU boards, and wavelength of
a service whose Alarm Status is Critical Alarm.
Step 4 After locating the service path, check whether Activation Status is Active, Class is Silver,
and Rerouting Lockout is Unlocked for the service. The service can be rerouted only when
these attributes are the specified values.
Step 5 In area 2, locate the FIU boards that the service traverses in two directions. Obtain the IDs of
slots housing the FIU boards and check whether MUT_LOS and BD_STATUS alarms are
reported on the FIU boards in the historical alarm list.
Issue 02 (2014-08-26)
56
CAUTION
----End
57
Historical alarms
Current alarms
Events
Logs in the mfs/log, mfs/ion, ofs1/log, ofs2/log, and ofs2/gcp directories on the SCC
board
CAUTION
The space for saving log files is limited and logs will be overwritten after they are saved for a
specified time. Therefore, collect the required logs as soon as possible in case of a service
interruption so that the logs that are recorded during the service interruption will be not
overwritten.
3.7 References
The following lists the reference documents and manuals:
Issue 02 (2014-08-26)
58
Network Topology
The following figure shows the network topology.
Issue 02 (2014-08-26)
59
Cause Analysis
A fault diagnosis shows that Resource Reservation is enabled for No. 50 wavelength at
SITE_A. After wavelength 50 at SITE_A is no longer reserved, Huawei has successfully
created the protection trail.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check for abnormal alarms at the sites on the protection trail. If there are no abnormal alarms,
go to the next step.
Step 2 Check TE links and ensure that they are up.
Step 3 Optimize the service that is successfully created (the service over wavelength 10) to the
protection trail. If this operation is successful, go to the next step.
Step 4 Check NE-level optical cross-connections of the sites on the protection trail. If no optical
cross-connections are configured for No. 50 wavelength, go the next step.
Step 5 Check whether the resources (wavelength 50 in this example) of a site on the protection trail,
for example, SITE_A, is reserved. If it is reserved, release it.
----End
Issue 02 (2014-08-26)
60
occupied by an ASON service during rerouting. If the service plan is changed, users must
release this wavelength for the FIU board at the site.
Network Topology
N/A
Cause Analysis
A fault diagnosis shows that two NEs on the network use the same OSPF IP address. In this
situation, the OSPF protocol runs abnormally. As a result, NEs on the network reset.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Review the reset logs to check whether the OSPF protocol runs normally when separate
optical and electrical NEs are configured. If the OSPF protocol runs abnormally, go to the
next step.
Step 2 Check the OSPF IP addresses of network-wide NEs. If two NEs use the same OSPF IP
address, correct the OSPF IP addresses for the two NEs.
----End
Issue 02 (2014-08-26)
61
Network Topology
N/A
Cause Analysis
A fault diagnosis shows that duplicated node IDs are used for site D. As a result, the OSPF
protocol repeatedly creates and deletes links and too many link change events are reported to
the NMS. Eventually, timeout occurs because of a large amount of data has been generated on
the NMS.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check the NMS logs to verify that an NE reports a large amount of (for example, 7000)
events within one second.
Step 2 Query the performance events. If there are link bandwidth change events and link basic
information events, check the basic attributes of the links by checking the parameters in the
performance events. If the basic attributes keep unchanged, go to the next step.
Step 3 Check link-related operations. If there are records showing that some links are deleted or
added for the same NE, it is probable that duplicated node IDs have been used for the links.
Step 4 Query the node IDs of all NEs using the NMS. If two NEs use the same NODEID,
reconfigure the node IDs for the two NEs according to the network plan.
----End
Issue 02 (2014-08-26)
More than one NE in one ASON domain is configured as the communication NE for
communicating with the NMS. (In general, only one NE should be configured as the
communication NE in one ASON domain. The following figure shows an example of
correct NE configuration in one ASON domain.)
62
2.
ECC messages are transmitted between different ASON domains. To verify this
information, you can check whether the source and sink NEs are in the same domain and
whether the OSPF protocol is enabled, as shown in the figure below.
3.
Duplicated node IDs are used for the NEs in the same ASON domain.
Issue 02 (2014-08-26)
63
Network Topology
The following figure shows the network topology.
Cause Analysis
A fault diagnosis shows that the logical board for the ND2 board in slot 12 at SITE_C is
retained after the ND2 board is moved from slot 12 to slot 2. As a result, the signaling
interface cannot be updated accordingly, leading to a failure to change the service protection
level back to diamond. After the logical board is deleted, the signaling interface is updated
accordingly. At this point, an operation of changing the service protection level back to
diamond can be performed successfully.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check the site name for which the NMS reports an error message. (In this example, the site
name is SITE_C.)
Step 2 Check information about the TE links between SITE_C and SITE_B to ensure that the TE
links are in normal state. In addition, verify that the ND2 board in slot 2 at SITE_C and the
ND2 board in slot 7 at SITE_B are mutual remote ends.
Step 3 Check the signaling interface information. If the remote end of the TE link on the ND2 board
in slot 2 at SITE_C is empty but the remote end of the TE link on the ND2 board in slot 12 at
SITE_C is the ND2 board in slot 7 at SITE_B, go to the next step.
Step 4 Check whether the logical board for the ND2 board that is originally installed in slot 12 at
SITE_C is retained. If yes, delete it, then change the service protection level back to diamond.
----End
Issue 02 (2014-08-26)
64
moving the line board, reconfigure the trail on which the line board is moved to another slot
as the working trail.
Network Topology
N/A
Cause Analysis
A fault diagnosis shows that a 10G optical-layer ASON service is configured for the OTU
board. This ASON service leads to a failure to delete the fiber connection between the OTU
board and the M40 board.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Identify the source and sink nodes, slots, and ports of the fiber connection to be deleted.
Step 2 Check the configurations of the source and sink nodes, boards, and ports. If an optical-layer
ASON service is configured for either the source or sink port, downgrade the ASON service
to a traditional service. Then delete the fiber connection.
----End
If an ASON service is configured for the fiber connection, first downgrade the ASON
service into a traditional service. Then delete the fiber connection.
2.
If no ASON service is configured but TE links at either end of the fiber connection are
reserved for ASON services, change the value of Revertive Mode to Non-Revertive for
all the ASON services on the NE, as shown in the following figure. Then you can delete
the fiber connection.
Issue 02 (2014-08-26)
65
Network Topology
N/A
Cause Analysis
There are the following possible causes for a route computation failure:
Issue 02 (2014-08-26)
66
1.
The control link is unreachable. In other words, there are no physical paths from the
source to the sink.
2.
A link fault has occurred. For example, if a link break or downgrade fault occurs when
the ASON software searches for routes between the source and sink nodes, the ASON
software will be unable to find the sink node.
3.
No idle channels are available. For optical-layer ASON, no end-to-end uniform idle
timeslots are available.
4.
The add wavelengths are duplicated with the drop wavelengths at the source or sink
node.
5.
Fiber connections are configured incorrectly for the source and sink nodes or for the
intermediate nodes.
6.
If regeneration boards are configured at the optical layer, the possible causes are:
a) Logical fiber connections are incorrectly configured for the regeneration boards.
b) The optical module types of the regeneration boards do not match the optical module
types of the add/drop boards.
c) The service rates of the regeneration boards do not match the service rates of the
add/drop boards.
d) The FEC settings of the regeneration boards are inconsistent with the FEC settings
of the add/drop boards.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check whether the control link is reachable. In the NE Explorer, choose Configuration >
WDM ASON > WDM Control Link Management to check information about the control
link. If node that the ASON service has to traverse is isolated, handle the control link fault to
ensure that the control link is reachable.
Step 2 In the NE Explorer, choose Configuration > WDM ASON > WDM Control Link
Management. Check all of the TE links that the ASON service may traverse and ensure that
Alarm Status is No Alarm and Link Status is Up for the TE links. If there are any TE link
faults, handle them before performing the next step.
Step 3 Check whether idle channels are available. First, determine the trails that the ASON service
may traverse through visual inspection. Then check the channel status (either on FIU or OTU
boards) for the trails one by one. In addition, ensure that channels are not reserved. The
following figure shows an example for navigating to the channel information on FIU boards.
Issue 02 (2014-08-26)
67
Step 4 Check whether add wavelengths are duplicated with drop wavelengths at the source or sink
node. Ensure that each wavelength is used for carrying only one service in the same direction.
Step 5 Ensure that all fiber connections are configured correctly. Focus on checking the fiber
connections for the newly inserted boards after the deployment commissioning.
Step 6 If regeneration boards are used, check the configurations of the regeneration boards. Ensure
that the fiber connections of the regeneration boards are configured correctly. Then check the
optical module types, service rates, and FEC settings of the regeneration boards to ensure that
they match those of the add/drop boards.
----End
Issue 02 (2014-08-26)
68
Network Topology
N/A
Cause Analysis
Computing an end-to-end trail can be successful only when the information about
network-wide links is complete regardless of whether information about a link interface of a
node is missing. When a trail for a service (either a newly created service or an optimized or
rerouted service) traverses the node, the control plane checks the correctness of the interface
information for the node. The check fails since the link information is missing and therefore
the end-to-end service fails to be created.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check the error code using the NMS or by running commands and verify that the error code
indicates that the outbound interface of the service is down.
Step 2 Identify the node where the interface is missing. Then use the NMS or run commands to
retrieve information about the node.
Step 3 Check whether the node information contains the interface information and whether the
interface information is complete.
Step 4 If the interface information is incomplete, perform a warm reset on the SCC board on the NE.
----End
4.1.8 Case 8: LMP Protocol Check Fails Due to DCN Errors and
Consequently Service Deployment Fails
Fault Description
A network uses Huawei's OptiX OSN 6800 devices, for which electrical-layer ASON is
enabled. NEs on the network are connected using optical amplifier boards.
SITE-A is the first site. When an attempt is made to create an ODU2 ASON silver service
between SITE-A and SITE-B, the ASON software responds with a route computation failure
message, and the NMS displays an error code of 40497.
Network Topology
The following figure shows the network topology.
Issue 02 (2014-08-26)
69
The following figure shows the fiber connections inside SITE-A and SITE-B.
Cause Analysis
A fault diagnosis shows that at SITE-A the line board connecting to SITE-B has DCN errors.
As a result, LMP protocol check fails for the line board and the TE links on the line board are
in abnormal state.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Specify SITE-A's line board that connects to SITE-B as the explicit board and create an
ASON service. If the system responds with a route computation failure message and the NMS
displays an error code of 40497, go to the next step.
Step 2 Check the links generated on the line board. If ODU2 links on the line board are not displayed
in the TE link management window, go to the next step.
Step 3 Check information about the line board. If lots of DCN errors are generated, replace the
board.
----End
Issue 02 (2014-08-26)
70
Therefore, before deploying an ASON service, ensure that the boards that the planned service
trail traverses are working properly.
Network Topology
The following figure shows the networking diagram.
NE2
NE1
NE3
Cause Analysis
A fault diagnosis shows that line attenuation between NE1 and NE2 is excessively high and
therefore the input optical power of FIU boards on NE2 is lower than the minimum value. As
a result, TE links between NE1 and NE2 are interrupted and the ASON services fail to be
deployed.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check TE links along the service flow. If a TE link is faulty, go to the next step.
Step 2 Create traditional services from NE1 to NE3. If creating traditional services fails, go to the
next step.
Step 3 Check optical power along the planned path. If the input optical power of a board (for
example, the FIU board on NE2) is lower than the minimum value, go to the next step.
Step 4 Check the upstream NE of the board (NE1in this example) and ensure the optical power at the
transmit end is within the permitted range. If no cross-connection is configured on the
upstream NE, the optical power at the transmit end may be insufficient. At this point, create
two NE-level static cross-connections on the upstream NE (NE1 in this example) as planned
so that the input optical power of the board (the FIU board on NE2 in this example) is within
the permitted range.
Step 5 After TE links are working properly, configure ASON services.
Issue 02 (2014-08-26)
71
----End
Network Topology
N/A
Cause Analysis
The possible causes are as follows:
1.
No OSPF IP address is configured for the out-band channel that is used for the
communication between optical NEs and electrical NEs.
2.
ETH control ports on optical NEs and electrical NEs are disabled.
3.
The Link Management Protocol (LMP) is enabled for the link where boards involved in
the required virtual TE links are located.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Verify that an OSPF IP address is configured for the out-band channel that is used for the
communication between optical NEs and electrical NEs.
Step 2 Verify that the OSPF of the ETH control ports on optical NEs and electrical NEs are enabled.
Issue 02 (2014-08-26)
72
Step 3 Verify that the LMP is disabled for the boards that function as edge points between optical
NEs and electrical NEs.
----End
Step 2 Set OSPF Protocol Status of the ETH control ports on optical NEs and electrical NEs to
Enabled.
Issue 02 (2014-08-26)
73
Step 3 Set LMP Protocol Status of the boards that function as edge points between optical NEs and
electrical NEs to Disabled.
Issue 02 (2014-08-26)
74
----End
Issue 02 (2014-08-26)
75
Cause Analysis
Explicit resources must be specified along the direction from the source to the sink of a
service. If explicit resources are specified in the reverse direction, reroute computation will
fail. If explicit resources are randomly specified without a specific direction, the ASON
software has to compute many possible routes, which extends computation time and
deteriorates performance.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Verify that explicit resources are specified in the direction from the source to the sink of
services. For example, for an ASON service from SITE_1 to SITE_5, route computation fails
if SITE_3 is specified as the first explicit node and SITE_2 as the second explicit node. This
is because explicit resources are not specified in the direction from the source to the sink.
----End
Issue 02 (2014-08-26)
76
Network Topology
The following figure shows the network topology.
Cause Analysis
A fault diagnosis shows that the attenuation between the OTU board and the FIU board at
SITE_A is larger than the maximum value. As a result, optical power adjust (OPA) fails and
therefore the service is interrupted after being rerouted to the preset restoration trail.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check for alarms on the trail to which the service is rerouted. If an OPA_FAIL_INDI alarm is
generated at a site (SITE_A in this example), go to the next step.
Step 2 On the NMS, obtain the optical power, attenuation, and insertion loss of each board, and the
permitted adjustment range of each EVOA over the trail
(TN12OBU2-TN11RDU9-TN13WSM9-TN12OAU1 in this example).
Step 3 Check whether the attenuation of the trail satisfies OPA requirements based on OPA rules. If
the attenuation of the trail is beyond the OPA range and only the service is transmitted over
the trail, change the attenuation of the VOA on the trail (the VOA of the TN12OAU1 in this
example) so that the attenuation of the trail is within the OPA range. Then activate the
service.
Issue 02 (2014-08-26)
77
----End
Network Topology
The following figure shows the network topology.
Cause Analysis
A fault diagnosis shows that the SNCP protection type is incorrect (SNC/N should be
configured other than SNC/I). As a result, protection switching fails after a second fiber cut.
Issue 02 (2014-08-26)
78
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Analyze the fault and check the two fiber cuts.
In this example, the first fiber cut occurs on the link between SITE_1 and SITE_B, triggering
SNCP protection switching. The service is successfully rerouted to a new trail (the green line)
and this trail becomes the working trail.
The second fiber cut occurs on the link between SITE_A and SITE_C. No other trails are
available and therefore rerouting fails, further leading to a protection switching failure on the
client side. Consequently, the service is interrupted.
Step 2 Pinpoint the cause for the protection switching failure.
In this example, after the first fiber cut occurs, the service is rerouted and traverses SITE_C
which is an electrical regeneration site. SNC/I protection is configured for the service. Note
that the regeneration site regenerates SM overheads and therefore SNC/I protection fails to
detect SM overheads. Consequently, protection switching fails to occur at SITE_B.
----End
Network Topology
The following figure shows the network topology.
Issue 02 (2014-08-26)
79
Cause Analysis
A fault diagnosis shows that this fault results from an incorrect fiber connection on the
network. In this example, a logical fiber connection is configured between port 1 on the OSC
board in slot 1 and the FIU board on the green line, but port 1 on the OSC board is physically
connected to the FIU board on the blue line. As a result, a cross-connection is created from
SITE_A to the FIU board on the blue line instead of the FIU board on the green line. As a
result, the source and sink of the service have different cross-connection information and the
service is interrupted after being rerouted.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Check for abnormal alarms on the trail to which the service is rerouted. (In this example, the
OTU board on the green line but not the FIU board reports an R_LOS alarm.) If an abnormal
alarm is generated, go to the next step.
Step 2 Check cross-connections at each site on the trail. In this example, optical cross-connections
configured at SITE_A do not match the physical fiber connections. If a cross-connection is
incorrectly configured, go to the next step.
Step 3 Check the ASON control link topology. If the topology is correct, go to the next step.
Step 4 Compare the physical fiber connections with the logical fiber connections for each site. In this
example, the physical and logical fiber connections of FIU boards are inconsistent. After
modifying the fiber connections, the fault is rectified.
----End
Issue 02 (2014-08-26)
80
Network Topology
N/A
Cause Analysis
Rerouting of an ASON OCh trail is triggered upon the following conditions: an FIU board on
the trail is offline or reports a MUT_LOS alarm. After a slave subrack is powered off, the FIU
board in the slave subrack cannot report alarms to the main control board in the master
subrack, failing to trigger service rerouting.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 In NE Panel on the NMS, check whether all boards in a slave subrack of an NE on the OCh
trail are offline. If all boards in a slave subrack are offline, go to the next step.
Step 2 Check the power cable connections and network cable connections of the NE. If there is a
cable connection error, remove it.
----End
Issue 02 (2014-08-26)
81
traverses are offline in NE Panel. Lastly, pinpoint the root cause based on the abnormal
alarms.
Network Topology
The following figure shows four points for monitoring channel alarms:
Point A monitors client-side alarms along the direction of the signal flow.
Point B monitors WDM-side alarms along the direction of the signal flow.
Cause Analysis
When multiple service trails have channel-level faults, an ASON service is rerouted from its
original trail where channel alarms are generated to a new trail. Then the alarms on the
original trail are cleared. Channel alarms, however, are generated on the new trail, triggering
service rerouting the second time. The service may be rerouted to its original trail (because
the original trail is optimal). When this occurs, service rerouting is triggered again. Then the
service is frequently rerouted among multiple trails.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Lock the rerouting function of the ASON service.
Step 2 Check for traditional alarms on the original trail of the ASON service and pinpoint the root
cause.
Step 3 Check for traditional alarms on the preset restoration trail (for automatic routing) of the
ASON service and pinpoint the root cause.
----End
Issue 02 (2014-08-26)
82
Network Topology
N/A
Cause Analysis
In case of a wavelength-level fault, such as a pigtail or board (an optical, OTU, or
regeneration board) fault, the ASON software monitors the optical layer from two aspects:
lines (OMS TE links) and OTU boards. See the following figure.
The ASON software cannot detect wavelength-level faults and therefore does not initiate the
automatic service recovery process.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Verify that a wavelength-level fault occurs and an ASON service is not automatically
recovered.
Step 2 Locate the fault by checking performance and alarms on boards on the trail that the service
traverses.
Step 3 Clear traditional alarms.
----End
Issue 02 (2014-08-26)
83
Network Topology
N/A
Cause Analysis
The ASON software attempts to revert a rerouted ASON service enabled with scheduled
reversion to the original trail at the scheduled reversion time. If the original trail fails to be
restored within the time, the ASON software no longer attempts to revert the service to the
original trail.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Verify that the original trail is restored.
Step 2 Specify the scheduled reversion time again.
After the specified time elapse, the service is reverted to the original trail.
----End
Issue 02 (2014-08-26)
84
Cause Analysis
ASON OCh services have the following requirements on regeneration boards:
1.
Regeneration boards must be configured only on optical NEs when optical NEs and
electrical NEs are separated.
2.
Regeneration boards are classified into two types: unidirectional regeneration boards
(such as LSXR and LSXLR) and bidirectional regeneration boards (such as ND2).
Unidirectional regeneration boards in two directions of a service trail must be configured
in paired slots; otherwise, the ASON software fails to identify the two regeneration
boards as a pair and therefore creates incorrect cross-connections or fails to create
cross-connections.
3.
The rate of a regeneration board and the type of the optical module on the regeneration
board must match those on boards adding or dropping services; otherwise, services are
Issue 02 (2014-08-26)
85
unavailable. This is because the ASON software selects only regeneration boards whose
rate and optical module type match the rates and optical module types of boards that add
or drop services.
4.
The FEC type of regeneration boards must be the same as the FEC type of boards adding
or dropping services. If the two FEC types are different, services may be unavailable.
5.
The ODUk rate of a regeneration board must be the same as the ODUk rate of boards
adding or dropping services. If the two rates are different, for example, a regeneration
board is with a lower rate (10.7G) while a board adding or dropping services is with a
higher rate (11.1G), services are unavailable.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Verify that the regeneration board for the ASON OCh service is configured on an optical NE.
Step 2 If unidirectional regeneration boards are configured on the ASON OCh trail, verify that the
unidirectional regeneration boards in the two directions of the trail are configured in paired
slots.
Step 3 Check whether the rate of the regeneration board and the type of the optical module on the
board match those of boards adding or dropping services. If the rate and the optical module
type of the regeneration board mismatch those of boards adding or dropping services, replace
the regeneration board.
Step 4 Check whether the FEC type of the regeneration board is the same as that of boards adding or
dropping services. If they are different, specify the same FEC type.
Step 5 Check whether the ODUk rate of the regeneration board is the same as that of boards adding
or dropping services. If they are different, specify the same ODUk rate.
----End
Issue 02 (2014-08-26)
86
Network Topology
The following figure shows the network topology.
Cause Analysis
An ASON OCh service is rerouted when an FIU board is offline or reports a MUT_LOS
alarm. When an ASON OCh service is interrupted but not rerouted, and the cause cannot be
found using common methods, check optical power on the trail. The service interruption may
result from abnormal optical power on the trail.
Troubleshooting Procedure
Use the following steps to diagnose the fault:
Step 1 Choose Configuration > WDM ASON > ASON Trail Management from the main menu.
In the WDM ASON Trail Management window that is displayed, right-click the interrupted
ASON OCh service and choose Query Relevant Optical Power from the shortcut menu.
Then export the optical port information of the trail from the NMS.
Issue 02 (2014-08-26)
87
Step 2 Among the optical power information, check whether there is a board whose Input Power or
Output Power is 60. If such data is found, no light is input to or output by the board, which
is abnormal.
Issue 02 (2014-08-26)
88
Step 3 Locate the abnormal board and node, and check the attenuation of boards on the node along
the signal flow. Check for the following issues that may result in the fault:
1. The attenuation of VOAs on OA boards is excessively high.
2. The attenuation of the port on a multiplexer board (RMU9 or WSM9) is excessively high.
3. The attenuation of EVOA boards is excessively high.
----End
Issue 02 (2014-08-26)
89
5 FAQs
FAQs
Choose Configuration > WDM ASON > ASON Trail Management from the main
menu. The services that are displayed are ASON services.
2.
Choose Service > WDM Trail > Manage WDM Trail from the main menu. Among the
trails that are displayed, trails whose WDM ASON Trail is Yes are ASON services;
trails whose WDM ASON Trail is No are traditional services.
Issue 02 (2014-08-26)
90
5 FAQs
Step 2 In the displayed service list, NEs in the Source column are the first nodes of services and NEs
in the Sink column are last nodes. Or, you can obtain the first and last nodes of a service from
the Route View field. In this field, the NE whose arrow is upwards is the first node; the NE
whose arrow is downwards is the last node.
Issue 02 (2014-08-26)
91
5 FAQs
----End
Issue 02 (2014-08-26)
92
5 FAQs
Step 2 In the red box shown in the following figure, the Actual Route, Original Route, Associated
Route, Preset Restoration Trail 1, and Preset Restoration Trail 2 tabs are displayed.
----End
Issue 02 (2014-08-26)
93
5 FAQs
Issue 02 (2014-08-26)
94
5 FAQs
Issue 02 (2014-08-26)
95
5 FAQs
On the NMS, however, you cannot query the preset restoration trails of ASON services that
traverse a specific NE or board.
Directly viewing the TE links where the preset restoration trail traverses:
In the WDM ASON Trail Management window, select a service to view the preset
restoration trail, and check whether all the TE links where the preset restoration trail
traverses are available. The preset restoration trail is unavailable when a TE link is
interrupted. If no TE link is interrupted, the preset restoration trail is available.
Check whether the trail of the ASON service is correct in the WDM ASON Trail
Management window
Issue 02 (2014-08-26)
96
5 FAQs
service to the other LSP. If you cannot restore the service on the NMS, you can disable the
main optical path and the laser at the transmit end of the SC2 board. You can also remove the
pigtail from the line side of the FIU board so that a MUT_LOS alarm is reported on the FIU
board. The MUT_LOS alarm triggers ASON service rerouting and then you can select a new
service trail.
5.2.2 What Are the Risks if ODU0, ODU1, and ODU2 ASON
Services Are Concurrently Configured?
In automatic mode, network resource distribution policies become complex when
multi-granularity services (that is, ODUk, where K can be 0, 1, or 2), are configured.
Small-granularity services may discontinuously occupy bandwidth for large-granularity
services, which affects the survivability of large-granularity services.
5.2.3 What Are the Basic Rules for Configuring Preset Restoration
Trails?
The basic rules for configuring preset restoration trails are as follows:
1. Preset restoration trails must be planned to ensure that end-to-end service performance such
as OSNR satisfies requirements.
2. If possible, you are advised to configure two preset restoration trails for each service.
Issue 02 (2014-08-26)
97
5 FAQs
For electrical cross-connections, configure SNCP with the dual feeding and selective
receiving function for electrical cross-connections to ensure that ASON services are
automatically reverted and the service performance is reliable.
2.
For optical cross-connections on optical paths, the dual feeding and selective receiving
function is restricted. To create cross-connections at service adding or dropping sites or
regeneration sites, you must delete the original cross-connections. In addition,
optical-layer services occupy large bandwidth and service trails cannot be frequently
switched until the network is stable. Therefore, manual reversion is recommended for
optical-layer services.
Issue 02 (2014-08-26)
98
5 FAQs
5.2.8 What Are the Rules for Configuring Node IDs for ASON
NEs?
Before the ASON feature is enabled on an NE, you must set the node ID for the NE because
the node ID is the unique identifier of an NE on the control plane. Comply with the following
rules when configuring the node IDs of ASON NEs:
Issue 02 (2014-08-26)
99
5 FAQs
Ensure that the node ID of an ASON NE is in the format of x.x.x.x (x ranges from 1 to
254), and cannot be in the same network segment as the IP address and OSPF IP address
of the NE.
Configure the node ID for an NE before enabling the ASON feature on the NE.
Do not change node IDs after a network is in use. If you need to change node IDs, ensure
that there are no ASON services.
Issue 02 (2014-08-26)
100
5 FAQs
You are advised to disable the electrical-layer ASON feature on the network for which only
optical-layer ASON is enabled. The procedure is as follows: In the WDM ASON Topology
Management window, click the Enable Electrical-Layer ASON Feature tab. On the tab,
select No from the Enable Electrical-Layer ASON Feature drop-down list for the required
NE.
Issue 02 (2014-08-26)
101
5 FAQs
Application Scenarios
Scenario 1: Some wavelengths on a link are configured for carrying static services, but no
static services are added currently. The wavelengths must be reserved to prevent them from
being used by ASON services.
Scenario 2: When ASON is enabled in multiple domains such as SDH, OTN, and WDM
ASON domains, some resources need to be reserved to ensure that resources used by ASON
services at multiple layers are independent of each other and therefore avoid association
between ASON services at multiple layers.
Issue 02 (2014-08-26)
102
5 FAQs
5.2.13 How to Disable the LMP Protocol for the Optical Ports that
Are Not Used by ASON Services?
You can disable the LMP protocol to avoid unnecessary link verification on an ASON
network. Perform the following operations to disable the LMP protocol: In the NE Explorer,
choose ASON > Advanced Maintenance in the navigation tree, click the LMP Protocol
Status tab, and select Disabled from the LMP Protocol Status drop-down list.
NOTE
On an optical-layer ASON network, you are advised to enable the LMP protocol for optical ports on the
FIU board and disable the LMP protocol for optical ports on other boards.
On an electrical-layer ASON network, you are advised to enable the LMP protocol for optical ports on
the OTU boards adding or dropping ASON services and disable the LMP protocol for optical ports on
other boards.
5.2.14 How to Disable the OSPF Protocol for the Optical Ports that
Are Not Used by ASON Services?
On a network where only optical-layer ASON is applied, you can disable the OSPF protocol
for optical ports of boards at the electrical layer to lighten NE load. In other words, you can
disable the OSPF protocol for the optical ports on OTU boards. Perform the following
operations to disable the OSPF protocol: In the NE Explorer, choose ASON > Advanced
Maintenance, click the OSPF Protocol Status tab, and select Disabled from the OSPF
Protocol Status drop-down list.
Issue 02 (2014-08-26)
103
5 FAQs
If all the NEs on a DCN subnet are OptiX OSN 8800 V100R002, OptiX OSN 6800
V100R004C04, or later versions, a maximum of 200 NEs is allowed when the ASON
protocol is disabled and a maximum of 100 NEs is allowed when the ASON protocol is
enabled for all the NEs on the DCN subnet.
When some NEs on a DCN subnet are OptiX OSN 8800 V100R001, OptiX OSN 6800
V100R004C02, or earlier versions, a maximum of 100 NEs is allowed, regardless of
whether the ASON protocol is enabled.
Multiple GNEs can be configured on a DCN subnet. The GNEs can share the traffic between
non-GNEs and the NMS. The non-GNEs under each GNE are specified manually by
configuring the GNE as the primary GNE of them. A GNE can connect to at most 60
non-GNEs (50 non-GNEs recommended). If there are more than 60 non-GNEs, another GNE
must be configured. The non-GNEs refer to equivalent NEs.
When the number of NEs on a DCN subnet exceeds the upper limit, the DCN subnet must be
split into smaller DCN subnets. Two methods are available for splitting a DCN subnet:
horizontal split and vertical split.
In the horizontal split method, a DCN subnet is split based on the service domains to which
NEs belong. NEs in a WDM domain can be grouped into a subnet, and NEs in an SDH
domain can be grouped into a subnet. In the vertical split method, a DCN subnet is split based
on the physical locations of NEs or based on the network topology, regardless of whether NEs
belong to the same service domain.
Issue 02 (2014-08-26)
104
5 FAQs
To revert an electrical-layer ASON service to its original trail, you can choose either Revert
To Port or Revert To Channel. To revert an optical-layer ASON service to its original trail,
you can choose only Revert To Wavelength.
Revert To Port: If you choose Revert To Port, the port of the new trail is the same as the
port of the original trail, but the channels may be different.
Revert To Channel: If you choose Revert To Channel, the ports and channels of the new
trail are the same as those of the original trail.
Revert To Wavelength: If you choose Revert To Wavelength, the ports and wavelengths of
the new trail are the same as those of the original trail.
Issue 02 (2014-08-26)
105
5 FAQs
You are advised to choose Revert To Channel to revert an electrical-layer ASON service to
its original trail and choose Revert To Wavelength to revert an optical-layer ASON service to
its original trail.
Issue 02 (2014-08-26)
106
5 FAQs
OPA function is triggered to automatically adjust optical power to ensure successful service
trail computation. Therefore, even though the rerouting of ASON services is successful, OPA
adjustment may fail, and OCh optical paths of ASON services may be unavailable.
Issue 02 (2014-08-26)
107
5 FAQs
Issue 02 (2014-08-26)
108
5 FAQs
other reasons need to be deleted manually. Therefore, you must delete residual
cross-connections in a timely manner.
Issue 02 (2014-08-26)
109