Professional Documents
Culture Documents
OptiX RTN Microwave Equipment Maintenance Cases 04
OptiX RTN Microwave Equipment Maintenance Cases 04
V100
Maintenance Cases
Issue 04
Date 2019-06-20
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Contents
5 Typical Cases................................................................................................................................ 61
5.1 List of Cases................................................................................................................................................................. 61
5.2 Radio Link Faults......................................................................................................................................................... 62
5.2.1 Radio Link Interruptions Due to Multipath Fading (1)............................................................................................. 63
5.2.2 Service Bit Errors Due to Interference to Radio Links..............................................................................................64
5.2.3 Intermittent Link Interruptions Caused by IF Interference........................................................................................65
6 A Appendix...................................................................................................................................97
Overview
For assisting maintenance engineers in troubleshooting, this document describes how to
troubleshoot OptiX RTN products, and is organized as follows:
l Basic principles and common methods for locating faults
This chapter describes basic principles and common methods for locating faults. Each
method is illustrated using an example.
l Troubleshooting process and guide
This chapter describes the general troubleshooting process, fault categories, and how to
diagnose each category of faults.
l Equipment interworking guide
This chapter provides criteria for correct interworking between OptiX RTN products and
other products, and methods used for locating interworking faults.
l Typical cases
This chapter provides typical troubleshooting cases for helping maintenance personnel
improve their fault diagnosis capabilities.
l Appendix
This chapter provides references.
Intended Audience
This document is intended for:
l Technical support engineers
l Maintenance engineers
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Symbol Description
General Conventions
The general conventions that may be found in this document are defined as follows.
Convention Description
Update History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Description
Fault locating aims to narrow down the most likely areas for faults, since transmission
equipment faults affect services in a large area.
Table 2-1 lists the basic principles for locating faults. These principles are summarized based
on characteristics of transmission equipment.
External first, transmission Rule out external faults, for example, faults on power
next supply equipment or interconnected equipment, or cable
damage.
Network first, NE next Locate a fault to a radio site or a radio based on fault
symptoms.
High-severity alarms first, First handle high-severity alarms, such as critical alarms
low-severity alarms next and major alarms. Then handle low-severity alarms, such as
minor alarms and warnings.
Signal flow All scenarios This method helps locate a fault to a radio site or
analysis radio hop. Familiarity with service signal flows,
cable connections, and air-interface link
connections helps analyze fault symptoms and
locate possibly faulty points.
Alarm analysis All scenarios Alarms well illustrate fault information. Handle
alarms reported by faulty points immediately
after analyzing service signal flows.
Receive and Locating radio By analyzing the current and historical receive
transmit power link faults and transmit power on a radio link, determine
analysis whether any errors, for example, interference and
fading, exist on the radio link.
Loopback Locating a fault to This method is fast and independent of alarm and
a component or performance event analysis. It, however, affects
site section by embedded control channels (ECCs) and normal
section service running.
Replacement Locating a fault to This method does not require sound theoretical
a component or knowledge or skills but requires spare parts. It
board, or applies to nearly sites.
identifying
external faults
Tests using Isolating external This method provides accurate results. Before
instruments and faults and using this method, interrupt services.
tools addressing
interworking
issues
Use this method if you need to locate a fault to a site or link on a network or locate a fault to a
module.
Fault Symptoms
As shown in Figure 2-2, a microwave chain network was set up, and all 2G and 3G base
station services in an area were interrupted for approximately 10 minutes.
Procedure
Step 1 Checked the distribution of the NEs on which services were interrupted and the service flow
direction.
NE1704 converged the interrupted services, so the service interruption was related to
NE1704.
NE1704 reported an MW_CFG_MISMATCH alarm, and the Hybrid radio E1 capacity was
changed on NE1704 right before the services were interrupted. It was inferred that the
services were interrupted due to an E1 capacity mismatch between NE1704 and NE1705.
----End
Checking current and historical alarms, fault symptoms, and fault time helps narrow down the
most likely areas for faults, and helps locate a fault to a hop, site, or module.
The alarm and performance analysis method entails capabilities in using the NMS and
analyzing service signal flows.
Procedure
Step 1 Checked alarms.
Boards in slots 1, 5, 6, and 7 reported the HARD_BAD alarm.
l The PXC board in slot 1 reported a HARD_BAD alarm, whose parameters indicated that
the 38M clock was lost and the analog phase-locked loop (PLL) was unlocked.
l The boards in slots 5, 6, and 7 reported the HARD_BAD alarm, whose parameters
indicated that the 38M clock was lost and the PXC board in slot 1 was faulty. The fault
caused loss of the first 38M clock.
Step 2 Checked the XCP_INDI alarm.
The HARD_BAD alarm reported by the board in slot 1 triggered a switchover, causing the
SCC board to report an XCP_INDI alarm.
Step 3 Replaced the PXC board in slot 1.
The alarms cleared.
----End
Procedure
Step 1 Checked the ODU receive power that was recorded during the alarm period.
The difference between the maximum receive power and the minimum receive power was
more than 40 dB, and the minimum receive power was close to or less than the receiver
sensitivity. Therefore, it was inferred that the fault was caused by spatial fading.
----End
l Minimize the impact of multipath fading by using one of the following methods,
depending on the actual conditions:
– Use low capacity, low-order modulation schemes, and low bandwidths.
– Increase the height difference between antennas at both ends providing that line-of-
sight (LOS) is guaranteed.
– Add two antennas and configure an SD protection group.
2.6 Loopback
2.6.1 Application Scenarios
After a loopback is enabled at a point, signals that should be forwarded in normal cases are
routed to the signal source. If services are interrupted, loopbacks can be performed to narrow
down fault areas by checking whether each network section is in good condition.
Loopbacks can be software loopbacks or hardware loopbacks. Software loopbacks can be
inloops or outloops. For detailed loopback definitions, operation methods, and usage
restrictions, see the Maintenance Guide.
Procedure
Step 1 Analyzed the service signal flow.
Step 3 Set an inloop at the tributary board (point 1) on NE2, and connected an E1 bit error rate
(BER) tester to point A (third-party SDH equipment).
The E1 BER tester at point A read no bit error. It was suspected that the radio link between
NE1 and NE2 was faulty.
Step 5 Tested the radio link performance by setting an inloop at the tributary board (point 1) on NE2
and connecting an E1 BER tester to point B (OptiX OSN equipment).
NOTE
The E1 BER tester was connected to the OptiX OSN equipment and the corresponding E1 cross-
connections were modified, because NE1 had no E1 tributary port.
Step 6 Checked the interconnection configuration data on the OptiX RTN equipment, OptiX OSN
equipment, and third-party SDH equipment.
The preceding equipment used their own clock sources, and the clocks were not synchronized.
NOTE
All equipment on an SDH network must trace the reference clock. In the preceding example, the OptiX
RTN equipment, OptiX OSN equipment, and third-party SDH equipment are interconnected through
SDH ports. After E1 services are encapsulated and mapped several times, serious jitter may be generated
and results in bit errors. To resolve similar issues, plan and implement clock solutions when building
SDH or microwave transmission networks.
----End
2.7 Replacement
Fault Symptoms
See the following figure. Two sites, site A and site B, were interconnected using 2+0 radio
links. At each site, ODUs of the same type (with the same sub-band but different working
frequencies) were used. NE B-2 at site B frequently reported services alarms such as R_LOC
and R_LOF.
Procedure
Step 1 Checked historical performance events and the receive power within the period of alarm
reporting.
The receive power was normal. Because the alarms did not persist, loopback tests were
inapplicable. The replacement method could be used for fault locating. The receive end was
suspected faulty. However, it was difficult to replace an ODU. Because the 2+0 links used the
same type of ODUs, the IF cables at site B could be interchanged for fault locating.
Step 2 Interchanged the IF cables at site B and checked for alarms for two days.
NE B-2 still reported service alarms. Therefore, site B was not faulty, and site A was possibly
faulty.
Step 3 Restored the IF cable connections at site B, interchanged the IF cables at site A, and checked
for alarms for two days.
NE B-1 reported service alarms. Therefore, the IF cable connecting NE A-2 and ODU A-2
was faulty.
----End
Procedure
Step 1 Checked alarms.
The CONIFG_NOSUPPORT alarm indicating an incorrect frequency caused the
RADIO_MUTE alarm.
NOTE
Step 3 Changed the Tx frequency to a correct value based on the network planning information.
----End
Fault Symptoms
In the network shown in following figure, the NMS set up data communication network
(DCN) communication with NE1 and NE2 through the multiprotocol label switching (MPLS)
network. NE1 was connected to the MPLS network using a hub and communicated with the
MPLS network through the Open Shortest Path First (OSPF) protocol. The NMS pinged NE1
successfully but failed to ping NE2. Therefore, NMS could not reach NE2. The routing table
of NE1 indicated that NE1 did not learn routes to upstream NEs. The MPLS network had
multiple radio hops at its edge, but the fault occurred only between NE1 and NE2.
Procedure
Step 1 Connected the hub to a PC and used the data service packet sniffer to analyze the OSPF
packets received by NE1.
The designated router (DR) IP addresses in the OSPF packets were xx.xx.xx.1, but the IP
address of the NE that sent the DR packets was xx.xx.xx.2. Therefore, NE1 did not receive
any DD packets sent by the DR elected on the OSPF subnet. As a result, NE1 could not create
an adjacency with the DR and could not learn OSPF routes.
Step 2 Sniffed and analyzed OSPF packets at another OptiX RTN NE that was connected to the
MPLS network and was operating normally.
The OptiX RTN NE received OSPF packets from the DR. Therefore, an OptiX RTN NE fault
was ruled out.
Step 3 Increased the priority of NE1's gateway (IP address: xx.xx.xx.2) so the gateway became the
DR on the subnet.
NE1 learned OSPF routes, and NE2 was reachable to the NMS.
----End
l Storing all the statistics on the agent side and supporting offline manager operations
l Storing historical data to facilitate fault diagnosis
l Supporting error detection and reporting
l Supporting multiple manager sites
The OptiX RTN equipment achieves RMON using the following management groups:
Fault Symptoms
Figure 2-8 shows a mobile network, where OptiX RTN 600 V100R003s provided backhaul
transmission. Packet loss occurred when BTS1 at site 1 and BTS2 at site 2 were pinged from
the RNC, but did not occur when BTS3 at site 3 was pinged.
Procedure
Step 1 Suspected that the radio bandwidth between NE 3-002 and NE 3-003 was insufficient,
causing the loss of ping packets.
Step 2 Analyzed the RMON data of NE 3-002 to check whether packet loss was caused by
insufficient radio bandwidth between site 2 and site 3.
The maximum traffic volume of NE 3-001 already reached its maximum air interface
bandwidth (25 Mbit/s). Therefore, packet loss was caused by congestion. For details, see the
following figure.
----End
l Multipath fading prediction methods. Generally, the following methods are available:
– ITU-R-P.530-7/8 method: It is globally applicable.
– ITU-R-P.530-9 method: It is applicable to areas with high reflection gradients, for
example, the Middle East, the Mediterranean sea, and West Africa. It works with
the ITU-R-P.530-7/8 method. During the prediction, low availability is used as the
calculation result.
– KQ factor method: It is applicable to China (seldom used).
– Vigants-Barnett method: It is applicable to North America.
l Rain fading prediction methods. Generally, the following methods are available:
– ITU: It is globally applicable.
– R.K. Crane: It is applicable to North America.
– For a link covering several rain zones, it is recommended that you select the zone
with the heaviest rainfall for calculation.
Fault Symptoms
A radio link frequently but intermittently reported MW_RDI, R_LOC, and RPS_INDI alarms,
and HSB switchovers were triggered.
Procedure
Step 1 Queried historical receive power values of the radio link.
The receive power decreased to a value close to the receiver sensitivity when an alarm was
reported. Most alarms were reported during the night or in the early morning. When the
weather was favorable at noon, the receive power was normal. Therefore, intermittent radio
link interruptions were caused by multipath fading.
Step 2 Checked annual interruption time predicted for the radio link.
The actual annual interruption time was longer than the predicted time of 1877 seconds.
Therefore, the fading margin was insufficient.
The ITU-R-P.530-7/8 method was used. The area covered by the radio link was in the Middle
East, and therefore the ITU-R-P.530-9 method should be used.
Step 4 Used the ITU-R-P.530-9 method to predict annual interruption time without changing other
conditions.
The obtained value was about 175833 seconds, which was longer than the value obtained
using the ITU-R-P.530-7/8 method.
According to the preceding analysis, the actual annual interruption time was much longer than
the predicted time because an incorrect multipath algorithm was used in network planning.
Step 5 Planned this link using a correct algorithm and deployed 1+1 SD protection for the link. The
link availability met service requirements.
----End
Mark Explanation
3 Find causes of a fault with reference to section 2.2 Common Methods for
Locating Faults, determine the category of the fault with reference to
section 3.2 Fault Categories, and rectify the fault as instructed in the
corresponding section listed below:
l 3.3 Troubleshooting Radio Links
l 3.4 Troubleshooting TDM Services
l 3.5 Troubleshooting Data Services
l 3.6 Troubleshooting Microwave Protection
l 3.7 Troubleshooting Clocks
l 3.8 Troubleshooting DCN Communication
4 Contact Huawei local office or dial Huawei technical service hotline for
problem reporting and technical support.
When handling critical problems such as a service interruption, exercise the following
precautions:
l Restore services as soon as possible.
l Analyze fault symptoms, find causes, and then handle faults. If causes are unknown,
exercise precautions when you perform operations in case the problems become severer.
l If a fault persists, contact Huawei engineers and coordinate with them to handle the fault
promptly.
l Record the operations performed during fault handling and save the original data related to
the fault.
Radio link fault Radio links report link-related alarms such as MW_LOF and
RADIO_RSL_LOW, or have bit errors.
Time division multiplexing Radio links work normally but their carried TDM services
(TDM) service fault are interrupted or deteriorate.
Data service fault Radio links work normally but their carried data services
have packet loss or are unavailable.
Protection fault Protected radio links or their carried services are faulty, or
protection switching fails (no switchover is performed or
services are unavailable after switching is complete).
Fault Causes
Causes of radio link faults are classified into the following categories:
l Equipment faults, including indoor unit (IDU) faults, outdoor (ODU) faults, and power
faults
l Propagation faults, including fading, interference, and poor LOS
l Poor construction quality, including poor antenna/component installation, poor
grounding, and poor waterproofing
Troubleshooting Process
Figure 3-3 illustrates the process for diagnosing a radio link fault.
Rain When it rains, a link may be Increase link fading margin, use low
fading interrupted or deteriorate. frequency bands, or use vertical
polarization.
l Increase link fading margin for
rain zones L, M, N, P, and Q.
l Rain fading impairs radio links
that operate at high frequency
bands, especially frequency bands
higher than 18 GHz. Radio links
operating at frequency bands
lower than 10 GHz are not
affected. If rain fading is severe,
change radio links' operating
frequency bands, if necessary.
l Rain fading in horizontal
polarization is severer than that in
vertical polarization.
Poor LOS The receive power is always lower l If radio links or antennas are
than the designed power. blocked, adjust antenna mount
heights or positions to bypass
obstacles.
l Adjust deviated antennas.
Cause 1 The hardware is faulty. Analyze alarms and perform loopbacks to check
whether board hardware is faulty. If a board is
faulty, replace the board.
Cause 2 A radio link is faulty. On the NMS, find the occurrence period of the
fault and check whether any service alarm is
generated on the radio link. If a radio link alarm is
generated, first rectify radio link faults.
Cause 5 Power supply voltage Check whether the voltage of the external input
fluctuates, the power supply fluctuates or whether the equipment
grounding is improper, is grounded improperly.
or external interference
exists.
Cause Analysis
If services at all base stations on an entire network or in an area are interrupted, faults
probably occur at the convergence nodes that are interconnected with BSCs/RNCs. Therefore,
check for the following faults at convergence nodes:
l Board hardware fault
l Port fault
l Configuration error
l Equipment interconnection fault
l If this type of faults occurs, contact the maintenance personnel for the interconnected
equipment.
Before locating faults, collect data of all NEs that are possibly faulty, if possible.
1. Rule out hardware faults and radio link faults with reference to section 3.2 Fault
Categories and 3.3 Troubleshooting Radio Links.
2. Check whether upstream convergence ports at the convergence nodes report equipment
alarms.
If Then
These ports report any of the Clear the alarms as instructed in "Alarms and
following equipment alarms: Handling Procedures" in the Maintenance Guide.
l ETH_LOS
l LASER_MOD_ERR
l LASER_NOT_FITED
l ETH_NO_FLOW
3. Check RMON statistics about upstream convergence ports at the convergence nodes.
If Then
The ports receive data but do not The boards where the ports locate may be faulty.
transmit data In this case, go to the next step.
The ports do not receive data The interconnected equipment is faulty. In this
case, rectify the fault by following instructions in
chapter 4 Equipment Interworking Guide.
4. Check the Ethernet bandwidths provided by radio links at the convergence nodes.
If Then
Attributes of the service ports Set attributes for the service ports again (including
are incorrectly set port enabled/disabled, tag attribute, and default
VLAN) and check whether the services recover. If
not, go to the next step.
If Then
VLAN settings are inconsistent Re-set VLANs for the services and check whether
with actual services the services recover. If not, go to the next step.
----End
If the fault persists after all the preceding steps are performed, dial Huawei technical service
hotline or contact Huawei local office.
Fault Symptoms
Services at all base stations on an entire network or in an area experience packet loss. For
example, all Internet service users experience a low access rate, calls are delayed, ping
packets between BSCs/RNCs and base stations are lost, or artifacts appear in video services.
Cause Analysis
If services at all base stations on an entire network or in an area experience packet loss, faults
probably occur at convergence nodes (possibly OptiX PTN 1900 or OptiX RTN 950) that are
interconnected with BSCs/RNCs. Therefore, check for the following faults at the convergence
nodes (the possibility of service configuration errors is eliminated because the services are not
interrupted):
l Incorrect parameter setting (for example, mismatched working modes) for Ethernet ports
l Network cable or fiber fault
l Service traffic exceeding preset bandwidth
l Member link fault in link aggregation groups (LAGs)
l Oversized burst traffic
l Broadcast storm
l Inappropriate quality of service (QoS) parameter setting
Before locating faults, collect data of all NEs that are possibly faulty, if possible.
The convergence nodes report Clear the alarms as instructed in "Alarms and
alarms like ETH_LOS or Handling Procedures" in the Maintenance Guide.
experience alarm jitters If the alarms clear, check whether the fault is
rectified. If the alarms persist, go to the next step.
2. At the convergence nodes, check whether the ports used for interconnection and their
peer ports at the interconnected equipment are consistently set.
If Then
The ports' working modes are Change their working modes to the same and
inconsistent with their peer check whether the fault is rectified. If not, check
ports' working modes the next item.
The ports' physical states are Verify fiber connections or network cable
different from the settings connections at the ports. Then, enable the ports
again and check whether the fault is rectified. If
not, check the next item.
The ports' maximum Change the value of the MTU parameter to 9600
transmission unit (MTU) bytes and check whether the fault is rectified. If
settings are different from actual not, check the next item.
packet lengths
3. Check the traffic volume at each convergence port and each convergence node.
If Then
The total volume of traffic Split the traffic or increase the maximum
converged to a convergence bandwidth configured for the convergence node. If
node exceeds the maximum only a few service packets are lost (generally due
bandwidth configured for the to oversized burst traffic), check for historical
convergence node threshold-crossing events.
Check whether the fault is rectified. If not, check
the next item.
If Then
The burst traffic volumes at the Enable traffic shaping at the convergence ports
convergence nodes exceed the that are interconnected with BSCs/RNCs, and
maximum bandwidths check whether the fault is rectified. If not, check
configured for the convergence the next item.
nodes
4. Check whether QoS settings are appropriate if QoS policies are configured for the
convergence nodes or BSCs.
If Then
If the fault persists after all the preceding steps are performed, dial Huawei technical service
hotline or contact Huawei local office.
----End
Fault Symptoms
Services at some base stations in an area are interrupted. For example, all data, voice, and
video services at a base station or at a node that converges services from several base stations
are interrupted, or all ping packets between a BSC and its subordinate base stations are lost.
Cause Analysis
If services at some base stations are interrupted, certain equipment on the transmission link is
faulty. To diagnose the fault, check service continuity on the link and RMON counts of
service ports, determine the fault scope, and check for the following faults at those possibly
faulty nodes:
Before locating faults, collect data of all NEs that are possibly faulty, if possible.
1. Check service continuity on each branch of the faulty link to determine the fault scope.
If Then
The services from base stations The NE or its next-hop NE on the faulty link is
or OptiX RTN NEs to an NE on faulty. In this case, go to the next step.
the faulty link are available, but
the services from the faulty link
to the NE are interrupted
An NE on the faulty link If an NE on the faulty link transmits data but does
receives data but does not not receive data, check the traffic counts of its
transmit data, or transmits data next-hop NE. Repeat this operation until you
but does not receive data locate the NE that does not transmit data. The
located NE is considered a faulty NE. Then, go to
the next step.
If Then
3. At the faulty NE, check whether the port used for interconnection and its peer port at the
interconnected equipment are consistently set.
If Then
The port's working mode is Change the working mode to the same and check
inconsistent with its peer port's whether the fault is rectified. If not, check the next
working mode item.
If Then
The port's MTU setting is Change the value of the MTU parameter to 9216
different from the actual packet bytes and check whether the fault is rectified. If
length not, go to the next step.
If Then
The services are not configured Re-configure the services and check whether the
or are incorrectly configured services recover. If not, go to the next step.
If Then
Attributes of the service ports Set attributes for the service ports again (including
are incorrectly set port enabled/disabled, tag attribute, Layer 2/Layer
3 attribute, and default VLAN) and check whether
the services recover. If not, go to the next step.
7. Check the service VLAN. If the service VLAN is incorrectly set, re-set it.
If the fault persists after all the preceding steps are performed, dial Huawei technical service
hotline or contact Huawei local office.
----End
Fault Symptoms
Services at some base stations in an area experience packet loss. For example, some users
experience a low Internet access rate, calls are delayed, some ping packets between a BSC
and its subordinate base stations are lost, or artifacts appear in video services.
Cause Analysis
If services at some base stations experience packet loss, certain equipment on the transmission
link is faulty. To diagnose the fault, check service continuity on the link and RMON counts of
service ports, determine the fault scope, and check for the following faults at those possibly
faulty nodes:
Before locating faults, collect data of all NEs that are possibly faulty, if possible.
1. Check RMON counts of ports on the faulty link, and determine the fault scope by
comparing traffic volumes at involved NEs.
If Then
The volume of traffic received Consider the NE as a faulty NE and go to the next
by an NE is greater than the step.
volume of traffic transmitted by
the NE
The volume of traffic received Check the traffic volume at the next-hop NE.
by an NE is equal to the volume Repeat this operation until you locate the NE
of traffic transmitted by the NE, whose volume of received traffic is largely
but both volumes are too low different from its volume of transmitted traffic.
The located NE is considered a faulty NE. Then,
go to the next step.
The NE reports alarms like Clear the alarms as instructed in "Alarms and
ETH_LOS or experiences alarm Handling Procedures" in the Maintenance Guide.
jitters If the alarms clear, check whether the fault is
rectified. If not, go to the next step.
3. At the faulty NE, check whether the port used for interconnection and its peer port at the
interconnected equipment are consistently set.
If Then
The port's working mode is Change the working mode to the same and check
inconsistent with its peer port's whether the fault is rectified. If not, check the next
working mode item.
The port's MTU setting is Change the value of the MTU parameter to 9600
different from the actual packet bytes and check whether the fault is rectified. If
length not, check the next item.
If Then
The total volume of traffic Split the traffic or increase the maximum
converged to an upstream bandwidth configured for the port. Then check
service port exceeds the whether the fault is rectified. If not, check the next
maximum bandwidth configured item.
for the port
The burst traffic volume at an Enable traffic shaping for the port, and check
upstream service port exceeds whether the fault is rectified. If not, check the next
the maximum bandwidth item.
configured for the port
5. Check whether QoS settings are appropriate if QoS policies are configured for the faulty
NE.
If Then
If the fault persists after all the preceding steps are performed, dial Huawei technical service
hotline or contact Huawei local office.
----End
Fault Symptoms
A switchover in microwave 1+1 protection, triggered by a radio link fault or an equipment
fault, fails or is delayed.
Cause The microwave 1+1 protection group Check the current switching state and
1 is in the forced or lockout switching switching records of the microwave
state, causing a switchover failure. 1+1 protection group.
Cause In the microwave 1+1 protection Check the alarms reported by boards in
2 group, both the main and standby links the microwave 1+1 protection group,
are interrupted or both the main and and the current switching state of the
standby units are faulty, resulting in a microwave 1+1 protection group.
switchover failure.
Cause The NE is being reset or a switchover Check the alarms reported by the NE,
3 between the main and standby system switchover records of the main and
control boards just happens, resulting standby system control boards (OptiX
in a switchover failure or a delayed RTN 950/980 NEs support main and
switchover. standby system control boards), and
the current switching state of the
microwave 1+1 protection group.
Cause An RDI-caused switchover is triggered Check the alarms reported by the NE,
4 immediately after a switchover is and parameter settings, current
complete. As the RDI-caused switching state, and switching records
switchover needs to wait for the of the microwave 1+1 protection
expiration of the wait-to-restore group.
(WTR) timer (in revertive mode, the
waiting time is the preset WTR time;
in non-revertive mode, the waiting
time is 300s), the switchover is
delayed.
Cause In OptiX RTN 600 V100R005/OptiX Check the alarms reported by the NE,
5 RTN 900 V100R002C02 and later and the current switching state and
versions, anti-jitter is provided for switching records of the microwave
switchovers triggered by RDIs and 1+1 protection group.
service alarms, to prevent repeated
microwave 1+1 protection switchovers
caused by deep and fast fading. As a
result, some switchovers are delayed.
Cause 1 The microwave 1+1 protection group Check whether the revertive mode is
works in non-revertive mode. enabled for the microwave 1+1
protection group. If not, enable it.
Cause 2 The current switching state of the Check whether the current switching
microwave 1+1 protection group is state of the microwave 1+1
RDI, so an automatic revertive protection group is RDI. If yes,
switchover cannot take place. manually clear the RDI state.
Cause 3 When the microwave 1+1 protection Check whether boards in the
group is in the WTR state, the microwave 1+1 protection group
microwave 1+1 protocol detects that report hardware alarms. If yes,
the main unit is faulty. As a result, handle the alarms.
revertive switchover to the main unit
fails.
Cause 1 The SNCP protection group is in the Check the current switching state and
forced or lockout switching state, switching records of the SNCP
causing a switchover failure. protection group.
Cause 2 Both the working and protection Check the alarms reported by boards
channels in the SNCP protection in the SNCP protection group, and
group are unavailable, resulting in a the current switching state of the
switchover failure. SNCP protection group.
Cause 3 The NE is being reset or a switchover Check the alarms reported by the NE,
between the main and standby system the records of switchovers between
control boards just happens, resulting the main and standby system control
in a switchover failure or a delayed boards, and the current switching
switchover. state of the SNCP protection group.
Cause 4 On an SNCP ring formed by NEs Find the NEs whose NE software
using both SDH and Hybrid boards, versions are earlier than OptiX RTN
some NEs use the NE software 600 V10R005 or OptiX RTN 900
earlier than OptiX RTN 600 V100R002C02, and the NEs for
V10R005 or OptiX RTN 900 which E1_AIS insertion is disabled.
V100R002C02, or E1_AIS insertion
is disabled for some NEs.
Possible Causes
Possible causes of clock faults are as follows:
The OptiX RTN equipment provides various clock alarms to help locate clock faults. When a
clock system becomes faulty, rectify the fault based on reported alarms.
EXT_SYNC_LOS
No. Possible Cause Handling Procedure
Cause 1 The clock input mode (2 On the NMS, check whether the clock input
Mbit/s or 2 MHz) mode configured for the external clock
configured for an external source is the same as the actual clock input
clock source is different mode.
from the actual clock input If not, change the clock input mode for the
mode. external clock source. Then, check whether
the alarm clears.
Cause 2 A system control, switching, On the NMS, check whether the system
and timing board is faulty. control, switching, and timing board reports
hardware alarms like HARD_BAD.
If yes, clear the hardware alarms and then
check whether the EXT_SYNC_LOS alarm
clears.
Cause 3 A clock input cable is Verify that the clock input cable is correctly
connected incorrectly. connected.
Verify that the port impedance of the
equipment providing the clock source is the
same as the impedance of the clock input
port. If not, for example, a 75-ohm port is
connected to a 120-ohm port, install an
impedance coupler between the two ports.
Verify that the clock input cable is
disconnected or damaged.
Cause 4 The equipment providing a Check whether the equipment providing the
clock source is faulty. clock source is working correctly.
If not, use another equipment to provide a
clock source and then check whether the
alarm clears.
SYNC_C_LOS
No. Possible Cause Handling Procedure
Cause 2 Service signals tracing a On the NMS, check whether any signal loss
clock source are lost. alarms like ETH_LOS, MW_LOF, R_LOC,
and T_ALOS are reported.
If yes, clear these alarms and then check
whether the SYNC_C_LOS alarm clears.
LTI
No. Possible Cause Handling Procedure
Cause 2 A line/tributary/link clock On the NMS, check whether any signal loss
source is lost. alarms like ETH_LOS, MW_LOF, R_LOC,
and T_ALOS are reported.
If yes, clear these alarms and then check
whether the LTI alarm clears.
Cause 3 Clock sources are set to On the NMS, check whether clock sources
work in non-revertive or are set to work in non-revertive mode. If yes,
locked mode. As a result, change the mode to revertive and then check
after the currently traced whether the LTI alarm clears.
clock source is lost, On the NMS, check whether an
automatic switchover to a SYNC_LOCKOFF alarm is reported. If yes,
normal clock source fails. clear the SYNC_LOCKOFF alarm and then
check whether the LTI alarm clears.
SYN_BAD
No. Possible Cause Handling Procedure
Cause 1 The quality of the traced Replace the currently traced clock source
clock source deteriorates or and then check whether the alarm clears.
clock sources are If the alarm persists, check whether the input
interlocked. clock is correctly configured. If the
configuration is incorrect, correct the clock
configuration and then check whether the
alarm clears.
Cause 2 The alarmed board is faulty. On the NMS, check whether the alarmed
board also reports hardware alarms like
HARD_BAD and TEMP_OVER.
If yes, clear the hardware alarms and then
check whether the SYN_BAD alarm clears.
CLK_NO_TRACE_MODE
No. Possible Cause Handling Procedure
Cause 1 No system clock source On the NMS, check whether a system clock
priority list is configured, source priority list is configured.
and the NE uses its default If not, configure a system clock source
system clock source priority priority list and add available clock sources
list. to the list.
Fault NEs connected through their service ports like air interfaces, Ethernet ports,
symptoms and SDH ports are unreachable to their NMS.
Illustration
Handling l For cause 1, rectify service faults including hardware faults and radio
measures link faults.
l For cause 2, check whether the IDs, IP addresses, DCC channel
attributes, and inband DCN attributes of the NEs are modified before
they become unreachable to their NMS.
l For cause 3, replace the faulty system control boards.
Table 3-5 NEs connected through NMS/COM ports are unreachable to their NMS
Item Description
Fault NEs connected through NMS/COM ports are unreachable to their NMS.
symptoms
Item Description
Illustration
Handling l For cause 1, check the network cable of the NMS. If it is faulty, replace
measures it.
l For cause 2, check whether the IDs, IP addresses, DCC channel
attributes, and inband DCN attributes of the NEs are modified before
they become unreachable to their NMS.
l For cause 3, replace the faulty system control boards.
Illustration
Item Description
Handling l For cause 1, check whether the ID, IP address, DCC channel attributes,
measures and inband DCN attributes of the NE are modified before it becomes
unreachable to its NMS.
l For cause 2, verify that each NE on the DCN subnet has a unique ID and
IP address.
l For cause 3, check the routing table of the gateway NE. If the gateway
NE manages more than the recommended number of NEs, divide the
ECC subnet into several smaller ones.
l For cause 4, replace the faulty system control board.
6 Some NEs may occasionally become Verify that a minimum of 192 kbit/s
unreachable to their NMS. bandwidth is allocated to inband
DCN. If the allocated bandwidth is
lower than 192 kbit/s, packets from
the NMS may be lost.
7 Direct connection from the faulty NE Search for the IP address of the faulty
to its NMS fails. NE on the NMS. If the IP address is
not found, or if the IP address is
found but the NMS still cannot reach
the faulty NE, press the reset button
on the system control board of the
faulty NE.
If an OptiX RTN NE
interworks with
another NE that does
not support auto-
negotiation, configure
an identical working
mode (full-duplex or
half-duplex) and an
identical rate for the
interworking ports at
both ends.
E1/E3 port Cable The shield layer of the If the coaxial cable is grounded
grounding coaxial cable in different modes at the two
connecting two 75- ends, electric potential
ohm ports must be difference and bit errors may
grounded in the same occur.
mode at the two ends.
5 Typical Cases
Fault Symptoms
The received signal levels (RSLs) at both ends of a 1+1 SD cross-ocean radio link fluctuated
dramatically, leading to bit errors or even link interruptions.
Procedure
Step 1 Checked the alarms reported by NEs at both ends of the radio link.
The NEs did not report any hardware alarms but frequently reported radio link alarms and
service interruption alarms.
Step 2 Checked the RSLs of the main and standby ODUs at each end.
The RSLs of the main and standby ODUs at each end fluctuated dramatically, with a
fluctuation range over 30 dB. Therefore, the fault was possibly caused by multipath fading.
Step 3 Checked the network plans and the mounting height difference between the main and standby
antennas at each end.
The mounting height difference between the main and standby antennas at each end was only
4 meters, so space diversity performance was poor.
NOTE
To protect long-distance cross-ocean radio links against multipath fading, take the following measures
during network planning:
l Ensure that the fading margin is greater than or equal to 30 dB.
l Increase the mounting height difference between the main and standby antennas at both ends of a
1+1 SD radio link.
Step 4 Adjusted the mounting heights of the main antennas to 24 meters and those of the standby
antennas to 10 meters.
The following figure shows the simulation result and illustrates satisfactory diversity
compensation.
NOTE
The value of K generally ranges from 0.67 to 1.33. In this case, the RSLs of the main and standby
antennas are not correlated with each other. When designing mounting heights for main and standby
antennas, keep appropriate antenna spacing for minimizing the impact of reflection on radio links. When
reflection causes high attenuation on the main path, the attenuation on the standby path is low.
----End
Fault Symptoms
Bit errors occurred in the services carried by a 2.5 km long radio link between NE A and NE
B. Both NEs used antennas with a diameter of 0.6 meters and 15 GHz ODUs. The IF1 boards
on both NEs worked in mode 5 (28 MHz/QPSK).
Procedure
Step 1 Checked the alarms and logs of the two NEs.
The NEs did not report any hardware alarm. NE A reported an MW_FEC_UNCOR alarm, but
NE B did not.
The RSL at NE A was –62 dBm and that at NE B was –70 dBm. These two values were
greater than the receiver sensitivity (–85 dBm) in mode 5.
Step 4 Used one of the following methods for eliminating interference signals:
l Using frequencies that are not affected by interference signals (tests showed that the sub-
bands supported by the ODU were all interfered).
l Using antennas with a diameter greater than 0.6 meters (the workload is heavy and
interference signals are also amplified).
l Changing a polarization direction (cross-polarization discrimination of 30 dB can be
achieved).
----End
Procedure
Step 1 Checked the historical receive power at the two ends of the link.
Step 2 It was found that the receive power was stable when the link was interrupted. Therefore, the
interruption was not caused by rain fading or multipath fading. Generally, if a link is
interrupted when the receive power is higher than sensitivity, the interruption is caused by
interference.
Step 3 Queried the MSEs of related IF boards. The MSEs were greater than -30 dB, which indicated
that there was interference.
Step 4 Started the frequency scanning function provided with the equipment. No interference was
found around the operating frequency of the link.
Step 5 Configured inloops at IF ports. The MSEs were normal during the inloops. Therefore, the
interruption was not caused by an IF board fault.
Step 6 Replaced the IF cable. The MSEs were still poor. Therefore, the interruption was not caused
by an IF cable failure.
Step 7 Queried the MSEs of adjacent sites, and found that there was interference.
Step 8 Used the spectrum analyzer to scan the intermediate frequency, and found that there was
interference on multiple frequencies near the 140 MHz downstream intermediate frequency.
These frequencies were found to be frequencies used by civil aviation.
Step 9 Replaced the IF cable with a shielded enhanced cable because the intermediate frequency
cannot be changed. Then, the MSEs were improved.
----End
Procedure
Step 1 Analyzed the cause why protection switching did not occur. Checked the 1+1 SD protection
group configurations, and found that reverse switching was enabled and the WTR time of the
1+1 SD protection was set to 10 minutes.
If a member link in a 1+1 protection group fails but no switching occurs, the cause is usually
that the reverse switching timer has not expired.
Step 2 Queried the receive power of the standby ODU of NE A, and found that the receive power
was -40 dBm, which was a normal value. Forcibly switched services to the standby link on
NE A. The services recovered.
Step 3 Reverse switching occurs usually because a fault (such as a hardware fault) occurs in the
transmit part at the source end but the equipment cannot detect the fault. Queried the transmit
power and receive power of the main ODU of NE A. The transmit power was 23 dBm, which
was a normal value; the receive power was still -90 dBm.
Step 4 Replaced the main ODU of NE A and queried the receive power of the main ODU. The
receiver power was still -90 dBm.
Step 5 Replaced the flexible waveguide connected to the main ODU and queried the receive power
of the main ODU. The receive power was still -90 dBm.
Step 6 Checked the connections between the ODUs and antenna, and found a sign of water at the
antenna feed port and then found water accumulated in the antenna feed. The accumulated
water caused a failure to transmit RF signals.
Step 7 Emptied the water in the antenna feed, dried it, and installed it again. Queried the receive
power of the main ODU. The receive power was -35 dBm, which as a normal value.
----End
Procedure
Step 1 Checked the distribution of the NEs on which services were interrupted and the service flow
direction.
NE1704 converged the interrupted services, so the service interruption was related to
NE1704.
Step 2 Checked alarms and operation records on NE1704.
NE1704 reported an MW_CFG_MISMATCH alarm, and the Hybrid radio E1 capacity was
changed on NE1704 right before the services were interrupted. It was inferred that the
services were interrupted due to an E1 capacity mismatch between NE1704 and NE1705.
Step 3 Corrected the Hybrid radio E1 capacity on NE1704.
The fault was rectified.
----End
Procedure
Step 1 Checked the ODU receive power that was recorded during the alarm period.
The difference between the maximum receive power and the minimum receive power was
more than 40 dB, and the minimum receive power was close to or less than the receiver
sensitivity. Therefore, it was inferred that the fault was caused by spatial fading.
----End
l Minimize the impact of multipath fading by using one of the following methods,
depending on the actual conditions:
– Use low capacity, low-order modulation schemes, and low bandwidths.
– Increase the height difference between antennas at both ends providing that line-of-
sight (LOS) is guaranteed.
– Add two antennas and configure an SD protection group.
Procedure
Step 1 Checked historical performance events and the receive power within the period of alarm
reporting.
The receive power was normal. Because the alarms did not persist, loopback tests were
inapplicable. The replacement method could be used for fault locating. The receive end was
suspected faulty. However, it was difficult to replace an ODU. Because the 2+0 links used the
same type of ODUs, the IF cables at site B could be interchanged for fault locating.
Step 2 Interchanged the IF cables at site B and checked for alarms for two days.
NE B-2 still reported service alarms. Therefore, site B was not faulty, and site A was possibly
faulty.
Step 3 Restored the IF cable connections at site B, interchanged the IF cables at site A, and checked
for alarms for two days.
NE B-1 reported service alarms. Therefore, the IF cable connecting NE A-2 and ODU A-2
was faulty.
----End
Fault Symptoms
After an OptiX RTN 600 NE was configured, it operated normally. Its services, however,
were interrupted after it restarted after a power failure.
Procedure
Step 1 Checked alarms.
NOTE
Step 3 Changed the Tx frequency to a correct value based on the network planning information.
The fault was rectified.
----End
Procedure
Step 1 Queried historical receive power values of the radio link.
The receive power decreased to a value close to the receiver sensitivity when an alarm was
reported. Most alarms were reported during the night or in the early morning. When the
weather was favorable at noon, the receive power was normal. Therefore, intermittent radio
link interruptions were caused by multipath fading.
Step 2 Checked annual interruption time predicted for the radio link.
The actual annual interruption time was longer than the predicted time of 1877 seconds.
Therefore, the fading margin was insufficient.
Step 3 Checked the network planning methods.
The ITU-R-P.530-7/8 method was used. The area covered by the radio link was in the Middle
East, and therefore the ITU-R-P.530-9 method should be used.
Step 4 Used the ITU-R-P.530-9 method to predict annual interruption time without changing other
conditions.
The obtained value was about 175833 seconds, which was longer than the value obtained
using the ITU-R-P.530-7/8 method.
According to the preceding analysis, the actual annual interruption time was much longer than
the predicted time because an incorrect multipath algorithm was used in network planning.
Step 5 Planned this link using a correct algorithm and deployed 1+1 SD protection for the link. The
link availability met service requirements.
----End
Procedure
Step 1 Checked historical alarms. It was found that site F reported MW_LOF and MW_BER_SD
alarms, and site E reported an MW_RDI alarm when transient link interruption occurred. This
indicated that a unidirectional link fault occurred.
Step 2 Analyzed link performance curves and found that the waveforms of MSE performance curves
were almost the same for the four IF directions at site F. It was unlikely that the four IF boards
and ODUs were faulty at the same time. It was suspected that a fault occurred during space
propagation.
Step 3 Analyzed the link performance curves and link interruption time. It was found that even slight
decrease of receive power caused link interruption. This indicated that the demodulation
threshold signal level of receiver had degraded, which was often caused by interference.
According to the link performance curve, transient link interruption occurred at night. During
the day, the receive power was stable and no transient link interruption occurred.
Step 4 Applied for a maintenance window, performed frequency scanning when transient link
interruption occurred at night, and found that co-channel interference existed.
Step 5 Checked for interference sources. Because frequencies were strictly managed and the links
spanned over remote areas, it was unlikely that the links were interfered by devices from other
carriers. Therefore, it was necessary to check for interference within the microwave network.
Step 6 Analyzed the network plan. According to the network plan, the A-B and E-F links operated at
the same frequency. In addition, sites A, B, E, and F were almost on a line. No angle was
formed to avoid co-channel interference, resulting in over-reach interference.
Step 7 Changed the frequency at which the E-F link operated. The fault was cleared.
----End
Fault Symptoms
In an XPIC-enabled 4+0 long haul microwave link group, continuous bit errors occurred on a
channel whereas the performance of the other three channels was normal.
Procedure
Step 1 Checked the RSLs of the four channels. All of them met requirements in the network plan.
This indicated that antennas were properly aligned.
Step 2 Checked the link MSE curves. The MSE value for the faulty channel dramatically fluctuated
and the MSE values for the other three channels were stable, which indicated that interference
might exist.
Step 3 Muted the peer RFU on the faulty channel and checked the local RSL value. The local RSL
value was -90 dBm, which indicated that no interference existed and no signal leakage
occurred.
Step 4 Suspected that crossmodulation occurred on the channel. Checked whether elliptic
waveguides and flexible waveguides were properly routed and connected. It was found that an
elliptic waveguide was fixed using angle iron instead of required fixing clamps, resulting in
deformation of the elliptic waveguide.
Step 5 Replaced the deformed elliptic waveguide. The fault was cleared.
----End
l Route elliptic waveguides as designed, ensure that the waveguide bend radius meets
requirements, and use mapping fixing clamps to fix the waveguides, preventing the
waveguides from being deformed.
l Ensure that no copper scale enters a waveguide when making connectors for the
waveguide.
l Properly connect and waterproof waveguides.
Fault Symptoms
Figure 5-6 shows the network topology, where two OptiX PTN 3900s transmitted services to
each other through radio links set up by five OptiX RTN 620s. After a broadcast storm
occurred on the network, NE A became unreachable to its NMS and its converged services
were interrupted.
Procedure
Step 1 Checked for equipment alarms and radio link alarms on the NEs.
No equipment alarm or radio link alarm was found. Therefore, it was suspected that NE data
was incorrectly configured.
Step 2 Checked operation logs of the OptiX RTN 620s on the U2000.
On NE5, a bridge service was configured between the EMS6 board in slot 4 and the EMS6
board in slot 8 when the fault occurred.
Step 3 Checked the cable connection between the four ports and the service configuration data of
NE5.
Port 1 and port 2 on the EMS6 board in slot 4 were respectively connected to port 1 and port 2
on the EMS6 board in slot 8 using network cables. Parameter Hub/Spoke, however, was
incorrectly set for the four ports. As a result, a loop formed among the four ports and packets
were forwarded among the four ports, leading to the broadcast storm. For the cable
connection between the four ports, see the following figure.
----End
Fault Symptoms
Figure 5-7 shows the network topology, where an OptiX RTN 605 1F and an OptiX RTN 620
set up a radio link. Each NE was configured with EPLAN services and connected to a
computer. The NEs did not pass a ping test but did not report an alarm.
Procedure
Step 1 Checked for equipment alarms and radio link alarms on the NEs.
No equipment alarm or radio link alarm was found. Therefore, it was suspected that NE data
was incorrectly configured.
Step 2 Checked the working mode parameters for the IF boards at both ends of the radio link.
The E1 capacity was set to different values, resulting in different bandwidths for data services
and finally service interruptions. No alarm indicating E1 capacity inconsistency was provided.
Step 3 Changed E1 capacities to ensure that both NEs had the same E1 capacity.
----End
Fault Symptoms
A mobile backhaul network was reconstructed to a packet network. The original 2G BTS
services were transmitted through CES E1 services. After services of site A were cut over, the
services became unavailable and the BTS failed to be started.
Procedure
Step 1 Checked the service configuration. To be specific, performed an LSP ping test and PW ping
test to check packet service configuration. The test results showed that the packet service
configuration was correct.
Step 2 Checked the CES service configuration. The frame format of the port on the RTN equipment
was set to CRC4 multiframe. Changed the frame format to unframe and the CES emulation
mode from CESoPSN to SAToP. The BTS services recovered.
Step 3 Communicated with wireless engineers and found that the frame format was set to double
frame for E1 signals of the BTS. Different frame formats caused the service interruption.
----End
Fault Symptoms
Figure1 shows a mobile network, where OptiX RTN 600 V100R003s provided backhaul
transmission. Packet loss occurred when BTS1 at site 1 and BTS2 at site 2 were pinged from
the RNC, but did not occur when BTS3 at site 3 was pinged.
Procedure
Step 1 Suspected that the radio bandwidth between NE 3-002 and NE 3-003 was insufficient,
causing the loss of ping packets.
Step 2 Analyzed the RMON data of NE 3-002 to check whether packet loss was caused by
insufficient radio bandwidth between site 2 and site 3.
The maximum traffic volume of NE 3-001 already reached its maximum air interface
bandwidth (25 Mbit/s). Therefore, packet loss was caused by congestion. For details, see the
following figure.
----End
Procedure
Step 1 Checked IF alarms. The NE did not report IF alarms, indicating that the IF link worked
properly.
Step 2 Queried the ARP entry table. The port failed to learn the IP address of the peer port.
Step 3 Queried tunnel configurations. The next hop IP address was incorrectly configured (not
configured as the IP address of the peer port) for the tunnels.
Step 4 Rectified the next hop IP address for the tunnels. Services were restored, and the port can
properly send packets.
----End
Procedure
Step 1 Ran the traceroute command to check the connectivity of the MPLS tunnel. The tunnel was
interconnected between two transit nodes.
Step 2 Queried ARP entries. The ARP entry mapped to the IP address of the interconnected port on
Transit_B did not exist on Transit_A, resulting in service interruption.
Step 4 Checked port configurations. The tag attributes of interconnected ports were inconsistently
configured (one hybrid, the other tag-aware).
Analyzed VLAN attributes of ARP packets:
l A hybrid port sends ARP packets that do not carry VLANs, and receives ARP packets
with default VLANs or without VLANs.
l A tag-aware port sends ARP packets with default VLANs and receives ARP packets
with default VLANs.
Therefore, a hybrid port can receive and process ARP packets sent from a tag-aware port, but
a tag-aware port cannot receive ARP packets sent from a hybrid port or parse out the
corresponding MAC address.
Step 5 Changed the tag attribute of the peer port to tag-aware. ARP learning was normal and services
were restored.
----End
Procedure
Step 1 Checked whether the NE reported any LAG-related alarm. The NE did not report such an
alarm.
Step 2 Checked LAG configurations at both ends. The RTN NE was configured to work in revertive
mode, whereas the switch was configured to work in non-revertive mode.
After the port on the switch was enabled, the RTN NE reverted to the master port, whereas the
working port of the switch remained the slave port, resulting in service interruption.
Step 3 Configured the switch to work in revertive mode. Services recovered.
----End
Procedure
Step 1 Because 3G service carried by Ring 2 are normal, so check the traffic rates on NE 101 which
are convergence node of Ring 1.
We found that received and transmitted traffic on ports 4/5/6 were 97 Mbit/s, close to
bandwidth of FE, but only port 6 were connected to the switch.
Step 2 Queried the usage of ports 4 and 5.
According to alarms and operation logs, we found that engineers had enabled and performed
loopbacks on ports 4 and 5 earlier.
Step 3 Disabled ports 4 and 5. Services recovered.
----End
Procedure
Step 1 We checked configuration and connection end to end about NodeB 19 and found some
operation was wrong.
l In network planning, services of NodeB19 should be backhaul to CX through gateway
NE11.
l The actual gateway of NodeB19 was NE12.
l In order to match the planning, engineer connected NE 11 and NE12 by Ethernet service
ports, and configured the services of NodeB 19 as the route: NodeB19 -> relay nodes ->
NE12 -> NE11 -> CX.
So there formed an E-LAN loop between NE11, NE12 and CX, and bring broadcast storm
and services intermittent.
Step 2 Remove the Interconnection between NE11 and NE12, modified the planning and re-
configured the services, so that the service of NodeB19 would be backhaul to CX directly
through NE12. And all site services were restored.
----End
Procedure
Step 1 Checked alarms on nodes of Ring 1.
During the fault period, there were no link-related alarms, no hardware alarms, and no
software alarms.
Step 2 Checked ERPS parameter configurations. Guard Time was set to an excessively small value
(10 ms) for Ring 1. If the time for ERPS protocol packets to travel around the ring once is
greater than 10 ms, once the interruption happened on the link of this ring, the ERPS Node
may receive the outdated R-APS (SF) packets, it may cause the ERPS port unblock wrongly,
then the Ring 1 looped, and network storm happened.
Step 3 After Guard Time was changed to the default value (500 ms), the fault was rectified.
----End
Guard Time cannot be set to a value smaller than the time required by ERPS protocol packets
to travel around the ring once.
Fault Symptoms
The radio link in 1+1 protection between NE549 and NE606 became faulty, resulting in a
service interruption. The faulty radio link automatically recovered 5 minutes later.
Procedure
Step 1 Checked historical alarms of the two NEs.
NE549 reported a RADIO_MUTE alarm when the radio link was interrupted.
A command of muting an ODU was executed before the radio link was interrupted. This
misoperation triggered the RADIO_MUTE alarm.
Step 3 Checked the switching state of the 1+1 protection group because a RADIO_MUTE alarm
should have triggered a 1+1 protection switchover.
The 1+1 protection group on NE549 was in the forced switching state and was kept working
on the main channel, so the RADIO_MUTE alarm could not trigger a 1+1 protection
switchover.
NOTE
An NE automatically unmutes its ODU 5 minutes (the default time) after the ODU is muted. This
explains why the radio link between NE549 and NE606 automatically recovered 5 minutes after the link
interruption.
Step 4 Cleared the forced switching state of the 1+1 protection group on NE549 so the protection
group entered the automatic switching state.
----End
Procedure
Step 1 Checked alarms.
Boards in slots 1, 5, 6, and 7 reported the HARD_BAD alarm.
l The PXC board in slot 1 reported a HARD_BAD alarm, whose parameters indicated that
the 38M clock was lost and the analog phase-locked loop (PLL) was unlocked.
l The boards in slots 5, 6, and 7 reported the HARD_BAD alarm, whose parameters
indicated that the 38M clock was lost and the PXC board in slot 1 was faulty. The fault
caused loss of the first 38M clock.
Step 2 Checked the XCP_INDI alarm.
The HARD_BAD alarm reported by the board in slot 1 triggered a switchover, causing the
SCC board to report an XCP_INDI alarm.
Step 3 Replaced the PXC board in slot 1.
The alarms cleared.
----End
interworked with the board in slot 8 on NE B through an air interface. Certain base stations
traced clock signals from NE A and NE B, and the clock signals became abnormal.
Procedure
Step 1 Checked the alarms of the NEs.
l NE A reported HARD_BAD alarms from February 28 to April 11 and SYN_BAD
alarms in May. The value of the first parameter of HARD_BAD alarms was 6, indicating
that the digital phase-locked loop (PLL) was abnormal. The SYN_BAD alarms indicated
that the traced clock source deteriorated.
l NE B reported an RP_LOC alarm, indicating that the clock signals received from the
PLL were lost.
The clocks of the two NEs were abnormal.
Step 2 Analyzed the clock configuration data.
l Checked the system clock source priority list of NE A and the clock source that NE A
was tracing.
– Priority 1: 5-IFH2-(SDH) air-interface link clock
– Priority 2: internal clock source
– Clock source that NE A was tracing: 5-IFH2-(SDH) air-interface link clock
l Checked the clock configuration data of NE B.
Synchronous Ethernet was not enabled for NE B. In that case, the RTN 605 1F traced the
air-interface link clock of RTN 620 by default. If two RTN 605 1F/2F NEs are
interworking, the Tx low NE traces the clock of the Tx high NE by default.
The preceding information showed that NE A traced the link clock from its IF board in slot 5
and NE B traced the link clock from its IF board in slot 8. The two IF boards interworked
with each other through an air interface so the two NEs traced each other's clock. In the case
of clock interlocking, a small frequency deviation is gradually enlarged and finally falls out of
the permitted range.
Step 3 Changed the clock configuration NE A so that NE A traces the link clock from its IF board in
slot 6.
----End
Procedure
Step 1 Analyzed the service signal flow.
The alarmed E1 signal was received from NE2.
Step 2 Checked alarms reported by NE2.
NE2 did not report any hardware alarms or service alarms.
Step 3 Set an inloop at the tributary board (point 1) on NE2, and connected an E1 bit error rate
(BER) tester to point A (third-party SDH equipment).
The service had bit errors.
NOTE
The E1 BER tester was connected to the OptiX OSN equipment and the corresponding E1 cross-
connections were modified, because NE1 had no E1 tributary port.
Step 6 Checked the interconnection configuration data on the OptiX RTN equipment, OptiX OSN
equipment, and third-party SDH equipment.
The preceding equipment used their own clock sources, and the clocks were not synchronized.
Step 7 Enabled the preceding equipment to trace their upstream clocks.
Clock synchronization was achieved across the entire network.
Step 8 Performed tests again using the E1 BER testers.
No bit error occurred.
NOTE
All equipment on an SDH network must trace the reference clock. In the preceding example, the OptiX
RTN equipment, OptiX OSN equipment, and third-party SDH equipment are interconnected through
SDH ports. After E1 services are encapsulated and mapped several times, serious jitter may be generated
and results in bit errors. To resolve similar issues, plan and implement clock solutions when building
SDH or microwave transmission networks.
----End
Procedure
Step 1 Checked cable connection at NodeB 1 because it reported an alarm indicating the loss of
clock signals from an external clock port.
The external clock port on NodeB 1 was a 75-ohm coaxial port and the external clock port on
NE B was a 120-ohm twisted-pair port. To connect the external clock port on NE B to the
external clock port on NodeB 1, an impedance converter box (Balun-box) was installed on the
external clock port of NE B.
The wire connection diagram of the converter box shows that the Tx wire from NE B was
connected to the Rx end of the converter box and the Rx wire from NE B was connected to
the Tx end of the converter box. Cable connection examination showed that the Tx wire from
the converter box was connected to the Rx end of NodeB 1 and the Rx wire from the
converter box was connected to the Tx end of NodeB 1. As a result, the Tx wire from NE B
was connected to the Tx end of NodeB 1 and the Rx wire from NE B was connected to the Rx
end of NodeB 1, so signals were unavailable.
Step 2 Corrected the cable connection. NodeB 1 could trace clock signals normally.
----End
Procedure
Step 1 The server could be pinged, but services were unavailable. This was usually caused packet
loss. Suspected that insufficient radio link capacity caused congestion and consequently
resulted in packet loss.
Step 2 Queried the air interface bandwidth utilization of the radio link and found that no congestion
occurred on the link.
Step 3 Queried the RMON performance statistics of the interconnected Ethernet ports of the RTN
and PTN equipment and found statistics about oversized packets and corrupted packets.
Step 4 Queried the port configurations. The maximum frame length configured for the Ethernet port
of the RTN equipment was 1522 bytes, and that configured for the Ethernet port of the PTN
equipment was 1620 bytes. The maximum frame lengths configured for the interconnected
ports were inconsistent.
Step 5 Changed the maximum frame length to 1620 bytes for the Ethernet port of the RTN
equipment. Users could access services on the server.
----End
Procedure
Step 1 Connected the hub to a PC and used the data service packet sniffer to analyze the OSPF
packets received by NE1.
The designated router (DR) IP addresses in the OSPF packets were xx.xx.xx.1, but the IP
address of the NE that sent the DR packets was xx.xx.xx.2. Therefore, NE1 did not receive
any DD packets sent by the DR elected on the OSPF subnet. As a result, NE1 could not create
an adjacency with the DR and could not learn OSPF routes.
Step 2 Sniffed and analyzed OSPF packets at another OptiX RTN NE that was connected to the
MPLS network and was operating normally.
The OptiX RTN NE received OSPF packets from the DR. Therefore, an OptiX RTN NE fault
was ruled out.
Step 3 Increased the priority of NE1's gateway (IP address: xx.xx.xx.2) so the gateway became the
DR on the subnet.
NE1 learned OSPF routes, and NE2 was reachable to the NMS.
----End
Fault Symptom
During MIMO commissioning, the XPI value cannot be adjusted to be within the range from
18 dB to 22 dB. Figure 5-21 shows the MIMO commissioning result.
Applicable Version
The following adjustment method applies only to RTN 900 V100R011C00SPC210 and later
versions and is not applicable to RTN 300 devices.
Before using the following method to adjust the XPI value of MIMO, obtain the customer's
approval. This method adjusts the transmit power at the transmit end. As a result, the transmit
power is different from the planned value.
Procedure
As shown in Figure 5-22, the XPI and MMI values at the receive end are related to the
transmit power at the transmit end. Taking Rx1 as an example, Table 5-3 and Table 5-4
describe the impact of the adjusted transmit power on the XPI and MMI values at the receive
end when the adjusted H1 and V1 values are the same or not.
Table 5-3 Impact of the adjusted transmit power on the XPI and MMI values at the receive
end (when the adjusted H1 and V1 values are the same)
Item H1 V1
If the transmit power of H1 and V1 increases by 2 dB and the transmit power of H2 and V2
remains unchanged, the XPI and MMI values at the receive end (Rx1) change as follows:
Same-plane XPI 0 0
Different-plane XPI +2 +2
Item H1 V1
MMI (same-polarization +2 +2
power difference)
Table 5-4 Impact of the adjusted transmit power on the XPI and MMI values at the receive
end (when the adjusted H1 and V1 values are different)
Item H1 V1
Same-plane XPI +1 -1
Different-plane XPI +2 +1
MMI (same-polarization +2 +1
power difference)
How to adjust TSL of MIMO.xlsx shows the formulas for adjusting the XPI and MMI values.
Figure 5-23 shows the MIMO commissioning result (after adjustment).
6 A Appendix
0.1 2 3 5 8 6 8 12 10 20 12 15 22 35 65 72
0.0 5 6 9 13 12 15 20 18 28 23 33 40 65 10 96
3 5
0.0 8 12 15 19 22 28 30 32 35 42 60 63 95 14 11
1 5 5
0.0 14 21 26 29 41 54 45 55 45 70 10 95 14 20 14
03 5 0 0 2
0.0 22 32 42 42 70 78 65 83 55 10 15 12 18 25 17
01 0 0 0 0 0 0