Professional Documents
Culture Documents
Espace EMS Troubleshooting Guide (V200R001C02SPC200 - 04) PDF
Espace EMS Troubleshooting Guide (V200R001C02SPC200 - 04) PDF
Espace EMS Troubleshooting Guide (V200R001C02SPC200 - 04) PDF
V200R001C02SPC200
Troubleshooting Guide
Issue 04
Date 2012-06-08
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute the warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Contents
1 Conventions ................................................................................................................................... 1
2 Overview......................................................................................................................................... 2
2.1 Fault Source ..................................................................................................................................................... 2
2.2 Precautions for Troubleshooting ...................................................................................................................... 3
2.3 Requirements on Maintenance Personnel ........................................................................................................ 3
2.4 Troubleshooting Flow ...................................................................................................................................... 4
2.4.1 Troubleshooting Flowchar ...................................................................................................................... 4
2.4.2 Collecting Fault Scenario Information .................................................................................................... 5
2.4.3 Locating and Rectifying Faults ............................................................................................................... 6
2.4.4 Checking Fault Rectification .................................................................................................................. 6
2.4.5 Generating a Fault Rectification Report.................................................................................................. 6
2.4.6 Contacting Huawei .................................................................................................................................. 6
2.5 Obtaining Huawei Technical Support............................................................................................................... 7
5 Troubleshooting .......................................................................................................................... 46
7 Troubleshooting Cases............................................................................................................... 70
7.1 Filesync Exception ......................................................................................................................................... 71
7.2 DataGuard Synchronization Exception .......................................................................................................... 71
7.3 GDR Process Exception ................................................................................................................................. 72
7.4 Modifying Information About the Master Node Corresponding to the Mediation Node After Switching ..... 73
7.5 The Performance Data of Some Network Devices Cannot Be Collected on the eSpace EMS....................... 73
7.6 Fault Rectification About IP PBX Performance Data Collection Status ........................................................ 74
7.7 Fault Rectification in the File System ............................................................................................................ 75
7.8 eSpace EMS Page Is Leftward Offset in IE 8.0 ............................................................................................. 77
7.9 File Download Dialog Box Is Displayed After a Click on the Upload Icon ................................................... 78
7.10 Failure to Export Data .................................................................................................................................. 79
7.11 Browser Page Cannot Be Properly Displayed or Some Browser Functions Are Unavailable ...................... 82
1 Conventions
2 Overview
Having the basic knowledge of network devices, operating systems (OSs), databases,
understanding the common commands, and being skillful in using them to perform
maintenance.
Understanding the logical structure of the eSpace EMSnetworking, the mapping between
the eSpace EMS and on-site devices, and the physical connections between on-site
devices.
Being familiar with the system structure of the eSpace EMS and skillful in operating the
eSpace EMS.
Understanding the basic methods of locating and rectifying faults.
For the system status information possibly related to a fault, see 6 Collecting Fault
Information.
For a fault reported by a customer, the customer service personnel collect the fault scenario information.
For a fault occurs in an alarm or during the routing maintenance, the maintenance personnel collect the
fault scenario information.
You need to collect the detailed information about a device only after identifying the faulty device.
Handling faults
After locating the faulty module, take proper measures to rectify the faults.
It is recommended that a fault rectifying report contain four topics: fault symptom, fault locating, fault
rectifying, and preventive suggestion.
Before contacting Huawei technical support engineers, make sure that the following
information is available:
Full name of the site where a fault occurs
Name and phone number (mobile or fixed-line phone number) of a contact
Fault scenario information and fault details
Remote maintenance environment and parameters for remote access
Access the technical support website of Huawei as anequipment user.Only equipment users or
higher-level users have the permission to access DocumentationandCommunityon the technical
support website of Huawei.Before you access the technical support website of Huawei, register on the
website as equipment user by using the information about Huawei products that you purchased.
How to access Documentation?
Visithttp://support.huawei.com, and then clickDocumentation. On the
Documentationpage, you can download and browse Huawei product manuals, technical
guides, technical cases, precaution notices, and Huawei technical publications.
How to access Community?
Procedure
Step 1 Log in to the eSpace EMS client.
Step 2 On the Topology Management tab page, view the alarms generated on LocalNMS, as shown
in Figure 3-1.
Step 4 Click an alarm to view the detailed information, as shown in Figure 3-3
Step 5 Click View details next to Proposed repair actions: to view the causes and repair
suggestions for the alarm.
----End
Context
The configuration file oms.xml under {install path}/run/config records log levels based on
global configuration. This topic describes how to change a log level online by using
commands. After change, the configuration takes effect immediately. If you restart the system,
log levels are automatically restored based on global configuration.
Log levels include:
DEBUG
INFO
WARN
ERROR
FATAL
Procedure
The omscli.sh command under {install path}/run/bin is used to change a log level.
Perform the following steps to change a log level:
1. Log in to the eSpace EMS server as user i2kuser.
2. Query the current log level.
# cd {install path}/run/bin
# ./omscli.sh log all
3.2.2 Logs
This topic describes how the system collects logs when faults occur in the system.
{install path} is the installation path of the eSpace EMS server. The default path is /opt/oms.
pmthreshold_*.log Performance
threshold
license/*.log License
management logs
other/*.log NE detection, NE
automatic access,
and IP PBX/IAD
backup and
restoration logs
remotesupport/*.log Remote
maintenance logs
sftpclient/*.log Log downloading
logs
tr69/*.log IP
4 Fault Analysis
Fault Location
To rectify a fault that occurs when you obtain the performance data of an NE connected over
SNMP, perform the following steps:
1. Check whether the NE is connected properly.
a. On the eSpace EMS client, choose Resource > Resource Management.
b. Click the Service Applications or Physical Devices tab, as shown in Figure 4-1.
b. Click Select Managed Object. In the Select Managed Object dialog box, set
Object type, Subnets, and Managed Objects, and click OK.
c. On the Historical Data tab page, set Time period and click Search.
− Check using logs
In the med_*.log file in {install path}/run/log/oms/med, check whether there are
performance data reported by the UOA using the OIDs of performance indicators.
If the performance indicators are cumulative ones, you need to check whether there
are calculated performance data using their OIDs in the pmdata_*.log file in {install
path}/run/log/oms/pm.
Implementation Principles
The eSpace EMS compares the performance data with the preset performance index
thresholds in real time. If the performance instant value is greater than the threshold in three
consecutive intervals, the eSpace EMS generates a corresponding performance alarm. The
period of three intervals is a default setting, which can be changed in the relevant
configuration file of the eSpace EMS server.
Through this function, the performance items of the NEs that are monitored by the eSpace
EMS can be monitored in real time.
The performance data of the NEs that are connected to the eSpace EMS through SNMP is
obtained through the performance alarms by the eSpace EMSpm_snmpdataproc module of the
eSpace EMS server.
Figure 4-2 shows the process of generating a performance alarm.
1. The administrator creates a performance alarm on the eSpace EMS client and sets the
performance indicator thresholds.
2. The eSpace EMS client calls the performance alarm interface of the eSpace EMS server,
and transmits the performance alarm parameter information to the eSpace EMS server.
Then the eSpace EMS server saves the specified performance alarm threshold conditions
to the database.
3. After obtaining performance data based on the statistics period, the eSpace EMS server
performs calculation based on the specified thresholds for performance indicators. If the
value of a performance indicator exceeds the specified threshold, an alarm is generated.
4. After the performance indicator falls, the eSpace EMS server obtains the performance
data again based on the statistics period and then performs calculation based on the
specified thresholds. If the value of the performance indicator is less than the specified
threshold, the alarm is cleared.
Fault Location
If an error occurs in the performance alarm, do as follows to locate the fault:
1. Check whether the alarm thresholds are successfully created.
a. On the eSpace EMS client, choose Performance > Template Configuration.
b. Select an NE or a module.
c. Click a measurement unit and check whether the alarm thresholds of a performance
indicator are successfully set.
If yes, perform step 2; if no, contact the NE maintenance personnel.
2. Check whether a performance alarm is generated.
a. On the eSpace EMS client, choose Fault > Current Alarms.
b. In the alarm list, check whether a performance alarm is generated.
If no performance alarm is generated and the value of the performance indicator
exceeds the specified alarm threshold, contact Huawei technical support.
Implementation Principle
Figure 4-3 shows the process of executing an installation or upgrade task.
待安装或
软件管理
升级的目标主机
1.使用Telnet或SSH协议连接目标主机
2.返回登录成功信息
3.执行指令
4.返回指令执行输出信息
5.根据指令执行输出信息判断执行结果
Fault Location
If a fault occurs when an installation or upgrade task is created, locate and rectify the fault as
prompted:
1. Locate a fault based on the log information on the task execution page of the Software
Management.
2. If the following log information is displayed, contact the plug-in maintenance personnel
to locate and rectify the fault:
a. Log in to the Software Management host as the i2kuser user.
b. Access install path/run/log/oms/swm, for example,
/opt/huawei/I2000/run/log/oms/swm/.
c. Refer to the ideploy_ui_*.log file to locate the fault.
Table 4-1 shows examples of logs for executing an installation or upgrade task.
2011-11-23 14:58:29,638 DEBUG [T=4 The log information indicates that the
4973][sun.reflect.Genera tedMethodAcces software management module connects to
sor306.invoke() -1] [SSHTerminal] (conn the host to be installed or upgraded over
ectToServer :211) Make connection to secure shell protocol (SSH).
oamtest2@10.137.97.239 at port 22
2011-11-23 14:58:30,895 DEBUG [T=4 The log information indicates that the
4973][sun.reflect.Genera tedMethodAcces software management module runs the
sor306.invoke() -1] [UnixTerminal] (sen cd;ksh command on the host to be installed
dCommand:18 11) SSHTerminal : execute or upgraded and the timeout period is
command >>> [30000]:cd ; ksh 30,000 ms.
2011-11-23 14:58:31,057 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module successfully
sor306.invoke() -1] [ResultProcessor] ( executes instructions.
setSuccessf ul:846) Match message[ide
ploy:cmd:end] with finish word[ideplo y
:cmd:end]
2011-11-23 14:58:32,058 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module runs
sor306.invoke() -1] [UnixTerminal] (exe modules/backp.sh but there is no return
cuteForward :818) read data error for c value in the timeout period (such as
ommand: /home/see/breeze/ideploy/2011 0 1,500,000 ms).
610170618.498/scripts/ideploy_wrap.sh m To resolve the problem, set Timeout
odules/backup.sh com.huawei.breeze.idep
duration for command execution on the
loy.task.ExecuteTimeoutException: SSHTe
Configure System page under software
rmi nal : Execute command : /home/see
management or contact Huawei technical
/breeze/ideploy/2011061017061 8.498/scr
support.
ipts/ideploy_wrap.sh modules/backup.sh
timeout.[1500000 ms] on host 10.3.4.33(
see)
2011-11-23 14:58:33,026 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module runs the
sor306.invoke() -1] [ResultProcessor] ha_start.sh script in ngin_ha and the return
(processRaw Msg:397) math result met ex value is not zero. You can locate the fault
ception. com.huawei.breeze.ideploy.task based on the output information of the
.ExecuteErrorException: -Command: "/hom script.
e/lgjsee/breeze/ideploy/20110711145056.
24/scripts/ide ploy_wrap.sh ngin_ha/ha_
start.sh" -Catched Key: "ideploy:error
:" -From Message: "iDeploy:Error:FAILED
" at com.huawei.breeze.ideploy.terminal
.ResultProcessor.ma tchiDeployKeyWords(
ResultProcessor.java:1085) at com.huawe
i.breeze.ideploy.terminal.ResultProcess
or.ma tchResult(ResultProcessor.java:57
3) at com.huawei.breeze.ideploy.termina
l.ResultProcessor.pr ocessRawMsg(Result
Processor.java:373) at com.huawei.breez
e.ideploy.terminal.UnixTerminal.proce s
sResult(UnixTerminal.java:738) at com.h
Implementation Principle
Figure 4-4shows the process of checking host information.
11. The Software Management executes the command for exiting the root user.
12. The Software Management executes the command for creating a file and the command
for deleting the created file.
13. The target host sends output to the Software Management.
14. The Software Management executes the FTP or SFTP command to obtain files from the
target host.
15. The target host sends output to the Software Management.
16. The Software Management determines the host information checking result based on the
received output.
Locating Guideline
If a fault occurs when host information is checked, refer to Table 4-2 to locate and rectify the
fault.
4. The eSpace EMS server sends the request for creating a trace task to the eSpace EMS
Mediation node.
The Mediation node and the eSpace EMS server can be deployed on different machines. Typically, the
Mediation node and the eSpace EMS server can be deployed on a machine.
5. The Mediation node verifies and records the trace parameters.
6. The Mediation node sends the request for creating a trace task to the UOA.
7. The UOA verifies the request and asynchronously sends the request to the NE.
8. The UOA sends the success or failure information about task creation to the Mediation
node.
9. The Mediation node updates the trace task status.
10. The Mediation node returns the task creation result to the eSpace EMS server.
11. The eSpace EMS server updates the trace task status.
12. The eSpace EMS server returns the task creation result to the eSpace EMS client.
13. Steps 13 to 20 are the process that the NE asynchronously returns the task creation result.
The Exceeded the maximum number of tracing tasks (5). or The number
of trace tasks exceeded the maximum 40. Message Is Displayed When You
Create a Task.
Cause
A maximum of 40 trace tasks can be created, and a maximum of five trace tasks is
allowed for a single client.
Solution
Delete unnecessary trace tasks.
The Failed to create the trace task. Message Is Displayed When You
Create a Trace Task.
Cause
− No matched module is found.
− The trace agent is not connected successfully.
− The trace task fails to be created because all trace task IDs are used up.
− Exceptions occur in the Master and Mediation services.
Solution
Click View Detail to view related information.
Table 4-3 describes the solutions based on different causes.
A Message Is Reported After You Successfully Create a Trace Task, But the Task
Is Automatically Deleted After It Runs Some Time.
Cause
− All modules of the trace task are disconnected or deregistered.
− The connection between the UOA and Mediation is disconnected.
− The end time of the paused trace task is reached.
Solution
Click View Detail to view related information.
Table 4-4 describes the solutions based on different causes.
An NE reports a tracing message to the UOA. The UOA reports the message to the
Mediation Node. The Mediation Node then sends the message to the eSpace EMS server.
Finally, the eSpace EMS server reports the message to the eSpace EMS client, and the
eSpace EMS client presents the alarm on the GUI.
Solution
1. Ask the NE maintenance personnel to check whether an NE has reported a message.
If no, ask the NE maintenance personnel to locate and rectify faults based on the NE
troubleshooting guide.
2. Check the duoa_trace_agent.log file in UOA installation directory/log for records that
indicate the UOA has received messages from NEs.
If the log file contains the following information, the UOA has received messages from
NEs.
Nov 21 18:44:16 [Debug3] ThreadID:10236 >>>
--------------ReportTraceMsg------------- ModuleCode =
0054040110001 IsSender = 0 TraceCode = 0xff00000e RcvMsgTimeMs = 739
RcvMsgTimeSec = 1321872256 Trac
eProtocol = 0 GeneralIDType = 1 GeneralID = 123
If the preceding information is not displayed, the UOA does not receive the message
reported by the NE. In this case, you can view the log file of the UOA to locate and
troubleshoot the fault. If the fault still persists, contract Huawei technical support.
3. Check the duoa_trace_agent.log file in UOA installation directory/log/debug for
records that indicate the UOA has reported messages to the Mediation Node.
If the log file contains the following information, the UOA has reported message to the
Mediation Node.
The following information indicates that the UOA sends a message to the Mediation
Node whose IP address is 10.138.48.145.
Nov 21 19:20:27 [Debug3] ThreadID:8620 >>> Put message to queue(1)
(destination=10.138.48.145:4308)
, length is 74. Nov 21 19:20:27 [Debug3] ThreadID:9560 >>> Send message to
remote(IP-10.138.48.145:PORT-
4308:HANDLE-1188), message stream: 000000 00 00 00 4A 00 02 00 03-00 00 00 00 FF
00 00 0E ...J..........
.. 000010 00 01 30 30 35 34 30 34-30 31 31 30 30 30 31 00 ..0054040110001. 000020
01 03 31 32 33 4E
CA 33-FB 00 00 01 9B 00 00 00 ..123N.3........ 000030 00 00 0B 46 72 6F 6D 20-53
52 56 4D 61 6E 00 00
...From SRVMan.. 000040 00 08 7C 6C 65 76 65 6C-3D 30 ..|level=0
If the UOA does not send the message to the Mediation Node, check the UOA log file to
locate and rectify the fault. If the fault persists, contact Huawei technical support.
4. Check the log file of the Mediation Node for records indicating that the Mediation Node
has received messages from the UOA.
Log file path: {install path}/run/log/oms/trace/trace_node_*.log
If the log file contains the following information, the Mediation Node has received
messages from the UOA.
2011-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch()
117] Receive message from remote ip = 10.138.48.145, port = 6601 2011-11-21
19:20:27,411 DEBUG [T=124
5][com.huawei.oms.net.trace.uoa. agent.AgentDispatcher.dispatch() 120] Receive
message command id = 3. 2011
-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch() 121] 0000
00 00 00 00 4A 00 02 00 03-00 00 00 00 FF 00 00 0E ...J............ 000010 00 01
30 30 35 34 30 34-30 31 3
1 30 30 30 31 00 ..0054040110001. 000020 01 03 31 32 33 4E CA 33-FB 00 00 01 9B
00 00 00 ..123N.3........
000030 00 00 0B 46 72 6F 6D 20-53 52 56 4D 61 6E 00 00 ...From SRVMan.. 000040
00 08 7C 6C 65 76 65 6C-3D
30 ..|level=0
If the preceding information is not displayed, the Mediation does not receive the message
forwarded by the UOA. In this case, you need to check whether the connection between
the UOA and the Mediation is normal and view the log file of the Mediation to locate
and troubleshoot the fault. If the fault still persists, contact Huawei technical support.
5. In the log file of the eSpace EMS server, check whether the eSpace EMS server receives
the message from the Mediation.
Log file path: {install path}/run/log/oms/trace/trace_app_*.log
If the log file contains the following information, the eSpace EMS server receives the
message from the Mediation Node.
2011-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch()
117] Receive message from remote ip = 10.138.48.145, port = 6601 2011-11-21
19:20:27,411 DEBUG [T=1245][co
m.huawei.oms.net.trace.uoa. agent.AgentDispatcher.dispatch() 120] Receive message
command id = 3. 2011-11-2
1 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch() 121] 000000 00
00 00 4A 00 02 00 03-00 00 00 00 FF 00 00 0E ...J............ 000010 00 01 30 30
35 34 30 34-30 31 31 30
30 30 31 00 ..0054040110001. 000020 01 03 31 32 33 4E CA 33-FB 00 00 01 9B 00 00
00 ..123N.3........ 0000
30 00 00 0B 46 72 6F 6D 20-53 52 56 4D 61 6E 00 00 ...From SRVMan.. 000040 00 08
7C 6C 65 76 65 6C-3D 30
..|level=0
If the preceding information is not displayed, the eSpace EMS server does not receive
the message forwarded by the Mediation. In this case, you need to check whether the
connection between the eSpace EMS server and the Mediation is normal and view the
log files of the app to locate and troubleshoot the fault. If the fault still persists, contact
Huawei technical support.
Unknown Icons Exist in Flowcharts in the Chart Display Area of the iTrace
Client (As Shown in Figure 4-7)
Cause
− An NE registers the GeneralID with the UOA.
− An unidentified module is registered on the UOA, but the module type is not
specified in the resource file.
Principle
After receiving a tracing message, the eSpace EMS client searches for the module type
in the local NE data based on the module code, and draws a tracing flowchart based on
the obtained module type. If the module code does not exist in the local eSpace EMS NE
data, the eSpace EMS client cannot find the module type, and cannot draw an icon that
can be identified. Therefore, the icon is displayed as the module code in the chart display
area. The module code registered with the UOA is a 13-digit number.
Solution
Check whether the unknown module is registered with the UOA by viewing the
$UOA_RUN_ROOT/data/middata/module.datfile on the UOA server.
The first field in every line of this file is a module code. The following are examples:
0054040101001|4040101|SEE_testrptmsg|10.137.97.244|1|1|V100R001C02B121|1111|404
01|SEE_244|o60585
|0054040101001|||soapadapter_100
0054040101002|4040101|SEE_testrptmsg2|10.137.97.244|1|1|V100R001C02B1
21|1111|40401|SEE_244|o60585|0054040101002|||soapadapter_100
− If the module code exists, the module has been registered with the UOA.
− If the module code does not exist, the module is not registered with the UOA.
Because the $UOA_RUN_ROOT/data/middata/module.dat file cannot be
modified manually, contact the NE maintenance personnel to register the module
with the UOA.
Unknown Icons Exist in Flowcharts in the Chart Display Area of the iTrace
Client (As Shown inFigure 4-8)
Cause
− An NE fails to register the GeneralID with the UOA.
− The module GeneralID is not reported.
Principle
The GeneralID of a module is identified in either of the following ways:
− An NE registers the GeneralID with the UOA.
The process of identifying the GeneralID is as follows:
Assume that the module code is 0054040110001, and the GeneralID is
DOID://0A4769E7/00000001/00000002/000000020054100100002.
1. The NE notifies the UOA that the GeneralID of the module 0054040110001 is
DOID://0A4769E7/00000001/00000002/000000020054100100002.
2. When the iTrace server sends a request to the UOA for creating a tracing task, the UOA
reports the GeneralID to the iTrace server.
3. The NE reports the tracing message with the other party information being the module
GeneralID DOID://0A4769E7/00000001/00000002/000000020054100100002.
4. The iTrace server changes the other party information from
DOID://0A4769E7/00000001/00000002/000000020054100100002 to 0054040110001.
5. The iTrace server reports the message with the module code to the iTrace client for
display.
− An NE reports the module GeneralID directly in the additional information about a
message. This method does not require preprocessing of the iTrace server. The iTrace
client directly changes the GeneralID to the module code.
Solution
The fault locating method varies according to the method of identifying the GeneralID.
− An NE registers the GeneralID with the UOA.
The following information indicates that the UOA reports a GeneralID message to the iTrace server. For
example, the message is SynGeneralIDListMsg, the message ID is 0x56, and the GeneralID of the
module 0054030104001 is DOID://0A4769E7/00000001/00000002/000000020054100100001.
Jun 04 09:50:30 [Debug3] ThreadID:1479543712 >>> Put message to queue(0)
(destination=10.137.97.248:5
2241), length is 89. Jun 04 09:50:30 [Debug3] ThreadID:1479543712 >>>
------------- CSynGeneralIDListMsg
------------- m_nTotal_Length = 89 m_sVersion = 2 m_sCommand_ID = 0x56
m_nSequence_ID = 0 m_ucSynT
ype = 0 m_szModuleCode = 0054030104001 Jun 04 09:50:30 [Debug3]
ThreadID:1479543712 >>> m_vModuleGeneral
IDList.size ===== 2 ucGeneralIDType : 1 strGeneralID : 0054030104001
ucGeneralIDType : 0 strGeneralID
: DOID://0A4769E7/00000001/00000002/000000020054100100001
Operation Procedure
Synchronize The administrator triggers the operation of synchronizing
configuration data configuration data on the eSpace EMS client.
1. The eSpace EMS client sends a data synchronization
request to the server.
2. The server obtains the latest configuration data from NEs.
3. The server synchronizes the latest data to the eSpace EMS
database and returns a synchronization result to the eSpace
EMS client.
4. The eSpace EMS client shows the synchronization result.
Add, delete, or modify The administrator adds, deletes, or modifies configuration
configuration items items on the eSpace EMS client.
1. The eSpace EMS client sends a request for pre-editing
configuration data to the server.
2. The server submits the data to NEs.
3. The eSpace EMS client shows the data submission result.
Fault Location
If an error occurs when you perform configuration management operations, you can locate the
fault using either of the following two methods:
1. Locate a fault based on the error message provided by the eSpace EMS client.
2. Locate a fault based on the error log and analysis for the configuration management
implementation process.
Typically, you can locate a fault based on the error message provided by the eSpace EMS
client. If you cannot find the fault cause based on the error message, you can check the error
log and configuration management implementation process. The error log information is as
follows:
Log file name: cm_[TIMESTAMP].log
Log file path: {Install path/run/log/oms/cm}
The following is a sample of a piece of complete log information:
2011-11-15 09:50:13,248 DEBUG
[T=205][com.huawei.oms.cm.as.support.ExtensionActivator.start() 43]
ExtensionActivator is starting.
Information Description
Typically, the production machine of the eSpace EMS provides services. If the production
machine is faulty, a manual switchover is performed to switch services from the production
machine to the redundancy machine.
The GDR software synchronizes data between the production machine and the redundancy
machine and performs resource management for application services.
DRService: primary process of the GDR software. The process exists on both the
production machine and the redundancy machine. The DRService process of the
production machine monitors the status of the replication link and helps the DRService
process of the redundancy machine to perform a failover. The DRService process of the
redundancy machine monitors the GDR software and prepares the application services
and database for a switchover or failover.
DRAgent: agent of the GDR software. The DRAgent exists only on the redundancy
machine and is responsible for preparing the application programs for a failover.
Disaster recovery command line interface (DRCLi): operation mode of the GDR
software. The DRCLi encapsulates data synchronization commands at the bottom layer
to provide users with simple command interfaces and provide the DRService process
with the unified management interface, command interface, and message interface.
Filesync: file synchronization tool of the GDR software. The tool can run only on the
production machine and is responsible for synchronizing files from the production
machine to the redundancy machine.
DataGuard: component of the Oracle database. The component is responsible for
replicating data of the database.
If the update time of a file changes, it is considered that the file has been modified. If the file
contents are not actually modified, the file is also synchronized from the production machine to the
redundancy machine.
In the configuration file of the GDR software, you need to configure information such as the files or
directories to be synchronized and the synchronization type.
The files to be synchronized are listed as follows:
− {install path}/run/repository
− {install path}/run/hedex
− {install path}/run/plugins
− {install path}/run/pickup
− {install path}/run/dump
− {install path}/run/data
Oracle database synchronization based on the DataGuard:
Figure 4-11 shows the synchronization process.
Switching
Switching is classified into switchover and failover:
Switchover: You perform a switchover only when the production machine is running
properly. A switchover is triggered for test during installation, debugging, or routine
maintenance.
Failover: You need to perform a switchover when the production machine is faulty.
In switchover mode, you need to stop the service of the production machine and then start the service of
the redundancy machine. In failover mode, however, you need only to start the service of the
redundancy machine.
Switchover 1. You run the drcli command on the redundancy machine to trigger
a switchover.
2. After receiving the switchover request, the DRService of the
redundancy machine notifies the DRAgent to prepare for the
switchover. If the production machine is running properly, the
DRService of the redundancy machine also notifies the DRService
of the production machine to prepare for the switchover.
3. The DRAgent starts the eSpace EMS service of the redundancy
machine.
4. The DRService of the production machine stops the eSpace EMS
service of the production machine.
5. The DRService of the production machine stops data replication
from the original production machine to the original redundancy
machine.
6. On the redundancy machine, you run the command for role
switching to start data synchronization from the current production
machine to the redundancy machine.
Failover 1. When the production machine is faulty, you run the DRCLI
command on the redundancy machine to trigger a failover.
2. After receiving the failover request, the DRService of the
redundancy machine notifies the DRAgent to prepare for the
failover.
3. The DRAgent starts the eSpace EMS service of the redundancy
machine.
4. You repair the original production machine. After repair, the data
is synchronized from the current production machine to the current
redundancy machine.
Fault Location
When a fault occurs in the GDR system, you can locate the fault based on logs.
The GDR software logs the operating information about the GDR system. Common log files
are listed as follows:
drcli.log in /opt/huawei/gdr/log: operating logs of the drcli command
filesync_sh.log in /opt/huawei/gdr/log: operating logs about data replication
filesync.log in /opt/huawei/gdr/log: operating logs of the Filesync process
drservice.log in /opt/huawei/gdr/log: operating logs of the DRService process
5 Troubleshooting
Command Syntax
./omsd.sh start
Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install path}/run/bin.
> cd {install path}/run/bin
3. Start the eSpace EMS.
> ./omsd.sh start
Output Example
Command Syntax
./omscli.sh checkstate process
Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install patch}/run/bin.
> cd {install path}/run/bin
3. Query the eSpace EMS service status.
> ./omscli.sh checkstate process
Output Example
System process already started.
Command Syntax
./omsd.sh stop
Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install path}/run/bin.
> cd {install path}/run/bin
3. Stop the eSpace EMS.
> ./omsd.sh stop
Output Example
Procedure
Step 1 Start the GDR software of the production machine.
> drservice -c
> p
----End
Procedure
Step 1 Log in to the production machine and redundancy machine by using the gdr account that is
also used in installation of the GDR software.
Step 2 Check the process status of the GDR software on the production machine.
> p
If the information contains the drservice and filesync processes, the GDR software is running
properly on the production machine.
Step 3 Check the process status of the GDR software on the redundancy machine.
> p
If the information contains the drservice and dragent processes, the GDR software is running
properly on the redundancy machine.
The filesync process exists on the redundancy machine, but the dragent process does not exist on the
redundancy machine.
----End
Check Result
Check that the GDR software is running properly on both the production machine and
redundancy machine.
Exception Handling
If a GDR process is abnormal or not running, check the following logs:
− {GDRWORKDIR}/log/drcli.log
− {GDRWORKDIR}/log/drservice.log
− {GDRWORKDIR}/log/filesync.log
− {GDRWORKDIR}/log/filesync_sh.log
If the drservice or dragent process is abnormal, stop the process and then start the
drservice process.
For details, see Stopping the GDR Software and Starting the GDR Software.
Context
Procedure
Step 1 Log in to the redundancy machine as DR user gdr.
Step 2 Check the DR resource states.
> drcli -c drstate -l
RG STATE
Group State DRState
RG.1 PostOnline Normal
RESOURCE STATE
Group Resource ID Type State
RG.1 1001 -- Offline
RG.1 100101 App(i2000) Offline
RG.1 100102 DB(ORACLE) Offline
----End
Check Result
Check that all resources on the redundancy machine is in the offline state.
Procedure
Step 1 Log in to the production machine by using the gdr account that is also used in installation of
the GDR software.
Step 2 Check the database synchronization status.
> drcli -c checkrep ResID100101
RepType: DataGuard
DBName : I2KDB
RlinkName : omsdb[omsdb]_dr_omsdb
Log_Dest_Status : Connected
Time_Computed : None
TransportLag : None
ApplyLag : None
EstimatedOpenTime : None
RealTimeApply : None
MRP0Status : None
OracleDBStatus : READ WRITE
Step 3 Log in to the redundancy machine by using the gdr account that is also used in installation of
the GDR software.
Step 4 Check the database synchronization status.
> drcli -c checkrep ResID100101
Example: > drcli -c checkrep 100102
RepType: DataGuard
DBName : I2KDB
RlinkName : omsdb_dr_omsdb[omsdb]
Log_Dest_Status : Connected
Time_Computed : 26-JAN-2010 13:43:54
TransportLag : +00 00:00:00
ApplyLag : +00 00:00:03
EstimatedOpenTime : 13(S)
RealTimeApply : ON
MRP0Status : APPLYING_LOG
OracleDBStatus : READ ONLY
----End
Check Result
If the preceding information in bold is displayed, database synchronization is normal.
Procedure
Step 1 Log in to the production machine by using the gdr account that is also used in installation of
the GDR software.
Step 2 Check whether the Filesync process is running properly.
> p
Check Result
The Filesync process is running properly.
The latest records in the log file contain error or failed.
Procedure
Step 1 Log in to the production machine as the DR user gdr.
Step 2 Run the drcli -s switchovercheck command to check whether the running status of the DR
environment is normal.
The command is used to check the following information:
Data replication status of a database resource
Status of the file synchronization through the file synchronization tool
Running status of the DR software GDR
Status of the eSpace EMS key information
After the command is run, the system writes the following execution result into the
switchcheck.prt file in /opt/huawei/GDR/log:
*****************************************************************************
Wed Dec 2 10:11:40 CST 2009
*****************************************************************************
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[SUCCESS] All check finished. Status are all Normal.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
----End
Context
When you stop the GDR software on the production machine, the drservice and filesync
processes are stopped.
When you stop the GDR software on the redundancy machine, the drservice and dragnet
processes are stopped.
Procedure
Step 1 Log in to the production or redundancy machine as user gdr.
Step 2 Run the following command:
> drcli -s stop
----End
6.1 OS Information
This topic describes the common commands used during OS information.
Table 6-1lists the information about the Linux OS that needs to be collected and the command
used for collecting the information.
Figure 6-1 shows the status of the indicators on Huawei Quidway S5600.
No. Description
1 Status indicators of twenty-four 10/100/1000Base-T
auto-negotiation Ethernet ports
2 Indicators of gigabit SFP combo ports
3 Fabric indicator
4 RPS indicator
No. Description
5 Power indicator
6 Module indicator
7 Indicator of port mode switch
8 Mode switch button for port status indicators
9 Seven-segment LED display
10 Console
6.3 DR Information
This topic describes the common information that is collected when a DR fault occurs.
Table 6-5 describes the information that needs to be collected when a DR fault occurs and the
corresponding commands.
Table 6-5 Information that needs to be collected and the corresponding commands
Table 6-6 Information to be collected from an Oracle database and commands for collecting the
information
4 Initializat > cd Save the files that are located in the path and
ion files $ORACLE_HOME/db submit them to Huawei technical support
of the s engineers.
database The command must be run by the oracle
user.
NOTE
$ORACLE_BASE indicates ORACLE_BASE set
in the environment variables by the oracle user.
5 Database > sqlplus / as sysdba Submit the query result to Huawei technical
version support engineers.
SQL> select banner
from sys.v_$version; The command must be run by the oracle
user.
6 Memory # top>file2.txt Run the top command as the root user and
usage of submit the result to Huawei technical
the support engineers.
database
server
7 Database > exp Back up the data exported from the
data system/password@i2kd database.
export b buffer=8092 full=y The command must be run by the oracle
inctype=complete user.
file=backup.dmp
NOTE
The field password indicates the password of the
system user. The value varies according to actual
situation.
The field i2kdb indicates the instance name of the
eSpace EMSdatabase.
8 Port > more Obtain the value of PORT in the file, and
number, $ORACLE_HOME/net then submit the result to Huawei technical
IP work/admin/listener.or support engineers.
address, a i2kdb = (DESCRIPTION_LIST = (DESCRIPTION
and host = (ADDRESS = (PROTOCOL = TCP)(HOST =
name of i2ksvr-1)(PORT = 1521)) ) )
the
current
database. The command must be run by the oracle
user.
9 Name of > sqlplus / as sysdba Submit the query result to Huawei technical
the support engineers.
SQL> select
current The command must be run by the oracle
DB_UNIQUE_NAME
database user.
from v$database;
instance.
10 Character > sqlplus / as sysdba Submit the query result to Huawei technical
set used support engineers.
SQL> show parameter
in the The command must be run by the oracle
nls_language
current user.
database.
{install path} is the installation path of the eSpace EMS server. The default path is /opt/oms.
pmthreshold_*.log Performance
threshold
management logs
license/*.log License
management logs
other/*.log NE detection, NE
automatic access,
and IP PBX/IAD
backup and
restoration logs
remotesupport/*.log Remote
maintenance logs
tr69/*.log IP
Phone/SBC/EGW
NE access and
service logs
7 Troubleshooting Cases
Symptom
The drcli -s switchovercheck command fails after file synchronization is stopped or when the
Filesync process is synchronizing files.
Solution
This exception occurs because the switching check fails after file synchronization is stopped
or when the Filesync process is synchronizing files. You can perform the following steps to
resume file synchronization:
If file synchronization is stopped, run drcli -f resume on the production machine to start
file synchronization. After file synchronization, run switchovercheck.
If the Filesync process is synchronizing files, run drcli -f fullrep -l on the production
machine to start lightweight synchronization. After file synchronization, run
switchovercheck.
Symptom
During a switchover or failover, the message DB synchronization has
disconnected is displayed. When users check the database synchronization status on the
production machine or redundancy machine, the value of Log_Dest_Status is Disconnected.
Solution
If the value of Log_Dest_Status is Disconnected on the production machine:
1. Log in to the production machine as user oracle and run the following command:
> sqlplus / as sysdba
2. Check the status of LOG_ARCHIVE_DEST_2.
> select dest_name,status from v$archive_dest_status where dest_id=2;
DEST_NAME STATUS
LOG_ARCHIVE_DEST_2 ERROR
Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=float_ip_rep1)(PORT=1522)))
TNS-12541: TNS:no listener TNS-12560: TNS:protocol adapter error
TNS-00511: No listener Linux Error: 111: Connection refused
Symptom
The GDR process is being restarted or not running.
Solution
If the GDR process on the production machine is running properly or not running, collect
the configuration file and log file from {GDRWORKDIR}/config and
{GDRWORKDIR}/log respectively on the production machine. Then contact Huawei
technical support.
If the GDR process on the redundancy machine is running properly or not running,
collect the files from {GDRWORKDIR}/config, {GDRWORKDIR}/config/i2000, and
{GDRWORKDIR}/log respectively on the redundancy machine. Then contact Huawei
technical support.
Procedure
Step 1 Log in to the Mediation node as user i2kuser.
Step 2 Modify the configuration file.
> vi {install path}/run/config/oms.xml
<config name="med"> <config name="center"> <param name="serverPort">31006</param>
<param name="transportPackets">19998</param> </config> <config name="node"> <param
name="nodeId">Mediation_Masterself</param> <param
name="centerIP">10.85.172.90</param> <param name="nodeIP">0.0.0.0</param> <param
name="centerPort">31006</param> <param name="localPort">31007</param> </config>
Change the value of centerIP to the IP address of the current production machine.
Step 3 Restart the Mediation service.
> cd {install path}/run/bin
> ./omsd.sh restart
----End
Symptom
The performance data of some network devices is not displayed in the performance
monitoring view and cannot be found in historical data.
Cause Analysis
The eSpace EMS obtains the performance data of network devices by running the SNMP Get
command. However, the SNMP access is disabled on the network devices for security reasons.
Therefore, the eSpace EMS cannot obtain the performance data by running the SNMP Get
command.
Solution
You need to grant SNMP access rights to the eSpace EMS server. In the disaster recovery
networking, you need to grant SNMP access rights to the production machine and the
redundancy machine.
For details, contact the device maintenance personnel.
Symptom
In the Monitoring Configuration window, the collection status is Abnormal.
In the Monitoring View window, no performance data in the latest several data
collection periods is displayed.
Possible Causes
The connection between the IP PBX and the eSpace EMS is abnormal.
The IP PBX is upgrading or has been upgraded.
The IP PBX is restarting or has been restarted.
Boards on the IP PBX are restarting or have been restarted.
The active/standby board switchover is being performed or has been performed on the IP
PBX.
Procedure
Step 1 Verify that the connection between the IP PBX and eSpace EMS is normal. If the connection
is abnormal, connect the IP PBX to the eSpace EMS correctly.
Step 2 In system operation logs, check whether any user has upgraded the IP PBX in the day when
exceptions occur.
1. Choose System > Log Management from the main menu.
2. Choose Query Logs > Operation Logs from the navigation tree on the left.
3. In the operation log list, check whether any user has upgraded the IP PBX in the day
when exceptions occur.
If no, go to Step 3.
If yes, go to Step 4.
Step 3 View the IP PBX operation logs and check whether any user restarts the IP PBX, restarts
boards, or perform the active/standby board switchover.
1. Choose Resource > Resource Management from the main menu.
2. In the Operation column of the device list, click .
The XXX Management window is displayed. In the window name, XXX indicates an
NE name.
3. Choose Manage Service > Operation Log from the navigation tree on the left.
4. In the operation log list, check whether any user restarts the IP PBX, restarts boards, or
perform active/standby board switchover.
Step 4 Restart the performance monitoring task.
1. Choose Performance > Monitoring Configuration from the main menu.
2. Select the performance counter whose collection status is Abnormal, click Stop, and
click Start.
If the collection status is changed to Normal, the fault is rectified.
If the collection status is still Abnormal, contact Huawei technical support engineers.
----End
Procedure
Do not run the fsck command in the file system that has been mounted. Otherwise, data is
lost.
The shared disk cannot be used by other devices.
The value on indicates that the file system is mounted. Run the umount command to unmount the file
system.
Step 2 Run the fsck -y command to check and restore the file system.
fsck -y /dev/cciss/c0d0p2
# fsck -y /dev/cciss/c0d0p2
fsck 1.38 (30-Jun-2005)
To check and restore the VxFS file system, run the fsck.vxfs command.
# fsck.vxfs -y /dev/sdb1
If the system displays the message "passed", the checking and restoration complete.
After restarting, you can access the file system.
If the restoration fails, the file system is damaged. Go to Step 3.
Step 3 Run the following command as prompted:
# fsck.reiserfs --rebuild-tree -y /dev/vgscp/lvscp
Step 4 If the file system cannot be restored, re-create a file system and use the backup data.
----End
Subsequent Processing
After restoration, check whether the file system status is normal.
Step 1 Run the tune2fs to check the ext2 or ext3 file system status before mounting the file system.
# tune2fs -l device name |grep state
# tune2fs -l /dev/sdb2 |grep state
Filesystem state: clean
If clean is displayed in the checking result, you do not need to perform further operations. Otherwise, go
to Step 2.
Problem
When using the IE 8.0 to download import templates in batches, the eSpace EMS page is
leftward offset, as shown in Figure 7-1.
Cause
The Internet Explorer is not a standard Internet Explorer 8.0, but Internet Explorer 8.0
Compatibility View.
Troubleshooting
1. Choose Tools > Developer Tools from the menu bar of the Internet Explorer.
The Developer Tools window is displayed.
2. Choose Browser Mode > Internet Explorer 8.0 from the menu bar, as shown in Figure
7-2.
After the settings are complete, the eSpace EMS page is displayed normally.
Problem
Step 1 Click next to Resource file to import on the batch import page, and select an Excel file.
Step 2 Click . The File Download dialog box is displayed, as shown in Figure 7-3.
----End
Cause
The selected Excel file does not match the template. For example, this problem occurs if
you select an IAD template on the Import IP PBX page.
Solution
Step 1 Close the File Download dialog box and the NE Management tab page.
Step 2 Click My Computer, and choose Tools > Folder Options from the main menu on the
displayed My Computer page.
Step 3 Click the File Types tab.
Step 4 Select extension ACTION, and click Delete.
----End
Possible Causes
The automatic prompt function for downloading files is disabled.
Procedure
Step 1 Start the Internet Explorer.
Step 2 Choose Tools > Internet Options > Security > Custom Level from the main menu.
Step 3 Click Enable in Automatic prompting for file downloads under Downloads.
Possible Causes
The browsing history is not cleared.
Procedure
Step 1 Clear the browsing history.
Internet Explorer 8.0
1. Choose Tools > Internet Options from the main menu.
2. Click the General tab and click Delete.
3. Click Private Data. In the displayed dialog box, click Clear Now.
----End