Espace EMS Troubleshooting Guide (V200R001C02SPC200 - 04) PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 91

eSpace EMS

V200R001C02SPC200
Troubleshooting Guide

Issue 04

Date 2012-06-08

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2012. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute the warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: http://www.huawei.com
Email: support@huawei.com

Huawei Proprietary and Confidential


Issue 04 (2012-06-08) i
Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management Contents

Contents

1 Conventions ................................................................................................................................... 1
2 Overview......................................................................................................................................... 2
2.1 Fault Source ..................................................................................................................................................... 2
2.2 Precautions for Troubleshooting ...................................................................................................................... 3
2.3 Requirements on Maintenance Personnel ........................................................................................................ 3
2.4 Troubleshooting Flow ...................................................................................................................................... 4
2.4.1 Troubleshooting Flowchar ...................................................................................................................... 4
2.4.2 Collecting Fault Scenario Information .................................................................................................... 5
2.4.3 Locating and Rectifying Faults ............................................................................................................... 6
2.4.4 Checking Fault Rectification .................................................................................................................. 6
2.4.5 Generating a Fault Rectification Report.................................................................................................. 6
2.4.6 Contacting Huawei .................................................................................................................................. 6
2.5 Obtaining Huawei Technical Support............................................................................................................... 7

3 Methods of Locating Faults ......................................................................................................... 9


3.1 Viewing Alarms on the eSpace EMS Client ..................................................................................................... 9
3.2 Log Analysis................................................................................................................................................... 11
3.2.1 Changing a Log Level ........................................................................................................................... 11
3.2.2 Logs ...................................................................................................................................................... 13

4 Fault Analysis .............................................................................................................................. 19


4.1 Performance Fault Analysis............................................................................................................................ 19
4.1.1 Performance Statistics ........................................................................................................................... 20
4.1.2 Performance Alarms .............................................................................................................................. 21
4.2 Software Management Fault Analysis ............................................................................................................ 22
4.2.1 Executing an Installation or Upgrade Task ........................................................................................... 22
4.2.2 Checking Host Information ................................................................................................................... 25
4.3 iTrace Analysis ............................................................................................................................................... 27
4.3.1 Creating a Tracing Task ........................................................................................................................ 28
4.3.2 Displaying Tracing Messages ............................................................................................................... 32
4.4 iCnfg Analysis ................................................................................................................................................ 38
4.5 DR Fault Analysis .......................................................................................................................................... 40

5 Troubleshooting .......................................................................................................................... 46

Issue 04 (2012-06-08) Huawei Proprietary and Confidential ii


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management Contents

5.1 Checking the Running Status of the eSpace EMS .......................................................................................... 46


5.1.1 Starting the eSpace EMS Service .......................................................................................................... 46
5.1.2 Querying the eSpace EMS Service Status ............................................................................................. 47
5.1.3 Stopping the eSpace EMS Service ........................................................................................................ 47
5.2 Checking the Running Status of the DR System ............................................................................................ 48
5.2.1 Starting the GDR Software ................................................................................................................... 48
5.2.2 Checking the Process Status of the GDR Software ............................................................................... 49
5.2.3 Checking the States of DR Resources ................................................................................................... 50
5.2.4 Checking the Database Synchronization Status .................................................................................... 51
5.2.5 Checking the File Synchronization Status............................................................................................. 52
5.2.6 Checking the Statuses of the Switched Roles of the DR System .......................................................... 53
5.2.7 Stopping the GDR Software .................................................................................................................. 54

6 Collecting Fault Information .................................................................................................... 55


6.1 OS Information............................................................................................................................................... 55
6.2 Network Device Information.......................................................................................................................... 56
6.3 DR Information .............................................................................................................................................. 59
6.4 Oracle Database Information.......................................................................................................................... 61
6.5 Collecting Logs .............................................................................................................................................. 63
6.6 Version Information........................................................................................................................................ 69

7 Troubleshooting Cases............................................................................................................... 70
7.1 Filesync Exception ......................................................................................................................................... 71
7.2 DataGuard Synchronization Exception .......................................................................................................... 71
7.3 GDR Process Exception ................................................................................................................................. 72
7.4 Modifying Information About the Master Node Corresponding to the Mediation Node After Switching ..... 73
7.5 The Performance Data of Some Network Devices Cannot Be Collected on the eSpace EMS....................... 73
7.6 Fault Rectification About IP PBX Performance Data Collection Status ........................................................ 74
7.7 Fault Rectification in the File System ............................................................................................................ 75
7.8 eSpace EMS Page Is Leftward Offset in IE 8.0 ............................................................................................. 77
7.9 File Download Dialog Box Is Displayed After a Click on the Upload Icon ................................................... 78
7.10 Failure to Export Data .................................................................................................................................. 79
7.11 Browser Page Cannot Be Properly Displayed or Some Browser Functions Are Unavailable ...................... 82

Issue 04 (2012-06-08) Huawei Proprietary and Confidential iii


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 1 Conventions

1 Conventions

This topic describes conventions of this guide.


 The user name of the eSpace EMS is i2kuser.
 {Install Path} is the installation path of the eSpace EMS. The default path is /opt/oms.
 {GDRWORKDIR} is the GDR installation path. The default path is /opt/oms/gdr.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 1


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

2 Overview

About This Chapter


This topic helps maintenance personnel to locate and rectify faults.
2.1 Fault Source
This topic describes the fault sources that trigger fault handling activities, and the jobs of
responsible persons before they submit faults to maintenance personnel.
2.2 Precautions for Troubleshooting
Maintenance personnel must take the relevant precautions before locating and rectifying faults,
ensuring the safety of the personnel, services, and devices, including significant and
dangerous operations.
2.3 Requirements on Maintenance Personnel
This topic describes the requirements for the qualifications of maintenance personnel.
2.4 Troubleshooting Flow
This topic describes the general process of rectifying faults and the operations in each step.
2.5 Obtaining Huawei Technical Support
This topic describes how to obtain technical support from Huawei.

2.1 Fault Source


This topic describes the fault sources that trigger fault handling activities, and the jobs of
responsible persons before they submit faults to maintenance personnel.
The fault sources are as follows:
 Customer complains
The customer service department receives customer complaints and starts the fault
rectifying process. The department filters out non-defect events, collects fault scenario
information, and transfers faults to maintenance personnel.
 Routine maintenance

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 2


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

In routine maintenance, maintenance personnel regularly take preventive measures


during the normal running of devices to detect and eliminate hidden faults in the devices
in time.
The routine maintenance of the eSpace EMS includes but is not limited to the following:
− Check whether the services on the eSpace EMS server run normally.
− Check whether the database runs normally.
− Check whether the performance indicators of servers and services meet requirements.
For more information about routine maintenance, see theRoutine Maintenance.

2.2 Precautions for Troubleshooting


Maintenance personnel must take the relevant precautions before locating and rectifying faults,
ensuring the safety of the personnel, services, and devices, including significant and
dangerous operations.
Before locating and rectifying faults, maintenance personnel must:
 Strictly comply with the operation and industry safety regulations to ensure the safety of
personnel and devices.
 Take antistatic measures such as wearing an ESD-preventive wrist strap when replacing
and maintaining device parts.
 Not directly connect external computers to the eSpace EMS.
 Strictly control the use of network services.
 Record all relevant raw information in detail when any problem arises during
maintenance.
 Record all significant operations, such as restarting processes. Before these operations,
check the feasibility of the operations, back up data, prepare emergency and safety
measures, and make sure that operations are performed by qualified operators.
 Be cautions when performing the following dangerous operations:
− Deleting directories and files from the eSpace EMS
− Modifying the configuration files of the database
− Modifying the attributes of the database
− Deleting the log files from the systems and database
− Stopping the systems, processes, and database
− Running the kill command
− Modifying the configurations of the network devices

2.3 Requirements on Maintenance Personnel


This topic describes the requirements for the qualifications of maintenance personnel.
To ensure effective maintenance, maintenance personnel are required to have the basic
knowledge of networks and computers, be clear about the service processes of the eSpace
EMS, skillful in locating and rectifying faults, and familiar with the on-site environment.
Thus maintenance personnel must meet the following requirements:

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 3


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

 Having the basic knowledge of network devices, operating systems (OSs), databases,
understanding the common commands, and being skillful in using them to perform
maintenance.
 Understanding the logical structure of the eSpace EMSnetworking, the mapping between
the eSpace EMS and on-site devices, and the physical connections between on-site
devices.
 Being familiar with the system structure of the eSpace EMS and skillful in operating the
eSpace EMS.
 Understanding the basic methods of locating and rectifying faults.

2.4 Troubleshooting Flow


This topic describes the general process of rectifying faults and the operations in each step.
TheeSpace EMSis complicated, resulting in the complication of theeSpace
EMStroubleshooting. In addition, theeSpace EMSinvolves multiple network elements
(NEs).Therefore, you need to be familiar with the following points for the troubleshooting:
eSpace EMS networking, interaction between theeSpace EMS and the superior eSpace EMS,
and interaction between theeSpace EMSand the NEs.
According to the statistics, a fault has only one source in most cases instead of multiple
sources. Thus, it is important for you to locate the source of a fault before rectifying the fault.

2.4.1 Troubleshooting Flowchar


This topic describes the general process of handling faults.
Figure 2-1shows the general process of handling faults in the eSpace EMS system.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 4


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

Figure 2-1 Figure 1 Process of handling faults

2.4.2 Collecting Fault Scenario Information


Collecting fault scenario information helps to quickly located faults. This topic describes the
important information about fault scenarios to be collected.
When a fault occurs, the scenario information about the fault must be collected immediately.
The information includes but is not limited to the following:
 Fault occurring time and place
 Detailed description of the fault symptom
 Operations performed before the fault occurs
 Measures taken after the fault occurs and the result
 Affected services and scope of the impact

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 5


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

 For the system status information possibly related to a fault, see 6 Collecting Fault
Information.

For a fault reported by a customer, the customer service personnel collect the fault scenario information.
For a fault occurs in an alarm or during the routing maintenance, the maintenance personnel collect the
fault scenario information.

2.4.3 Locating and Rectifying Faults


This topic describes the following operations:
 Locating faults
Fault locating involves two levels: component and module.
− Component level: Narrow the fault source to a device, such as a database.
− Module level: Locate the faulty module, such as the listening port of a database, after
identifying the faulty device.
For the common methods of locating faults, see 3 Methods of Locating Faults.
 Collecting fault information
After identifying the faulty device, collect the details about the device, including the
version number, logs, error codes, alarms, and memory information.
For how to collect fault information, see6 Collecting Fault Information.

You need to collect the detailed information about a device only after identifying the faulty device.
 Handling faults
After locating the faulty module, take proper measures to rectify the faults.

2.4.4 Checking Fault Rectification


This topic aims at determining whether the faults are correctly located and handled.
After taking measures to rectify faults, check whether the faults are rectified.

2.4.5 Generating a Fault Rectification Report


Fault rectifying reports contain the information of the same types to help future maintenance
and fault locating.
After confirming that a fault is rectified, record the fault rectifying process and produce a
report.

It is recommended that a fault rectifying report contain four topics: fault symptom, fault locating, fault
rectifying, and preventive suggestion.

2.4.6 Contacting Huawei


If you fail to rectify a fault after using the methods of locating and rectifying faults described
in this document, contact Huawei technical support engineers for remote or on-site assistance
in rectifying the fault.
For how to obtain technical support from Huawei, see2.5 Obtaining Huawei Technical
Support.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 6


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

Before contacting Huawei technical support engineers, make sure that the following
information is available:
 Full name of the site where a fault occurs
 Name and phone number (mobile or fixed-line phone number) of a contact
 Fault scenario information and fault details
 Remote maintenance environment and parameters for remote access

2.5 Obtaining Huawei Technical Support


This topic describes how to obtain technical support from Huawei.
You can obtain technical support from Huawei through the Internet or by phone. See Table
2-1

Table 2-1 Table 1 Methods of obtaining technical support from Huawei

Method Operation Instruction


Dial a hotline number Dial any of the hotline numbers of Huawei customer service
of Huawei customer center:
service center  8008302118
 4008302118
Dial the phone Obtain the phone numbers of regional offices at
number of the http://www.huawei.com/cn/about/officeList.do
regional office
Refer to the 1. Visithttp://support.huawei.com, and then click Documentation
troubleshooting cases on the left.
2. Choose Product Line > Product > Family > Product >
Troubleshooting Case.
3. View cases or enter keywords for searching.
Consult online 1. Visithttp://support.huawei.com, and then click Community on
the left.
2. Select a forum from technical forum.
3. Check whether the methods of rectifying similar faults have
been provided in the forum. If not, submit the fault.

Access the technical support website of Huawei as anequipment user.Only equipment users or
higher-level users have the permission to access DocumentationandCommunityon the technical
support website of Huawei.Before you access the technical support website of Huawei, register on the
website as equipment user by using the information about Huawei products that you purchased.
 How to access Documentation?
Visithttp://support.huawei.com, and then clickDocumentation. On the
Documentationpage, you can download and browse Huawei product manuals, technical
guides, technical cases, precaution notices, and Huawei technical publications.
 How to access Community?

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 7


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 2 Overview

Visithttp://support.huawei.com, and then clickCommunity.TheCommunity page


provides technical forums about Huawei products and functions as a platform for
technical consultations and exchanges.
 How to contact regional offices?
Visithttp://support.huawei.com/, and then click About Huawei.On the page that appears,
click Contact us to view the contact information of regional offices.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 8


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

3 Methods of Locating Faults

About This Chapter


This topic describes several methods of locating fault, including analyzing logs, analyzing
alarms, and capturing packets for analysis.
There are multiple methods of locating faults. In the actual situation, these methods are often
used together as complements to each other. A good command and a flexible application of
these locating methods are the prerequisites for efficient fault rectification.
3.1 Viewing Alarms on the eSpace EMS Client
This topic describes how to view alarms on the eSpace EMS client.
3.2 Log Analysis
You can locate a fault quickly by checking the logs. This topic describes how to enable the
debug logs and view the logs.

3.1 Viewing Alarms on the eSpace EMS Client


This topic describes how to view alarms on the eSpace EMS client.

Procedure
Step 1 Log in to the eSpace EMS client.
Step 2 On the Topology Management tab page, view the alarms generated on LocalNMS, as shown
in Figure 3-1.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 9


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Figure 3-1 Viewing Alarms

Step 3 View the current fault alarms, as shown in Figure 3-2.

Figure 3-2 Filter window

Step 4 Click an alarm to view the detailed information, as shown in Figure 3-3

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 10


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Figure 3-3 Current fault alarms

Step 5 Click View details next to Proposed repair actions: to view the causes and repair
suggestions for the alarm.
----End

3.2 Log Analysis


You can locate a fault quickly by checking the logs. This topic describes how to enable the
debug logs and view the logs.
You can locate a fault by checking logs in the following cases:
 No alarm is reported when the fault occurs.
 The fault cannot be located only by checking the alarm.

3.2.1 Changing a Log Level


You can change a log level to obtain the required log information.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 11


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Context
The configuration file oms.xml under {install path}/run/config records log levels based on
global configuration. This topic describes how to change a log level online by using
commands. After change, the configuration takes effect immediately. If you restart the system,
log levels are automatically restored based on global configuration.
Log levels include:
 DEBUG
 INFO
 WARN
 ERROR
 FATAL

Procedure
The omscli.sh command under {install path}/run/bin is used to change a log level.
Perform the following steps to change a log level:
1. Log in to the eSpace EMS server as user i2kuser.
2. Query the current log level.
# cd {install path}/run/bin
# ./omscli.sh log all

No Name Level File


1 apache WARN /opt/I2000SDV3/run/log/oms/core/apache.log
2 asutil ERROR /opt/I2000SDV3/run/log/oms/asutil/asutil.log
3 author ERROR /opt/I2000SDV3/run/log/oms/sm/author.log
4 base ERROR /opt/I2000SDV3/run/log/oms/core/base.log
5 bme ERROR /opt/I2000SDV3/run/log/bme/bme.log
6 cache ERROR /opt/I2000SDV3/run/log/oms/core/cache.log
7 cm ERROR /opt/I2000SDV3/run/log/oms/cm/cm.log
8 configure ERROR /opt/I2000SDV3/run/log/oms/core/configure.log
9 dbevtutil ERROR /opt/I2000SDV3/run/log/oms/eam/dbevtutil.log
10 dis_frame ERROR /opt/I2000SDV3/run/log/oms/autodis/dis_frame.log
11 dis_lldp ERROR /opt/I2000SDV3/run/log/oms/autodis/dis_lldp.log
12 dis_snmp ERROR /opt/I2000SDV3/run/log/oms/autodis/dis_snmp.log
Name is the log name, Level is the log level, and File is the absolute path of the log file.
3. Change a log level.
# ./omscli.sh log logname level
− logname is the log name in the 2 query result.
− level is the changed level.
For example, change the log level of cm to DEBUG:
# ./omscli.sh log cm DEBUG
Change log level of cm from ERROR to DEBUG
4. (Optional) Restore the log level to the default level.
# ./omscli.sh log logname default
Example: # ./omscli.sh log cm default

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 12


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

3.2.2 Logs
This topic describes how the system collects logs when faults occur in the system.

eSpace EMS Logs


Table 3-1 describes how the system collects logs when faults occur.

{install path} is the installation path of the eSpace EMS server. The default path is /opt/oms.

Table 3-1 Log description

Mod Log File Path Log File Log Description


ule
Secu {install author_*.log Security
rity path}/run/log/oms/s authentication logs
mod m/
ule nePermitGate_*.log NE right gateway
logs

sm_*.log Main program logs


Alar {install fm_*.log Main program logs
m path}/run/log/oms/f
mod m/ fmprobe_*.log Collection layer
ule logs

fmui_*.log Alarm client logs

fmbackup_*.log Alarm dump logs


Perfo {install pm_*.log Logs related to
rman path}/run/log/oms/p performance
ce m/ monitoring
mod templates, NE
ule event processing,
and view
monitoring

pmdata_*.log Logs collected


when performance
data is saved to the
database
pmds_*.log DS layer logs
pmmeastype_*.log Logs related to
performance
indicator instance
management

pmprobe_*.log Performance data


collection logs

pmthreshold_*.log Performance
threshold

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 13


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Mod Log File Path Log File Log Description


ule
management logs

pmui_*.log Client operation


logs in performance
management
NE {install mimcache_*.log Cache logs of the
acces path}/run/log/oms/ea MIM
s m/
mod mim_*.log NE management
ule logs

iconmgr_*.log NE icon processing


logs

eam_*.log Logs related to NE


access operations
such as NE
lifecycle and type
processing

eam_*.log DS logs of the


EAM
eam_*.log Client operation
logs related to NE
access operations
such as tree table
refreshment
Topo {install mapping_*.log Logs related to
logy path}/run/log/oms/to topology object
mod po/ mapping processing
ule
topo_*.log DS layer logs
related to topology
operations such as
right and domain
allocation and
initialization of the
data to be displayed
on the client

topo_*.log uiService logs, such


as flex invocation
Java errors

topomgr_*.log Logs related to


topology object
management and
alarm
synchronization
Soft {install ideploy_ui*.log Running logs
ware path}/run/log/oms/s related to software

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 14


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Mod Log File Path Log File Log Description


ule
mana wm/ management
geme
nt {install *.log Execution logs of
mod path}/run/log/oms/s installation or
ule wm/task name upgrade tasks
NOTE
The task name is the
name of the installation
or upgrade task created
in software
management.

Mess /opt/oms/run/log/om trace_node_*.log Logs related to


age strace/ interaction between
traci the mediation node
ng and the UOA
mod
ule trace_app_*.log Running logs of
message tracing
applications
ME {install med_*.log MED framework
D path}/run/log/oms/m logs and logs
mod ed/ related to
ule interaction between
the MED and NEs
over SNMP or
SOAP

ftp.server_*.log Logs related to


interaction between
the MED and NEs
over FTP

ftp.client_*.log Logs related to


interaction between
the MED and NEs
over FTP

ftp.med_*.log Logs related to


interaction between
the MED and NEs
over FTP

mml.med_*.log Logs related to


interaction between
the MED and NEs
over MML

mml.client_*.log Logs related to


interaction between
the MED and NEs
over MML
telnet.med_*.log Logs related to

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 15


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Mod Log File Path Log File Log Description


ule
interaction between
the MED and NEs
over Telnet

telnet.client_*.log Logs related to


interaction between
the MED and NEs
over Telnet

ssh.med_*.log Logs related to


interaction between
the MED and NEs
over SSH
ssh.client_*.log Logs related to
interaction between
the MED and NEs
over SSH
Nort {install nbi_*.log Running logs
hbou path}/run/log/oms/n related to the
nd bi/ northbound
mod module, for
ule example,
forwarding alarms
to the upper NMS
and performing
tasks delivered by
the upper NMS
Basi {install web.portal_*.log Portal running logs
c path}/run/log/oms/co
platf re/ event_*.log Event running logs
orm
log.mgmt_*.log Running logs of the
mod
tool used for
ule
dynamically
changing log
severities

task_*.log Task running logs


sbus_*.log sbus running logs

sbus.server_*.log Running logs of the


sbus server

sbus.heartbeat_*.log sbus heartbeat


check logs

ds.core.adapter_*.log Running logs of the


DS layer

fsm_*.log Running logs of the


file management

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 16


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Mod Log File Path Log File Log Description


ule
module

persistence_*.log Running logs of the


persistence layer

sbus.client_*.log Running logs of the


sbus client
apache_*.log Running logs of
Tomcat

base_*.log Running logs of the


base module

cache_*.log Running logs of the


cache module
UC {install snmptrap_*.log Logs about sending
servi path}/run/log/uc/ and receiving trap
ce messages between
log NEs and the eSpace
EMS through
SNMP
cbm/*.log Common functional
module logs (such
as cache, rotation,
batch importing,
and device
selection functions)
gs8/*.log GS8 access and
service logs

iad/*.log IAD access and


service logs

ippbx/*.log IP PBX access and


service logs

license/*.log License
management logs

other/*.log NE detection, NE
automatic access,
and IP PBX/IAD
backup and
restoration logs

remotesupport/*.log Remote
maintenance logs
sftpclient/*.log Log downloading
logs
tr69/*.log IP

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 17


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 3 Methods of Locating Faults

Mod Log File Path Log File Log Description


ule
Phone/SBC/EGW
NE access and
service logs

ums/*.log UMS NE access


and service logs

vqm/*.log NE voice quality


monitoring logs

upgrade/*.log NE upgrade logs


Start {install log.log Startup logs
up path}/run/log/virgo/
log stop.exception.log Startup failure logs

Garb {install gc.hprof.txt Garbage collection


age path}/run/log/ logs
colle
ction
log

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 18


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

4 Fault Analysis

About This Chapter


This topic describes the principles of faults of different categories and the fault locating
guideline, helping you to locate and rectify faults quickly.
4.1 Performance Fault Analysis
This topic describe the principles of the performance statistics, performance monitoring, and
performance alarms and the fault locating guideline, helping you to locate and rectify a
performance fault quickly.
4.2 Software Management Fault Analysis
This topic describes the principles of the software management functions and the fault
location guideline, helping you to locate faults quickly.
4.3 iTrace Analysis
This topic describes the principles of the iTrace common functions and the fault locating
guideline, helping you to locate faults quickly.
4.4 iCnfg Analysis
This topic describes the principles of the iCnfg common functions and the fault location
guideline, helping you to locate faults quickly.
4.5 DR Fault Analysis
This topic describes the principles of the disaster recovery (DR) system, which helps you to
locate and rectify a DR fault quickly.

4.1 Performance Fault Analysis


This topic describe the principles of the performance statistics, performance monitoring, and
performance alarms and the fault locating guideline, helping you to locate and rectify a
performance fault quickly.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 19


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

4.1.1 Performance Statistics


This topic describes the principles of the performance statistics and the fault location
guideline, helping you to locate and rectify a fault quickly.

Fault Location
To rectify a fault that occurs when you obtain the performance data of an NE connected over
SNMP, perform the following steps:
1. Check whether the NE is connected properly.
a. On the eSpace EMS client, choose Resource > Resource Management.
b. Click the Service Applications or Physical Devices tab, as shown in Figure 4-1.

Figure 4-1 Resource management

c. Check the connections between NEs and the eSpace EMS.


If the connection status is Online, go to 2. Otherwise, rectify the fault according to
the troubleshooting suggestions.
2. Check whether the NE reports the performance data to the eSpace EMS.
You can check whether the NE reports the performance data to the eSpace EMS using
any of the following methods:

Using monitoring views takes precedence over other two methods.


− Check using monitoring views
a. On the eSpace EMS client, choose Performance > Monitoring View.
b. Click Add Monitoring View.
c. In the Add Monitoring View dialog box, set View name, Managed Object, and
Indicator Instance, and click OK.
− Check using historical performance data
a. On the eSpace EMS client, choose Performance > Historical Data.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 20


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

b. Click Select Managed Object. In the Select Managed Object dialog box, set
Object type, Subnets, and Managed Objects, and click OK.
c. On the Historical Data tab page, set Time period and click Search.
− Check using logs
In the med_*.log file in {install path}/run/log/oms/med, check whether there are
performance data reported by the UOA using the OIDs of performance indicators.
If the performance indicators are cumulative ones, you need to check whether there
are calculated performance data using their OIDs in the pmdata_*.log file in {install
path}/run/log/oms/pm.

4.1.2 Performance Alarms


This topic describes the principles of the performance alarms and the fault location guideline,
helping you to locate and rectify a fault quickly.

Implementation Principles
The eSpace EMS compares the performance data with the preset performance index
thresholds in real time. If the performance instant value is greater than the threshold in three
consecutive intervals, the eSpace EMS generates a corresponding performance alarm. The
period of three intervals is a default setting, which can be changed in the relevant
configuration file of the eSpace EMS server.
Through this function, the performance items of the NEs that are monitored by the eSpace
EMS can be monitored in real time.
The performance data of the NEs that are connected to the eSpace EMS through SNMP is
obtained through the performance alarms by the eSpace EMSpm_snmpdataproc module of the
eSpace EMS server.
Figure 4-2 shows the process of generating a performance alarm.

Figure 4-2 Process of generating a performance alarm

The process is described as follows:

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 21


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

1. The administrator creates a performance alarm on the eSpace EMS client and sets the
performance indicator thresholds.
2. The eSpace EMS client calls the performance alarm interface of the eSpace EMS server,
and transmits the performance alarm parameter information to the eSpace EMS server.
Then the eSpace EMS server saves the specified performance alarm threshold conditions
to the database.
3. After obtaining performance data based on the statistics period, the eSpace EMS server
performs calculation based on the specified thresholds for performance indicators. If the
value of a performance indicator exceeds the specified threshold, an alarm is generated.
4. After the performance indicator falls, the eSpace EMS server obtains the performance
data again based on the statistics period and then performs calculation based on the
specified thresholds. If the value of the performance indicator is less than the specified
threshold, the alarm is cleared.

Fault Location
If an error occurs in the performance alarm, do as follows to locate the fault:
1. Check whether the alarm thresholds are successfully created.
a. On the eSpace EMS client, choose Performance > Template Configuration.
b. Select an NE or a module.
c. Click a measurement unit and check whether the alarm thresholds of a performance
indicator are successfully set.
If yes, perform step 2; if no, contact the NE maintenance personnel.
2. Check whether a performance alarm is generated.
a. On the eSpace EMS client, choose Fault > Current Alarms.
b. In the alarm list, check whether a performance alarm is generated.
If no performance alarm is generated and the value of the performance indicator
exceeds the specified alarm threshold, contact Huawei technical support.

4.2 Software Management Fault Analysis


This topic describes the principles of the software management functions and the fault
location guideline, helping you to locate faults quickly.

4.2.1 Executing an Installation or Upgrade Task


This topic describes the process of executing an installation or upgrade task and the fault
location guide, helping you to locate faults quickly.

Implementation Principle
Figure 4-3 shows the process of executing an installation or upgrade task.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 22


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-3 Process of executing an installation or upgrade task

待安装或
软件管理
升级的目标主机

1.使用Telnet或SSH协议连接目标主机

2.返回登录成功信息

3.执行指令

4.返回指令执行输出信息

5.根据指令执行输出信息判断执行结果

The process of executing an installation or upgrade task is as follows:


1. The Software Management connects to a target host using Telnet or SSH.
2. The host to be installed or upgraded sends the connection result to the Software
Management.
3. The Software Management sends an installation or upgrade command to the target host.
4. The target host sends the command execution result to the Software Management.
5. The Software Management checks whether the installation or upgrade task is executed
successfully based on the command execution result.

Fault Location
If a fault occurs when an installation or upgrade task is created, locate and rectify the fault as
prompted:
1. Locate a fault based on the log information on the task execution page of the Software
Management.
2. If the following log information is displayed, contact the plug-in maintenance personnel
to locate and rectify the fault:
a. Log in to the Software Management host as the i2kuser user.
b. Access install path/run/log/oms/swm, for example,
/opt/huawei/I2000/run/log/oms/swm/.
c. Refer to the ideploy_ui_*.log file to locate the fault.
Table 4-1 shows examples of logs for executing an installation or upgrade task.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 23


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Table 4-1 Examples of logs for executing an installation or upgrade task

Log Information Description

2011-11-23 14:58:29,638 DEBUG [T=4 The log information indicates that the
4973][sun.reflect.Genera tedMethodAcces software management module connects to
sor306.invoke() -1] [SSHTerminal] (conn the host to be installed or upgraded over
ectToServer :211) Make connection to secure shell protocol (SSH).
oamtest2@10.137.97.239 at port 22

2011-11-23 14:58:30,895 DEBUG [T=4 The log information indicates that the
4973][sun.reflect.Genera tedMethodAcces software management module runs the
sor306.invoke() -1] [UnixTerminal] (sen cd;ksh command on the host to be installed
dCommand:18 11) SSHTerminal : execute or upgraded and the timeout period is
command >>> [30000]:cd ; ksh 30,000 ms.

2011-11-23 14:58:31,057 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module successfully
sor306.invoke() -1] [ResultProcessor] ( executes instructions.
setSuccessf ul:846) Match message[ide
ploy:cmd:end] with finish word[ideplo y
:cmd:end]

2011-11-23 14:58:32,058 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module runs
sor306.invoke() -1] [UnixTerminal] (exe modules/backp.sh but there is no return
cuteForward :818) read data error for c value in the timeout period (such as
ommand: /home/see/breeze/ideploy/2011 0 1,500,000 ms).
610170618.498/scripts/ideploy_wrap.sh m To resolve the problem, set Timeout
odules/backup.sh com.huawei.breeze.idep
duration for command execution on the
loy.task.ExecuteTimeoutException: SSHTe
Configure System page under software
rmi nal : Execute command : /home/see
management or contact Huawei technical
/breeze/ideploy/2011061017061 8.498/scr
support.
ipts/ideploy_wrap.sh modules/backup.sh
timeout.[1500000 ms] on host 10.3.4.33(
see)

2011-11-23 14:58:33,026 DEBUG [T=4 The log information indicates that the
4976][sun.reflect.Genera tedMethodAcces software management module runs the
sor306.invoke() -1] [ResultProcessor] ha_start.sh script in ngin_ha and the return
(processRaw Msg:397) math result met ex value is not zero. You can locate the fault
ception. com.huawei.breeze.ideploy.task based on the output information of the
.ExecuteErrorException: -Command: "/hom script.
e/lgjsee/breeze/ideploy/20110711145056.
24/scripts/ide ploy_wrap.sh ngin_ha/ha_
start.sh" -Catched Key: "ideploy:error
:" -From Message: "iDeploy:Error:FAILED
" at com.huawei.breeze.ideploy.terminal
.ResultProcessor.ma tchiDeployKeyWords(
ResultProcessor.java:1085) at com.huawe
i.breeze.ideploy.terminal.ResultProcess
or.ma tchResult(ResultProcessor.java:57
3) at com.huawei.breeze.ideploy.termina
l.ResultProcessor.pr ocessRawMsg(Result
Processor.java:373) at com.huawei.breez
e.ideploy.terminal.UnixTerminal.proce s
sResult(UnixTerminal.java:738) at com.h

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 24


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Log Information Description


uawei.breeze.ideploy.terminal.UnixTermi
nal.readA ndProcessResult(UnixTerminal.
java:628) at com.huawei.breeze.ideploy.
terminal.UnixTerminal.sendC ommand(Unix
Terminal.java:1817) at com.huawei.breez
e.ideploy.terminal.UnixTerminal.sendP a
ssword(UnixTerminal.java:1713) at com.h
uawei.breeze.ideploy.terminal.UnixTermi
nal.execu teCmdWithSuUser(UnixTerminal.
java:1199) at com.huawei.breeze.ideploy
.terminal.UnixTerminal.execu teForward(
UnixTerminal.java:789) at com.huawei.br
eeze.ideploy.terminal.UnixTerminal.exec
u teWithSuUser(UnixTerminal.java:1025)

4.2.2 Checking Host Information


This topic describes the process of checking host information and the fault locating guideline,
helping you to locate faults quickly.

Implementation Principle
Figure 4-4shows the process of checking host information.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 25


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-4 Process of checking host information

The process of checking host information is as follows:


1. The Software Management connects to the target host to install or upgrade using Telnet
or Secure Shell (SSH).
2. The target host sends a message to the Software Management indicating that the
connection is successful.
3. Enter the user name for logging in to the target host.
4. The target host sends output to the Software Management.
5. The Software Management enters a password based on the received output.
6. The target host sends output to the Software Management.
7. The Software Management executes the command for switching to the root user.
8. The target host sends output to the Software Management.
9. The Software Management enters the password of the root user based on the received
output.
10. The target host sends output to the Software Management.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 26


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

11. The Software Management executes the command for exiting the root user.
12. The Software Management executes the command for creating a file and the command
for deleting the created file.
13. The target host sends output to the Software Management.
14. The Software Management executes the FTP or SFTP command to obtain files from the
target host.
15. The target host sends output to the Software Management.
16. The Software Management determines the host information checking result based on the
received output.

Locating Guideline
If a fault occurs when host information is checked, refer to Table 4-2 to locate and rectify the
fault.

Table 4-2 Solutions to different errors

Error Information Solution


The user name or Log in to the target host manually using Telnet or SSH, and
password is incorrect. verify the user name or password.
The prompt character is Log in to the target host manually using Telnet or SSH, and
incorrect. check whether the prompt character is one of the following
default prompt characters:
#$>%
If the prompt character is not a default one, change it on the Host
Management page. Click Full to expand all host parameters,
and set the password prompt character to a correct one.
The password of the 1. Log in to the target host manually using Telnet or SSH.
root user is incorrect. 2. Run the ksh command to switch shell.
3. Run the su - root command.
4. Check whether the password prompt character belongs to the
Software Management's password prompt set.
5. If yes, verify that the password of the root user exists. If no,
add the displayed prompt to the Software Management
prompt set.
Other errors Collect files ideploy_ui_*.log and ideploy_ui_*.zip in install
path/run/log/oms/swm/, and send them to Huawei technical
support engineers to locate and rectify faults.

4.3 iTrace Analysis


This topic describes the principles of the iTrace common functions and the fault locating
guideline, helping you to locate faults quickly.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 27


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

4.3.1 Creating a Tracing Task


This topic describes the process of creating a tracing task and the fault locating guideline,
helping you to locate faults quickly.

Process of Creating a Tracing Task


This topic describes how to create a tracing task.
Figure 4-5 shows the process of creating a tracing task.

Figure 4-5 Process of creating a tracing task

The process of creating a tracing task is described as follows:


1. A user creates a trace task.
2. The user sets trace conditions on the eSpace EMS client and sends a request for creating
a trace task to the eSpace EMS server.
3. The eSpace EMS server constructs the trace task data based on the user settings.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 28


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

4. The eSpace EMS server sends the request for creating a trace task to the eSpace EMS
Mediation node.

The Mediation node and the eSpace EMS server can be deployed on different machines. Typically, the
Mediation node and the eSpace EMS server can be deployed on a machine.
5. The Mediation node verifies and records the trace parameters.
6. The Mediation node sends the request for creating a trace task to the UOA.
7. The UOA verifies the request and asynchronously sends the request to the NE.
8. The UOA sends the success or failure information about task creation to the Mediation
node.
9. The Mediation node updates the trace task status.
10. The Mediation node returns the task creation result to the eSpace EMS server.
11. The eSpace EMS server updates the trace task status.
12. The eSpace EMS server returns the task creation result to the eSpace EMS client.
13. Steps 13 to 20 are the process that the NE asynchronously returns the task creation result.

Fault Location Guideline


This topic describes how to locate and rectify faults when creating a tracing task.
If a fault occurs when creating a tracing task, determine the step where the fault occurs based
on the symptom. Then check the matching environment and log to locate and rectify the fault.
The following describes common fault cases when creating a tracing task.

The Failed to obtain a management object. Message Is Displayed When


You Select a Device.
 Cause
The selected NE or object is deleted by other login users.
 Description
When a user selects a device from the eSpace EMS client, the eSpace EMS server
obtains the dn of the device from the managed object buffer, and then obtains the
detailed trace information about the device based on the dn. If the device is deleted, the
dn is also deleted, and the preceding message is displayed when you select the device.
 Solution
Open the Resource Management tab page, add the device, and create a trace task for the
device.

The Exceeded the maximum number of tracing tasks (5). or The number
of trace tasks exceeded the maximum 40. Message Is Displayed When You
Create a Task.
 Cause
A maximum of 40 trace tasks can be created, and a maximum of five trace tasks is
allowed for a single client.
 Solution
Delete unnecessary trace tasks.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 29


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

The Failed to create the trace task. Message Is Displayed When You
Create a Trace Task.
 Cause
− No matched module is found.
− The trace agent is not connected successfully.
− The trace task fails to be created because all trace task IDs are used up.
− Exceptions occur in the Master and Mediation services.
 Solution
Click View Detail to view related information.
Table 4-3 describes the solutions based on different causes.

Table 4-3 Solutions

Cause Description Solution


No matched The module is Check whether the module is registered on the
module is deregistered or an UOA. Open the register_info.log file in {UOA
found. exception occurs in installation directory}/log to check whether
the module. the module is deregistered. If the following
information is displayed, the module is
deregistered.
Nov 22 18:52:38:333686 ThreadID:195
6 >>> Module UnRegister: ModuleCode=005
4040110001
Contact the NE maintenance personnel to find
the reason why the module is deregistered, and
register the module again.
The trace agent The connection 1. Check whether the UOA is successfully
is not connected between the UOA and started.
successfully. Mediation node is Log in to the host where the UOA resides as
abnormal or the UOA user uoa and run the following command:
service is not running
> p
properly.
If the following information is displayed,
the UOA is successfully started. Otherwise,
run the uoa_start.sh command to start the
UOA.

uoa 28679 1 0 May23 ? 00


:00:01 uoa_lma uoa 28681 28679 0 Ma
y23 ? 00:00:01 uoa_server uoa
28782 28679 0 May23 ? 00:00:00 u
oa_log_agent uoa 28869 28679 0 May2
3 ? 00:00:00 uoa_trace_agent uoa
28943 28679 0 May23 ? 00:00:0
1 uoa_perf_agent uoa 28992 28679 0
May23 ? 00:00:00 uoa_cli

2. Check information such as the IP address,


port number, user name, and password in

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 30


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Cause Description Solution


the uoa_common.ini file on the UOA, and
then create NEs again on the eSpace EMS.
The trace task The eSpace EMS Close the trace page on the client (the task ID is
fails to be server needs to released after you close the trace page).
created because allocate the trace task
all trace task ID before sending a
IDs are used up. request for creating a
trace task to the
UOA. If trace task
IDs are allocated by
bit, a maximum of 24
tasks of the same type
is allowed on each
UOA. If you create
more than 24 tasks of
the same type on a
UOA, the error
message is displayed.
The eSpace If the eSpace EMS or Check whether the communication between the
EMS or Mediation service eSpace EMS and the Mediation is normal.
Mediation does not work Locate and troubleshoot the fault by checking
service does not properly, you the operating logs of the Mediation and the
work properly. sometimes cannot eSpace EMS server.
create trace tasks.

A Message Is Reported After You Successfully Create a Trace Task, But the Task
Is Automatically Deleted After It Runs Some Time.
 Cause
− All modules of the trace task are disconnected or deregistered.
− The connection between the UOA and Mediation is disconnected.
− The end time of the paused trace task is reached.
 Solution
Click View Detail to view related information.
Table 4-4 describes the solutions based on different causes.

Table 4-4 Solutions

Cause Description Solution


All modules of The UOA reports a task Check the connection between the
the trace task deletion message when the module and the UOA.
are connection between the View the UOA log file to check whether
disconnected or module and the UOA is the module is deregistered.
deregistered. abnormal or the device is
deregistered.
The connection The eSpace EMS Check whether the connection between

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 31


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Cause Description Solution


between the automatically deletes the the UOA and the Mediation is normal.
UOA and trace task related to the
Mediation is UOA when the connection
disconnected. between the UOA and the
Med-Node is disconnected
or the UOA service does not
work properly.
The end time of The UOA sends a task Re-create a trace task.
the paused trace deletion message to the
task is reached. device and reports it to the
eSpace EMS server when
the end time of the paused
trace task is reached.

A Tracing Task Is Deleted After Being Created for About 10 to 15 Seconds


 Cause
None
 Principle
If an NE does not respond to a request for creating a tracing task sent by the UOA in
about 10 seconds, the UOA considers that the NE does not run properly and deletes the
tracing task.
 Solution
1. Check whether the NE runs properly.
If no, restore the NE. For details, see the NE troubleshooting guide.
2. Check the connection between the NE and the UOA.
If the connection is faulty, restore the connection. For details, see the NE troubleshooting
guide.

4.3.2 Displaying Tracing Messages


This topic describes the process of displaying tracing messages and the fault locating
guideline, helping you to locate faults quickly.

Process of Displaying a Tracing Message


This topic describes the process of displaying a tracing message reported by an NE on the
eSpace EMS client.
Figure 4-6 shows the process of displaying a tracing message.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 32


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-6 Process of displaying a tracing message

The process is describes as follows:


1. An NE reports a tracing message to the UOA.
2. The UOA verifies the tracing message and reports the message to the eSpace EMS
Mediation node.
3. The eSpace EMS Mediation node reports the message to the eSpace EMS server.
4. The eSpace EMS server parses the message.
5. The eSpace EMS server reports the parsed message to the eSpace EMS client.
6. The eSpace EMS client shows the message tracing result in graphics based on the
parameters such as the trace type.

Fault Location Guideline


This topic describes how to locate and rectify faults when a tracing message is displayed.
Solutions to common faults when a tracing message is displayed are as follows:

No Message Is Reported After a Tracing Task Is Created


 Cause
− An NE does not report a message to the UOA.
− An NE has reported a message, but the message does not reach the UOA for
unknown reasons. The common reason is that the tracing message reported by the NE
is filtered out by the platform.
− The UOA receives the message reported by the NE, but does not report the message
to the Mediation Node.
− The Mediation receives the message, but does not report the message to the eSpace
EMS client.
 Principle

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 33


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

An NE reports a tracing message to the UOA. The UOA reports the message to the
Mediation Node. The Mediation Node then sends the message to the eSpace EMS server.
Finally, the eSpace EMS server reports the message to the eSpace EMS client, and the
eSpace EMS client presents the alarm on the GUI.
 Solution
1. Ask the NE maintenance personnel to check whether an NE has reported a message.
If no, ask the NE maintenance personnel to locate and rectify faults based on the NE
troubleshooting guide.
2. Check the duoa_trace_agent.log file in UOA installation directory/log for records that
indicate the UOA has received messages from NEs.
If the log file contains the following information, the UOA has received messages from
NEs.
Nov 21 18:44:16 [Debug3] ThreadID:10236 >>>
--------------ReportTraceMsg------------- ModuleCode =
0054040110001 IsSender = 0 TraceCode = 0xff00000e RcvMsgTimeMs = 739
RcvMsgTimeSec = 1321872256 Trac
eProtocol = 0 GeneralIDType = 1 GeneralID = 123

Nov 21 18:44:16 [Debug3] ThreadID:7216 >>> -------------


CReportTraceMsgToOMCMsg ------------ m_nTotal_Len
gth = 74 m_sVersion = 2 m_sCommand_ID = 0x3 m_nSequence_ID = 0 m_uiTraceTaskID =
4278190094 m_usSequenceNum
= 1 m_cMsgDirection = 0 m_uiTraceMsgTimeSec = 1321872256 m_uiTraceMsgTimeMilliSec
= 739 m_sTraceProtocol =
0 m_szModuleCode = 0054040110001 m_GID.ucGeneralIDType = 1 m_GID.strGeneralID =
123 m_strTraceContent: 000
000 46 72 6F 6D 20 53 52 56-4D 61 6E From SRVMan
m_strTraceExtInfo: 000000 7C 6C
65 76 65 6C 3D 30 |level=0

If the preceding information is not displayed, the UOA does not receive the message
reported by the NE. In this case, you can view the log file of the UOA to locate and
troubleshoot the fault. If the fault still persists, contract Huawei technical support.
3. Check the duoa_trace_agent.log file in UOA installation directory/log/debug for
records that indicate the UOA has reported messages to the Mediation Node.
If the log file contains the following information, the UOA has reported message to the
Mediation Node.
The following information indicates that the UOA sends a message to the Mediation
Node whose IP address is 10.138.48.145.
Nov 21 19:20:27 [Debug3] ThreadID:8620 >>> Put message to queue(1)
(destination=10.138.48.145:4308)
, length is 74. Nov 21 19:20:27 [Debug3] ThreadID:9560 >>> Send message to
remote(IP-10.138.48.145:PORT-
4308:HANDLE-1188), message stream: 000000 00 00 00 4A 00 02 00 03-00 00 00 00 FF
00 00 0E ...J..........
.. 000010 00 01 30 30 35 34 30 34-30 31 31 30 30 30 31 00 ..0054040110001. 000020
01 03 31 32 33 4E
CA 33-FB 00 00 01 9B 00 00 00 ..123N.3........ 000030 00 00 0B 46 72 6F 6D 20-53
52 56 4D 61 6E 00 00
...From SRVMan.. 000040 00 08 7C 6C 65 76 65 6C-3D 30 ..|level=0

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 34


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

If the UOA does not send the message to the Mediation Node, check the UOA log file to
locate and rectify the fault. If the fault persists, contact Huawei technical support.
4. Check the log file of the Mediation Node for records indicating that the Mediation Node
has received messages from the UOA.
Log file path: {install path}/run/log/oms/trace/trace_node_*.log
If the log file contains the following information, the Mediation Node has received
messages from the UOA.
2011-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch()
117] Receive message from remote ip = 10.138.48.145, port = 6601 2011-11-21
19:20:27,411 DEBUG [T=124
5][com.huawei.oms.net.trace.uoa. agent.AgentDispatcher.dispatch() 120] Receive
message command id = 3. 2011
-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch() 121] 0000
00 00 00 00 4A 00 02 00 03-00 00 00 00 FF 00 00 0E ...J............ 000010 00 01
30 30 35 34 30 34-30 31 3
1 30 30 30 31 00 ..0054040110001. 000020 01 03 31 32 33 4E CA 33-FB 00 00 01 9B
00 00 00 ..123N.3........
000030 00 00 0B 46 72 6F 6D 20-53 52 56 4D 61 6E 00 00 ...From SRVMan.. 000040
00 08 7C 6C 65 76 65 6C-3D
30 ..|level=0

If the preceding information is not displayed, the Mediation does not receive the message
forwarded by the UOA. In this case, you need to check whether the connection between
the UOA and the Mediation is normal and view the log file of the Mediation to locate
and troubleshoot the fault. If the fault still persists, contact Huawei technical support.
5. In the log file of the eSpace EMS server, check whether the eSpace EMS server receives
the message from the Mediation.
Log file path: {install path}/run/log/oms/trace/trace_app_*.log
If the log file contains the following information, the eSpace EMS server receives the
message from the Mediation Node.
2011-11-21 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch()
117] Receive message from remote ip = 10.138.48.145, port = 6601 2011-11-21
19:20:27,411 DEBUG [T=1245][co
m.huawei.oms.net.trace.uoa. agent.AgentDispatcher.dispatch() 120] Receive message
command id = 3. 2011-11-2
1 19:20:27,411 DEBUG [T=1245][com.huawei.oms.net.trace.uoa.
agent.AgentDispatcher.dispatch() 121] 000000 00
00 00 4A 00 02 00 03-00 00 00 00 FF 00 00 0E ...J............ 000010 00 01 30 30
35 34 30 34-30 31 31 30
30 30 31 00 ..0054040110001. 000020 01 03 31 32 33 4E CA 33-FB 00 00 01 9B 00 00
00 ..123N.3........ 0000
30 00 00 0B 46 72 6F 6D 20-53 52 56 4D 61 6E 00 00 ...From SRVMan.. 000040 00 08
7C 6C 65 76 65 6C-3D 30
..|level=0

If the preceding information is not displayed, the eSpace EMS server does not receive
the message forwarded by the Mediation. In this case, you need to check whether the
connection between the eSpace EMS server and the Mediation is normal and view the

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 35


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

log files of the app to locate and troubleshoot the fault. If the fault still persists, contact
Huawei technical support.

Unknown Icons Exist in Flowcharts in the Chart Display Area of the iTrace
Client (As Shown in Figure 4-7)

Figure 4-7 Unknown icons

 Cause
− An NE registers the GeneralID with the UOA.
− An unidentified module is registered on the UOA, but the module type is not
specified in the resource file.
 Principle
After receiving a tracing message, the eSpace EMS client searches for the module type
in the local NE data based on the module code, and draws a tracing flowchart based on
the obtained module type. If the module code does not exist in the local eSpace EMS NE
data, the eSpace EMS client cannot find the module type, and cannot draw an icon that
can be identified. Therefore, the icon is displayed as the module code in the chart display
area. The module code registered with the UOA is a 13-digit number.
 Solution
Check whether the unknown module is registered with the UOA by viewing the
$UOA_RUN_ROOT/data/middata/module.datfile on the UOA server.
The first field in every line of this file is a module code. The following are examples:
0054040101001|4040101|SEE_testrptmsg|10.137.97.244|1|1|V100R001C02B121|1111|404
01|SEE_244|o60585
|0054040101001|||soapadapter_100
0054040101002|4040101|SEE_testrptmsg2|10.137.97.244|1|1|V100R001C02B1
21|1111|40401|SEE_244|o60585|0054040101002|||soapadapter_100

− If the module code exists, the module has been registered with the UOA.
− If the module code does not exist, the module is not registered with the UOA.
Because the $UOA_RUN_ROOT/data/middata/module.dat file cannot be
modified manually, contact the NE maintenance personnel to register the module
with the UOA.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 36


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Unknown Icons Exist in Flowcharts in the Chart Display Area of the iTrace
Client (As Shown inFigure 4-8)

Figure 4-8 Unknown icons

 Cause
− An NE fails to register the GeneralID with the UOA.
− The module GeneralID is not reported.
 Principle
The GeneralID of a module is identified in either of the following ways:
− An NE registers the GeneralID with the UOA.
The process of identifying the GeneralID is as follows:
Assume that the module code is 0054040110001, and the GeneralID is
DOID://0A4769E7/00000001/00000002/000000020054100100002.
1. The NE notifies the UOA that the GeneralID of the module 0054040110001 is
DOID://0A4769E7/00000001/00000002/000000020054100100002.
2. When the iTrace server sends a request to the UOA for creating a tracing task, the UOA
reports the GeneralID to the iTrace server.
3. The NE reports the tracing message with the other party information being the module
GeneralID DOID://0A4769E7/00000001/00000002/000000020054100100002.
4. The iTrace server changes the other party information from
DOID://0A4769E7/00000001/00000002/000000020054100100002 to 0054040110001.
5. The iTrace server reports the message with the module code to the iTrace client for
display.
− An NE reports the module GeneralID directly in the additional information about a
message. This method does not require preprocessing of the iTrace server. The iTrace
client directly changes the GeneralID to the module code.
 Solution
The fault locating method varies according to the method of identifying the GeneralID.
− An NE registers the GeneralID with the UOA.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 37


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

View UOA logs for creating a tracing task.


If the following log information exists, the GeneralID is successfully registered with
the UOA. Otherwise, contact Huawei technical support engineers.

The following information indicates that the UOA reports a GeneralID message to the iTrace server. For
example, the message is SynGeneralIDListMsg, the message ID is 0x56, and the GeneralID of the
module 0054030104001 is DOID://0A4769E7/00000001/00000002/000000020054100100001.
Jun 04 09:50:30 [Debug3] ThreadID:1479543712 >>> Put message to queue(0)
(destination=10.137.97.248:5
2241), length is 89. Jun 04 09:50:30 [Debug3] ThreadID:1479543712 >>>
------------- CSynGeneralIDListMsg
------------- m_nTotal_Length = 89 m_sVersion = 2 m_sCommand_ID = 0x56
m_nSequence_ID = 0 m_ucSynT
ype = 0 m_szModuleCode = 0054030104001 Jun 04 09:50:30 [Debug3]
ThreadID:1479543712 >>> m_vModuleGeneral
IDList.size ===== 2 ucGeneralIDType : 1 strGeneralID : 0054030104001
ucGeneralIDType : 0 strGeneralID
: DOID://0A4769E7/00000001/00000002/000000020054100100001

− An NE reports the module GeneralID directly in the additional information about a


message.
View the additional message information in the form display area of the iTrace client
to check whether the module alias similar to |alias=... is displayed.

4.4 iCnfg Analysis


This topic describes the principles of the iCnfg common functions and the fault location
guideline, helping you to locate faults quickly.

Implementation Principles of Configuration Management


Figure 4-9 shows the implementation principles of configuration management.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 38


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-9 Implementation principles of configuration management

Table 4-5 describes the implementation process of configuration management.

The operations described in Table 4-5 are not in sequence.

Table 4-5 Implementation process of configuration management

Operation Procedure
Synchronize The administrator triggers the operation of synchronizing
configuration data configuration data on the eSpace EMS client.
1. The eSpace EMS client sends a data synchronization
request to the server.
2. The server obtains the latest configuration data from NEs.
3. The server synchronizes the latest data to the eSpace EMS
database and returns a synchronization result to the eSpace
EMS client.
4. The eSpace EMS client shows the synchronization result.
Add, delete, or modify The administrator adds, deletes, or modifies configuration
configuration items items on the eSpace EMS client.
1. The eSpace EMS client sends a request for pre-editing
configuration data to the server.
2. The server submits the data to NEs.
3. The eSpace EMS client shows the data submission result.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 39


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Fault Location
If an error occurs when you perform configuration management operations, you can locate the
fault using either of the following two methods:
1. Locate a fault based on the error message provided by the eSpace EMS client.
2. Locate a fault based on the error log and analysis for the configuration management
implementation process.
Typically, you can locate a fault based on the error message provided by the eSpace EMS
client. If you cannot find the fault cause based on the error message, you can check the error
log and configuration management implementation process. The error log information is as
follows:
 Log file name: cm_[TIMESTAMP].log
 Log file path: {Install path/run/log/oms/cm}
The following is a sample of a piece of complete log information:
2011-11-15 09:50:13,248 DEBUG
[T=205][com.huawei.oms.cm.as.support.ExtensionActivator.start() 43]
ExtensionActivator is starting.

Table 4-6 describes each part in the log information.

Table 4-6 Log description

Information Description

2011-11-15 09:50:13,248 Time when a fault occurs. The time is


accurate to millisecond.
DEBUG Log level.
[T=205] ID of the current thread.
[com.huawei.oms.cm.as.support.Extensio Code information, such as the class name,
nActivator.start() 43] method name, and code line.
ExtensionActivator is starting. Log information. The information may be
displayed in lines.

4.5 DR Fault Analysis


This topic describes the principles of the disaster recovery (DR) system, which helps you to
locate and rectify a DR fault quickly.

Operation Principles of the GDR System


Figure 4-10 shows the network deployed in geographical disaster recovery (GDR) mode.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 40


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-10 Network deployed in GDR mode

Typically, the production machine of the eSpace EMS provides services. If the production
machine is faulty, a manual switchover is performed to switch services from the production
machine to the redundancy machine.
The GDR software synchronizes data between the production machine and the redundancy
machine and performs resource management for application services.
 DRService: primary process of the GDR software. The process exists on both the
production machine and the redundancy machine. The DRService process of the
production machine monitors the status of the replication link and helps the DRService
process of the redundancy machine to perform a failover. The DRService process of the
redundancy machine monitors the GDR software and prepares the application services
and database for a switchover or failover.
 DRAgent: agent of the GDR software. The DRAgent exists only on the redundancy
machine and is responsible for preparing the application programs for a failover.
 Disaster recovery command line interface (DRCLi): operation mode of the GDR
software. The DRCLi encapsulates data synchronization commands at the bottom layer
to provide users with simple command interfaces and provide the DRService process
with the unified management interface, command interface, and message interface.
 Filesync: file synchronization tool of the GDR software. The tool can run only on the
production machine and is responsible for synchronizing files from the production
machine to the redundancy machine.
 DataGuard: component of the Oracle database. The component is responsible for
replicating data of the database.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 41


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Data Synchronization in GDR Mode


In the eSpace EMS deployed in GDR mode, data synchronization is classified into the
following types:
 File synchronization based on Filesync: implemented by using the GDR software and
managed by the Filesync process of the GDR software.
 Oracle database synchronization based on the DataGuard: implemented by using the
DataGuard component of the Oracle database.
 File synchronization based on Filesync:
The synchronization process is as follows:
1. The GDR software periodically scans the files on the production machine.
2. If certain files are modified, the GDR software synchronizes the modified files from the
production machine to the redundancy machine by running the SCP or RCP command
of the operating system.

 If the update time of a file changes, it is considered that the file has been modified. If the file
contents are not actually modified, the file is also synchronized from the production machine to the
redundancy machine.
 In the configuration file of the GDR software, you need to configure information such as the files or
directories to be synchronized and the synchronization type.
The files to be synchronized are listed as follows:
− {install path}/run/repository
− {install path}/run/hedex
− {install path}/run/plugins
− {install path}/run/pickup
− {install path}/run/dump
− {install path}/run/data
 Oracle database synchronization based on the DataGuard:
Figure 4-11 shows the synchronization process.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 42


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Figure 4-11 Network deployed in GDR mode

The following four processes are responsible for synchronization:


− Log network server (LNS): transmits the redo log to the redundancy machine.
− Remote file server (RFS): receives redo data from the production machine and writes
redo data in the redo log file on the redundancy machine.
− Managed recovery process (MRP): applies the received logs to physical disks of the
redundancy machine.
− Archive (ARCH): archives logs on the redundancy machine.
The synchronization process is as follows:
1. The LGWR process writes redo logs in the product database in online mode.
2. The LNS process reads online redo logs from the product database and sends the logs to
the RFS process of the redundancy machine.
3. The RFS process receives the redo logs from the LNS process.
4. The RFS sends an acknowledgment message to the LNS, saying that the redo logs are
received successfully.
5. The RFS process writes the redo logs to the redundancy machine.
6. The MRP obtains the redo logs from the redundancy machine.
7. The MRP applies the redo logs to the database of the redundancy machine.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 43


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

Switching
Switching is classified into switchover and failover:
 Switchover: You perform a switchover only when the production machine is running
properly. A switchover is triggered for test during installation, debugging, or routine
maintenance.
 Failover: You need to perform a switchover when the production machine is faulty.

In switchover mode, you need to stop the service of the production machine and then start the service of
the redundancy machine. In failover mode, however, you need only to start the service of the
redundancy machine.

The following figure shows the switching process.

Table 4-7 Switching processes in switchover and failover modes

Switching Switching Process


Mode

Switchover 1. You run the drcli command on the redundancy machine to trigger
a switchover.
2. After receiving the switchover request, the DRService of the
redundancy machine notifies the DRAgent to prepare for the
switchover. If the production machine is running properly, the
DRService of the redundancy machine also notifies the DRService
of the production machine to prepare for the switchover.
3. The DRAgent starts the eSpace EMS service of the redundancy
machine.
4. The DRService of the production machine stops the eSpace EMS
service of the production machine.
5. The DRService of the production machine stops data replication
from the original production machine to the original redundancy
machine.
6. On the redundancy machine, you run the command for role
switching to start data synchronization from the current production
machine to the redundancy machine.
Failover 1. When the production machine is faulty, you run the DRCLI
command on the redundancy machine to trigger a failover.
2. After receiving the failover request, the DRService of the
redundancy machine notifies the DRAgent to prepare for the
failover.
3. The DRAgent starts the eSpace EMS service of the redundancy
machine.
4. You repair the original production machine. After repair, the data
is synchronized from the current production machine to the current
redundancy machine.

Fault Location
When a fault occurs in the GDR system, you can locate the fault based on logs.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 44


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 4 Fault Analysis

The GDR software logs the operating information about the GDR system. Common log files
are listed as follows:
 drcli.log in /opt/huawei/gdr/log: operating logs of the drcli command
 filesync_sh.log in /opt/huawei/gdr/log: operating logs about data replication
 filesync.log in /opt/huawei/gdr/log: operating logs of the Filesync process
 drservice.log in /opt/huawei/gdr/log: operating logs of the DRService process

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 45


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

5 Troubleshooting

About This Chapter


This topic describes the common operations in the troubleshooting, including the operations
of checking the statuses of the eSpace EMS, VCS, and DR resources and starting and
stopping the eSpace EMS, VCS, and DR system.
5.1 Checking the Running Status of the eSpace EMS
This topic describes how to start theeSpace EMS, stop the eSpace EMS, and view the running
status of eSpace EMS services.
5.2 Checking the Running Status of the DR System
This topic describes how to check the running status of the eSpace EMS DR system.

5.1 Checking the Running Status of the eSpace EMS


This topic describes how to start theeSpace EMS, stop the eSpace EMS, and view the running
status of eSpace EMS services.

5.1.1 Starting the eSpace EMS Service


This topic describes how to start the eSpace EMS. You need to start the Oracle database
before starting the eSpace EMS.

Command Syntax
./omsd.sh start

Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install path}/run/bin.
> cd {install path}/run/bin
3. Start the eSpace EMS.
> ./omsd.sh start

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 46


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

Output Example

Help System ......................................................... started


Kernel Module ....................................................... started
Base Module ......................................................... started
Net Adapter Module .................................................. started
Mediation Module .................................................... started
Topo Module ......................................................... started
MORE Module ......................................................... started
Audit Module ........................................................ started
Access Module ....................................................... started
Dump Module ......................................................... started
Fault Module ........................................................ started
Security Module ..................................................... started
Core Platform ....................................................... started
Performance Module .................................................. started
License Monitor Module .............................................. started
ConfigManager Module ................................................ started
NBI Module .......................................................... started
UOA Module .......................................................... started
Trace Module ........................................................ started
I2000 PM ............................................................ started
SoftwareManagement ideploy.ui ....................................... started
SoftwareManagement swm.ui ........................................... started
Startup Monitor ..................................................... started
Finished

5.1.2 Querying the eSpace EMS Service Status


This topic describes how to query the eSpace EMS service status.

Command Syntax
./omscli.sh checkstate process

Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install patch}/run/bin.
> cd {install path}/run/bin
3. Query the eSpace EMS service status.
> ./omscli.sh checkstate process

Output Example
System process already started.

5.1.3 Stopping the eSpace EMS Service


This topic describes how to stop the eSpace EMS.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 47


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

Command Syntax
./omsd.sh stop

Procedure
1. Log in to the eSpace EMS server as user i2kuser.
2. Access {install path}/run/bin.
> cd {install path}/run/bin
3. Stop the eSpace EMS.
> ./omsd.sh stop

Output Example

Dump Module ......................................................... stopped


MORE Module ......................................................... stopped
Audit Module ........................................................ stopped
Security Module ..................................................... stopped
ConfigManager Module ................................................ stopped
Trace Module ........................................................ stopped
Core Platform ....................................................... stopped
I2000 PM ............................................................ stopped
Performance Module .................................................. stopped
Access Module ....................................................... stopped
License Monitor Module .............................................. stopped
SoftwareManagement ideploy.ui ....................................... stopped
NBI Module .......................................................... stopped
Net Adapter Module .................................................. stopped
Startup Monitor ..................................................... stopped
Topo Module ......................................................... stopped
Fault Module ........................................................ stopped
UOA Module .......................................................... stopped
Mediation Module .................................................... stopped
SoftwareManagement swm.ui ........................................... stopped
Base Module ......................................................... stopped
Kernel Module ....................................................... stopped
Help System ......................................................... stopped
Finished

5.2 Checking the Running Status of the DR System


This topic describes how to check the running status of the eSpace EMS DR system.

5.2.1 Starting the GDR Software


This topic describes the commands for starting the drservice and filesync processes on the
production machine and the drservice and dragent processes on the redundancy machine.

Procedure
Step 1 Start the GDR software of the production machine.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 48


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

> drservice -c
> p

UID PID PPID C STIME TTY TIME CMD


root 3306 1 0 Sep15 ? 00:00:02 drservice -c
root 3307 3306 0 Sep15 ? 00:00:55 filesync 3 243353354 11112

Step 2 Start the GDR software of the redundancy machine.


> drservice -m
> p

UID PID PPID C STIME TTY TIME CMD


root 24065 1 0 Sep15 ? 00:00:08 drservice -m
root 24069 24065 0 Sep15 ? 00:00:00 dragent 0 i2000

----End

5.2.2 Checking the Process Status of the GDR Software


If the process of the GDR software is abnormal, data synchronization, failover, or switchover
may fail. This topic describes how to check the process status of the GDR software.

Procedure
Step 1 Log in to the production machine and redundancy machine by using the gdr account that is
also used in installation of the GDR software.
Step 2 Check the process status of the GDR software on the production machine.
> p
If the information contains the drservice and filesync processes, the GDR software is running
properly on the production machine.

UID PID PPID C STIME TTY TIME CMD


root 1632 1 0 14:38 ? 00:00:02 drservice -c
root 1640 1632 0 14:38 ? 00:00:01 dragent 0 db
root 1641 1632 0 14:38 ? 00:00:01 dragent 1 pub
root 1639 1632 0 14:38 ? 00:00:00 filesync 3 1471304970 11112

Step 3 Check the process status of the GDR software on the redundancy machine.
> p
If the information contains the drservice and dragent processes, the GDR software is running
properly on the redundancy machine.

UID PID PPID C STIME TTY TIME CMD


root 20393 1 0 14:40 ? 00:00:02 drservice -m

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 49


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

root 20400 20393 0 14:40 ? 00:00:02 dragent 0 i2000


root 20401 20393 0 14:40 ? 00:00:02 dragent 1 db

The filesync process exists on the redundancy machine, but the dragent process does not exist on the
redundancy machine.

----End

Check Result
Check that the GDR software is running properly on both the production machine and
redundancy machine.

Exception Handling
 If a GDR process is abnormal or not running, check the following logs:
− {GDRWORKDIR}/log/drcli.log
− {GDRWORKDIR}/log/drservice.log
− {GDRWORKDIR}/log/filesync.log
− {GDRWORKDIR}/log/filesync_sh.log
 If the drservice or dragent process is abnormal, stop the process and then start the
drservice process.
For details, see Stopping the GDR Software and Starting the GDR Software.

5.2.3 Checking the States of DR Resources


This topic describes how to check the states of DR resources.

Context

You have to run the command only on the redundancy machine.

The states of DR resources include:


 PreOnline: Resources or resource groups are available.After DR prestart is performed
successfully, resources or resource groups are in this state.
 PreOnlining: Resources or resource groups are being used for DR restart.When DR
prestart is being performed, resources or resource groups are in this state.
 Online: Resources or resource groups are available.After a switchover or failover is
performed successfully, resources or resource groups are in this state.
 Onlining: Resources or resource groups are being used for DR restart.When a switchover
or failover is being performed, resources or resource groups are in this state.
 Offline: Resources or resource groups are stopped.
 Offlining: Resources or resource groups are being stopped.
 Preonlinefailed: Resources or resource groups fail to be used for DR prestart.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 50


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

 OnlineFailed: Resources or resource groups fail to be used for a switchover or failover.


 OfflineFailed: Stopping resources or resource groups fails.
 Unknown: The state of resources or resource groups is unknown.
 PostOnline: After a switchover or failover is performed successfully, resources or
resource groups are in this state.
 PreOnlinePending: At prestart, database resources are suspended at start until the
production machine finishes the checkpoint operation.
 OnlinePending: Resources or resource groups are being started but suspended. This state
occurs in the following scenarios:
− ResourceGroup.n requires a switchover or failover, but ResourceGroup.m that
conflicts with ResourceGroup.n has initiates DR prestart. Therefore,
ResourceGroup.n is in the OnlinePending state until the state of ResourceGroup.m
changes to Offline.
− If the fast switching function is started before start of database resources, database
resources are in the OnlinePending state until the IBC message of the production
machine is executed successfully.

Procedure
Step 1 Log in to the redundancy machine as DR user gdr.
Step 2 Check the DR resource states.
> drcli -c drstate -l
RG STATE
Group State DRState
RG.1 PostOnline Normal

RESOURCE STATE
Group Resource ID Type State
RG.1 1001 -- Offline
RG.1 100101 App(i2000) Offline
RG.1 100102 DB(ORACLE) Offline

----End

Check Result
Check that all resources on the redundancy machine is in the offline state.

5.2.4 Checking the Database Synchronization Status


This topic describes how to check whether database synchronization is normal.

Procedure
Step 1 Log in to the production machine by using the gdr account that is also used in installation of
the GDR software.
Step 2 Check the database synchronization status.
> drcli -c checkrep ResID100101

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 51


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

Example: > drcli -c checkrep 100102

RepType: DataGuard
DBName : I2KDB
RlinkName : omsdb[omsdb]_dr_omsdb
Log_Dest_Status : Connected
Time_Computed : None
TransportLag : None
ApplyLag : None
EstimatedOpenTime : None
RealTimeApply : None
MRP0Status : None
OracleDBStatus : READ WRITE

Step 3 Log in to the redundancy machine by using the gdr account that is also used in installation of
the GDR software.
Step 4 Check the database synchronization status.
> drcli -c checkrep ResID100101
Example: > drcli -c checkrep 100102

RepType: DataGuard
DBName : I2KDB
RlinkName : omsdb_dr_omsdb[omsdb]
Log_Dest_Status : Connected
Time_Computed : 26-JAN-2010 13:43:54
TransportLag : +00 00:00:00
ApplyLag : +00 00:00:03
EstimatedOpenTime : 13(S)
RealTimeApply : ON
MRP0Status : APPLYING_LOG
OracleDBStatus : READ ONLY

----End

Check Result
If the preceding information in bold is displayed, database synchronization is normal.

5.2.5 Checking the File Synchronization Status


This topic describes how to check whether the file synchronization tool Filesync is running
properly.

Procedure
Step 1 Log in to the production machine by using the gdr account that is also used in installation of
the GDR software.
Step 2 Check whether the Filesync process is running properly.
> p

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 52


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

If the following information is displayed, the Filesync process is running properly.

UID PID PPID C STIME TTY TIME CMD


root 1632 1 0 14:38 ? 00:00:02 drservice -c
root 1640 1632 0 14:38 ? 00:00:01 dragent 0 db
root 1641 1632 0 14:38 ? 00:00:01 dragent 1 pub
root 1639 1632 0 14:38 ? 00:00:00 filesync 3 1471304970 11112

Step 3 Run the following commands:


> cd ${GDRWORKDIR}/log/
> more filesync.log
> more filesync_sh.log
> more filesync.prt
Check whether the latest records in the preceding log file contain error or failed. If the latest
records do not contain error or failed, file synchronization is normal.
----End

Check Result
 The Filesync process is running properly.
 The latest records in the log file contain error or failed.

5.2.6 Checking the Statuses of the Switched Roles of the DR


System
This topic describes how to check the statuses of the switched roles of the DR system.

Procedure
Step 1 Log in to the production machine as the DR user gdr.
Step 2 Run the drcli -s switchovercheck command to check whether the running status of the DR
environment is normal.
The command is used to check the following information:
 Data replication status of a database resource
 Status of the file synchronization through the file synchronization tool
 Running status of the DR software GDR
 Status of the eSpace EMS key information
After the command is run, the system writes the following execution result into the
switchcheck.prt file in /opt/huawei/GDR/log:
*****************************************************************************
Wed Dec 2 10:11:40 CST 2009
*****************************************************************************

---------------------Check DB replication State BEGIN------------------------

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 53


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 5 Troubleshooting

[SUCCESS] Stauts of DB replication is Normal.


---------------------Check DB replication State END--------------------------

---------------------Check Filesync State BEGIN------------------------------


[SUCCESS] Filesync status is Normal and all files have been replicated.
---------------------Check Filesync State END--------------------------------

---------------------Check GDR Process State BEGIN---------------------------


[SUCCESS] All processes of GDR are Normal.
---------------------Check GDR Process State END-----------------------------

---------------------Check Application State BEGIN---------------------------


[PROMPT] /opt/huawei/gdr/tools/check_i2000.sh execute successfully.
[SUCCESS] All App Scripts execute successfully.
---------------------Check Application State END-----------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[SUCCESS] All check finished. Status are all Normal.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

----End

5.2.7 Stopping the GDR Software


This topic describes the commands for stopping the drservice, filesync, and dragnet
processes.

Context
When you stop the GDR software on the production machine, the drservice and filesync
processes are stopped.
When you stop the GDR software on the redundancy machine, the drservice and dragnet
processes are stopped.

Procedure
Step 1 Log in to the production or redundancy machine as user gdr.
Step 2 Run the following command:
> drcli -s stop
----End

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 54


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

6 Collecting Fault Information

About This Chapter


This topic describes the information to be collected for fault analysis and location and the
commands used for collecting the information. The fault information to be collected involves
all devices dedicated to providing services in the product, including the devices related to
networks, storage, security, and servers, and their physical networking.
6.1 OS Information
This topic describes the common commands used during OS information.
6.2 Network Device Information
This topic describes the information about network devices that is to be collected.
6.3 DR Information
This topic describes the common information that is collected when a DR fault occurs.
6.4 Oracle Database Information
This topic describes the common information to be collected when you handle database faults.
6.5 Collecting Logs
This topic describes how the system collects logs when faults occur in the system.
6.6 Version Information
This topic describes the commands for querying the information about the versions of the
eSpace EMS.

6.1 OS Information
This topic describes the common commands used during OS information.
Table 6-1lists the information about the Linux OS that needs to be collected and the command
used for collecting the information.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 55


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Table 6-1 Information to be collected and required commands (Linux OS)

No. Information Command Description


to be
Collected

1 OS version # uname -a View the output result.

2 System # top Run the top command as the root user,


performance capture a screenshot, and then save the
status screenshot.
NOTE
Before you run the topcommand, make
sure that top software is installed.

3 System error # more View the current error information of


log /var/log/messages the system.

4 Space # df -k View the space usage of the space


information of usage.
the hard disk
5 Space usage of # du -sh * View the space usage of the file system
the file system. in the current directory.
6 Information # ifconfig -a View the status and IP address of the
about a network network adapter.
adapter

6.2 Network Device Information


This topic describes the information about network devices that is to be collected.
Currently, Huawei Quidway S5600 series Ethernet switches are adopted. Commands may
vary according to switch. For actual commands, see the corresponding manual.
Table 6-2 lists the information about Huawei Quidway S5600 that is to be collected and the
commands used for collecting the information.

Table 6-2 Information to be collected and required commands (switch)

No. Information Command Description


to Be
Collected

1 Status of fault - Check the LED indicators and


indicators interface indicators on the front panel
of a switch.
2 Status of  Command for Record the status of all valid
switch checking all interfaces displayed in a window.
interfaces interface: display
interface

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 56


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Information Command Description


to Be
Collected
 Command for
checking
specified
interfaces:
display interface
GigabitEthernet
1/0/20
3 Contents of display log Query logs for any error information
switch logs or the situation that a certain interface
is frequently up and down.
NOTE
The command varies according to switch.
For the actual command, see the
corresponding manual.

Figure 6-1 shows the status of the indicators on Huawei Quidway S5600.

Figure 6-1 Status of the indicators on Huawei Quidway S5600

Table 6-3 shows the modules shown in theFigure 6-1

Table 6-3 Module description

No. Description
1 Status indicators of twenty-four 10/100/1000Base-T
auto-negotiation Ethernet ports
2 Indicators of gigabit SFP combo ports
3 Fabric indicator
4 RPS indicator

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 57


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Description
5 Power indicator
6 Module indicator
7 Indicator of port mode switch
8 Mode switch button for port status indicators
9 Seven-segment LED display
10 Console

Table 6-4 describes the status of the indicators.

Table 6-4 Status of the indicators

Indicator Mark on Status Meaning


the Panel
(5) Power PWR Steady A switch is normally started.
indicator green
Blink green The system is performing the power-on self
(1 Hz) test.
Steady red The system fails in the power-on self test.
A fault occurs.
Blink Certain ports fail in the power-on self test.
yellow (1 Their functions fail.
Hz)
Off The switch is powered off.
(4) RPS RPS Steady The AC part and the DC input are normal.
indicator green
Steady The DC input is normal. The AC part fails
yellow or the AC input power is not connected.
Off The DC input power is not connected.
(3) Fabric STK Green The device is in the loop Fabric state.
indicator When a Fabric port receives or sends data,
the indicator blinks quickly.
Yellow A device is in the daisy chain Fabric state.
When a Fabric port receives or sends data,
the Fabric indicator blinks quickly.
Blink green The device is separated from the Fabric
(3 Hz) device (valid when the device is in the
Fabric state).
Off Two Fabric ports are not connected.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 58


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Indicator Mark on Status Meaning


the Panel
(6) Module Module(M Steady The module is in position and runs
indicator OD) green normally.
Blink The module is not supported or is faulty.
yellow
Off The module is not installed.

6.3 DR Information
This topic describes the common information that is collected when a DR fault occurs.
Table 6-5 describes the information that needs to be collected when a DR fault occurs and the
corresponding commands.

Table 6-5 Information that needs to be collected and the corresponding commands

No. Collecte Command Description


d
Informa
tion

1 Running > p Run the command on theeSpace EMS production


status of machine and DR machine as the DR user gdr.If the
the DR following information is displayed on the eSpace
system EMS production machine:

UID PID PPID C STIME TTY TIME CMD


root 1632 1 0 14:38 ? 00:00:02
drservice -c
root 1640 1632 0 14:38 ? 00:00:01
dragent 0 db
root 1641 1632 0 14:38 ? 00:00:01
dragent 1 pub
root 1639 1632 0 14:38 ? 00:00:00
filesync 3 1471304970 11112

If the following information is displayed on


theeSpace EMS DR machine:

UID PID PPID C STIME TTY TIME CMD


root 20393 1 0 14:40 ? 00:00:02
drservice -m
root 20400 20393 0 14:40 ? 00:00:02
dragent 0 i2000
root 20401 20393 0 14:40 ? 00:00:02
dragent 1 db

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 59


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Collecte Command Description


d
Informa
tion
It indicates that the DR system runs normally.
Otherwise, you need to collect the information, and
then submit it to Huawei technical support engineers.
2 Data > drcli -c Run the command on theeSpace EMS production
replicatio checkrep machine and DR machine as the DR user gdr. If the
n status 100102 following information is displayed on the eSpace
of a EMS production machine, it indicates that the data
database replication status of the database resource is normal
resource on the eSpace EMS production machine:
RepType: DataGuard
DBName : I2KDB
RlinkName : i2kdb[i2kdb]_dr_i2kdb
Log_Dest_Status : Connected
Time_Computed : None
TransportLag : None
ApplyLag : None
EstimatedOpenTime : None
RealTimeApply : None
MRP0Status : None
OracleDBStatus : READ WRITE
If the following information is displayed after the
preceding command is run on theeSpace EMS DR
machine, it indicates that the data replication status
of the database resource is normal on theeSpace
EMS DR machine:
RepType: DataGuard
DBName : I2KDB
RlinkName : dr_i2kdb_i2kdb[i2kdb]
Log_Dest_Status : Connected
Time_Computed : 04-JAN-2010 15:27:18
TransportLag : +00 00:00:00
ApplyLag : +03 15:04:53
EstimatedOpenTime : 10(S)
RealTimeApply : ON
MRP0Status :
OracleDBStatus : READ ONLY
Otherwise, you need to collect the displayed
information, and then submit it to Huawei technical
support engineers.
3 Status of > drcli -f check Run the command on the eSpace EMSproduction
the file machine as the DR user gdr .The command output
synchroni will be written into the filesync.prt file in
zation /opt/huawei/GDR/log.Then submit the file to
through Huawei technical support engineers.
the file
synchroni
zation

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 60


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Collecte Command Description


d
Informa
tion
tool
4 Status of > drcli -c Run the command on the eSpace EMS DR machine
the GDR drstate -l as the DR user gdr .Then collect the displayed
resources information and submit it to Huawei technical
support engineers.
5 Running > drcli -s Run the command on the eSpace EMSproduction
status of switchoverchec machine and DR machine as the DR user gdr .The
the DR k command output will be written into the
environm switchcheck.prt file in/opt/huawei/GDR/log.Then
ent submit the file to Huawei technical support
engineers.
6 Run logs > cd Pack all the files (including the filesync.prtand
of the /opt/huawei/gd switchcheck.prt files) in the preceding directory on
GDR r/log the eSpace EMSproduction machine and DR
machine, and then submit the package to Huawei
technical support engineers.
7 Configur > cd Pack all the files in the preceding directory on the
ation /opt/huawei/gd eSpace EMS production machine and DR machine,
informati r/config and then submit the package to Huawei technical
on of the support engineers.
GDR

6.4 Oracle Database Information


This topic describes the common information to be collected when you handle database faults.
Table 6-6 lists the information about an Oracle database that is collected about the commands
for collecting information.

Table 6-6 Information to be collected from an Oracle database and commands for collecting the
information

No. Informa Command Description


tion to
Be
Collecte
d
1 Alarm > cd Save the files that are located in the path and
logs $ORACLE_BASE/diag submit them to Huawei technical support
/rdbms/$ORACLE_SID/ engineers.
$ORACLE_SID/alert The command must be run by the oracle
user.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 61


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Informa Command Description


tion to
Be
Collecte
d
NOTE
$ORACLE_BASE indicates ORACLE_BASE set
in the environment variables by the oracle user.
The field$ORACLE_SID indicates the name of
the instance in use, such as i2kdb.
Example:
>
$ORACLE_BASE/diag/rdbms/i2kdb/i2k
db/alert
2 Connecti > cd Save the files that are located in the path and
on logs $ORACLE_BASE/diag submit them to Huawei technical support
/tnslsnr/${HOSTNAME} engineers.
/listener/trace The command must be run by the oracle
user.
NOTE
$ORACLE_BASE indicates ORACLE_BASE set
in the environment variables by the oracle user.
The field${HOSTNAME}indicates a host name,
such as 2ksvr-1.
Example:
>
$ORACLE_BASE/diag/tnslsnr/i2ksvr-1/li
stener/trace
3 Admin > cd Save the files that are located in the path and
configura $ORACLE_HOME/net submit them to Huawei technical support
tion data work/admin engineers.
The command must be run by the oracle
user.
NOTE
$ORACLE_BASE indicates ORACLE_BASE set
in the environment variables by the oracle user.

4 Initializat > cd Save the files that are located in the path and
ion files $ORACLE_HOME/db submit them to Huawei technical support
of the s engineers.
database The command must be run by the oracle
user.
NOTE
$ORACLE_BASE indicates ORACLE_BASE set
in the environment variables by the oracle user.

5 Database > sqlplus / as sysdba Submit the query result to Huawei technical
version support engineers.
SQL> select banner
from sys.v_$version; The command must be run by the oracle
user.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 62


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

No. Informa Command Description


tion to
Be
Collecte
d

6 Memory # top>file2.txt Run the top command as the root user and
usage of submit the result to Huawei technical
the support engineers.
database
server
7 Database > exp Back up the data exported from the
data system/password@i2kd database.
export b buffer=8092 full=y The command must be run by the oracle
inctype=complete user.
file=backup.dmp
NOTE
The field password indicates the password of the
system user. The value varies according to actual
situation.
The field i2kdb indicates the instance name of the
eSpace EMSdatabase.

8 Port > more Obtain the value of PORT in the file, and
number, $ORACLE_HOME/net then submit the result to Huawei technical
IP work/admin/listener.or support engineers.
address, a i2kdb = (DESCRIPTION_LIST = (DESCRIPTION
and host = (ADDRESS = (PROTOCOL = TCP)(HOST =
name of i2ksvr-1)(PORT = 1521)) ) )
the
current
database. The command must be run by the oracle
user.
9 Name of > sqlplus / as sysdba Submit the query result to Huawei technical
the support engineers.
SQL> select
current The command must be run by the oracle
DB_UNIQUE_NAME
database user.
from v$database;
instance.
10 Character > sqlplus / as sysdba Submit the query result to Huawei technical
set used support engineers.
SQL> show parameter
in the The command must be run by the oracle
nls_language
current user.
database.

6.5 Collecting Logs


This topic describes how the system collects logs when faults occur in the system.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 63


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

eSpace EMS Logs


Table 6-7 describes how the system collects logs when faults occur.

{install path} is the installation path of the eSpace EMS server. The default path is /opt/oms.

Table 6-7 Log description


Mod Log File Path Log File Log Description
ule
Secu {install author_*.log Security
rity path}/run/log/oms/s authentication logs
mod m/
ule nePermitGate_*.log NE right gateway
logs

sm_*.log Main program logs


Alar {install fm_*.log Main program logs
m path}/run/log/oms/f
mod m/ fmprobe_*.log Collection layer
ule logs

fmui_*.log Alarm client logs

fmbackup_*.log Alarm dump logs


Perfo {install pm_*.log Logs related to
rman path}/run/log/oms/p performance
ce m/ monitoring
mod templates, NE
ule event processing,
and view
monitoring

pmdata_*.log Logs collected


when performance
data is saved to the
database

pmds_*.log DS layer logs


pmmeastype_*.log Logs related to
performance
indicator instance
management

pmprobe_*.log Performance data


collection logs

pmthreshold_*.log Performance
threshold
management logs

pmui_*.log Client operation


logs in performance
management

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 64


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Mod Log File Path Log File Log Description


ule
NE {install mimcache_*.log Cache logs of the
acces path}/run/log/oms/ea MIM
s m/
mod mim_*.log NE management
ule logs

iconmgr_*.log NE icon processing


logs
eam_*.log Logs related to NE
access operations
such as NE
lifecycle and type
processing

eam_*.log DS logs of the


EAM

eam_*.log Client operation


logs related to NE
access operations
such as tree table
refreshment
Topo {install mapping_*.log Logs related to
logy path}/run/log/oms/to topology object
mod po/ mapping processing
ule
topo_*.log DS layer logs
related to topology
operations such as
right and domain
allocation and
initialization of the
data to be displayed
on the client
topo_*.log uiService logs, such
as flex invocation
Java errors

topomgr_*.log Logs related to


topology object
management and
alarm
synchronization
Soft {install ideploy_ui*.log Running logs
ware path}/run/log/oms/s related to software
mana wm/ management
geme
nt {install *.log Execution logs of
mod path}/run/log/oms/s installation or
ule wm/task name upgrade tasks

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 65


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Mod Log File Path Log File Log Description


ule
NOTE
The task name is the
name of the installation
or upgrade task created
in software
management.

Mess /opt/oms/run/log/om trace_node_*.log Logs related to


age strace/ interaction between
traci the mediation node
ng and the UOA
mod
ule trace_app_*.log Running logs of
message tracing
applications
ME {install med_*.log MED framework
D path}/run/log/oms/m logs and logs
mod ed/ related to
ule interaction between
the MED and NEs
over SNMP or
SOAP

ftp.server_*.log Logs related to


interaction between
the MED and NEs
over FTP

ftp.client_*.log Logs related to


interaction between
the MED and NEs
over FTP

ftp.med_*.log Logs related to


interaction between
the MED and NEs
over FTP

mml.med_*.log Logs related to


interaction between
the MED and NEs
over MML

mml.client_*.log Logs related to


interaction between
the MED and NEs
over MML

telnet.med_*.log Logs related to


interaction between
the MED and NEs
over Telnet

telnet.client_*.log Logs related to


interaction between

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 66


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Mod Log File Path Log File Log Description


ule
the MED and NEs
over Telnet

ssh.med_*.log Logs related to


interaction between
the MED and NEs
over SSH

ssh.client_*.log Logs related to


interaction between
the MED and NEs
over SSH
Nort {install nbi_*.log Running logs
hbou path}/run/log/oms/n related to the
nd bi/ northbound
mod module, for
ule example,
forwarding alarms
to the upper NMS
and performing
tasks delivered by
the upper NMS
Basi {install web.portal_*.log Portal running logs
c path}/run/log/oms/co
platf re/ event_*.log Event running logs
orm
log.mgmt_*.log Running logs of the
mod
tool used for
ule
dynamically
changing log
severities

task_*.log Task running logs


sbus_*.log sbus running logs

sbus.server_*.log Running logs of the


sbus server

sbus.heartbeat_*.log sbus heartbeat


check logs

ds.core.adapter_*.log Running logs of the


DS layer

fsm_*.log Running logs of the


file management
module

persistence_*.log Running logs of the


persistence layer

sbus.client_*.log Running logs of the


sbus client

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 67


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Mod Log File Path Log File Log Description


ule

apache_*.log Running logs of


Tomcat

base_*.log Running logs of the


base module

cache_*.log Running logs of the


cache module
UC {install snmptrap_*.log Logs about sending
servi path}/run/log/uc/ and receiving trap
ce messages between
log NEs and the eSpace
EMS through
SNMP

cbm/*.log Common functional


module logs (such
as cache, rotation,
batch importing,
and device
selection functions)

gs8/*.log GS8 access and


service logs

iad/*.log IAD access and


service logs

ippbx/*.log IP PBX access and


service logs

license/*.log License
management logs
other/*.log NE detection, NE
automatic access,
and IP PBX/IAD
backup and
restoration logs

remotesupport/*.log Remote
maintenance logs

sftpclient/*.log Log downloading


logs

tr69/*.log IP
Phone/SBC/EGW
NE access and
service logs

ums/*.log UMS NE access


and service logs

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 68


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 6 Collecting Fault Information

Mod Log File Path Log File Log Description


ule

vqm/*.log NE voice quality


monitoring logs

upgrade/*.log NE upgrade logs


Start {install log.log Startup logs
up path}/run/log/virgo/
log stop.exception.log Startup failure logs

Garb {install gc.hprof.txt Garbage collection


age path}/run/log/ logs
colle
ction
log

6.6 Version Information


This topic describes the commands for querying the information about the versions of the
eSpace EMS.
1. Log in to the eSpace EMS server as the i2kuseruser.
2. Run the following command to query the information about the versions of the eSpace
EMS:
> cd {install path}/run/config
> more oms.xml |grep "productVersion"
<param name="productVersion">V300R002C04</param>

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 69


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

7 Troubleshooting Cases

About This Chapter


7.1 Filesync Exception
This topic describes how to handle an exception of the GDR file synchronization process
Filesync.
7.2 DataGuard Synchronization Exception
This topic describes how to handle a DataGuard synchronization exception.
7.3 GDR Process Exception
This topic describes how to handle a GDR process exception.
7.4 Modifying Information About the Master Node Corresponding to the Mediation Node
After Switching
In distributed deployment mode, you need to associate the Mediation node with the current
production machine after switching.
7.5 The Performance Data of Some Network Devices Cannot Be Collected on the eSpace
EMS.
The performance data of some network devices such as routers or switches cannot be
collected on the eSpace EMS.
7.6 Fault Rectification About IP PBX Performance Data Collection Status
This topic describes the methods to rectify faults about IP PBX performance data collection
status.
7.7 Fault Rectification in the File System
This topic describes the methods to rectify faults in the file system.
7.8 eSpace EMS Page Is Leftward Offset in IE 8.0
This topic provides the method to use when the eSpace EMS page is leftward offset in IE 8.0.
7.9 File Download Dialog Box Is Displayed After a Click on the Upload Icon
This topic describes the method to use when the File Download dialog box is displayed after
a click on the upload icon.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 70


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

7.10 Failure to Export Data


7.11 Browser Page Cannot Be Properly Displayed or Some Browser Functions Are
Unavailable

7.1 Filesync Exception


This topic describes how to handle an exception of the GDR file synchronization process
Filesync.

Symptom
The drcli -s switchovercheck command fails after file synchronization is stopped or when the
Filesync process is synchronizing files.

Solution
This exception occurs because the switching check fails after file synchronization is stopped
or when the Filesync process is synchronizing files. You can perform the following steps to
resume file synchronization:
 If file synchronization is stopped, run drcli -f resume on the production machine to start
file synchronization. After file synchronization, run switchovercheck.
 If the Filesync process is synchronizing files, run drcli -f fullrep -l on the production
machine to start lightweight synchronization. After file synchronization, run
switchovercheck.

7.2 DataGuard Synchronization Exception


This topic describes how to handle a DataGuard synchronization exception.

Symptom
During a switchover or failover, the message DB synchronization has
disconnected is displayed. When users check the database synchronization status on the
production machine or redundancy machine, the value of Log_Dest_Status is Disconnected.

Solution
 If the value of Log_Dest_Status is Disconnected on the production machine:
1. Log in to the production machine as user oracle and run the following command:
> sqlplus / as sysdba
2. Check the status of LOG_ARCHIVE_DEST_2.
> select dest_name,status from v$archive_dest_status where dest_id=2;

DEST_NAME STATUS
LOG_ARCHIVE_DEST_2 ERROR

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 71


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

If STATUS of LOG_ARCHIVE_DEST_2 is ERROR, an exception occurs during


synchronization between the production machine and the redundancy machine.
3. Log in to the redundancy machine as user oracle.
4. Check whether data synchronization monitoring stops on the redundancy machine.
You can view the monitoring information in listener.ora under
$ORACLE_HOME/network/admin. According to the plan, the monitoring name of
the database on the redundancy machine is omsdb.
> lsnrctl status omsdb
If the following information is displayed, monitoring has stopped.

Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=float_ip_rep1)(PORT=1522)))
TNS-12541: TNS:no listener TNS-12560: TNS:protocol adapter error
TNS-00511: No listener Linux Error: 111: Connection refused

5. Start data synchronization monitoring of the redundancy machine.


> lsnrctl start omsdb
 If the value of Log_Dest_Status is Disconnected on the redundancy machine:
1. Log in to the redundancy machine as user oracle.
2. Check whether the TNS between the production machine and the redundancy machine is
normal.
You can view the TNS information in tnsnames.ora under
$ORACLE_HOME/network/admin. According to the plan, the TNS of the production
machine is omsdb.
> tnsping omsdb
If the following information is displayed, the TNS between the production machine and
the redundancy machine is normal.
TNS Ping Utility for Linux: Version 11.1.0.7.0 - Production on 24-OCT-2011 18:33:52

Copyright (c) 1997, 2008, Oracle. All rights reserved.

Used parameter files:


/opt/oracle/oradb/home/network/admin/sqlnet.ora

Used TNSNAMES adapter to resolve the alias


Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST =
10.85.178.87)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME =
omsdb)))
If the TNS between the production machine and the redundancy machine is abnormal,
check that the network connection between the two machines is normal.

7.3 GDR Process Exception


This topic describes how to handle a GDR process exception.

Symptom
The GDR process is being restarted or not running.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 72


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Solution
 If the GDR process on the production machine is running properly or not running, collect
the configuration file and log file from {GDRWORKDIR}/config and
{GDRWORKDIR}/log respectively on the production machine. Then contact Huawei
technical support.
 If the GDR process on the redundancy machine is running properly or not running,
collect the files from {GDRWORKDIR}/config, {GDRWORKDIR}/config/i2000, and
{GDRWORKDIR}/log respectively on the redundancy machine. Then contact Huawei
technical support.

7.4 Modifying Information About the Master Node


Corresponding to the Mediation Node After Switching
In distributed deployment mode, you need to associate the Mediation node with the current
production machine after switching.

Procedure
Step 1 Log in to the Mediation node as user i2kuser.
Step 2 Modify the configuration file.
> vi {install path}/run/config/oms.xml
<config name="med"> <config name="center"> <param name="serverPort">31006</param>
<param name="transportPackets">19998</param> </config> <config name="node"> <param
name="nodeId">Mediation_Masterself</param> <param
name="centerIP">10.85.172.90</param> <param name="nodeIP">0.0.0.0</param> <param
name="centerPort">31006</param> <param name="localPort">31007</param> </config>

Change the value of centerIP to the IP address of the current production machine.
Step 3 Restart the Mediation service.
> cd {install path}/run/bin
> ./omsd.sh restart
----End

7.5 The Performance Data of Some Network Devices


Cannot Be Collected on the eSpace EMS.
The performance data of some network devices such as routers or switches cannot be
collected on the eSpace EMS.

Symptom
The performance data of some network devices is not displayed in the performance
monitoring view and cannot be found in historical data.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 73


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Cause Analysis
The eSpace EMS obtains the performance data of network devices by running the SNMP Get
command. However, the SNMP access is disabled on the network devices for security reasons.
Therefore, the eSpace EMS cannot obtain the performance data by running the SNMP Get
command.

Solution
You need to grant SNMP access rights to the eSpace EMS server. In the disaster recovery
networking, you need to grant SNMP access rights to the production machine and the
redundancy machine.
For details, contact the device maintenance personnel.

7.6 Fault Rectification About IP PBX Performance Data


Collection Status
This topic describes the methods to rectify faults about IP PBX performance data collection
status.

Symptom
 In the Monitoring Configuration window, the collection status is Abnormal.
 In the Monitoring View window, no performance data in the latest several data
collection periods is displayed.

Possible Causes
 The connection between the IP PBX and the eSpace EMS is abnormal.
 The IP PBX is upgrading or has been upgraded.
 The IP PBX is restarting or has been restarted.
 Boards on the IP PBX are restarting or have been restarted.
 The active/standby board switchover is being performed or has been performed on the IP
PBX.

Procedure
Step 1 Verify that the connection between the IP PBX and eSpace EMS is normal. If the connection
is abnormal, connect the IP PBX to the eSpace EMS correctly.
Step 2 In system operation logs, check whether any user has upgraded the IP PBX in the day when
exceptions occur.
1. Choose System > Log Management from the main menu.
2. Choose Query Logs > Operation Logs from the navigation tree on the left.
3. In the operation log list, check whether any user has upgraded the IP PBX in the day
when exceptions occur.
 If no, go to Step 3.
 If yes, go to Step 4.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 74


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Step 3 View the IP PBX operation logs and check whether any user restarts the IP PBX, restarts
boards, or perform the active/standby board switchover.
1. Choose Resource > Resource Management from the main menu.
2. In the Operation column of the device list, click .
The XXX Management window is displayed. In the window name, XXX indicates an
NE name.
3. Choose Manage Service > Operation Log from the navigation tree on the left.
4. In the operation log list, check whether any user restarts the IP PBX, restarts boards, or
perform active/standby board switchover.
Step 4 Restart the performance monitoring task.
1. Choose Performance > Monitoring Configuration from the main menu.
2. Select the performance counter whose collection status is Abnormal, click Stop, and
click Start.
 If the collection status is changed to Normal, the fault is rectified.
 If the collection status is still Abnormal, contact Huawei technical support engineers.
----End

7.7 Fault Rectification in the File System


This topic describes the methods to rectify faults in the file system.

Procedure

 Do not run the fsck command in the file system that has been mounted. Otherwise, data is
lost.
 The shared disk cannot be used by other devices.

Step 1 Check whether the file system is mounted.


# mount
ucemserver2:~ # mount
/dev/cciss/c0d0p2 on / type ext3 (rw,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
devtmpfs on /dev type devtmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,mode=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
securityfs on /sys/kernel/security type securityfs (rw)

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 75


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

The value on indicates that the file system is mounted. Run the umount command to unmount the file
system.

Step 2 Run the fsck -y command to check and restore the file system.
fsck -y /dev/cciss/c0d0p2
# fsck -y /dev/cciss/c0d0p2
fsck 1.38 (30-Jun-2005)

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.


Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Wed May 27 15:47:08 2009
###########
fsck.reiserfs /dev/vgscp/lvscp failed (status 0x4). Run manually!

To check and restore the VxFS file system, run the fsck.vxfs command.
# fsck.vxfs -y /dev/sdb1
 If the system displays the message "passed", the checking and restoration complete.
After restarting, you can access the file system.
 If the restoration fails, the file system is damaged. Go to Step 3.
Step 3 Run the following command as prompted:
# fsck.reiserfs --rebuild-tree -y /dev/vgscp/lvscp
Step 4 If the file system cannot be restored, re-create a file system and use the backup data.
----End

Subsequent Processing
After restoration, check whether the file system status is normal.
Step 1 Run the tune2fs to check the ext2 or ext3 file system status before mounting the file system.
# tune2fs -l device name |grep state
# tune2fs -l /dev/sdb2 |grep state
Filesystem state: clean

If clean is displayed in the checking result, you do not need to perform further operations. Otherwise, go
to Step 2.

Step 2 Mount the file system and check logs in /var/log/messages.


If no error prompts exist in the logs, the file system is normal.
----End

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 76


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

7.8 eSpace EMS Page Is Leftward Offset in IE 8.0


This topic provides the method to use when the eSpace EMS page is leftward offset in IE 8.0.

Problem
When using the IE 8.0 to download import templates in batches, the eSpace EMS page is
leftward offset, as shown in Figure 7-1.

Figure 7-1 Leftward-offset eSpace EMS page in IE 8.0

Cause
The Internet Explorer is not a standard Internet Explorer 8.0, but Internet Explorer 8.0
Compatibility View.

Troubleshooting
1. Choose Tools > Developer Tools from the menu bar of the Internet Explorer.
The Developer Tools window is displayed.
2. Choose Browser Mode > Internet Explorer 8.0 from the menu bar, as shown in Figure
7-2.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 77


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Figure 7-2 Developer tool window

After the settings are complete, the eSpace EMS page is displayed normally.

7.9 File Download Dialog Box Is Displayed After a Click


on the Upload Icon
This topic describes the method to use when the File Download dialog box is displayed after
a click on the upload icon.

Problem
Step 1 Click next to Resource file to import on the batch import page, and select an Excel file.

Step 2 Click . The File Download dialog box is displayed, as shown in Figure 7-3.

Figure 7-3 File Download dialog box

----End

Cause
 The selected Excel file does not match the template. For example, this problem occurs if
you select an IAD template on the Import IP PBX page.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 78


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

 Extension ACTION is associated to an incorrect file type.


----End

Solution
Step 1 Close the File Download dialog box and the NE Management tab page.
Step 2 Click My Computer, and choose Tools > Folder Options from the main menu on the
displayed My Computer page.
Step 3 Click the File Types tab.
Step 4 Select extension ACTION, and click Delete.
----End

7.10 Failure to Export Data


Symptom
 When you attempt to export data, such as the current or historical alarm information and
signaling tracing data, the system displays a message shown in Figure 7-4.

Figure 7-4 Interception information

 Exporting the file failed.

Possible Causes
The automatic prompt function for downloading files is disabled.

Procedure
Step 1 Start the Internet Explorer.
Step 2 Choose Tools > Internet Options > Security > Custom Level from the main menu.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 79


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Figure 7-5 Internet options

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 80


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 81


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Step 3 Click Enable in Automatic prompting for file downloads under Downloads.

Figure 7-6 Security settings-Internet zone

Step 4 Click OK.


Step 5 Restart the Internet Explorer and log in to the eSpace EMS client. The fault is rectified.
----End

7.11 Browser Page Cannot Be Properly Displayed or Some


Browser Functions Are Unavailable
Symptom
Log in to the browser of the eSpace EMS client. The browser page cannot be properly
displayed or some functions on the browser page are unavailable. For example, slots are left
blank in the IP PBX device panel.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 82


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Possible Causes
The browsing history is not cleared.

Procedure
Step 1 Clear the browsing history.
 Internet Explorer 8.0
1. Choose Tools > Internet Options from the main menu.
2. Click the General tab and click Delete.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 83


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Figure 7-7 Internet options

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 84


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 85


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

3. Click Delete in the Delete Browsing History dialog box.

Figure 7-8 Deleting the browsing history

 Firefox 3.6 browser


1. Choose Tools > Options from the main menu.
2. Click Privacy in the displayed Options dialog box.

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 86


Copyright © Huawei Technologies Co., Ltd.
eSpace EMS
Fault Management 7 Troubleshooting Cases

Figure 7-9 Options

3. Click Private Data. In the displayed dialog box, click Clear Now.
----End

Issue 04 (2012-06-08) Huawei Proprietary and Confidential 87


Copyright © Huawei Technologies Co., Ltd.

You might also like