Rca 30466853

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Root Cause Analysis for After the

upgrade some NEs do not have


performance data and
performance data breakpoints

Issue 01

Date 2023-02-20

Huawei Technologies Co., Ltd.


Copyright © Huawei Technologies Co., Ltd. 2017. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior written
consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be within
the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: http://www.huawei.com

Email: support@huawei.com

Issue i
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints About This Document

About This Document

Trouble Ticket SR 30466853


Number

Prepared by Xu Qifeng 00627421

Approved by

Approval Date

Issue iii
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Basic Information

1 Basic Information

Product Name Version

NCE-IP+T+FAN V100R021C10SPC202

Issue 1-1
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Problem Analysis

2 Problem Description

1、 After the upgrade, some NEs at the secondary site of NCE do not have performance data.

2、 After the upgrade, the performance data of all NEs at the secondary site of NCE has breakpoints.

3、 After the switchover, the indicators data of some NEs is...

Issue 2-1
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Problem Analysis

3 Problem Analysis

1、 After analysis, these NEs are offline at the secondary site of NCE. Therefore, data cannot be collected and
analyzed.

These NEs are offline because the customer reports that the secondary site has some known route
problems. As a result, the secondary site of NCE cannot communicate with these NEs. This is not an NCE
product issue.

2、 For the data breakpoint of all PMS report issue, according to the analysis, there are several times NCE
restart operation during the data breakpoint. As a result, the collection process was restarted, causing
multiple data breakpoints.

3、 For the issue 3, some PMS instance cannot collect valid NE data after upgrade issue:
This is a problem of NCE V100R20C10. During an NE upgrade, there is a low probability that some
indicators of the new NE version fail to be updated in NCE database.
If NCE services are not restarted, NCE PMS still uses the cached data. Therefore, PMS functions are
normal.
If the NCE service is restarted, the indicators protocol to be collected from the NE is incorrectly updated,
then this problem also occurs in NCE R20C10.
The NCE services are automatically restarted during the upgrade. Therefore, the problem occurs after
the upgrade.

Issue 3-1
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Root Cause

4 Root Cause

For the issue 3, some PMS instance cannot collect valid NE data after upgrade issue:
This is a problem of NCE V100R20C10. During an NE upgrade, there is a low probability that some indicators
of the new NE version fail to be updated in NCE database.
When the SNMPCollectorService, BulkCollectorService processes restarts, the information fails to be loaded
from the database.
As a result, performance data collection abnormal after the upgrade.

Issue 4-1
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Corrective Action

5 Corrective Action

5.1 Workaround
Workaround for the issue 3:
Run the commands to add the missing configuration.
1) Use SSH to log in to NCE.
2) Run the following command as the dbuser user:
$ zsql legacycollectordb@127.0.0.1:$(cat /opt/zenith/data/$(find /opt/zenith/data/ -name legacycollectordb | sed
's/\// /g' | awk '{print $4}')/cfg/zengine.ini | grep LSNR_PORT | sed 's/=/ /g' | awk '{print $2}')
Password:

# backup the table


SQL> create table WBH_NPMS_DEV_TYPE_INDICATOR_1102 as select * from NPMS_DEV_TYPE_INDICATOR;
# restore the lost indicators data
SQL> INSERT IGNORE INTO NPMS_DEV_TYPE_INDICATOR
(DEV_TYPE_ID, DEV_OS_VERSION, TEMPLATE_ID, INDICATOR_ID,
PROTOCOL, PRECISION_VALUE, DATA_TYPE, FORMULA_TYPE,
FORMULA, VALUE_UNIT, INDICATOR_GRP_ID)
SELECT tmp.DEV_TYPE_ID, mis_data.DEV_SUB_VERSION,
tmp.TEMPLATE_ID, tmp.INDICATOR_ID,
tmp.PROTOCOL, tmp.PRECISION_VALUE, tmp.DATA_TYPE, tmp.FORMULA_TYPE,
tmp.FORMULA, tmp.VALUE_UNIT, tmp.INDICATOR_GRP_ID
FROM NPMS_DEV_TYPE_INDICATOR tmp ,
(SELECT DEV_TYPE_ID, DEV_SUB_VERSION, NPMS_TEMPLATE_ID FROM
(SELECT distinct a.DEV_TYPE_ID, a.DEV_SUB_VERSION, b.NPMS_TEMPLATE_ID
FROM npms_ne_info a, npms_instance_info b
WHERE a.DEV_ID=b.NPMS_DEV_ID) all_data
WHERE (SELECT count(1) as num FROM
(SELECT distinct DEV_TYPE_ID, DEV_OS_VERSION,TEMPLATE_ID
FROM NPMS_DEV_TYPE_INDICATOR) part_data
WHERE part_data.DEV_TYPE_ID = all_data.DEV_TYPE_ID
and part_data.DEV_OS_VERSION=all_data.DEV_SUB_VERSION
and part_data.TEMPLATE_ID=all_data.NPMS_TEMPLATE_ID) = 0
) mis_data
WHERE tmp.DEV_TYPE_ID=mis_data.DEV_TYPE_ID
and tmp.TEMPLATE_ID=mis_data.NPMS_TEMPLATE_ID;
SQL> COMMIT;

SQL> select count(1) from npms_dev_type_indicator where indicator_grp_id = 27 and protocol != 1;

# backup the table


SQL> CREATE TABLE NPMS_DEV_TYPE_INDICATOR_BAK AS SELECT * FROM NPMS_DEV_TYPE_INDICATOR;

Issue 5-1
Root Cause Analysis for After the upgrade some NEs do
not have performance data and performance data
breakpoints Corrective Action

# update indicator protocol


SQL> update npms_dev_type_indicator,TDT_DEVICE_TYPE_INDICATOR set npms_dev_type_indicator.FORMULA =
TDT_DEVICE_TYPE_INDICATOR.FORMULA WHERE TDT_DEVICE_TYPE_INDICATOR.PROTOCOL =1 AND
npms_dev_type_indicator.DEV_TYPE_ID = TDT_DEVICE_TYPE_INDICATOR.DEV_TYPE_ID AND
npms_dev_type_indicator.INDICATOR_ID = TDT_DEVICE_TYPE_INDICATOR.INDICATOR_ID AND
npms_dev_type_indicator.INDICATOR_ID IN (SELECT INDICATOR_ID FROM TDT_INDICATOR WHERE
INDICATOR_GRP_ID = 27) AND npms_dev_type_indicator.PROTOCOL != 1;

SQL> update npms_dev_type_indicator,TDT_DEVICE_TYPE_INDICATOR set npms_dev_type_indicator.PROTOCOL =


TDT_DEVICE_TYPE_INDICATOR.PROTOCOL WHERE TDT_DEVICE_TYPE_INDICATOR.PROTOCOL =1 AND
npms_dev_type_indicator.DEV_TYPE_ID = TDT_DEVICE_TYPE_INDICATOR.DEV_TYPE_ID AND
npms_dev_type_indicator.INDICATOR_ID = TDT_DEVICE_TYPE_INDICATOR.INDICATOR_ID AND
npms_dev_type_indicator.INDICATOR_ID IN (SELECT INDICATOR_ID FROM TDT_INDICATOR WHERE
INDICATOR_GRP_ID = 27) AND npms_dev_type_indicator.PROTOCOL != 1;

SQL> COMMIT;

3) If no error is reported, the execution is successful.


4) Restart the SNMPCollectorService, BulkCollectorService service.

5.2 Patch
The issue 3 has been resolved in NCE V100R021C10SPC202 and will not occur in subsequent NE
version upgrade and NCE version upgrade.

[Question 1] Why the PMS report loss data during the NCE upgrade MW?
[Reply]
NCE PMS collects real-time NE data every 5 or 15 minutes and records the data to the NCE
database. PMS reports are generated based on the collected data.
In the NCE upgrade operation MW, NCE services are stopped. In this case, the PMS service of NCE
is stopped. Therefore, NEs’ data cannot be collected periodically by NCE, and data generated
during this period will be lost in the PMS report.

[Question 2] For the PMS history report, how to restore the lost data?
[Reply]
Data required by NCE PMS reports is collected in real time. Therefore, lost data cannot be restored.

[Question 3] Which NCE operations will cause data loss in NCE PMS reports?
[Reply]
PMS collection-related services are stopped or restarted.
NCE DR switchover operation.

[Suggestion]
If customer wants to reduce the PMS data loss time next time, it is recommended that the
operation time window be divided into two times during the next NCE upgrade.
MW1 upgrades only the Secondary site, and then uses the Secondary site to manage and monitor
NEs mainly.
MW2 upgrades the Primary site.
In this case, data loss occurs only during the upgrade of the standby site (< 5 hours) and
active/standby switchover (< 15 minutes).

Issue 5-2

You might also like