Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Handling L2 Watchdog Resets on the FAS 25XX platforms

https://kb.netapp.com/onprem/ontap/hardware/Handling_L2_Watchdog_Resets_on_the_FAS_25XX_pl…
Updated: Wed, 03 May 2023 08:05:23 GMT

Applies to
• FAS 25XX systems

Issue
• Node reboots unexpectedly
• Node does not reboot after an unexpected shutdown

Service Processor logs on the impacted node show the following:

Record 801: Sun Mar 06 15:09:20.924775 2021 [IPMI Event.critical]: L2 watchdog


timeout hard reset
Record 802: Sun Mar 06 15:09:20.984259 2021 [Trap

'NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations
provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations
provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or 1
techniques herein is a customers responsibility and depends on the customers ability to evaluate and integrate them into the customers operational
environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this
document.'
Event.critical]: hwassist l2_watchdog_reset (29)
Record 803: Sun Mar 06 15:09:23.000822 2021 [SP.critical]: Filer Reboot

• If node reboots, the following error can be seen in the EMS log files

[cluster-01:mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a


watchdog reset.

• If node is unable to reboot, system senors from the SP may show senors unavailble (na) or faulted
(Fault)

Sensor Name | Current | Unit | Status | LCR | LNC


| UNC | UCR
-----------------+------------+------------+------------+-----------
+-----------+-----------+-----------
SYSTEM:
System_FW_Status | na | discrete | na | na | na
| na | na
System_Watchdog | 0x0 | discrete | | na | na
| na | na
Wrench_Port_Up | na | discrete | na | na | na
| na | na
CONTROLLER_A:
PCM_Status | 0x0 | discrete | Fault | na | na
| na | na
Attn_Sensor1 | 0x0 | discrete | Asserted | na | na
| na | na
CPU-1_DTS_Temp | na | degrees C | na | na | na
| -10.000 | 0.000
CPU-2_DTS_Temp | na | degrees C | na | na | na
| -10.000 | 0.000
CPU0_PVCCP | na | Volts | na | 1.580 | 1.670
| 1.920 | 2.010
CPU1_PVCCP | na | Volts | na | 1.580 | 1.670
| 1.920 | 2.010

© 2023 NetApp.No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical,
including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner. For more
information, see Legal Notices. 2
Cause
• Node reboots unexpectedly reboots due to a watch dog reset
• A watchdog is an independent timer that monitors the progress of the main controller running Data ONTAP.
Its function is to serve as an automatic server restart in the event the system encounters an unrecoverable
system error.

Solution
1. Collect the following SP Logs

system log
events all
sp status -d
system senors

2. Review logs for abnormalities for the timestamp around the L2 Watchdog reset event

Replacement Criteria

Condition Log Analysis Guidance

One L2 watchdog reset event


found Giveback and Monitor

Two or more L2 watchdog Resets


Node reboots Replace the motherboard (PCM)
seen in logs within one year
normally
One L2 watch Dog reset event Investigate to determine if the panic, NMI, or
with Watchdog panic NMI or other other event will require support to replace the
event PCM or another part

Node unresponsive Use the same log analysis as


Power cycle from SP, and attempt to boot. If
and able to collect above to determine if impacted
condition continues, Reseat PCM.
BMC logs node is safe to boot.

Node unresponsive
No logs can be collected from an Attempt a PCM reseat, PCM may need to be
and unable to collect
unresponsive BMC replaced.
BMC logs

3. Any further scenarios may require the review of additional logs to determine the what course of action to take,

© 2023 NetApp.No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical,
including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner. For more
information, see Legal Notices. 3
and may require a support case, please contact NetApp Technical Support or log into the NetApp Support Site
to create a case. Reference this article for further assistance.

Additional Information
• Handling watchdog resets (WDR)

© 2023 NetApp.No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical,
including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner. For more
information, see Legal Notices. 4

You might also like