Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Net2 Action Plan

For NetApp Authorized Service Engineers

Boot Device Replacement Action Plan


for AFF A150, ASA A150, AFF C190, AFF A220,
ASA AFF A220 FAS2720/50

Rev 024a >> 13 September 2023


Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF
A220, ASA AFF A220, FAS2720/50

Section Overview:

Section 1 : Appliance / USB Boot Device Visual Checks

Section 2 : Node Pre-Checks

Section 3 : Node State Check and Shutdown Procedure

Section 4 : Capture the Current System Configuration

Section 5 : Install the boot image on the USB Flash Drive

Section 6 : Remove the cables, Cable Management Tray and extract the Controller

Section 7 : Replace the Boot Device

Section 8 : Partially Reinsert the Controller and reconnect the cables

Section 9 : Fully Insert the Controller Module and Exit into LOADER

Section 10 : Select the Procedure For Transferring the System Files

Section 11 : Procedure A: Restore the var file system using system node restore-backup command

Section 12 : Procedure B: Restore the var file system using Update flash from backup config option

Section 13 : Verify UTA ports configuration

Section 14 : Perform 'giveback' if applicable and verify ONTAP version

Section 15 : Send Autosupport, Enable Options, Submit logs, Part Return

Rev 024a >> 13 September 2023


Page 3/19

README FIRST
1) This Action Plan applies to the following models:
• AFF A150 (supported in ONTAP version 9.12.1P1 and higher)
• ASA A150 (supported in ONTAP version 9.13.1 and higher)
• AFF C190 (supported in ONTAP version 9.6RC2 and higher)
• AFF A220 (supported in ONTAP version 9.4RC1 and higher)
• ASA AFF A220 (supported in ONTAP version 9.7RC1 and higher)
• FAS 2700 (supported in ONTAP version 9.4RC1 and higher)
Note: 8.3 and higher is C-Mode only.
1) The login name for C-Mode systems is "admin ", not "root ".
2) The ONTAP version and mode is listed in your dispatch!
3) C-Mode: Has two console command shells, clustershell and nodeshell . The default shell is clustershell .
IF clustershell, the console prompt includes a double colon ( :: ).
Ex(1): cluster::> Ex(2): cluster::storage>
4) To switch from clustershell to nodeshell , enter 'run local' at the ::> prompt, then the double colons (::) are
removed. To exit nodeshell , enter 'exit' or Ctrl-D.
5) From clustershell, nodeshell commands can be entered by prefacing the 7-Mode command with “run local".
Ex: cluster::> run local sysconfig -v Note, all 7-Mode commands are not supported in C-Mode.

2) Terms Used:

Impaired Node / Target: Term used to identify the controller that requires service.
Repaired Node: Term used to identify the repaired controller.
Healthy Node / Partner: Term used to identify the partner-controller of the Impaired Node / Repaired Node in
an HA config.

3) NEVER attempt to replace the Controller and the Boot Device at the same time!
If the dispatch instructs to replace BOTH the controller and the boot device as part of a single dispatch, do NOT
proceed - Contact  NGS.
Controller replacement and Boot Device replacement should be completed separately.

4) If you are not permitted console access, please confirm console output or other details with the console user
prior to performing the following actions:
1) The serial number in your dispatch matches the serial number on the impaired controller.
2) Shutdown of the impaired node.
3) Confirmation that the impaired node is ready for service (LOADER> prompt).

5) A Single Serial Console Cable is required.


Use of Y-cable / Double Dongle / Dual Serial Cable connections is only permissible after the motherboard has
been reinserted in the chassis.

6) You must have a USB flash drive, formatted as FAT32 to hold the image version of ONTAP (xxx_image.tgz). The
FAT32 partition needs to be 4GB in size, else it may fail. Open the linked doc for guidelines on creating 4GB
partition.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 4/19

Guidelines for creating 4GB partition

7) Upon receipt of ticket, before going onsite, you must download / copy the following items from support site to
your laptop.
(1) the exact version of Data ONTAP. Your dispatch notes should list the version of Data ONTAP to download. If
the version not listed, contact  support to determine the correct Data ONTAP version.
Note:
You must download and keep both the "ONTAP IMAGE WITH NETAPP VOLUME ENCRYPTION" and "ONTAP
IMAGE WITHOUT NETAPP VOLUME ENCRYPTION" in your laptop. The exact version to be used will be
determined by examining the 'version -v' command output that you capture when you reach the customer
location.

Open the linked doc below for detailed instructions.


Download ONTAP images and Service image

8) For the procedure in section 11 ("Restore the var file system using system node restore-backup
command") to work, the PARTNER has to be able to communicate to TARGET. If not, the procedure in section 12
("Restore the var file system using Update flash from backup config option") must be used.

9) The procedure in section 12 ("Restore the var file system using Update flash from backup config
option") will speed the process up a bit but should only be used within a maintenance window as there is an
outage for a few minutes. The customer can inform you if they are in a maintenance window and can sustain an
outage on the TARGET node.
Also, this procedure requires an additional reboot when restoring the var file system.

10) When you are onsite, speak with the customer and ask the following questions:
i) Is this system using NetApp Storage Encrypted (NSE) Disks?
ii) Is Onboard Key Manager (OKM) enabled in this system? If yes, inform the customer that the passphrase is
needed to set the Onboard Key Manager recovery secrets after replacing the controller later in this Action Plan.
If the passphrase cannot be provided after the motherboard is replaced, a giveback cannot be completed and
the controller will remain in the failover state.

11) For NSE enabled systems, confirm with customer that they have not power cycled the controller chassis or any
disk shelves with NSE drives. If power cycled, do NOT proceed - Contact  NGS.

12) Review applicable videos from the NetApp Hardware Learning Series>
https://netapp.hosted.panopto.com/Panopto/Pages/Sessions/List.aspx#folderID=379d4b9e-3a7f-41db-9b40-
adb30108fe02
List of available videos include:
1) Confirm Serial number, host name, impaired node status prior to removing hardware.
2) Install and Verify PuTTY Terminal Emulator Settings
3) Gathering and Setting UTA/CNA Port Configuration
and more....

13) Link to Statement of Volatility by Platform is:

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 5/19

http://support.netapp.com/info/web/ECMP1132988.html

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 6/19

Section 1 : Appliance / USB Boot Device Visual Checks


1) Visually verify you are working on the correct system.
See the System Front & Rear views and Figures showing System Config Options.
See the System Boot Device FRU picture.

Section 2 : Node Pre-Checks


1) Verify the "Order Reference 8xxxxxxxxx number on the RMA packing slip is the same as the Part Request (PREQ)
number listed in your dispatch notes.

2) Stop!
Do not remove the Boot Device from its bag until you are ready to use it.

3) Adhere to anti-static precautions. (A paper ESD strap is included inside the RMA box if you don't have your own.)

Section 3 : Node State Check and Shutdown Procedure


1) Always capture the node’s console output to a text file, even if using the end-user's computer.
Note:
For this model, serial terminal is operating at a baud rate of 115200.
Important!
A single serial console cable is required. Use of Y-cable / Double Dongle / Dual Serial Cable connections is only
permissible after the motherboard has been replaced or reinserted in the chassis.
Read Console Attach Job-Aid

2) Locate the IMPAIRED Node: May need to perform the following: (Open the linked doc for detailed instruction.)

i) Look at controller bezels that have ( ! ) Status LED ON.


ii) Examine the rear of unit - A controller with ( ! ) means the controller has some sort of FAULT.
iii) Perform a console check on each controller.

Warning for HA Config!


If the failure has caused a controller failover you may have been dispatched on the surviving controller's serial
number, not the impaired one. Perform a console check on each controller in the HA to determine the impaired
controller as detailed in the below linked doc. If the impaired node's serial number / hostname does not match
the dispatch note, do NOT proceed - contact  NGS. Failure to identify the IMPAIRED Node correctly would
result in a complete outage!
Linked doc: Detailed steps to determine the Impaired controller.

3) Using a sticker or masking tape or some other identifier, mark or label the controller module that requires
service as "IMPAIRED".
Note:
This will help to pinpoint the impaired controller module while physically removing it from the chassis later in
this Action Plan.
See Example here.
Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 7/19

4) Connect console cable to the Impaired node and login as admin. Engage end-user for password.
a) IF the Impaired (Target) Node is UP, connect console cable to the Impaired node and login as admin.||

b) IF HA-config and the Impaired (Target) Node is dead or not UP, connect to the Healthy (Partner) node and
login as admin. ||

c) IF Non-HA and the Impaired (Target) Node is not UP or cannot be booted due to an error, skip to step 10.

5) Send NetApp an ASUP Message from the Impaired/Healthy node to prevent unnecessary cases from being
created by ASUP messages during this scheduled maintenance window.

Enter the command:


system node autosupport invoke -node * -type all -message MAINT=<n>h
( <n> is the number of hours required for maintenance. The approximate hours required for this maintenance is 3
hours.)
See Example for sending ASUP

6) Enter: version to display the ONTAP version.

7) a) IF ONTAP version is 9.6 or later, go to step 8. ||

b) IF ONTAP version is 9.5 or lower: Determine if any Key Manager is enabled and restore the management
authentication keys if needed - follow the steps given in the linked doc below. Once done, go to step 9.

For ONTAP ver 9.5 and earlier: Determine env variables set for encryption enabled system / Determine if any Key
Manager is enabled

8) For ONTAP ver 9.6 and later ONLY :


Determine if any Key Manager is enabled and restore the management authentication keys if needed - follow
the steps given in the linked doc below.
For ONTAP ver 9.6 and later: Determine if any Key Manager is enabled and restore the management
authentication keys if needed

9) Enter: version -v from the clustershell to display the ONTAP version running and verify if NVE (NetApp
Volume Encryption) is supported.

Important Note:
The version -v command output helps to determine whether your cluster version supports NVE. You will
need this info when you download the ONTAP version image from the support site.

Check if the text <1no-DARE> (for “no Data At Rest Encryption”) is displayed in the version -v output.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 8/19

Example:
cluster::> version -v
NetApp Release 9.1.0: Tue May 10 19:30:23 UTC 2016 <1no-DARE>

If the text “1no-DARE” is displayed, it indicates that NVE is not supported on your cluster version.
If the text “1no-DARE” is not displayed, it indicates that NVE is supported on your cluster version.
Determined whether or not NVE supported? If done, go to step 11.

10) If Non-HA and the Impaired (Target) Node is not UP or cannot be booted due to an error, Engage  NGS to
obtain ONTAP version and verify if NVE (NetApp Volume Encryption) is supported.
Note:
(NGS uses the SOFTWARE-IMAGE.XML section in the Autosupport data to find the required info.)
Determined whether or not NVE supported? If done, go to Section 4.

11) a) IF HA config, move console cable to the Healthy Node if not already connected and login as admin. Once
done, go to next step.||

b) IF non-HA (single node), skip to step 20.

12) Enter: storage failover show and identify the different HA-Pairs in the cluster.
See storage failover show example

13) Enter: set -priv advanced to set the privilege level to "advanced".

Enter: y to the question "Do you want to continue? ".


(This is a required step to see the 'Epsilon' column in the next command output.)

14) Enter: cluster show to display information about the nodes in the cluster.
Note:
A typical 'cluster with only 1 HA-Pair (2-nodes)' uses 'cluster-HA’ configuration and in that case Epsilon will show
as "false" for both nodes.
See cluster show example

15) Review the "cluster show" output from the previous step. Check the "Health" and "Eligibility" values for all nodes
other than Impaired node in all HA groups.
a) IF all the nodes show “true”, go to next step.||

b) IF any of the nodes shows “false”, contact  NGS to correct the problem in the node showing false
before moving into next step.
See cluster show example

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 9/19

16) Review the "cluster show" output from the previous step. Check the Epsilon value for Impaired node to be
serviced.
Note:
A typical 'cluster with only 1 HA-Pair (2-nodes)' uses 'cluster-HA’ configuration and in that case Epsilon will show
as "false" for both nodes.
a) IF the Epsilon value for Impaired node to be serviced is "true", go to step 17.||

b) IF the Epsilon value for Impaired node to be serviced is "false", go to step 18.
See cluster show example

17) Follow steps (a-c) to reassign epsilon to a healthy node.


a) Enter: cluster modify -node <Target_node> -epsilon false to remove Epsilon from the target.
b) Enter: cluster modify -node <Different_node_in_different_HA_Pair> -epsilon true to
assign Epsilon to a node in a different HA-Pair.
c) Enter: cluster show to confirm that Epsilon has been re-assigned properly.

See sample log here

18) Enter: set -priv admin (This will return the privilege level to "admin".)

19) Follow the below steps:

a) Enter the below commands from the Healthy Node to check the value of "auto-giveback" and "auto-giveback-
after-panic" options.
sto fa show -node local -fields auto-giveback
sto fa show -node local -fields auto-giveback-after-panic

b) If any option(s) shows "true", set it to "false" using the below commands.
sto fa modify -node local -auto-giveback false
sto fa modify -node local -auto-giveback-after-panic false

See detailed steps here

20) Move the console cable to the Impaired system if not already connected and verify the console response.
a) IF the console response is the LOADER prompt, skip to next section. ||

b) IF the console response is "Waiting for giveback", skip to step 22. ||

c) IF the console response is "Login" or "ONTAP prompt", go to step 21.

21) a) IF HA-Pair and both nodes are up, Engage end-user to “takeover” the Impaired node from the PARTNER. If
end-user is not available, proceed with manual Takeover steps as mentioned in the linked doc below. Once the
"takeover" is performed, confirm the storage "takeover" status as mentioned in the linked doc below. If the
"takeover" is incomplete, contact  NGS. Once the "takeover" is complete without any error, go to next step.||

b) IF non-HA and node is up, engage end-user to "halt" the Impaired node. Once the "halt" is complete without
any error, go to next step.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 10/19

Review steps for performing manual Takeover


Confirm the storage "takeover" status

22) Move console cable to the Impaired node if not already connected.
a) IF the console response is LOADER-A|B>, skip to next Section. ||

b) IF the console response is: "Waiting for giveback…...", follow the below steps:
i) At the "Waiting for giveback ……" prompt, Enter: Ctrl-C
ii) At the message: "Do you wish to halt this node rather than wait [y/n]? " Enter: y
iii) After the system drops to the LOADER-A|B> prompt, go to next Section.
See sample log here

Section 4 : Capture the Current System Configuration


1) Move console cable to the Impaired Node if not already connected.
Note:
In this section, the output of printenv (displays all boot environmental variables) from the LOADER and
ucadmin show (UTA port configuration info) from the Maintenance mode will be captured.
a) IF the controller cannot be booted due to an error, engage  NGS for assistance to obtain the list of boot
environmental variables (printenv command output) and UTA port configuration info (ucadmin show
command output) required for configuring the system later in this Action Plan. Once done, go to Section 5. ||

b) In other cases, go to step 2 to capture the printenv and ucadmin show commands output.

2) From the LOADER prompt, enter: printenv (This command displays all boot environmental variables.)

3) Enter: boot_ontap maint to boot into Maintenance mode.

(Enter: y to the question “Continue with boot?”.)


Once boot into the Maintenance mode, the console prompt changes to *>.

4) At the *> prompt, enter: ucadmin show and capture the UTA port configuration information.
See sample log here.

5) At the *> prompt enter: halt


(After prom initialization the console will display the LOADER prompt.)

Section 5 : Install the boot image on the USB Flash Drive


1) Stop!
You must have a USB flash drive, formatted as FAT32 to hold the image version of ONTAP (xxx_image.tgz). The
FAT32 partition needs to be 4GB in size, else it may fail.
For guidelines on creating 4GB partition, click on the linked doc below.
*Skip this step if you are already having a FAT32 partitioned 4GB size USB flash drive.
Create 4GB Partition

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 11/19

2) a) IF you have already the appropriate ONTAP images in your laptop, copy the exact image version - ONTAP
IMAGE WITH NETAPP VOLUME ENCRYPTION or ONTAP IMAGE WITHOUT NETAPP VOLUME ENCRYPTION - to the
USB flash drive now. Review 'version -v' command output captured in section 3 to determine whether
your cluster version supports NVE (NetApp Volume Encryption). For more info, open the linked doc below.
Once done, go to Section 6. ||

b) In other cases, go to step 3 and follow the instruction to download the ONTAP version from the Support Site.

Determine the appropriate boot image to the USB Flash Drive

3) Download the exact version of Data ONTAP from support site to the USB flash drive. Selection of ONTAP image
to be downloaded depends on whether your cluster version supports NVE or not. Follow the detailed steps in
the linked doc below to install the boot image on the USB flash drive.
Note:
Engage end-user if required.
Install the boot image on the USB Flash Drive

Section 6 : Remove the cables, Cable Management Tray and extract the Controller
1) Stop!
Do NOT turn off the power supplies because disks are spinning in the chassis.

2) Stop!
Before proceeding to next step, pinpoint the Impaired (Target) controller by verifying the "IMPAIRED"
label/mark on the controller. The console cable should currently be attached to the Impaired controller and its
console should show the 'LOADER' prompt.
Do NOT disturb the Healthy (Partner) node!

3) On the Impaired (Target) Node, squeeze the latch on the cam handle until it releases, and slide the controller
module towards you halfway out of the chassis.

4) Label each cable connector with its port number and then unplug the cabling from the connector.
Warning!
Be sure to label each cable with the name and number of the port to which it was attached so that you can
reconnect them correctly later in this Action Plan.

5) Remove the Cable Management Arm if installed and remove the Impaired controller from the system. For
detailed steps, open the linked doc below.
Note:
If possible keep the cables on the cable management arm to keep them in the correct position for reconnection.

Linked doc: Cable Mgmt Arm and Controller Removal.

Section 7 : Replace the Boot Device


1) Replace the Boot Device - Open the linked doc below for detailed instructions.
Detailed steps for replacing the Boot Device

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 12/19

Section 8 : Partially Reinsert the Controller and reconnect the cables


1) Partially insert the controller into the slot so that the cables can be attached.
DO NOT engage the backplane yet!

2) Re-attach the Cable Management Arm if removed.

3) Insert the USB flash drive that contains the ONTAP image into the USB slot on the controller module.
Caution!
Make sure that you install the USB flash drive in the slot labeled for USB devices, and not in the USB console
(IOIOI) port.
See the USB port location on the controller

4) Cables: Fully insert each cable that was removed to its proper port until it clicks in. Make sure the SFPs are
reinstalled if they were removed from the controller module.
Test the cable connection by pulling on them. Especially the FC and SAS ports!
Warning!
Failure to reinsert all cables into their correct ports may cause network and/or storage outage after the giveback
later in this Action Plan.

Section 9 : Fully Insert the Controller Module and Exit into LOADER
1) Re-attach laptop to the console port if not already connected and capture the display output even if using the
end user's computer.

2) Fully Insert the Controller Module into the slot and and latch the cam lever closed. Reconnect the power cords
and turn both Power Supplies ON if necessary.
(The module starts the power on boot process.)

3) IMMEDIATELY after the console message "Starting AUTOBOOT press Ctrl-C to abort…" is
displayed, press Ctrl-C (^C) key a couple times to abort the autoboot.
Linked doc - Abort the autoboot and exit into LOADER prompt.

Section 10 : Select the Procedure For Transferring the System Files


1) Stop!
There are two different procedures for transferring the system files:
Procedure A: Restore the var file system using system node restore-backup command
Procedure B: Restore the var file system using Update flash from backup config option

The procedure to be followed is based on system configuration and status.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 13/19

a) IF this is an "HA" system and the TARGET node was successfully taken over by its PARTNER, follow
Procedure A - go to section 11. ||

b) IF the PARTNER did not takeover or the PARTNER node cannot communicate with the TARGET node or this
is a maintenance window, follow Procedure B - go to section 12. (A maintenance window and service outage
may be required. Call  NGS if you have questions.)

Section 11 : Procedure A: Restore the var file system using system node restore-backup command

1) Warning!
This procedure <Procedure A: Restore the var file system using system node restore-backup command> can be
used only if the Impaired node was successfully taken over by its PARTNER.

2) Stop!
Inform the end-user the following:
For HA system, end-user is to provide the e0M port IP address, the netmask for the network and a gateway IP if
one is needed to configure the Impaired controller e0M port.
Configure the local interface "e0M" on the Impaired controller - follow detailed steps in the linked doc below.
Configure the local interface "e0M"

3) At the LOADER prompt, enter: boot_recovery <xxx_image.tgz> to boot the recovery image and follow
the steps in the linked doc below.
Procedure A : Restore the var file system using system node restore-backup command
a) IF Procedure A was success, go to section 13.||

b) IF Procedure A was not success, go to section 12

Section 12 : Procedure B: Restore the var file system using Update flash from backup config option

1) Warning!
This procedure (Procedure B: Restore the var file system using Update flash from backup config option) should
only be used if the PARTNER did not takeover the TARGET / the PARTNER node cannot communicate with the
TARGET node / this is a maintenance window.

2) Validate the boot env variables:


Confirm that all required boot PROM variables are properly set based on your system type and configuration. If
any of the variables are missing or values are not set correctly, set them now one by one. Follow the steps
detailed in the linked doc below.
Linked doc - Validate the boot env variables

3) At the LOADER prompt, enter: boot_recovery <xxx_image.tgz> to boot the recovery image and follow
the steps in the linked doc below.
Procedure B : Restore the var file system using Update flash from backup config option

Section 13 : Verify UTA ports configuration

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 14/19

1) a) IF the console response is LOADER-A|B>, skip to next step. ||

b) IF the console response is: "Waiting for giveback…...", follow the below steps:
i) At the "Waiting for giveback ……" prompt, Enter: Ctrl-C
ii) At the message: "Do you wish to halt this node rather than wait [y/n]? " Enter: y
iii) After the system drops to the LOADER-A|B> prompt, go to next step.
See sample log here

2) At the LOADER prompt, enter: boot_ontap maint to boot into Maintenance mode.
(Enter: y to the question “Continue with boot?”.)
Once boot into the Maintenance mode, the console prompt changes to *>.

3) Verify the UTA2(CNA) port configuration: For detailed process steps, open the linked doc below.

a) At the *> prompt, enter: ucadmin show and check how the Host Adapter ports are currently configured
after the boot device replacement.

b) Compare the "Mode" and "Type" of FC Host Adapter ports with the configuration of Host Adapter ports
(found in the console log output) captured in section 4 - they must match exactly. If any mismatch is detected,
enter the command: ucadmin modify -m <fc | cna> -t <initiator | target> -f
<adapter_name> to change the configuration as needed.

c) Enter: ucadmin show again to confirm the changed ports are displaying under Pending Mode and/or
Pending Type.
Verify UTA Ports configuration after boot device replacement

4) At the *> prompt, enter: halt


(After prom initialization the console will display the LOADER prompt)

Section 14 : Perform 'giveback' if applicable and verify ONTAP version


1) At the LOADER-A|B prompt,enter: boot_ontap to boot ONTAP.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 15/19

Stop!
If the system shows either of the below error messages while booting (applicable for ONTAP 9.8 or later
versions, with the Trusted Platform Module (TPM) license installed and the encryption keys for the onboard key
manager (OKM) configured) .
....[....:crypto.ssal.failed:ALERT]: SSAL operation failed: SSAL Unseal operation failed.
....[....:crypto.okmrecovery.failed:ALERT]: ERROR: Import of the onboard key hierarchy failed:
failed to import key hierarchy. Additional information: error: ssal unseal failed.
contact  NGS and
(i) reference Onboard Key Manager Trusted Platform Module (OKM TPM).
(ii) provide output of below command, if any,
event log show -message-name gb.sfo.veto.kmgr.keysmissing
(To be run from Healthy_Node/clustershell)
Sample command output:
2/20/2019 03:41:53 wfit-8060-151-23 ERROR gb.sfo.veto.kmgr.keysmissing: Giveback of aggregate <aggr-
name> failed due to unavailability of volume encryption keys for the encrypted volumes of the aggregate on the
partner node <node-name>.

Stop!
If the "Entering FM state:5 " and "WARNING: 0 disks found" issue is encountered, it should
be fixed. For details, refer the linked doc below. Once the workaround is performed, boot the controller by
entering boot_ontap from the LOADER.

Entering FM state:5 " and "WARNING: 0 disks found

2) After the console stops printing messages, hit <enter>.


a) IF the system booted up to a "login" prompt, login as Admin and enter: version -v and verify the
version displayed is the same as the ONTAP version captured in section 3. Once done, skip to Section 15.
Issues? Call  NGS. ||

b) IF the system booted up to a "Waiting for giveback" prompt (press the <enter> key), the node was
part of an HA configuration and was taken over by its partner. Continue with step 3.
See sample log for 'login' prompt and 'waiting for giveback' prompt.

3) Login into the Healthy (Partner) Node. Engage end-user for password.

4) To confirm the repaired node is ready for a "giveback", enter: storage failover show
(May have to wait a couple minutes for the NVMEMs to synchronize.)
See storage failover show example.

5) a) IF the procedure in section 12 'Procedure B (Restore the var file system using Update flash from
backup config option)' was used, go to step 6. ||

b) IF the procedure in section 11 'Procedure A (Restore the var file system using system node restore-backup
command)' was used, go to step 9.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 16/19

6) You MUST perform this step for a 'complete giveback':


From the Healthy (Partner) node, enter the controller giveback command:
storage failover giveback -fromnode local
If the Giveback fails,refer this doc.
Stop!
If the system shows either of the below error messages while booting (applicable for ONTAP 9.8 or later
versions, with the Trusted Platform Module (TPM) license installed and the encryption keys for the onboard key
manager (OKM) configured) .
....[....:crypto.ssal.failed:ALERT]: SSAL operation failed: SSAL Unseal operation failed.
....[....:crypto.okmrecovery.failed:ALERT]: ERROR: Import of the onboard key hierarchy failed: failed to import key
hierarchy. Additional information: error: ssal unseal failed.
contact  NGS and
(i) reference Onboard Key Manager Trusted Platform Module (OKM TPM).
(ii) provide output of below command, if any,
event log show -message-name gb.sfo.veto.kmgr.keysmissing
(To be run from Healthy_Node/clustershell)
Sample command output:
2/20/2019 03:41:53 wfit-8060-151-23 ERROR gb.sfo.veto.kmgr.keysmissing: Giveback of aggregate <aggr-
name> failed due to unavailability of volume encryption keys for the encrypted volumes of the aggregate on the
partner node <node-name>.

7) Wait! 3 minutes after giveback reported complete. Then check controller failover status by entering the
command:
storage failover show
Confirm "Giveback" status of the storage
If the "giveback" is incomplete, wait 2 minutes and re-check. If still not complete after 20 minutes, contact 
NGS.
Stop!
Do not proceed to next step if 'incomplete or partial giveback'!

8) Move console cable to the TARGET node if not already connected and observe the console logs. The TARGET
node takes back its storage, completes booting, and then reboots and is again taken over by the PARTNER node.

Wait for the "Waiting for Giveback.." message text to be displayed again.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 17/19

Example:
******************************************************************************
Note (Additional reboot when restoring the var file system):
The TARGET node takes back its storage, completes booting up into the login prompt, and then reboots and is
again taken over by the PARTNER node.
. ..........
...........
Terminated

varfs_backup_restore: bootarg.abandon_varfs is set! Skipping /var backup. <===


Uptime: 14m34s
HALT: HA partner has taken over (ic) on Tue Nov 21 21:39:54 GMT 2017 <===

ugen0.2: <vendor 0x8087> at usbus0 (disconnected)


System rebooting... <===
Warning! Do NOT Press Ctrl-C during the boot process!
System will be rebooted up to " Waiting for giveback.... " if HA or " login " prompt if non-HA.
...........
...........

Waiting for giveback...(Press Ctrl-C to abort wait)


Waiting for giveback...(Press Ctrl-C to abort wait)
******************************************************************************
Stop!
If the system shows either of the below error messages while booting (applicable for ONTAP 9.8 or later
versions, with the Trusted Platform Module (TPM) license installed and the encryption keys for the onboard key
manager (OKM) configured) .
....[....:crypto.ssal.failed:ALERT]: SSAL operation failed: SSAL Unseal operation failed.
....[....:crypto.okmrecovery.failed:ALERT]: ERROR: Import of the onboard key hierarchy failed: failed to import
key hierarchy. Additional information: error: ssal unseal failed.
contact  NGS and
(i) reference Onboard Key Manager Trusted Platform Module (OKM TPM).
(ii) provide output of below command, if any,
event log show -message-name gb.sfo.veto.kmgr.keysmissing
(To be run from Healthy_Node/clustershell)
Sample command output:
2/20/2019 03:41:53 wfit-8060-151-23 ERROR gb.sfo.veto.kmgr.keysmissing: Giveback of aggregate <aggr-
name> failed due to unavailability of volume encryption keys for the encrypted volumes of the aggregate on the
partner node <node-name>.

9) a) IF Encryption enabled system and Onboard Key Management is set, follow the steps in the linked doc below.
||

b) In other cases, skip to step 10.


Post Boot Device Replacement Steps if Onboard Key Mgmt is set

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 18/19

Note:
1) To verify the type of Key Manager set, refer the security key-manager key query or security
key-manager query command output captured in sec 3.
2) If External Key Mangement set, additional steps will be performed later in this section.

10) From the Healthy (Partner) node, enter the controller giveback command:
storage failover giveback -fromnode local
If the Giveback fails,refer this doc.

11) Wait! 3 minutes after giveback reported complete. Then check controller failover status by entering the
command:
storage failover show
If the "giveback" is incomplete, wait 2 minutes and re-check. If still not complete after 20 minutes, contact 
NGS.
Stop!
Do not proceed to next step if 'incomplete or partial giveback'!

12) At the clustershell prompt, enter: net int show -is-home false to list the logical interfaces that are
not on their home node and port.
If any interfaces are listed as "false" in the above command, engage customer to revert those interfaces back to
their home port using the net int revert command.
See net int show, net int revert command examples

13) Move console cable to the Repaired (Target) node if not already connected and enter: version -v to display
the ONTAP version running.
Verify the version displayed is the same as the ONTAP version captured in section 3.
Issues? Call  NGS.

14) To check whether or not NVE (NetApp Volume Encryption) is configured for any volumes in the cluster, enter:
volume show -is-encrypted true
If any volumes are listed in the output, NVE is configured.
(You can skip this step if you have already checked if NVE is configured for any volumes in the cluster.)
Sample log here.

15) a) IF Encryption enabled system and External Key Management is set, follow the steps in the linked doc below.
Select the linked doc based on your ONTAP version.||

b) In other cases, go to next section.


For ONTAP ver 9.5 and earlier: Post Boot Device Replacement Steps if External Key Mgmt is set
For ONTAP ver 9.6 and later: Post Boot Device Replacement Steps if External Key Mgmt is set

Section 15 : Send Autosupport, Enable Options, Submit logs, Part Return


1) Re-enable "auto-giveback" options if they were disabled on the Healthy (Partner) node.
Detailed steps to re-enable the auto-giveback

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a
Page 19/19

2) Ask customer if using Operations Manager or OnCommand System Manager? If so, can they still access the
controllers? If not, open the link below to see the bug details.
Open the link to see the Bug 583160 details

3) Ask customer if this is a SAN Storage System? If yes, are the mapped LUNs still accessible from the hosts?

4) Request end-user to send NetApp an ASUP Message from the Repaired Node so the configuration setup can be
verified by NGS. If the Repaired Node is not UP, send ASUP from its Partner.
Enter the command:
system node autosupport invoke -node * -type all -message MAINT=END
(The text string "MAINT=END" in the command will end the maintenance window and resume automatic case
creation immediately.)

5) You must complete the Post-Event-Feedback (PEF) form - Include any dispatch, AP issues and incomplete tasks.
(The PEF Form is under "Field Service Surveys" on the Net2 home Screen.)
Also, attach the console log to the PEF form that needs to be filled out.
Note:
PEF is a batch form and has no linkage to our case management tool. You must call NetApp Support, by phone,
for all service delivery issues related to this dispatch. You should call NetApp Support to open a new case
regarding actionable items at the customer site.

6) The defective part is not returnable; dispose it according to country standards.

7) Verify with customer that the system is OK and if working with NGS ask them if it is OK to be released.

8) Close dispatch per Rules of Engagement.

Net2 AP: >> Boot Device Replacement Action Plan for AFF A150, ASA A150, AFF C190, AFF A220, ASA AFF A220, FAS2720/50 >> Rev 024a

You might also like