Professional Documents
Culture Documents
HGST Active Archive Fru Replacement Guide
HGST Active Archive Fru Replacement Guide
HGST Active Archive Fru Replacement Guide
Copyright
Notice
Publication Information
One MB is equal to one million bytes, one GB is equal to one billion bytes, one TB equals
1,000GB (one trillion bytes) and one PB equals 1,000TB when referring to storage capacity.
Usable capacity will vary from the raw capacity due to object storage methodologies and
other factors.
The following paragraph does not apply to any jurisdiction where such provisions are
inconsistent with local law: THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
A PARTICULAR PURPOSE.
This publication could include technical inaccuracies or typographical errors. Changes
are periodically made to the information herein; these changes will be incorporated in
new editions of the publication. There may be improvements or changes in any products
or programs described in this publication at any time. It is possible that this publication
may contain reference to, or information about, HGST products (machines and programs),
programming, or services that are not announced in your country. Such references or
information must not be construed to mean that Western Digital Corporation or its affiliates
intends to announce such HGST products, programming, or services in your country.
Technical information about this product is available by contacting your local HGST
product representative or on the Internet at: support.hgst.com.
Western Digital Corporation or its affiliates may have patents or pending patent
applications covering subject matter in this document. The furnishing of this document does
not give you any license to these patents.
Long Live Data, EasiScale, EasiScale, and the HGST logo are registered trademarks or
trademarks of Western Digital Corporation or its affiliates or its affiliates in the U.S. and/
or other countries. Amazon S3, Amazon Simple Storage Services, and Amazon AWS S3 are
trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
Other trademarks are the property of their respective owners. References in this publication
to HGST-branded products, programs, or services do not imply that they will be made
available in all countries. Product specifications provided are sample specifications and
do not constitute a warranty. Actual specifications for unique part numbers may vary.
Please visit the Support section of our website, www.hgst.com/support/systems-support, for
additional information on product specifications. Photographs may show design models.
References in this publication to HGST-branded products, programs or services do not
imply that they are to be available in all countries in which HGST operates.
2
FRU Replacement Guide Preface
Preface
Notice
Topics:
• Document Conventions
• Related Documents
• Points of Contact
Document Conventions
Typography
Element Sample Notation
Linux commands or user input rm -rf /tmp
Linux system output Installation successful!
Commands longer than one line are split with "\" s3cmd \
--dump-config
Storage Notations
Convention Prefix Examples Usage
xB (base 10 notation) SI prefix: (kilo, mega, giga, 1GB = 1 gigabyte = Disk sizes
tera, peta, exa, zetta, yotta) 1,000,000,000 bytes
1TB = 1 terabyte =
1,000,000,000,000 bytes
xiB (base 8 notation) Binary prefixes (kibi, mebi, 1GiB = 1 gibibyte = Storage space, and sizes of
gibi, tebi, pebi, exbi, zebi, 1,073,741,824 bytes partitions or file systems
yobi)
1TiB = 1 tebibyte =
1,099,511,627,776 bytes
• This document uses a comma (",") for digit grouping; for example, 1,000 is one thousand.
• This document uses a period (".") as a decimal mark; for example, 12.5 %.
Admonitions
Type Usage
Indicates extra information that has no specific hazardous
Note: or damaging consequences.
3
FRU Replacement Guide Preface
Type Usage
Indicates a faster or more efficient way to do something.
Tip:
Related Documents
Title Description
ActiveScale CM User Guide
™
Usage instructions for ActiveScale™ Cloud Management
(CM)
Points of Contact
Contact HGST Support with your rack serial number or deployment ID.
Email www.hgst.com/support/systems-support
Phone 1-844-717-7766 or 1-408-717-7766
Website support.hgst.com
End of document.
4
FRU Replacement Guide About This Document
Topics: This guide provides instructions for replacing hardware components of the HGST Active
Archive System.
• Weight
Weight
Rack:
The following table displays the weight of the Active Archive System:
Note: The weight mentioned previous is the total unpacked weight after delivery.
5
FRU Replacement Guide Contents
Contents
List of Figures..................................................................................................................................................... 8
List of Tables.....................................................................................................................................................12
6
FRU Replacement Guide Contents
7
FRU Replacement Guide List of Figures
List of Figures
Figure 7: The New Chassis Appears Under the FAILED List in the CMC..................................................... 21
Figure 17: Controller Node, Back, with PSU Status LEDs Highlighted..........................................................38
Figure 26: The New Chassis Appears Under the FAILED List in the CMC................................................... 47
8
FRU Replacement Guide List of Figures
Figure 28: The Node with a Rebooted Chassis as Seen on the CMC.............................................................. 52
Figure 34: Storage Node, Back, with PSU Status LEDs Highlighted.............................................................. 59
9
FRU Replacement Guide List of Figures
10
FRU Replacement Guide List of Figures
11
FRU Replacement Guide List of Tables
List of Tables
Table 4: Work Table with Sample MAC Addresses and Serial Bus Paths...................................................... 20
Table 5: Work Table with Sample Ethernet Port Names and NIC Array IDs..................................................23
Table 7: Work Table with Sample MAC Addresses and Serial Bus Paths...................................................... 46
Table 8: Work Table with Sample Ethernet Port Names and NIC Array IDs..................................................49
Table 10: Utilization of Old Disks When New Disks Are Added................................................................. 116
Table 13: Work Table for Storage Enclosure Basic Capacity Upgrades........................................................119
12
FRU Replacement Guide 1 Controller Node Replaceable Units
Topics: This section provides replacement procedures for the following parts in a Controller Node:
• Chassis
• Warnings
• HDD
• Chassis Replacement
• SSD
Procedure
• PSU
• Hard Disk Drive
• SFP+ DAC Cable
Replacement Procedure
• Solid State Disk
Replacement Procedure
• Power Supply Unit
Replacement Procedure
• SFP+ DAC Cable
Replacement Procedure
1.1 Warnings
Caution: Opening or removing the system cover when the system is powered on may expose you to a
risk of electric shock.
When replacing items from the inside of the chassis, ensure that you take precautions to prevent
electrostatic discharge (ESD).
13
FRU Replacement Guide 1 Controller Node Replaceable Units
14
FRU Replacement Guide 1 Controller Node Replaceable Units
A work table is provided at the end of this section for your convenience, to store all of the information needed for a
chassis replacement.
To replace a Controller Node chassis, proceed as follows:
1.
If the failed node is the Management Node, fail over the Management Node to another Controller Node.
a) Open an SSH session to any Controller Node.
You must obtain the IP addresses of the Controller Node ahead of time.
b) Use the following command to determine the virtual IP address of the Management Node.
grep dmachine.amplistor.com /etc/hosts | grep -v 127.0.0.1 | awk '{print $1}'
The output of this command is the virtual IP address of the Management Node. For example,
172.16.63.154
c) Open an SSH session to the Management Node using the virtual IP address obtained in the previous substep.
d) Exit the OSMI menu.
The Linux prompt appears.
e) Copy or write down the hostname in the Linux prompt.
f) Log into the CMC.
g) Navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes, and select the failed
Controller Node.
h) Compare the hostname of the failed node, as displayed in the CMC, to the hostname you saved from substep e.
i) If the failed node is the Management Node, fail over the Management Node to another Controller Node.
For instructions on how to fail over the Management Node, see Managing Hardware in the HGST Active
Archive System Administration Guide.
2.
Note: Save the node's hostname in your worktable under Original Hostname of Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
15
FRU Replacement Guide 1 Controller Node Replaceable Units
Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
5.
Important: Pull very gently on the pull tabs of the SFP+ cables, otherwise they might break.
Note: Check that the cables are labeled correctly, so that you can put them back in the same
order.
16
FRU Replacement Guide 1 Controller Node Replaceable Units
v. M2
Figure 3: Controller Node, Back
Caution: A Controller Node chassis weighs about 50lbs. Ensure that you have sufficient
manpower to handle it safely.
Warning: Once you pull the chassis past the pull-safety, do not leave it hanging in the rack.
Otherwise, the rack rails may be damaged permanently.
6.
Move the two HDDs and the four SSDs from the failed chassis to the exact corresponding slots in the new
chassis.
Tip: Write down the disk serial number and slot location so that you can double-check that each
disk is seated in the correct slot post installation into the new chassis.
a) Remove each disk from its slot in the front bay of the failed chassis.
b) Install the disk into the corresponding slot in the new chassis.
Figure 4: Controller Node, Front
17
FRU Replacement Guide 1 Controller Node Replaceable Units
7.
Get the IP address and machine name (hostname) of the new chassis.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Unmanaged Devices >
Uninitialized.
The new chassis appears in the list of uninitialized devices. This indicates that it has started successfully.
b) Write the value of Name into your work table, under Temporary Hostname of Node.
c) Write the value of Name without the PM- prefix into your work table, under MAC Address of Node.
18
FRU Replacement Guide 1 Controller Node Replaceable Units
d) Write the IP address into your work table, under Temporary IP Address of Node.
Figure 6: Uninitialized Nodes
10.
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth0/address
00:25:90:fd:e8:7c
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/net/eth2/
address
00:25:90:fd:e8:7d
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.2/net/eth3/
address
00:25:90:fd:e8:7e
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/net/eth5/
address
00:25:90:fd:e8:7f
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.0/net/eth1/
address
90:e2:ba:7c:5a:fc
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.1/net/eth4/
address
90:e2:ba:7c:5a:fd
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0/net/eth6/
address
90:e2:ba:7c:5d:a4
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1/net/eth7/
address
90:e2:ba:7c:5d:a5
root@nfsROOT:~#
The output of this command shows the serial bus path (for example, 0000:81:00.1) and the new MAC
address (for example, 90:e2:ba:7c:5a:fc).
19
FRU Replacement Guide 1 Controller Node Replaceable Units
Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done
c) Fill in the serial bus path in ascending order in the Serial Bus Path column of the work table.
d) Fill in the MAC address corresponding to the serial bus path in ascending order in the MAC Address on the
New Chassis column of the work table.
For the sample output from the step above, the work table would look like this:
Table 4: Work Table with Sample MAC Addresses and Serial Bus Paths
Serial Bus Path MAC Address on the New Ethernet Port Name
Chassis
0000:01:00.0 00:25:90:fd:e8:7c eth0
0000:01:00.1 00:25:90:fd:e8:7d eth1
0000:01:00.2 00:25:90:fd:e8:7e eth2
0000:01:00.3 00:25:90:fd:e8:7f eth3
0000:81:00.0 90:e2:ba:7c:5a:fc eth4
0000:81:00.1 90:e2:ba:7c:5a:fd eth5
0000:82:00.0 90:e2:ba:7c:5d:a4 eth6
0000:82:00.1 90:e2:ba:7c:5d:a5 eth7
e) Close the SSH session to the Controller Node.
You are now back in the SSH session to the Management Node.
11.
Get the machine GUID and device GUID of the new chassis.
a) On the Management Node, start the Q-Shell:
/opt/qbase3/qshell
c) Retrieve the machine GUID for the new chassis, using the value of Temporary Hostname of Node in
uppercase, from the work table, for hostname_of_new_node in the command below:
machine_guid = cloudapi.machine.find(name='hostname_of_new_node')['result'][0]
For example,
machine_guid = cloudapi.machine.find(name='PM-90:E2:BA:7E:B8:31')['result'][0]
d) Retrieve the device GUID using the machine GUID you obtained from the previous step.
dg = cloudapi.machine.list(machineguid=machine_guid)['result'][0]['deviceguid']
20
FRU Replacement Guide 1 Controller Node Replaceable Units
For example,
dg
'd951f6d9-7104-470d-8c97-ecf52d57c7b5'
12.
Mark the new chassis as FAILED in the Active Archive System database, and clean up references to it.
The Active Archive System created a new INSTOCK node in its database for the new chassis. If you do not mark
the new chassis as FAILED in the database, you are in effect adding a new node rather than replacing an existing
node's chassis. Therefore, you must remove the INSTOCK node by following the steps below.
a) Mark the new chassis as FAILED in the Active Archive System database:
Execute this command on the Management Node:
cloudapi.device.updateModelProperties(dg, \
status=str(q.enumerators.devicestatustype.FAILED))
The new chassis now appears under the FAILED list in the CMC, and is removed from the Unmanaged
Devices list.
Figure 7: The New Chassis Appears Under the FAILED List in the CMC
b) From the Management Node, clean up references to the new chassis in the Active Archive System database.
In the command below, replace MAC_ADDRESS with the value you wrote in the work table for MAC Address
of Node.
q.amplistor.cleanupMachine('MAC_ADDRESS')
For example,
In [14]: q.amplistor.cleanupMachine('90:E2:BA:7E:B8:31')
Out[14]: True
21
FRU Replacement Guide 1 Controller Node Replaceable Units
Refresh the screen by clicking Refresh in the Commands pane. Check that the new chassis is no longer in the
FAILED list.
13.
Update the Active Archive System database with the MAC addresses for the new chassis.
a) From the Management Node, create a cloudAPI connection.
cloudapi = i.config.cloudApiConnection.find('main')
b) From the Management Node, get the machine GUID using your work table value for Original Hostname of
Node.
machine_guid = cloudapi.machine.find(name='HOSTNAME_OF_OLD_NODE')\
['result'][0]
For example,
machine_guid = cloudapi.machine.find(name='HGST-Alpha02-DC01-R02-CN01')\
['result'][0]
d) Display all the Ethernet port names (ethN) that are registered:
For example,
e) Write the index of the above machine.nics[index].name value into the work table in column NIC
Array ID, in the row corresponding to ethN.
22
FRU Replacement Guide 1 Controller Node Replaceable Units
For the sample output from the step above, the work table would look like this:
Table 5: Work Table with Sample Ethernet Port Names and NIC Array IDs
Serial Bus Path MAC Address on the Ethernet Port Name NIC Array ID
New Chassis
0000:01:00.0 00:25:90:fd:e8:7c eth0 0
0000:01:00.1 00:25:90:fd:e8:7d eth1 4
0000:01:00.2 00:25:90:fd:e8:7e eth2 1
0000:01:00.3 00:25:90:fd:e8:7f eth3 2
0000:81:00.0 90:e2:ba:7c:5a:fc eth4 5
0000:81:00.1 90:e2:ba:7c:5a:fd eth5 3
0000:82:00.0 90:e2:ba:7c:5d:a4 eth6 6
0000:82:00.1 90:e2:ba:7c:5d:a5 eth7 7
IPMI See IPMI MAC Address BMC 8
of Node in the work table.
f) Update the database entry for machine.nics[N].hwaddr with the corresponding MAC address for ethN
from your work table.
For example,
machine.nics[0].hwaddr = '00:25:90:FD:E8:7C'
machine.nics[1].hwaddr =
'00:25:90:FD:E8:7E'
machine.nics[2].hwaddr =
'00:25:90:FD:E8:7F'
machine.nics[3].hwaddr =
'90:E2:BA:7C:5A:FD'
machine.nics[4].hwaddr =
'00:25:90:FD:E8:7D'
machine.nics[5].hwaddr =
'90:E2:BA:7C:5A:FC'
machine.nics[6].hwaddr =
'90:E2:BA:7C:5D:A4'
machine.nics[7].hwaddr =
'90:E2:BA:7C:5D:A5'
g) Update the database entry for machine.nics[8].hwaddr with the corresponding IPMI MAC address
from your work table, under MAC Address of Node.
For example,
machine.nics[8].hwaddr = '0C:C4:7A:36:8B:12'
23
FRU Replacement Guide 1 Controller Node Replaceable Units
14.
Update the MAC address of the IPMI NIC, and the DHCP leases.
a) Log into the CMC.
b) Navigate to the CMC's view of the node whose chassis you have replaced.
c) Select that node's Summary tab.
d) Write the IPMI IP address, as shown in the General section, in your work table, under IPMI IP Address of
Node.
Figure 8: The Old IPMI IP Address of the Node
e) Leave the current SSH session as is. Open a new SSH session on the Management Node.
f) Open /opt/qbase3/cfg/dhcpd/dhcpd.leases with your text editor.
g) Search for the IPMI IP address (obtained in substep d) in the file.
The section containing the IPMI IP address looks like the following example:
host 457f495a-80b7-4125-862b-5f87d9121cfa {
dynamic;
hardware ethernet 0c:c4:7a:36:8b:12:;
fixed-address 172.16.201.16;
group "pmachines";
}
h) Change the hardware ethernet value to the new IPMI MAC address in lowercase from your work table,
under IPMI MAC Address of Node.
For example,
Do a sanity check to verify that you have updated the new MAC addresses correctly.
Compare the output of the command below to your work table.
In [9]: for nic in machine.nics: nic.name; nic.hwaddr
24
FRU Replacement Guide 1 Controller Node Replaceable Units
...:
Out[9]: 'eth0'
Out[9]: '00:25:90:FD:E8:7C'
Out[9]: 'eth2'
Out[9]: '00:25:90:FD:E8:7E'
Out[9]: 'eth3'
Out[9]: '00:25:90:FD:E8:7F'
Out[9]: 'eth5'
Out[9]: '90:E2:BA:7C:5A:FD'
Out[9]: 'eth1'
Out[9]: '00:25:90:FD:E8:7D'
Out[9]: 'eth4'
Out[9]: '90:E2:BA:7C:5A:FC'
Out[9]: 'eth6'
Out[9]: '90:E2:BA:7C:5D:A4'
Out[9]: 'eth7'
Out[9]: '90:E2:BA:7C:5D:A5'
Out[9]: 'BMC'
Out[9]: '0C:C4:7A:36:8B:12'
16.
17.
Update and save the Active Archive System database device object.
a) Get the device object.
device = cloudapi.device.getObject(machine.deviceguid)
b) Update the MAC address of the chassis with the value you saved in the work table under MAC Address of
Node.
device.nicports[0].hwaddr = 'NEW_MAC_ADDRESS'
For example,
In [12]: device.nicports[0].hwaddr='90:E2:BA:7E:B8:31'
18.
Restart dhcpd.
In [14]: q.manage.dhcpd.restart()
Stopping dhcpd...
dhcpd is halted
Starting dhcpd...
dhcpd is running
25
FRU Replacement Guide 1 Controller Node Replaceable Units
19.
20.
When the node is restarted, update the main.cfg file and restart the application server.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
A list of Controller Nodes appears in the CMC.
b) Click the Controller Node whose chassis you have just replaced.
Identify the correct Controller Node by its hostname: it now matches the Original Hostname of Node value
you recorded in the worktable. This value is typically of the format SystemID-DCnn-Rnn-CNnn.
c) Identify the IP addresses listed in the Private IP field.
d) Open an SSH session to the Controller Node, using any one of the IP addresses you obtained from substep c,
and exit the OSMI menu.
The Linux prompt appears.
e) At the Linux prompt on the Controller Node, open the file /opt/qbase3/cfg/qconfig/main.cfg with
your text editor.
The file has a section that looks like this:
[main]
lastlogcleanup = 1428960577
domain = somewhere.com
nodetype = CPUNODE
nodename = 90E2BA7EB831
logserver_loglevel = 6
logserver_port = 9998
logserver_ip = 127.0.0.1
qshell_firstrun = False
machineguid = fc635662-5247-45b1-
ab66-d0abe8e60712
f) Replace the value after nodename = with the new MAC address from your work table, under MAC Address
of Node.
Note: The MAC address must be in uppercase and without colons. For example,
00:25:90:3B:C1:72 must be typed as 0025903BC172.
g) Save and close the configuration file.
h) Start the Q-Shell.
/opt/qbase3/qshell
26
FRU Replacement Guide 1 Controller Node Replaceable Units
Applicationserver is still
running, waiting for 5 more seconds
Applicationserver is still
running, waiting for 4 more seconds
Starting applicationserver
Applicationserver...
21.
Verify that the bus information of the network interfaces matches the udev rules.
a) Run the following command:
Tip: Check the hardware paths in the command below, as they might be different on the new
chassis.
For example, the output of the above command looks like this:
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth0/address
00:25:90:fd:e8:7c
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/net/eth2/address
00:25:90:fd:e8:7d
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.2/net/eth3/address
00:25:90:fd:e8:7e
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/net/eth5/address
00:25:90:fd:e8:7f
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.0/net/eth1/address
90:e2:ba:7c:5a:fc
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.1/net/eth4/address
90:e2:ba:7c:5a:fd
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0/net/eth6/address
90:e2:ba:7c:5d:a4
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1/net/eth7/address
90:e2:ba:7c:5d:a5
Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done
b) Compare the output of the command above to the contents of the file /etc/udev/rules.d/70-
persistent-net.rules.
For example, the contents of this file look like this:
27
FRU Replacement Guide 1 Controller Node Replaceable Units
reboot
Warning: Be very careful when recording and updating MAC addresses. A mistake may render the new
chassis unusable.
Item Value
Virtual IP Address of the Management Node:
Get this value as instructed in the Using the Administrator
Interfaces chapter of the HGST Active Archive System
Administration Guide
28
FRU Replacement Guide 1 Controller Node Replaceable Units
Serial Bus Path MAC Address on the New Ethernet Port Name NIC Array ID
Chassis
0000:01:00.0 eth0
0000:01:00.1 eth1
0000:01:00.2 eth2
0000:01:00.3 eth3
0000:81:00.0 eth4
0000:81:00.1 eth5
0000:82:00.0 eth6
0000:82:00.1 eth7
IPMI
29
FRU Replacement Guide 1 Controller Node Replaceable Units
30
FRU Replacement Guide 1 Controller Node Replaceable Units
31
FRU Replacement Guide 1 Controller Node Replaceable Units
b) Press the release button on the drive carrier of the decommissioned HDD to extend the drive carrier handle.
Figure 12: Removing a Drive Carrier
c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the HDD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct HDD.
e) Unscrew the drive carrier from the decommissioned HDD.
f) Screw the drive carrier onto the replacement HDD.
g) Install the replacement HDD into the same slot that the decommissioned HDD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.
Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.
The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:
32
FRU Replacement Guide 1 Controller Node Replaceable Units
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.
Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.
d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.
33
FRU Replacement Guide 1 Controller Node Replaceable Units
• Decommission the faulty SSD in the CMC. For more information, see Managing Hardware in the HGST Active
Archive System Administration Guide .
• Obtain a replacement SSD from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: 40 minutes.
To replace an SSD, proceed as follows:
Warning: The CMC identifies the incorrect slot for failed SSDs on Controller Nodes.
34
FRU Replacement Guide 1 Controller Node Replaceable Units
The image in the decommissioned disk details for SSDs is mislabeled: when it highlights slot 9, the
decommissioned SSD is actually located in slot 5; when it highlights slot 10, the decommissioned
SSD is actually located in slot 6.
35
FRU Replacement Guide 1 Controller Node Replaceable Units
b) Press the release button on the drive carrier of the decommissioned SSD to extend the drive carrier handle.
Figure 16: Removing a Drive Carrier
c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the SSD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct SSD.
e) Unscrew the drive carrier from the decommissioned SSD.
f) Screw the drive carrier onto the replacement SSD.
g) Install the replacement SSD into the same slot that the decommissioned SSD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.
Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.
The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:
36
FRU Replacement Guide 1 Controller Node Replaceable Units
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.
Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.
d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.
37
FRU Replacement Guide 1 Controller Node Replaceable Units
38
FRU Replacement Guide 1 Controller Node Replaceable Units
2. Replace the faulty SFP+ DAC cable on the Controller Node end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
3. Visually trace the faulty SFP+ DAC cable to the correct port on its Storage Interconnect end.
4. Replace the faulty SFP+ DAC cable on the Storage Interconnect end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
5. Verify that the amber LED on the SFP+ DAC cable is off.
39
FRU Replacement Guide 2 Storage Node Replaceable Units
Topics: This section provides replacement procedures for the following parts in a Storage Node:
• Chassis
• Warnings
• HDD
• Chassis Replacement
• PSU
Procedure
• MiniSAS Cable
• Hard Disk Drive
Replacement Procedure
• Power Supply Unit
Replacement Procedure
• MiniSAS Cable
Replacement Procedure
2.1 Warnings
Caution:
• Opening or removing the system cover when the system is powered on may expose you to a risk of
electric shock.
• When replacing items from the inside of the chassis, ensure that you take precautions to prevent
electrostatic discharge (ESD).
• A Storage Node weighs about 43lbs. Ensure sufficient manpower to handle it safely.
40
FRU Replacement Guide 2 Storage Node Replaceable Units
41
FRU Replacement Guide 2 Storage Node Replaceable Units
A work table is provided at the end of this section for your convenience, to store all of the information needed for a
chassis replacement.
To replace a Storage Node chassis, proceed as follows:
1.
Tip: The sample outputs shown for this procedure are from Storage Node 6.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
2.
Note: Save the node's hostname in your worktable under Original Hostname of Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 20: A Storage Node Pane in the CMC
42
FRU Replacement Guide 2 Storage Node Replaceable Units
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
3.
Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
4.
Note: Check that the cables are labeled correctly, so that you can put them back in the same
order.
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
Observe the amber LED on the paired Storage Enclosure Basic indicating loss of connection.
d) At the back of the chassis, disconnect the two power cords.
In the image above, the power cords are connected to the PSUs labeled P1 and P2.
e) At the front of the chassis, slowly slide the chassis out until you reach the pull-safety at the midway point (you
will hear a soft clicking sound, and feel the chassis "catch" on the rails).
f) Disengage the pull-safety on both sides of the chassis and slide it out until the split line of the two top covers.
Push the pull-safety on one side up, and the pull-safety on the other side down.
g) Continue to slowly slide the chassis out until you reach the pull-safety at the end point, and disengage it as you
did the earlier one.
h) Safely unmount the chassis from the rack and place it on a table.
43
FRU Replacement Guide 2 Storage Node Replaceable Units
Caution: A Storage Node chassis weighs about 43lbs. Ensure that you have sufficient
manpower to handle it safely.
Warning: Once you pull the chassis past the pull-safety, do not leave it hanging in the rack.
Otherwise, the rack rails may be damaged permanently.
5.
Move the two HDDs from the failed chassis to the exact corresponding slots in the new chassis.
Tip: Write down the disk serial number and slot location so that you can double-check that each
disk is seated in the correct slot post installation into the new chassis.
a) Remove each disk from its slot in the front bay of the failed chassis.
b) Install the disk into the corresponding slot in the new chassis.
Figure 23: Storage Node, Front
6.
44
FRU Replacement Guide 2 Storage Node Replaceable Units
Get the IP address and machine name (hostname) of the new chassis.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Unmanaged Devices >
Uninitialized.
The new chassis appears in the list of uninitialized devices. This indicates that it has started successfully.
b) Write the value of Name into your work table, under Temporary Hostname of Node.
c) Write the value of Name without the PM- prefix into your work table, under MAC Address of Node.
d) Write the IP address into your work table, under Temporary IP Address of Node.
Figure 25: Uninitialized Nodes
9.
45
FRU Replacement Guide 2 Storage Node Replaceable Units
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/net/eth1/address
90:e2:ba:7e:b8:30
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.1/
net/eth3/address
90:e2:ba:7e:b8:31
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.0/
net/eth0/address
0c:c4:7a:33:38:10
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.1/
net/eth2/address
0c:c4:7a:33:38:11
root@nfsROOT:~#
The output of this command shows the serial bus path (for example, 0000:02:00.1) and the new MAC
address (for example, 90:e2:ba:7e:b8:31).
Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done
c) Fill in the serial bus path in ascending order in the Serial Bus Path column of the work table.
d) Fill in the MAC address corresponding to the serial bus path in ascending order in the MAC Address on the
New Chassis column of the work table.
For the sample output from the step above, the work table would look like this:
Table 7: Work Table with Sample MAC Addresses and Serial Bus Paths
Serial Bus Path MAC Address on the New Ethernet Port Name
Chassis
0000:02:00.0 90:e2:ba:7e:b8:30 eth0
0000:02:00.1 90:e2:ba:7e:b8:31 eth1
0000:07:00.0 0c:c4:7a:33:38:10 eth2
0000:07:00.1 0c:c4:7a:33:38:11 eth3
e) Close the SSH session to the Storage Node.
You are now back in the SSH session to the Management Node.
10.
Get the machine GUID and device GUID of the new chassis.
a) On the Management Node, start the Q-Shell:
/opt/qbase3/qshell
46
FRU Replacement Guide 2 Storage Node Replaceable Units
c) Retrieve the machine GUID for the new chassis, using the value of Temporary Hostname of Node in
uppercase, from the work table, for hostname_of_new_node in the command below:
machine_guid = cloudapi.machine.find(name='hostname_of_new_node')['result'][0]
For example,
machine_guid = cloudapi.machine.find(name='PM-90:E2:BA:7E:B8:31')['result'][0]
d) Retrieve the device GUID using the machine GUID you obtained from the previous step.
dg = cloudapi.machine.list(machineguid=machine_guid)['result'][0]['deviceguid']
dg
'd951f6d9-7104-470d-8c97-ecf52d57c7b5'
11.
Mark the new chassis as FAILED in the Active Archive System database, and clean up references to it.
The Active Archive System created a new INSTOCK node in its database for the new chassis. If you do not mark
the new chassis as FAILED in the database, you are in effect adding a new node rather than replacing an existing
node's chassis. Therefore, you must remove the INSTOCK node by following the steps below.
a) Mark the new chassis as FAILED in the Active Archive System database:
Execute this command on the Management Node:
cloudapi.device.updateModelProperties(dg, \
status=str(q.enumerators.devicestatustype.FAILED))
The new chassis now appears under the FAILED list in the CMC, and is removed from the Unmanaged
Devices list.
Figure 26: The New Chassis Appears Under the FAILED List in the CMC
b) From the Management Node, clean up references to the new chassis in the Active Archive System database.
47
FRU Replacement Guide 2 Storage Node Replaceable Units
In the command below, replace MAC_ADDRESS with the value you wrote in the work table for MAC Address
of Node.
q.amplistor.cleanupMachine('MAC_ADDRESS')
For example,
In [14]: q.amplistor.cleanupMachine('90:E2:BA:7E:B8:31')
Out[14]: True
Update the Active Archive System database with the MAC addresses for the new chassis.
a) From the Management Node, create a cloudAPI connection.
cloudapi = i.config.cloudApiConnection.find('main')
b) From the Management Node, get the machine GUID using your work table value for Original Hostname of
Node.
machine_guid = cloudapi.machine.find(name='HOSTNAME_OF_OLD_NODE')\
['result'][0]
For example,
machine_guid = cloudapi.machine.find(name='HGST-S3-DC01-R01-SN06')['result'][0]
d) Display all the Ethernet port names (ethN) that are registered:
For example,
In [3]: machine = cloudapi.machine.getObject(machine_guid)
In [4]: print
machine.nics[0].name
eth1
In [5]: print
machine.nics[1].name
eth3
In [6]: print
machine.nics[2].name
eth2
In [7]: print
machine.nics[3].name
eth0
48
FRU Replacement Guide 2 Storage Node Replaceable Units
In [8]: print
machine.nics[4].name
BMC
e) Write the index of the above machine.nics[index].name value into the work table in column NIC
Array ID, in the row corresponding to ethN.
For the sample output from the step above, the work table would look like this:
Table 8: Work Table with Sample Ethernet Port Names and NIC Array IDs
Serial Bus Path MAC Address on the Ethernet Port Name NIC Array ID
New Chassis
0000:02:00.0 90:e2:ba:7e:b8:30 eth0 3
0000:02:00.1 90:e2:ba:7e:b8:31 eth1 0
0000:07:00.0 0c:c4:7a:33:38:10 eth2 2
0000:07:00.1 0c:c4:7a:33:38:11 eth3 1
IPMI See IPMI MAC Address BMC 4
of Node in the work table.
f) Update the database entry for machine.nics[N].hwaddr with the corresponding MAC address for ethN
from your work table.
For example,
machine.nics[0].hwaddr = '90:E2:BA:7E:B8:31'
machine.nics[1].hwaddr = '0C:C4:7A:33:38:11'
machine.nics[2].hwaddr = '0C:C4:7A:33:38:10'
machine.nics[3].hwaddr = '90:E2:BA:7E:B8:30'
g) Update the database entry for machine.nics[4].hwaddr with the corresponding IPMI MAC address
from your work table, under MAC Address of Node.
For example,
machine.nics[4].hwaddr = '0C:C4:7A:36:8B:12'
13.
Update the MAC address of the IPMI NIC, and the DHCP leases.
a) Log into the CMC.
b) Navigate to the CMC's view of the node whose chassis you have replaced.
c) Select that node's Summary tab.
49
FRU Replacement Guide 2 Storage Node Replaceable Units
d) Write the IPMI IP address, as shown in the General section, in your work table, under IPMI IP Address of
Node.
Figure 27: The Old IPMI IP Address of the Node
e) Leave the current SSH session as is. Open a new SSH session on the Management Node.
f) Open /opt/qbase3/cfg/dhcpd/dhcpd.leases with your text editor.
g) Search for the IPMI IP address (obtained in substep d) in the file.
The section containing the IPMI IP address looks like the following example:
host 457f495a-80b7-4125-862b-5f87d9121cfa {
dynamic;
hardware ethernet 0c:c4:7a:36:8b:12:;
fixed-address 172.16.201.16;
group "pmachines";
}
h) Change the hardware ethernet value to the new IPMI MAC address in lowercase from your work table,
under IPMI MAC Address of Node.
For example,
In your previous SSH session, do a sanity check to verify that you have updated the new MAC addresses correctly.
Compare the output of the command below to your work table.
In [9]: for nic in machine.nics: nic.name; nic.hwaddr
...:
Out[9]: 'eth1'
Out[9]: '90:E2:BA:7E:B8:31'
Out[9]: 'eth3'
Out[9]: '0C:C4:7A:33:38:11'
Out[9]: 'eth2'
Out[9]: '0C:C4:7A:33:38:10'
Out[9]: 'eth0'
Out[9]: '90:E2:BA:7E:B8:30'
50
FRU Replacement Guide 2 Storage Node Replaceable Units
Out[9]: 'BMC'
Out[9]: '0C:C4:7A:36:8B:12'
15.
16.
Update and save the Active Archive System database device object.
a) Get the device object.
device = cloudapi.device.getObject(machine.deviceguid)
b) Update the MAC address of the chassis with the value you saved in the work table under MAC Address of
Node.
device.nicports[0].hwaddr = 'NEW_MAC_ADDRESS'
For example,
In [12]: device.nicports[0].hwaddr='90:E2:BA:7E:B8:31'
17.
Restart dhcpd.
In [14]: q.manage.dhcpd.restart()
Stopping dhcpd...
dhcpd is halted
Starting dhcpd...
dhcpd is running
18.
51
FRU Replacement Guide 2 Storage Node Replaceable Units
Once rebooted, if you log into the CMC, you can see that the chassis now has the correct hostname as shown in the
figure below.
Figure 28: The Node with a Rebooted Chassis as Seen on the CMC
19.
When the node is restarted, update the main.cfg file and restart the application server.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
A list of Storage Nodes appears in the CMC.
b) Click the Storage Node whose chassis you have just replaced.
Identify the correct Storage Node by its hostname: it now matches the Original Hostname of Node value you
recorded in the worktable. This value is typically of the format SystemID-DCnn-Rnn-SNnn.
c) Identify the IP addresses listed in the Private IP field.
d) Open an SSH session to the Management Node, and exit the OSMI menu.
The Linux prompt appears.
e) Open an SSH session to the Storage Node, using any one of the IP addresses you obtained from substep c.
The Linux prompt appears.
f) At the Linux prompt on the Storage Node, open the file /opt/qbase3/cfg/qconfig/main.cfg with
your text editor.
The file has a section that looks like this:
[main]
lastlogcleanup = 1428960577
domain = somewhere.com
nodetype = STORAGENODE
nodename = 90E2BA7EB831
logserver_loglevel = 6
logserver_port = 9998
logserver_ip = 127.0.0.1
qshell_firstrun = False
machineguid = fc635662-5247-45b1-ab66-
d0abe8e60712
g) Replace the value after nodename = with the new MAC address from your work table, under MAC Address
of Node.
52
FRU Replacement Guide 2 Storage Node Replaceable Units
Note: The MAC address must be in uppercase and without colons. For example,
00:25:90:3B:C1:72 must be typed as 0025903BC172.
20.
Verify that the bus information of the network interfaces matches the udev rules.
a) Run the following command:
Tip: Check the hardware paths in the command below, as they might be different on the new
chassis.
For example, the output of the above command looks like this:
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/net/eth0/address
90:e2:ba:7e:b8:30
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.1/net/eth1/address
90:e2:ba:7e:b8:31
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.0/net/eth2/address
0c:c4:7a:33:38:10
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.1/net/eth3/address
0c:c4:7a:33:38:11
Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done
b) Compare the output of the command above to the contents of the file /etc/udev/rules.d/70-
persistent-net.rules.
For example, the contents of this file look like this:
root@HGST-S3-DC01-R01-SN06:~# cat /etc/udev/rules.d/70-persistent-net.rules
53
FRU Replacement Guide 2 Storage Node Replaceable Units
reboot
Sanity check: you can observe that the Storage Enclosure Basic LEDs are now solid green. In addition, the CMC shows
the node status as RUNNING.
Figure 29: Storage Node Status in the CMC
Warning: Be very careful when recording and updating MAC addresses. A mistake may render the new
chassis unusable.
Item Value
Virtual IP Address of the Management Node:
Get this value as instructed in the Using the Administrator
Interfaces chapter of the HGST Active Archive System
Administration Guide
54
FRU Replacement Guide 2 Storage Node Replaceable Units
Item Value
The CMC displays this value after the new chassis is
installed.
Serial Bus Path MAC Address on the New Ethernet Port Name NIC Array ID
Chassis
0000:01:00.0 eth0
0000:01:00.1 eth1
0000:01:00.2 eth2
0000:01:00.3 eth3
IPMI
55
FRU Replacement Guide 2 Storage Node Replaceable Units
56
FRU Replacement Guide 2 Storage Node Replaceable Units
b) Press the release button on the drive carrier of the decommissioned HDD to extend the drive carrier handle.
Figure 33: Removing a Drive Carrier
c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the HDD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct HDD.
e) Unscrew the drive carrier from the decommissioned HDD.
f) Screw the drive carrier onto the replacement HDD.
g) Install the replacement HDD into the same slot that the decommissioned HDD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Storage Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
57
FRU Replacement Guide 2 Storage Node Replaceable Units
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.
Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.
The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.
Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.
d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.
58
FRU Replacement Guide 2 Storage Node Replaceable Units
59
FRU Replacement Guide 2 Storage Node Replaceable Units
60
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
Topics: This section provides replacement procedures for the following parts in a Storage
Interconnect:
• Warnings
• Chassis
• Storage Interconnect
• Fan
Replacement Procedure
• PSU
• Fan Replacement • SFP+ 1G Module
Procedure • SFP+ DAC Cable
• Power Supply Unit
Replacement Procedure
• SFP+ 1G Module
Replacement Procedure
• SFP+ DAC Cable
Replacement Procedure
3.1 Warnings
Caution:
• All data on the Active Archive System is unavailable during repair.
• Always perform a health check after replacing a Storage Interconnect . Consult the HGST Active
Archive System Administration Guide, in the chapter "Monitoring the System".
• When you replace a Storage Interconnect by a one with firmware version 3.2.0.3 (instead of 2.2.0.5),
upgrade first the Telemetry Collection Software to version 1.0.181. Consult the HGST Active
Archive System Administration Guide, in the chapter "Upgrading the System".
• Upgrade the firmware of all Storage Interconnects to 3.2.0.3, once you have installed one Storage
Interconnect with the new firmware.
61
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
Replace switchport_IP with the default IP address of the new Storage Interconnect:
For Storage Interconnect 1 (lower), the default IP address is 192.168.123.123.
For Storage Interconnect 2 (upper), the default IP address is 192.168.123.123.
The login prompt appears.
Log in using the default credentials (username admin, password none or HGSTHGST, depending on
the rack's manufacture date).
The Storage Interconnect command prompt appears.
b) Type enable to enter the Privileged EXEC command mode.
(Routing)>enable
c) Type configure to enter the Privileged CONF command mode.
(Routing)#configure
(Routing) (Config)#
d) Configure the name of the switch:
Replace DC01 with the data center index you use for this data center. Replace SW01 with SW02 if you are
replacing the upper Storage Interconnect.
(Routing) (Config)#DC01-SW01
(DC01-SW01) #
This operation may take a few minutes. Management interfaces are not available during this time.
h) Verify that the settings on the Storage Interconnect match your configuration file.
62
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
i) If the output of show startup-config does not show flow control as enabled on the Storage Interconnect,
enable it:
Type the following command:
(DC01-SW01) >enable
(DC01-SW01) #show startup-config | include flowcontrol
If the output of the command above is not flowcontrol symmetric, type the following commands:
(DC01-SW01) #configuration
(DC01-SW01) (Config) #flowcontrol symmetric
(DC01-SW01) (Config) #exit
(DC01-SW01) #write memory
Configuration Saved!
(DC01-SW01) #
3. Remove the two blanking plates from the top front of the rack.
4. Identify the faulty Storage Interconnect.
Figure 35: A Single-Rack Active Archive System
63
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
64
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
x. (For SW1 only) And so on (refer to the Signal Cabling Scheme below).
Figure 36: Storage Interconnect Port Reservations
65
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
66
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
3. On the front side, identify the faulty PSU (amber colored LED).
4. Unplug the power cable of the faulty PSU.
5. Push the blue release tab towards the power connector and use the grab handle to remove the faulty PSU.
6. Push the new PSU into the Storage Interconnect.
7. Attach the power cable to the new PSU.
8. Reattach the two blanking plates onto the top front of the rack.
67
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
68
FRU Replacement Guide 3 Storage Interconnect Replaceable Units
2. Replace the faulty SFP+ DAC cable on the Controller Node end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
3. Visually trace the faulty SFP+ DAC cable to the correct port on its Storage Interconnect end.
4. Replace the faulty SFP+ DAC cable on the Storage Interconnect end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
5. Verify that the amber LED on the SFP+ DAC cable is off.
69
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
Topics: This section provides replacement procedures for a power distribution unit (PDU):
• PDU
• Warnings
• Power Distribution Unit
Replacement Procedure
4.1 Warnings
Caution: During the replacement of a PDU, the Active Archive System is running on a single power
source.
Note: PDU01 is located on the right side when facing the back of the rack (the side of switch ports).
Warning: Once you pull the PDU past the pull-safety, do not leave the PDU hanging in the rack.
Otherwise, the rack rails may be damaged permanently.
70
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
a) Feed the external power cable of the replacement PDU into the rack as you slide the replacement PDU into the
slides of the rack.
71
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
iv. In the Java control panel, select Edit Site List and add an exception for the following website:
http://192.168.123.123/
Figure 43: Windows 7 Java Control Panel: Exception Site List
v. Click Add to add the IP Address listed above, and then click OK.
vi. Click Continue to add the exception.
Figure 44: Windows 7 Java Control Panel: Security Warning
72
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
viii. Start your web browser (Chrome or Firefox recommended), and type http://192.168.123.123/
in the address bar.
The PDU's login dialog appears, as in the image below.
Figure 46: PDU Login Dialog
73
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
Note: On the first-time running on the browser, you are asked by JAVA if you want to
run on this page. Click "Run" button or "OK" when asked for prompts.
ix. Log into the PDU with username admin and password admin.
b) Verify the status of each electrical socket.
i. On the main menu of the PDU interface, click Status to view the PDU status.
Figure 47: PDU Main Menu
74
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
Tip: To refresh the status, click Refresh. To close the panel, click Close.
ii. A Java window pops up with PDU startup sequence delay parameters.
75
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
Tip: The Reset Delay value specifies how long the electrical socket waits in the OFF
position before switching to the ON position. It may be useful to change this value when
you want to quickly reset all ports without pulling out the main power cable connected to
the rack.
iii. For each port listed in the Name columns in each of the three sections (Branch XY, Branch YZ, and
Branch ZX), enter the values for ON Delay and Reset Delay exactly as shown in the image below.
76
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
Caution: Refer to the correct image based on whether you replaced the upper PDU
(PDU2) or the lower PDU (PDU 1).
iv. When you are finished, click Save, wait for 30 seconds, and then click Close.
77
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units
78
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Topics: This section provides replacement procedures for the following parts in a Storage Enclosure
Basic:
• Visual Indicator and
• Sled
Field Replaceable Units
• HDD
Locations
• Power Cord
• Sled Replacement
• MiniSAS Cable
Procedure
• Rear Fan
• Power Cord Replacement
• PSU
Procedure
• I/O Canister
• MiniSAS Cable • Sled Blank (to be replaced with a fully populated sled)
Replacement Procedure
• Rear Fan Replacement
Procedure
• Power Supply Unit
Replacement Procedure
• I/O Canister Replacement
Procedure
• Storage Enclosure Basic
Capacity Upgrades
79
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
The following diagram displays the visual indicators for the I/O canister, sled, and the rear fans in the Storage Enclosure
Basic:
Figure 53: System Enclosure Information
The following diagrams display the physical locations of the various FRUs and visual indicators in the Storage
Enclosure Basic:
80
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
81
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note: Ensure that you store all removed parts in a safe location while replacing the FRU.
Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 56: A Storage Node Pane in the CMC
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
82
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
2. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.
83
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable can from which port to ensure that they are plugged in
correctly when reassembling.
4. Unlock the I/O module from the enclosure by pulling the latch handle out and away from the I/O module.
Figure 60: Unlocking the I/O Module
84
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
85
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
6. On the front of the sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 62: Sled Release Button
86
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• Ensure that you remove and replace the sleds and sled blanks in the same order.
• Repeat the two previous steps until all of the sleds, in need of replacement, have been removed
from the chassis.
87
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
8. On the sled, slide the drive cover forward and up until the cover has been removed.
Figure 65: Sled Cover
88
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
9. To remove the hard disk drives from the failed sled, on the hard drive carrier, depress the two buttons and remove
the drive.
Figure 66: Hard Disk Drive Carrier Buttons
10. To install the hard disk drives into the replacement sled, on the hard drive carrier, depress the two buttons and insert
the drive into the first sled slot.
11. Repeat the previous step until all drive slots within the sled are populated.
12. Install the replacement sled in the reverse order that you removed it.
13. Install the remaining enclosure components in the reverse order that you removed them.
14. Power on the Storage Node.
The power button is located on the chassis front control panel.
89
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Warning: For a single geo system, this procedure involves taking the paired Storage Node offline, which
can result in data unavailability for both the large file and the small file policy. Therefore, you must put
the MetaStores into read-only mode at the start of this procedure, and then reactivate them at the end of
this procedure.
90
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
Figure 68: Decommissioned Disks in the CMC
91
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
c) Open an SSH session to the Storage Node that is paired with the Storage Enclosure Basic containing the
decommissioned drive.
d) Start the Q-Shell.
/opt/qbase3/qshell
Welcome to qshell
e) Identify the drive slot number based on the serial number in the decommissioned disk details you obtained in
Step 1.
In [1]:api = i.config.cloudApiConnection.find('main')
In [2]:mguid = api.machine.find(name='hostname_of_storage_node')['result'][0]
In [3]:print(api.disk.list(machineguid=mguid, serial_number='device_serial_number')
['result'][0]['bus_location'])
EXP_SLOT_69
Important: Subtract 1 from the drive slot number, because the index starts from 0.
For example, if the drive slot number is 69, you must use 68 in the command below.
Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
92
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
7. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
8. Locate the enclosure that contains the failed hard disk drive.
Note: The enclosure containing the failed drive will have a flashing blue identification LED.
9. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.
93
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• Power cords marked in red.
• Cord retention bale marked in blue.
94
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.
95
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
11. Unlock the I/O canister from the enclosure by pulling the latch handle out and away from the I/O canister.
Figure 74: Unlocking the I/O Canister
96
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
13. Locate the failed sled by identifying the Sled Fail/Identify indicator is blinking amber.
Figure 76: Sled HDD Order
97
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
14. On the front of the first sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 77: Sled Release Button
98
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
15. Remove the sled in need of a new hard disk drive, out of the chassis.
Figure 79: Removing the Sled
Note:
• Store all removed parts in a safe location.
• Ensure that you remove and replace the sleds and sled blanks in the same order.
• Repeat the two previous steps until all of the disks in need of replacement have been removed
from the sled.
99
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
16. On the sled, slide the drive cover forward and up until the cover has been removed.
Figure 80: Sled Cover
17. Identify the drive to be replaced by referring to the drive map you obtained from the CMC and the amber LED.
Tip: The correct drive is the one whose blue arrow is pointing at the illuminated amber LED.
100
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
18. To remove the hard disk drive, on the hard drive carrier, depress the two buttons and remove the drive.
Figure 81: Hard Disk Drive Carrier Buttons
19. Install the replacement hard disk drive with carrier in the reverse order that you removed it.
20. Install the remaining enclosure components in the reverse order that you removed them.
21. Re-connect the enclosure to the power cords.
22. On the PSUs, identify that the AC and DC LEDs display green indicators.
23. Power on the Storage Node.
The power button is located on the chassis front control panel.
24. Disable the identification LED on the Storage Enclosure Basic.
a) Open an SSH session to any Controller Node.
The OSMI menu appears.
b) Exit the OSMI menu.
The Linux prompt appears.
101
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
c) Open an SSH session to the Storage Node that is paired with this Storage Enclosure Basic.
d) Disable ("clear") the drive LED using the drive slot number you obtained in Step 4e.
Important: Subtract 1 from the drive slot number, because the index starts from 0.
For example, if the drive slot number is 69, you must use 68 in the command below.
Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.
The Initializing new disk job has completed successfully when the number of degraded disks decreases
by 1.
Postrequisites:
If you replaced the wrong disk, see Troubleshooting.
On single geo systems only, put the MetaStores back into read/write mode. For instructions, see Marking a MetaStore
as Read/Write on page 131.
102
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note: The power cord is marked in red and the power cord retention bale is marked in blue.
2. Do the following to remove the failed power cord from the rack:
a) Disconnect the failed power cord from the server.
b) From the Enclosure end, pull the power cord through the rail kit cable guides.
c) Pull the power cord up through the side of the rack rail.
d) Pull the power cord through the top of the rack.
3.
Note: Ensure the new power cord is installed in the same location as the failed power cord.
Do the following to install the new power cord into the rack:
a) Run the new power cord through the top of the rack.
b) Pull the power cord down through side of the rack rail.
c) From the Enclosure end, pull the power cord through the rail kit cable guides.
d) Connect the power cable to the server.
4. Pull the power cord through I/O module cable guides.
5. Plug the power cord into the power supply unit.
6. To secure the power cord, press the power cord retention bale into the I/O module.
103
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
3. From the I/O module of the Storage Enclosure Basic, unplug the failed miniSAS cable.
104
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
a) Disengage the latch on the faulty miniSAS cable at its Storage Enclosure Basic end by pulling very gently on its
pull tab.
Figure 85: Removing the MiniSAS Cables
Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.
b) Once the latch is disengaged and the cable is loose, grasp its metal connector or cord (not its pull tab) to pull it out
of its port.
4. Do the following to remove the failed miniSAS cable from the rack:
a) From the enclosure end, pull the miniSAS cable through the rail kit cable guides.
b) Pull the miniSAS cable up through the side of the rack rail.
5. Do the following to install the new miniSAS cable into the rack:
Tip: If you are replacing the 6M cable, install it over the top of the rack for ease of replacement.
Note: Ensure the new miniSAS cable is installed in the same location as the failed miniSAS cable.
a) Connect the new miniSAS cable into the same Storage Node port.
The cable is reseated properly (the latch is engaged) when you hear a click.
b) From the Storage Node, run the new miniSAS cable through the cable guides.
c) Pull the miniSAS cable down through the side of the rack rail.
d) From the enclosure end, pull the miniSAS cable through the rail kit cable guides.
e) Plug the miniSAS cable into the I/O module.
6. Verify that the amber LED on the new miniSAS connector is off.
105
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• Ensure that you store all removed parts in a safe location while replacing the FRU.
• The rear fans are hot-swappable. The enclosure does not need to be powered down in order to replace
them.
1. From the rear of the chassis, remove the failed rear fan by depressing the release button on the top right of the fan.
Figure 86: Fan Release Button
2. Rotate the top of the fan away from the chassis until the fan pins clear the connectors on the chassis.
Note: Repeat the previous step until all of the fans in need of replacement have been removed.
3. Remove the fan from the fan rubber bumpers on the chassis.
4. Install the replacement fan in the reverse order that you removed it.
106
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note: Ensure that you store all removed parts in a safe location while replacing the FRU.
Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 88: A Storage Node Pane in the CMC
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the Status
field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
107
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
2. Lift the cord retention bale and unplug the power cord from the failed power supply unit.
Figure 90: Removing the Power Cord
Note:
• Cord retention bale marked in blue.
• If you are removing power supply A, you do not need to remove the miniSAS cables. To remove
power supply B, it is recommended that you remove the miniSAS cables for ease of replacement.
To remove the miniSAS cables, pull the blue tab and remove the cable from the port. Repeat for
both miniSAS cables as necessary.
3. Unlock the failed power supply unit by pulling the latch handle out and away from the I/O canister.
Note:
• The power supply unit latch handle should be at 45° when removed.
108
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
• Repeat this step for the remaining power supply unit if necessary.
4. Remove the power supply unit until free of the I/O canister.
5. Install the replacement power supply unit.
6. Reconnect the miniSAS cables.
7. Plug the power cord back into the replaced power supply.
8. Power on the Storage Node.
The power button is located on the chassis front control panel.
109
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• Ensure that you store all removed parts in a safe location while replacing the FRU.
• Ensure you are wearing an ESD wrist strap to complete the replacement of the I/O canister.
Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 92: A Storage Node Pane in the CMC
110
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
2. Identify the Storage Enclosure Basic that contains the failed I/O canister.
Note: To identify the failed I/O Canister, verify that the amber light is blinking.
3. Remove the miniSAS cables by pulling the blue tab and remove the cable from the port.
4. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.
Figure 94: Removing the MiniSAS Cables
Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.
111
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
• Power cords marked in red.
• Cord retention bale marked in blue.
5. Wait approximately 30 seconds after the I/O canister is unplugged to continue with the replacement procedure.
112
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
6. With your palms facing up, place the pointer and middle finger into the latch handle sides.
Figure 96: Latch Handle Identification
Note:
• Latch handle marked in red.
• Rack ears marked in yellow.
7. With your thumbs on the rack ears, pull the latch handle sides and push on the rack ear release.
113
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
8. Pull the latch handle until clear of the rack ear latch.
Figure 97: Latch Handle Clear of Rack Ear
Note:
• Latch handle marked in red.
• Rack ears marked in yellow.
9. Completely remove the miniSAS cables and power cords from the I/O canister.
10. With your palms facing up, reposition your hands so that your thumbs are on the outside and your fingers cradle the
bottom and rear of either side of the I/O canister .
11. Slowly pull the I/O canister away from the chassis.
114
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Warning: The I/O canister is very back-heavy. Ensure that you are fully supporting the component
during the removal.
12. Install the replacement I/O canister in the reverse order that you removed it.
Note: Ensure that the I/O canister is center properly and press firmly to ensure you are able to latch
the replacement I/O canister.
115
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Table 10: Utilization of Old Disks When New Disks Are Added
Number of 1 2 3 4 5 6
Added Sled
Columns
Number
of Factory
Installed
Sled
Columns
1 100% 100% 100% 100% 100% 100%
2 50% 100% 100% 100% 100% 100%*
3 30% 60% 100% 100%* 100%* 100%*
4 20% 50% 70% 100%* 100%* 100%*
5 20% 40% 60%* 80%* 100%* 100%*
6 10% 30%* 50%* 60%* 80%* 100%*
*
Requires installation of an additional rack
116
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
117
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
5.8.2 Prerequisites
Before performing upgrading the capacity of the system, determine its current state and health. The system must be
in nominal condition, running the latest software and firmware, and contain no degraded disks. If the system is not in
nominal condition, you must repair it first.
118
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
a) Navigate to Dashboard > Administration > HGST Active Archive System Management > Locations >
Datacenter Management.
b) Select the data center which contains the physical rack to be upgraded.
c) In the Racks/Groups portion of the Datacenter Information pane, verify that the rack name and serial number
to be upgraded are the same value. If not, follow the steps in Setting Serial Number to Rack Name on page
134.
• Write down the post upgrade hardware abstraction layer (HAL) type from Supported Hardware Abstraction Layers
(HALs) on page 117.
• Obtain the correct replacement sleds from HGST Support.
• Bring several new, blank disks and sleds to the data center.
Warning: Bring serveral new blank disks and sleds to the data center to protect against a potential
failure while initializing the new disks.
Table 13: Work Table for Storage Enclosure Basic Capacity Upgrades
Item Value
Rack serial number
Original HAL type
New HAL type
5.8.3 Overview
This is only an overview of the procedure. Do not interpret this overview as the actual procedure.
1. Run the capacity upgrade tool in prepare mode to verify optimal disk safety, check for degraded and
decommissioned drives, and on single geo systems, to make the MetaStore(s) read-only.
2. For each Storage Node:
a) Power down the Storage Node.
b) Replace the sled blank(s) with the purchased populated sled(s).
c) Power up the Storage Node.
3. Run the capacity upgrade tool in upgrade mode to reassign the master of all namespaces (in other words, to balance
the namespaces on all storage daemons).
4. Run the capacity upgrade tool in finalize mode to add new storage daemons, initialize the blockstores, add new
maintenance agents, and, on single geo systems, to set the system back to read/write mode.
Important: This procedure involves putting single geo systems into read-only mode.
This procedure requires you to power down each Storage Node sequentially, which would result in data
unavailability or a lowered disk safety if data were still being ingested at this time. When you power
down the first Storage Node, the system would write a new object with a safety of 2. When you power
up the first Storage Node and power down the second Storage Node, 3 checkblocks of the object would
become unavailable. This would result in a disk safety of -1, and as such, an unavailable object. The
capacity upgrade tool, hgst_capacity_on_demand.py, prevents this scenario by putting the
system into read-only mode.
119
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Read-only mode does not prevent the system from executing repair tasks. Because all the storage
daemons on a Storage Node all have the same location, there is no risk of writing unbalanced spreads
for repair activity happening during the capacity upgrade procedure. Also, since repairs never write an
object with lower safety, if a repair must replace a checkblock in a Storage Node that is currently down,
the repair is redone the next day.
Powering down one Storage Node may reduce performance; however, this is no different from a normal
upgrade or maintenance operation on a Storage Node.
Warning:
• This procedure is only to be done on one rack at a time.
• Replace sleds from left to right and in sequence.
• Each Storage Enclosure Basic in any given rack must have the same number of populated sleds.
• Ensure that you store all removed parts in a safe location while replacing the sled.
4. Run hgst_capacity_on_demand.py in prepare mode to verify optimal disk safety, check for degraded and
decommissioned drives, and, for single geo systems, to make the MetaStore(s) read-only:
At the Linux prompt on the Management Node, run hgst_capacity_on_demand.py with the --prepare
option and the rack serial number you obtained in Prerequisites on page 118:
Run this with super user access.
Wait for this command to complete (usually less than 5 minutes except when there are a large number of buckets
with suboptimal disk safety).
Sample output on a single geo system:
root@HGST-MINI-S3-DC01-R01-CN08:~# /opt/qbase3/utils/HGST/
hgst_capacity_on_demand.py --prepare --rackserial MINIALPS1
2016-03-18 13:35:25,352 INFO Starting capacity on demand tool
2016-03-18 13:35:25,988 INFO Assessing enclosure layout
2016-03-18 13:35:28,113 INFO Starting upgrade preparation
2016-03-18 13:35:28,122 INFO Checking for degraded disks
2016-03-18 13:35:28,140 INFO Checking for decommissioned disks on the storage nodes
to upgrade
2016-03-18 13:35:28,549 INFO Checking for namespaces with low disk safeties. This
can take quite some time on environments with a large number of namespaces
2016-03-18 13:35:28,669 INFO Switching metadatastores to read-only.
2016-03-18 13:35:29,918 INFO Writing prepare tag
120
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag
on the command line.
You get an insufficient access Rerun the capacity upgrade tool with super user access.
privileges error.
You get a Please run on the master Rerun the capacity upgrade tool on the Management
controller error. Node.
The tool output is System node name: Failed This indicates a server communication issue.
to collect device info. (The Storage Node Either the Storage Enclosure Basic is failing
fails to return the device list to the capacity ugprade or the communication path has issues. Hardare
tool). troubleshooting is required.
The tool output is System node name: Failed
to collect storage enclosure info. (The
Storage Node fails to return information on the Storage
Enclosure Basic).
The tool output is System node name: Failed
to collect mount info. (The Storage Node
fails to return the list of mounted devices).
The tool output is System node name: Storage Upgrade the firmware on the Storage Enclosure Basic I/
Enclosure Firmware requires upgrade!. O module to version 0115 or greater.
(The Storage Enclosure Basic firmware does not
support capacity expansion).
The tool output is System node name: n disks This error indicates that we have missing disks.
are not mounted!. (The Storage Node fails to Hardware troubleshooting of the Storage Node is
remount all the disks). required to repair the issues.
The tool output is System node name: Storage This error indicates that the sled upgrade was done
Enclosure sleds not registered. improperly. The sleds were mis-installed.
Service required!. (The Storage Enclosure
misregistered sleds).
There are decommissioned disks on existing sleds. Replace the disks, and start over from step 1. For
instructions on replacing disks, see Hard Disk Drive
Replacement Procedure on page 90.
There are degraded disks on existing sleds. Assess the drive and either reset or decommission
and allow repairs to complete. Then start over from
step 1. For disk troubleshooting workflows, see
Managing Hardware in the HGST Active Archive
System Administration Guide.
There are buckets with suboptimal disk safeties. Invoke repairs on the impacted buckets by following
the instructions in Invoking the Repair Process on page
131. Allow repairs to complete before rerunning.
Depending upon the size of the environment, this could
take several hours. Then start over from step 1.
121
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Problem Action
You want to confirm that prepare mode completed Look at the output from the tool. You should see a line
successfully. similar to:
2016-03-18 13:35:29,934 INFO Completed
preparation
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 99: A Storage Node Pane in the CMC
Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.
All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
6. From the Storage Enclosure Basic, unplug the power cables by lifting the power cord retention bale and carefully
removing the power cord from the power supply.
122
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Note:
123
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
8. Unlock the I/O module from the enclosure by pulling the latch handle out and away from the I/O module.
Figure 103: I/O Canister Handle
124
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
10. On the front of the sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 106: Sled Release Button
125
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
11. Remove the sled blank from the slot to the farthest to the left.
Figure 107: Removing the Sled
Note: Ensure that you remove and replace the sleds and sled blanks in the same order.
12. From the front of the enclosure, slide the populated sled until fully seated.
Note: The sleds must be installed from slot A (far left of the enclosure) to slot G (far right of the
enclosure).
13. Install the remaining enclosure components in the reverse order that you removed them.
14. Power on the Storage Node.
The power button is located on the chassis front control panel.
15. Wait 10 minutes or less for the Machine was rebooted event in the CMC.
This event that indicates that the Storage Node disks and services are back online. For example:
Event Type: OBS-PMACHINE-0106
Event Message: Machine was rebooted
Severity: INFO
Source: HGST-S3-DC01-R01-SN02 (90:E2:BA:7C:45:B1)
Occurrences: 2
First occurrence: 2016-03-21 10:39:16
Last occurrence: 2016-03-21 11:26:36
Details:
Tags: keep_live:0
machineguid:a50d1a4b-d8ea-4dc9-89a8-12276626bf2b
agentguid:06b311fb-a5e6-48c2-bc32-df24288b9f23
typeid:OBS-PMACHINE-0106
machinename:HGST-S3-DC01-R01-SN02
16. Repeat steps 4-16 for other Storage Nodes in the rack.
17. Run hgst_capacity_on_demand.py in upgrade mode to initialize the new disks as blockstores and add
additional storage daemons and maintenance agents:
126
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Wait for this command to complete (30 minutes per sled column).
d) Verify that the CMC dashboard displays the upgraded capacity.
Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
Check the log file in /opt/qbase3/var/log/.
For more help with the tool, include the --help flag
on the command line.
You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
The HAL type does not match what you specified on A problem with the physical replacement. Verify that
the command line. the HAL type you specified is correct and that all drives
are present in the correct nodes.
There is a disk initialization failure. Triage like a normal bad disk. Reset the HAL type,
clear the new disks and remove from the environment.
Replace the bad drive and rerun the upgrade capacity
workflow.
• If the failure was a drive failure, roll back the
upgrade by following the steps in Recovering From
a Failed Disk Initialization on page 132, then
retry the upgrade.
• If any other problem occurs and/or you do not
have enough spare parts, you must roll back every
Storage Node, put back the old sled blanks, and run
the capacity upgrade tool in finalize mode.
There is an error with the creation of storage daemons, The appropriate action ultimately depends upon the
maintenance agents, or blockstores. error.
There is an error with updating the configuration of the Ensure all nodes are still online and reachable. Bring
maintenance agents. nodes online and rerun the upgrade capacity workflow.
The monitoring agent fails to restart or did not restart in Check the monitoring agent log for the node in the error
time. message and assess if the agent was hung or having
some other problem.
127
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
Problem Action
Verification of the Storage Node monitoring database Run an Aggregate Storage Pool Info policy manually
failed. through the OSMI to pool monitoring data and then
rerun the workflow.
Reassignment of storage daemons failed. Ensure that all storage daemons are running and
reachable. Then rerun the workflow.
A Storage Node is not a recognized HAL type. You may have replaced the sled blank(s) incorrectly or
not in left-to-right order. Start over again from step 3.
Make sure that you specified the HAL type to which
you are upgrading, rather than the existing HAL type,
on the command line.
A Storage Node does not have the correct number of If any replacement sled contains disks that were
unmanaged disks. previously used by EasiScaleTM or not empty, replace
these disks with new, empty ones.
Check to make sure no disk or sled is incorrectly
inserted.
When you think the problem is fixed, re-run the
capacity upgrade tool in upgrade mode. In other words,
go back to step 10.
18. Run hgst_capacity_on_demand.py in finalize mode to verify that there are no unmanaged disks (system-
wide), and, for single geo systems, to make the MetaStore(s) read/write again:
a) Open an SSH session to the Management Node.
b) Type 0 to exit the OSMI menu.
c) At the Linux prompt, run hgst_capacity_on_demand.py with the --finalize option and the rack
serial number you wrote in the Work Table on page 119:
Run this with super user access.
/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --rackserial serial_number --
finalize
Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag
on the command line.
You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
There are remaining unmanaged disks. If the upgrade was executed, it should never happen.
Reference the unmanaged disks section of the CMC
with the log file to see if the disk was processed.
19. Start log rotation on the Management Node.
128
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units
What To Do Next
Run the system health check again. If there are problems, submit logs to HGST Support.
If the CMC Does Not Show Updated Capacity Statistics
Run the Aggregate Storage Pool Info policy an additional time through the OSMI (1 > 3 > 1) to synchronize the new
available capacity statistics.
129
FRU Replacement Guide A Troubleshooting
A
Appendix
A Troubleshooting
Topics: This chapter provides troubleshooting tips.
• General
• Marking a MetaStore as
Read-Only
• Marking a MetaStore as
Read/Write
• Invoking the Repair
Process
• Recovering From a
Failed Disk Initialization
• Rolling Back a Capacity
Upgrade
• Setting Serial Number to
Rack Name
A.1 General
Problem Recommended Action
The PostgreSQL Fail over the CMC.
partition has failed,
or a NIC has failed Warning: When you are upgrading your setup, do not execute a failover. First
on the Management complete the upgrade before you start the failover.
Node.
To execute a failover, follow the instructions in Managing Hardware in the HGST Active
Archive System Administration Guide.
The wrong disk was If you accidentally replace the wrong disk, it shows up in the CMC as an unmanaged disk. An
replaced. unmanaged disk is a newly installed disk that the Active Archive System cannot determine a
purpose for (in other words, whether it is a replacement disk or really a new disk).
You shut down a node Connect a monitor to the node's VGA port, and a keyboard to its USB port. Restart the node.
in order to replace it Observe any error messages that it outputs.
or something in it, but
130
FRU Replacement Guide A Troubleshooting
Note: This procedure does not restart maintenance agents. While restarting maintenance agents
immeditely will ensure all agents start working on repairs immediately, it may have the undesired side
effect of interrupting an existing repair, which will prolong the time in which the disk safety returns to an
optimal level.
1. From the Q-Shell on the Management Node, trigger a repair crawl on all storage daemons:
for sd in q.dss.manage.listStorageDaemons(count=1024).keys():
ipaddr = q.dss.manage.showStorageDaemon(sd)['address'].split(':')[0]
port = int(q.dss.manage.showStorageDaemon(sd)['address'].split(':')[1])
q.dss.manage.repairStartCrawl(nodeIP=ipaddr, port=port)
The maintenance agent logs on the storage nodes (/opt/qbase3/var/log/dss/maintenanceagents/<id>.log) will indicate
that the daemon is executing repairs. If the maintenance agents do not appear to be actively executing, a monitor
crawl can be run against all storage daemons to pull in the latest data:
for sd in q.dss.manage.listStorageDaemons(count=1024).keys():
q.dss.manage.monitorMaster(sd, realtime=True)
131
FRU Replacement Guide A Troubleshooting
2. Watch the monitorStoragePool output to see when the objects have returned to a normal disk safety:
q.dss.manage.monitorStoragePool()
HDD='device_path'
SLOT="$(sg_ses `lsscsi -g | grep -E 'PIKES|STOR ENCL JBOD' | awk {'print $NF'}` -jj
|grep `lsscsi -tg |grep -w $HDD | awk {'print $3'} |cut -b5-` -B18 |grep SLOT|awk
'{print $1,$2,$3,$4}')"
echo $HDD "is in " $SLOT
api = i.config.cloudApiConnection.find('main')
mname = 'machine_hostname'
original_hal_type = 'old_HAL_type'
mguid = api.machine.find(name=mname)['result'][0]
For example,
api = i.config.cloudApiConnection.find('main')
mname = 'HGST-S3-DC01-R01-SN05'
original_hal_type = 'HGST_B14_SN'
mguid = api.machine.find(name=mname)['result'][0]
132
FRU Replacement Guide A Troubleshooting
4. Power down the node physically and replace the problematic disk / sled.
5. Rerun the capacity upgrade tool.
133
FRU Replacement Guide A Troubleshooting
2. Run hgst_capacity_on_demand.py in finalize mode to verify that there are no unmanaged disks (system-
wide), and to make the MetaStore(s) read/write again (for single geo systems only):
a) Open an SSH session to the Management Node.
b) Type 0 to exit the OSMI menu.
c) At the Linux prompt, run hgst_capacity_on_demand.py with the --finalize option and the rack
serial number you wrote in the Work Table on page 119:
Run this with super user access.
/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --rackserial serial_number --
finalize
Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag on
the command line.
You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
There are remaining unmanaged disks. If the upgrade was executed, it should never happen.
Reference the unmanaged disks section of the CMC with
the log file to see if the disk was processed.
3. Start log rotation on the Management Node.
At the Linux prompt on the Management Node, type:
mv ~/apache /etc/logrotate.d/
134
FRU Replacement Guide Active Archive System Glossary
H
Hand Tools Any tool that is readily available in the market place and
operated by hand.
HDD Hard Disk Drive
S
SEP SCSI Enclosure Processor
A group of SAS expanders which are located in the same
JBOD/Server enclosure. A SEP operates as a single
customer visible functional unit to provide enclosure
services functionality.
V
VPD Vital Product Data
Field replaceable unit part number, serial number, and so
on, stored in an I2C EEPROM.
135
FRU Replacement Guide Index
Index
A M
acronyms 135 machine name 13
maintenance agent 119, 131, 132, 133
C MetaStore
read-only 131, 131
capacity upgrade rollback 133 system
capacity upgrade tool 116, 117, 118, 119, 119 env_metastore 13
capacity utilization 116 framework 13
checkblock 116, 117 miniSAS cable 60, 104
conventions 3, 3, 3 model type 117
copyright 2 monitoring agent 131, 131
D N
daemon NIC 130
storage 119, 131, 132, 133 node
data unavailability 116, 117 controller
decommissioned disk details 30, 34, 56, 90 chassis 13
decommissioned disks 118, 119, 119 FRU list 13
degraded disks 118, 119, 119 HDD 30
disk PSU 38
unmanaged 130 SSD 34
wrong 130 storage
disk safety 90, 116, 117, 118, 119, 119, 131, 132 chassis 40
drive map 30, 34, 56, 90 FRU list 40
HDD 56
F PSU 59
failover 130 normal file policy 90
field replaceable unit (FRU) 130
flexible capacity 116, 117 P
part number (P/N) table 118, 119, 119
H part numbers 117
HAL type 132, 133 PDU
hardware abstraction layer (HAL) type 118, 119, 119 FRU list 70
hardware abstraction layer (HAL) type (model type 117 populated sled 118, 119, 119
hgst_capacity_on_demand.py 118, 119, 119 PostgreSQL 34, 130
hostname 13
hot swappable 30, 34, 56 R
Hugo 119, 119 rack
serial number
I setting 134
IOPs 117 viewing 118, 119, 119
item numbers 117 read-only mode 90, 118, 119, 119
related documents 4
L repair 118, 119, 119, 131, 132, 133
repair time 117
location LED 90
136
FRU Replacement Guide Index
S troubleshooting 130
typography 3, 3
SFP+ 1G module 68
SFP+ DAC cable 39, 69
V
single geo 118, 119, 119
sled virtual safety 116, 117
blank 118, 119, 119
populated 118, 119, 119 W
replacement
procedure 82 warnings 13, 40, 61, 70
sled blank 116, 118, 119, 119 weight 5
sled column 116, 116, 117
small file policy 90, 118
smartctl 119, 119
storage enclosure basic
capacity upgrades 116
FRU list 79
HDD
replacement
procedure 90
I/O canister
replacement
procedure 110
miniSAS cable
replacement
procedure 104
power cord
replacement
procedure 103
PSU
replacement
procedure 107
rear fan
replacement
procedure 106
sled
population
procedure 118, 119, 119
visual
indicator
FRU
location 79
storage interconnect
fan 66
FRU list 61
PSU 67
whole 61
storage policy 118
T
three geo 119, 119
137