Professional Documents
Culture Documents
XtremIO - V 4 0+4 0 1+4 0 2+4 0 4 - FRU - 302 002 044 - Rev 11
XtremIO - V 4 0+4 0 1+4 0 2+4 0 4 - FRU - 302 002 044 - Rev 11
EMC
XtremIO Storage Array
Version 4.0, 4.0.1, 4.0.2 and 4.0.4
Copyright © 2021 EMC Corporation. All rights reserved. Published in the USA.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect
to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular
purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
XtremIO, EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other
countries. All other trademarks used herein are the property of their respective owners.
For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com).
CONTENTS
Preface
Appendix D Priority FA
PREFACE
As part of an effort to improve its product lines, EMC periodically releases revisions of its
software and hardware. Therefore, some functions described in this document might not
be supported by all versions of the software or hardware currently in use. The product
release notes provide the most up-to-date information on product features.
Contact your EMC technical support professional if a product does not function properly or
does not function as described in this document.
Note: This document was accurate at publication time. Go to EMC Online Support
(https://support.emc.com) to ensure that you are using the latest version of this
document.
Purpose
This document provides the required information for replacing EMC XtremIO Storage Array
Field Replaceable Units (FRUs) that have been identified as unserviceable.
Audience
This document is intended for the EMC field support personnel.
Related documentation
The following EMC publications provide additional information:
XtremIO Storage Array Hardware Installation and Upgrade Guide
XtremIO Storage Array Software Installation and Upgrade Guide
XtremIO Storage Array User Guide
XtremIO Storage Array Release Notes
Preface 7
EMC CONFIDENTIAL
Preface
Typographical conventions
EMC uses the following type style conventions in this document:
Bold Use for names of interface elements, such as names of windows, dialog
boxes, buttons, fields, tab names, key names, and menu paths (what the
user specifically selects or clicks)
Italic Use for full titles of publications referenced in text
Monospace Use for:
• System output, such as an error message or script
• System code
• Pathnames, filenames, prompts, and syntax
• Commands and options
Monospace italic Use for variables.
Monospace bold Use for user input.
[] Square brackets enclose optional values
| Vertical bar indicates alternate selections — the bar means “or”
{} Braces enclose content that the user must specify, such as x or y or z
... Ellipses indicate nonessential information omitted from the example
Your comments
Your suggestions will help us continue to improve the accuracy, organization, and overall
quality of the user publications. Send your opinions of this document to:
techpubcomments@emc.com
CHAPTER 1
General Information
General Information 9
EMC CONFIDENTIAL
General Information
CHAPTER 2
Replacing Server Components
11
EMC CONFIDENTIAL
Replacing Server Components
If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure
and contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).
Note: Before arriving on the site, make sure that you have the updated Storage Controller
rescue image for the cluster’s version. In addition, ensure that the latest version of
Technician Advisor utility is installed on your laptop.
Note: If the customer has a Disk Retention Agreement with Dell EMC, remove the hard
disks and SSDs from the replaced Storage Controller and give them to the customer. for
instructions, refer to “Removing the Old Storage Controller Disks” on page 114.
Tolerance
Failure of a single Storage Controller results in a performance degradation.
Failure of both Storage Controllers in the same X-Brick results in:
• Loss of service in a multiple X-Brick cluster
• Data loss in a single X-Brick cluster
Failure of both InfiniBand links and/or both SAS ports in the same Storage Controller
results in a Storage Controller failure.
Opening and Closing a Tunnel Between a Storage Controller and the XMS
A tunnel must be opened in order to access the XMS via a Storage Controller that is
healthy.
Note: Since the Storage Controller may not be able to open an SSH tunnel due to
security issues, the tunnel is opened from the XMS’s side.
2. Upon completion of the procedure (when access to XMS is no longer required), make
sure to close the tunnel.
To close the tunnel that was opened between the Storage Controller and the XMS:
Run the following CLI command:
modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> close
Note: Make sure to access the XMS via a Storage Controller that is healthy.
Note: Make sure to access the XMS via a Storage Controller that is healthy.
Note: It is recommended to keep the CLI window in a maximized mode. Minimizing the
window may cause the activation progress bar to be displayed on new lines instead of
the same line.
Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.
Note: The cluster-id parameter is not mandatory for single cluster configurations.
5. Use Table 1 to record the configuration data of the defective Storage Controller, and
refer to it when you configure the new Storage Controller.
X-Brick Name
Cluster Name
cluster-id
Note: Make sure to close the tunnel between the Storage Controller and XMS when access
to XMS is no longer required, as described in “Opening and Closing a Tunnel Between a
Storage Controller and the XMS” on page 13.
Network ports are assigned for each Storage Controller, two at a time. For a (partial)
example, refer to Table 2 to determine the range of ports assigned for each Storage
Controller.
Table 2 Required Network Ports for Storage Controller Replacement (Partial Example)
.... ....
Note: The network port 11112 is only required if Storage Controllers are using an IPv6
management IP address.
It is necessary to confirm that each required network port from the XMS is open to its
respective Storage Controller.
Note: For checking the required ports to a defective Storage Controller, use the existing
Storage Controller to verify whether the port is open. However, if the defective Storage
Controller is not responsive, work with the customer to check the required ports for the
peer Storage Controller instead.
Note: Make sure to close the tunnel between the Storage Controller and the XMS upon
completion, as described in “Opening and Closing a Tunnel Between a Storage Controller
and the XMS” on page 13.
Replacing the Defective Storage Controller Using the Technician Advisor Utility
The Storage Controller replacement procedure should be performed using the XtremIO
Technician Advisor utility following a Service Request (SR), determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.
Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under Service
Scripts and Utilities > XtremIO Technician Advisor.
Note: If XtremIO Global Tech Support instructs you to follow the manual configuration
procedures, refer to Appendix E.
To identify the defective Storage Controller power supply, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the Storage Controllers power supply status, using the following command:
show-storage-controllers-psus cluster-id="<cluster name>"
Name Index Serial-Number Location-Index Power-Feed State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index
X1-SC1-PSU1 1 E98791D1251179549 1 port_1 healthy on left 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC1-PSU2 2 E98791D1251179559 2 port_1 healthy on right 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC2-PSU1 3 E98791D1242139127 1 port_2 failed on left 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1
X1-SC2-PSU2 4 E98791D1242139116 2 port_2 healthy on right 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1
3. Note the Index of Storage Controller power supplies with a non-healthy state.
To identify the defective Storage Controller power supply, using the GUI:
From the GUI, view the Inventory; the defective Storage Controller power supply
appears in orange.
Note: You can access the Dell EMC SolVe Desktop at:
https://solve.emc.com/desktopbinaries/setup.exe
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
2. Disconnect the power cable from the defective Storage Controller power supply. To
revoke cable retention, release the power cord latch. The cables should remain
fastened by the cable strap in the cable management bracket.
3. To remove the Storage Controller power supply, push the green lever and then pull on
the handle.
Note: If the defective Storage Controller power supply should be sent to Dell EMC for
Failure Analysis (FA), refer to Appendix D for the procedure details.
2. Connect the power cable to the new Storage Controller power supply. To resume cable
retention, fasten the power cord latch.
3. Lift the cable tray of the cable management bracket, while pulling the latches (on the
left and right sides of the bracket) until the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in position.
Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.
To verify that the new Storage Controller power supply is healthy, using the CLI:
1. Log in to the XMS CLI as tech.
2. Wait several seconds, then run the following command:
show-storage-controllers-psus cluster-id="<cluster name>"
Name Index Serial-Number Location-Index Power-Feed State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index
X1-SC1-PSU1 1 E98791D1251179549 1 port_1 healthy on left 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC1-PSU2 2 E98791D1251179559 2 port_1 healthy on right 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC2-PSU1 3 E98791D1242139127 1 port_2 healthy on left 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1
3. If the State is not healthy, inspect the Storage Controller power supply.
To verify that the new Storage Controller power supply is healthy, using the GUI:
1. Hover the mouse pointer over the new Storage Controller power supply; a ToolTip
appears, showing the power supply status.
2. Verify that the State is Healthy.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
Replacing an SFP+
The SFP+ replacement procedure should be performed following a Service Request (SR)
determined by XtremIO Global Technical Support.
Tolerance
Failure of an SFP+ may result in performance degradation.
Opening and Closing a Tunnel Between a Storage Controller and the XMS
Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). For instructions, refer to “Opening and Closing a
Tunnel Between a Storage Controller and the XMS” on page 13.
Procedure Prerequisite
Make sure to perform the following instruction prior to replacing an SFP+.
Note: XtremIO Global Tech Support should confirm this procedure prerequisite with the
respective Dell EMC network connectivity teams and with the customer.
For suspected SFP+ errors and/or iSCSI/Fibre Channel “connection to XtremIO cluster”
errors, arrange for the Connectivity team to confer with the customer in order to confirm
that iSCSI and Fibre Channel environment(s) to the XtremIO Storage Controller iSCSI or
Fibre Channel ports are validated. This includes confirming the network or Fibre Channel
switches, switch ports, network patch panels, cables and cable reseating (at both ends).
An SFP+ replacement procedure must only be performed after all other network
components and configurations have been verified. If not, replacing an SFP+ will not
resolve the issue.
Replacing an SFP+ 23
EMC CONFIDENTIAL
Replacing Server Components
Note: Identify the Storage-Controller-Name for each target by either the Name
value, or by running the following command:
show-targets prop-list=["Storage-Controller-Name"]
Note: In the example provided, following this step, a subset of SFP+s on the cluster is
detected as potentially defective. However, other SFP+s on the cluster can also be
defective. Complete the remaining steps in this procedure to determine thoroughly
which of the cluster’s SFP+s are defective.
1. Assuming the FC network supports 8GFC and was tested as noted in the prerequisites, prior to
starting this procedure.
2. Assuming the iSCSI network supports 10Gb and was tested as noted in the prerequisites, prior to
starting this procedure.
Replacing an SFP+ 25
EMC CONFIDENTIAL
Replacing Server Components
Note: If an SFP+ loopback tool is not available, skip this section and proceed with the rest
of the SFP replacement procedure.
Note: For further details on using LEDs to identify components, refer to Appendix C.
3. Using the noted details of the defective SFP+ (Name, Index, Port-Type, and
Target-Port-HW-Label), physically locate the SFP+ on the Storage Controller located
following step 2 of “Identifying the Defective SFP+”. For details, refer to the
Connecting the Cluster to Host section of the XtremIO Hardware Installation and
Upgrade Guide.
4. From the rear of the Storage Controller, unplug the (iSCSI or Fibre Channel) cable
connected to a defective SFP+.
Replacing an SFP+ 27
EMC CONFIDENTIAL
Replacing Server Components
Note: Use an orderable SFP+ extraction tool to raise the SFP+ bail. If an SFP+ extraction
tool is not available, carefully use a flat-headed screwdriver to lift the SFP+ bail.
6. Grasp the bail and slide the SFP+ out from the Storage Controller.
Note: The defective SFP+ should be sent to Dell EMC for Failure Analysis (FA) if possible.
Refer to Appendix D for the procedure details.
Note: For details on the required replacement SFP+ with XtremIO, refer to the XtremIO
Part Number List on XtremIO SolVe (Solve Desktop > XtremIO Generator > XtremIO X1
(XIOS 2.x, 3.x, 4.x) > FRU Replacement Procedures > XtremIO FRU Part Number List).
2. Make sure that the mating connector of the new SFP+ is free of dirt and/or obstacles.
3. Align the new SFP+ with the guides in the slot, and insert the SFP+ by sliding it into the
slot until slight resistance is felt.
Replacing an SFP+ 29
EMC CONFIDENTIAL
Replacing Server Components
Wait for 15 minutes before verifying that the replacement was successful.
5. Run the following command to verify the SFP+ replacement was successful:
show-targets cluster-id="<cluster name>"
6. On the show-target output, locate the information for the replaced SFP+(s), using
the Name and Index of the replaced defective SFP+.
7. Verify a successful FC SFP+ replacement, as follows:
a. Run the following command to verify that the Port-Speed is 8GFC and that the
Port-State is up:
show-targets-fc-error-counters cluster-id="<cluster name>"
b. In the show-targets-fc-error-counters output, locate the corresponding
FC target, using the Index of the replaced FC SFP+.
c. Verify that for this FC target, the Sync-Loss and Lync-Failure column values
no longer increase.
Replacing an SFP+ 31
EMC CONFIDENTIAL
Replacing Server Components
If the response shows alerts with the “repeating” text in the prefix, it is necessary to
clear the alert counters.
Note: Clearing alert counters clears all of the system’s alerts. In case of multiple alerts,
make a note of the components with repeated active alerts, prior to clearing alert
counters.
Default Gateway
Note: You can access the Dell EMC SolVe Desktop at:
https://solve.emc.com/desktopbinaries/setup.exe
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: Make sure that all cables are clearly labeled before disconnecting then from the
XMS.
3. Remove the bezel that covers the front of the server, as follows:
a. If the bezel is locked, unlock the bezel with the provided key.
b. Simultaneously press the tabs on both sides of the bezel to release it from its
latches, then pull the bezel off the component.
4. Remove the stabilizing screw behind the latch bracket on each side.
Note: A JIS screwdriver may be required if the rails are from an older version.
5. Pull the server forward until is locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.
Note: If the defective XMS should be sent to Dell EMC for Failure Analysis (FA), refer to
Appendix D for the procedure details.
Note: For more detailed instructions on installing the physical XMS, refer to the XtremIO
Storage Array Hardware Installation and Upgrade Guide.
a. Align the large end of the rail notches on the inner rail with the connection studs on
the side of the server.
b. Push the flat side of the inner rail onto the connection studs.
c. Slide the inner rail backwards along the server, until the studs fit securely into the
small end of the rail notches.
An audible click indicates that the rail is secure.
2. From the front of the cabinet, align the inner rails that are attached to the server with
the channels on the inside of the slide rails.
3. Slide the server into the slide rails and push the server into the cabinet.
An audible click indicates that the slide rails are engaged and locked.
4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the cabinet.
5. To further secure the rail assembly and server in the cabinet, insert and tighten a small
stabilizer screw directly behind each bezel latch.
6. Connect the two power cables to the XMS.
7. Connect the network cable to the MGMT1 Ethernet port (marked "1") on the physical
XMS.
8. If you initially tilted the cable management bracket's tray (up/down), on the Storage
Controller adjacent to the XMS, return it to its original position by pulling the latches
(on the left and right sides of the bracket) until the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in position.
Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.
Note: If the Tech port connection fails, or the OS fails to load, reinstall the physical
XMS with the appropriate XtremIO XMS Rescue Image. Refer to “Re-Installing a
Physical XMS” on page 91 for details.
Note: If the user wants to use IPV6, proceed with Step 3 only once the software
installation process has been completed, as described in XtremIO Storage Array
Software Installation and Upgrade Guide.
Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.
Note: If the package is not on the Support page for XtremIO, contact the XtremIO
Global Tech support.
Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.
5. Upload the software image to /var/lib/xms/images. Use an SFTP client (e.g. Filezila,
WinSCP) to log in as the xmsupload user and transfer the package downloaded on
your computer to the XMS. When the file transfer is complete, close the SFTP client
and re-open putty (SSH client) to the XMS.
Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit
> > 1
6. From the Install Menu, select Perform XMS install only. Enter the image file name that
was used in the previous step as input.
Install menu
-------------------------------------
XtremIO install interface
Checking XMS health
XMS health check passed
Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit
>>5
5
Enter Installation image filename (previous value: ''):
> upgrade-to-4.0.4-23.tar
upgrade-to-4.0.4-23.tar
Input received: 'upgrade-to-4.0.4-23.tar'
Installing XMS
Reformatting XMS
XMS installed successfully
9. Run the recover-xms command, and enter the IP address of a Storage Controller for
each of the clusters that should be managed by the XMS, followed by the force flag
(to override earlier cluster-XMS associations).
Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started
Done!
XMS recovery finished successfully
12. Optional: Following the XMS recovery process, if you want to refresh the SSH key, run
the following command:
refresh-xms-ssh-key
13. After the recovery has successfully completed, log out of XMS CLI.
14. Log in to XMS CLI as admin.
Note: For detailed instructions, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.
Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.
Note: If the user wants to use IPV6, proceed with Step 5 only completing the software
installation.
Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.
Note: If the package is not on the Support page for XtremIO, contact XtremIO Global
Tech support.
Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.
Note: Make sure that the software image is of the same version as that used by the
running cluster.
Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit
> > 1
8. From the Install Menu, select Perform XMS install only. Enter the image file name that
was used in the previous step as input.
Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit
>5
Please enter installation image filename:
> upgrade-to-4.0.0-XXX.tar
Running: /xtremapp/utils/first_install.py 0 0 /var/lib/xms/images/upgrade-to-4.0.0-XXX.tar
Installing XMS
Reformatting XMS
Installation ended successfully
Note: Even if working with a single cluster, ensure to add the single IP address.
Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started
Done!
XMS recovery finished successfully
14. After the recovery has successfully completed, log out of XMS CLI.
15. Log in to XMS shell as admin.
16. Review the XMS configuration and verify that SNMP, Email, and event handlers
definitions are correctly set.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
CHAPTER 3
Replacing DAE Components
Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.
Note: The cluster-id parameter is not mandatory for single cluster configurations.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
If the alert is raised on more than one SSD in the XtremIO cluster, make sure to replace the
defective SSDs systematically, one at a time. Therefore, it is necessary to wait for the
rebuild and integration of each new SSD to complete entirely BEFORE proceeding to
replace the next SSD, after each SSD is replaced. Refer to EMC KB 205558 for further
details, and up-to-date information on this scenario.
Note: For further details on using LEDs to identify components, refer to Appendix C.
For 10TB clusters that support encryption (PSNT P/N - 900-586-004) or for 10TB Starter
X-Brick (5TB) clusters (900-586-005), ensure that the SSD has one of the following part
numbers before replacing it:
005050673
00505110
Inserting an SSD with a different part number will prevent enabling encryption on this
cluster.
8. Generate and upload a log bundle (refer to “Generating and Uploading a Log Bundle”
on page 93).
No Rebuild in Progress
Note: The defective SSD should be sent to EMC for Failure Analysis (FA) if possible. Refer to
Appendix D for the procedure details.
To remove the defective SSD entry from the cluster database, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"
3. For SSDs with a failed_in_rg or revoked_from_rg state, note the SSD Index,
Brick-Name and XDP Group.
4. Remove the SSDs entry, using the following command:
remove-ssd ssd-id=<Name or Index> cluster-id="<cluster
name>"
5. Verify that the SSD entry has been removed, using the following command:
show-ssds cluster-id="<cluster name>"
To remove the defective SSD entry from the cluster database, using the GUI:
1. Right-click the defective SSD.
2. Click Remove SSD.
To add the new SSD to the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster-name>"
3. Note the new SSD WWN.
For example:
wwn-0x5000cca013118950
4. Add the new SSD to the relevant XDP Group, using the following command:
add-ssd brick-id=<Brick ID> ssd-UID=<SSD Index or Name>
For example:
add-ssd brick-id=1 ssd-uid="wwn-0x5000cca013118950"
cluster-id="Cluster_One"
Note: If the SSD you added is not a new SSD (out of the box) and was used in another
cluster or the same one, use the is-foreign-xtremapp-ssd flag.
For example:
add-ssd brick-id=1 ssd-uid="wwn-0x5000cca013118950"
is-foreign-xtremapp-ssd cluster-id="Cluster_One"
5. To add the new SSD to the XDP Group using the GUI, right-click the new SSD and click
Add SSD. This also assigns the SSD to the correct XDP Group.
To assign the new SSD to the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. Assign the SSD to the XDP Group, using the following command:
assign-ssd dpg-id=<X> ssd-id=<Y> cluster-id="<cluster name>"
where X = XDP group Index for the defective SSD and Y = defective SSD Index.
For example:
assign-ssd dpg-id=1 ssd-id="wwn-0x5000ccashow013118950"
cluster-id="Cluster_One"
3. Use the following command to check if the integration process has completed:
show-ssds cluster-id="<cluster name>"
Check if the State changes from assigning_to_rg to in_rg.
Rebuild in Progress
Failed to Rebuild
To identify an SSD that has failed in the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"
3. Note any SSDs in failed_in_rg state.
4. For each revoked SSD, perform the steps in “No Rebuild in Progress” on page 47.
To identify an SSD that has failed in the XDP Group, using the GUI:
1. Hover the mouse pointer over the defective SSD; a ToolTip appears, showing the SSD
status.
2. For each failed SSD, perform the steps in “No Rebuild in Progress” on page 47.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
Tolerance
Failure of a DAE chassis results in loss of service.
Note: For further details on using LEDs to identify components, refer to Appendix C.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Verify that you specify the correct cluster name.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
6. If cables are not marked, label them so that you can reconnect them as required to the
new DAE chassis.
7. Disconnect the power cables from the DAE’s PSUs.
8. Disconnect the SAS cables from the DAE Controllers.
9. Remove the DAE Controller (LCC) units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing a DAE Controller
(LCC)”).
10. Remove the DAE power supply units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing a DAE Power Supply”).
11. Remove the DAE bezel.
12. Remove each SSD (one at a time) from the defective DAE chassis and immediately
insert it into the same slot in the new DAE Chassis.
13. If you are replacing the DAE of a 10TB Starter X-Brick (5TB):
a. Remove the 12 plastic air seals from slots 13 through 24 of the defective DAE
chassis.
b. Insert the removed air seals into slots 13 through 24 of the new DAE chassis.
If you are replacing the DAE of a regular X-Brick, ignore this step.
14. Remove the four screws (two per side) that secure the front of the enclosure to the
front vertical channels of the cabinet, and save the screws.
15. With help from another person, slide the enclosure out of the cabinet.
Note: If the defective DAE chassis should be sent to EMC for Failure Analysis (FA), refer to
Appendix D for the procedure details.
Note: Make sure that the latches are engaged and the tray is locked in its position.
Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.
Note: If the state of the DAE chassis is other than healthy, contact XtremIO Global
Tech support.
Note: If the state of the DAE chassis is other than healthy, contact XtremIO Global
Tech support.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
Tolerance
Failure of both DAE Controllers (or all SAS cables) in the same X-Brick results in loss of
service.
Note: For further details on using LEDs to identify components, refer to Appendix C.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Tech Support before taking any further action.
3. If the cluster is a factory-assembled rack, remove the shipping bracket from behind
the DAE to be serviced.
4. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
5. Make sure that the SAS cables are labeled. If not, label them as necessary, so that you
can reconnect them as required to the DAE Controller.
6. Disconnect the SAS cables from the defective DAE Controller.
Note: When disconnecting the cables it is important to note the ports the cables were
disconnected from, so that you can reconnect them to the same ports after installing
the new DAE Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.
7. Remove the defective DAE Controller unit from the DAE as follows:
a. Locate the orange handle buttons on the DAE Controller handles.
b. Press the orange handle buttons to release the DAE Controller, pull the latches
outward, and remove the DAE Controller from its slot.
Note: If the defective DAE Controller should be sent to EMC for Failure Analysis (FA), refer
to Appendix D for the procedure details.
Note: Make sure that the latches are engaged and the tray is locked in its position.
Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.
Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Tech Support before taking any further action.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: Access to the disks in your DAE times out two minutes after a DAE power supply unit
is removed. While the system continues operating on a single PSU, the loss of the
removed PSU causes a timeout unless the PSU is replaced within two minutes. When
replacing a DAE PSU, ensure that the green light on the PSU remains permanently on for at
least five seconds before removing power on the second PSU.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
3. Disconnect the power cable from the defective DAE power supply.
Note: Ensure that the new DAE PSU is prepared for insertion.
Note: If the defective DAE power supply should be sent to EMC for Failure Analysis (FA),
refer to Appendix D for the procedure details.
2. Connect the DAE power supply power cable. A green light indicates that the DAE power
supply is successfully connected.
3. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in its position.
Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
CHAPTER 4
Replacing InfiniBand Switch Components
Note: In versions below 2.2.2.10, the InfiniBand Switches names, indexes and location
are reversed in the XMS GUI and CLI. Make sure you operate on the other switch than the
one indicated in the Hardware View. As best practice, you can compare the switch’s actual
S/N to the one presented in the GUI. It is always advisable to check the cable connection
and LED activities on the Storage Controllers IB NIC to make sure that you are operating on
the correct switch.
Tolerance
Failure of a single InfiniBand Switch renders the cluster vulnerable to risk of failure of
the second InfiniBand Switch and therefore, compromises redundancy.
Failure of both InfiniBand Switches in the same cluster results in loss of service.
Note: System Status LEDs are located at the front and rear of the InfiniBand Switch. A solid
red LED indicates that a major error has occurred.
Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.
Note: The cluster-id parameter is not mandatory for single cluster configurations.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: Make sure that all cables are clearly labeled to enable proper connection to the new
InfiniBand Switch.
6. Carefully remove the InfiniBand Switch from the rack, taking care not to disconnect
any other cables.
7. Note the position of the inner rails on the defective InfiniBand Switch, so as to mount
them at the exact same position, on the new InfiniBand Switch.
8. Remove the inner rails from the InfiniBand Switch.
Note: It is recommended to remove and install one rail (for reference) before removing
the second rail.
120
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
PS1
PS2
UID
RST
Note: If the defective InfiniBand Switch should be sent to EMC for Failure Analysis (FA),
refer to Appendix D, page D-101 for the procedure details.
Note: Verify that the correct holes are aligned to ensure that the depth of the
InfiniBand Switch within the rack is adjusted correctly.
2. Secure each inner rail to the InfiniBand Switch, using three screws.
3. Lift the InfiniBand Switch and slide it onto the rails.
4. Align the screw hole of each bezel clip with those on the front side of the inner rails
(one on each side).
120
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
2
S1
PS2
P
UIDT
RS
5. Through each bezel clip, tighten a screw (one on each side) to secure the unit to rack.
6. Connect the InfiniBand Switch power cables.
7. Connect the InfiniBand Switch interlink cables (labeled IBSW1-P17 and IBSW1-P18).
8. If you removed a shipping bracket directly above or below the InfiniBand Switch,
re-install it.
9. Wait for the interlinks to synchronize, as shown by the green LEDs on the InfiniBand
Switch associated ports.
10. Connect the remaining InfiniBand cables from the Storage Controllers.
11. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in its position.
Note: Make sure you are configuring the InfiniBand Switch that was just replaced, and
not the existing InfiniBand Switch.
3. Wait for several seconds and then run the following command:
show-infiniband-switches cluster-id=<cluster name>
Make sure that for the new InfiniBand Switch, the State column displays healthy.
4. Verify that the PSU is healthy, by running the following command:
show-infiniband-switches-psus cluster-id= <cluster name>
5. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id=<cluster name>
The output for show-clusters when the cluster is online:
Module-Name Index Cluster-Name Index XEnv-Name Index Storage-Controller-Name Index Assigned-To-LCC Module-Type State
X1-SC1-R1 1 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 ROUTER active
X1-SC1-C1 2 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 CONTROL active
X1-SC1-D1 3 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 X1-DAE-LCC-B DATA active
X1-SC1-R2 4 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 ROUTER active
X1-SC1-C2 5 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 CONTROL active
X1-SC1-D2 6 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 X1-DAE-LCC-A DATA active
X1-SC2-R1 7 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 ROUTER active
X1-SC2-C1 8 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 CONTROL active
X1-SC2-D1 9 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 X1-DAE-LCC-B DATA active
X1-SC2-R2 10 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 ROUTER active
X1-SC2-C2 11 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 CONTROL active
X1-SC2-D2 12 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 X1-DAE-LCC-A DATA active
X2-SC1-R1 13 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 ROUTER active
X2-SC1-C1 14 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 CONTROL active
X2-SC1-D1 15 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 X2-DAE-LCC-B DATA active
X2-SC1-R2 16 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 ROUTER active
X2-SC1-C2 17 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 CONTROL active
X2-SC1-D2 18 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 X2-DAE-LCC-A DATA active
X2-SC2-R1 19 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 ROUTER active
X2-SC2-C1 20 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 CONTROL active
X2-SC2-D1 21 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 X2-DAE-LCC-B DATA active
X2-SC2-R2 22 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 ROUTER active
X2-SC2-C2 23 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 CONTROL active
X2-SC2-D2 24 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 X2-DAE-LCC-A DATA active
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
InfiniBand Switches are equipped with two replaceable power supply units that work in a
redundant configuration. Either unit may be extracted without bringing down the system.
Note: Make sure that the power supply unit that you are NOT replacing is showing all
green, for both the power supply unit and System Status LEDs.
Tolerance
Failure of a single InfiniBand Switch power supply unit does not affect the InfiniBand
Switch operation.
Failure of both InfiniBand Switch power supply units will lead to an InfiniBand Switch
failure.
To identify the defective InfiniBand Switch power supply unit, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switches status, using the following command:
show-infiniband-switches-psus cluster-id=<cluster name>
3. Note the Index of the InfiniBand Switch power supply unit with a non-healthy state.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
Note: Do not attempt to insert a power supply unit with a power cord connected to it.
2. Insert the power supply unit by sliding it into the opening until a slight resistance is
felt.
3. Continue pressing the power supply unit until the latch snaps into place, confirming
proper installation.
4. Insert the power cord into the power supply unit connector, until the power cord
retainer is latched.
Note: The green power supply unit indicator should illuminate. If not, repeat the whole
procedure to extract the power supply unit, and re-insert it.
Note: Make sure that the latches are engaged and the tray is locked in its position.
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
Tolerance
Failure of one or more fan units does not affect the InfiniBand Switch operation, as
long as the ambient temperature is below 45° Celsius.
If one or more fan units fail and the ambient temperature exceeds 45° Celsius, the
InfiniBand Switch fails.
Note: Operation without a fan unit should not exceed two minutes.
During a fan hot-swap procedure, if the LED indicator is OFF, the fan unit is
disconnected.
Note: Make sure that the fans have the air flow that matches the model number. An air
flow opposite to the system design will cause the system to operate at a higher (less
than optimal) temperature.
To identify the defective InfiniBand Switch fan unit, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switch power supply unit status, using the following command:
show-infiniband-switches cluster-id=<cluster name>
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"
For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.
Failure to follow the above step may lead to data loss on the affected XtremIO cluster.
The green Fan Status LED should illuminate. If not, extract the fan unit and reinsert it.
After two unsuccessful attempts to install the fan unit, contact XtremIO Global Tech
Support for guidance and directions. No further action should be taken without
explicit direction from XtremIO Global Tech Support.
3. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in its position.
4. Identify the status of the fan unit via the XMS CLI, using the following command:
show-infiniband-switches cluster-id=<cluster name>
Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.
CHAPTER 5
Replacing Battery Backup Units
The Battery Backup Unit replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support. Technician Advisor is initially used to identify defective Battery
Backup Units on the cluster, and is then used to replace each Battery Backup Unit that is
identified as defective.
If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).
Tolerance
Failure of more than half of the BBUs in the same cluster results in loss of service.
Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.
Note: The cluster-id parameter is not mandatory for single cluster configurations.
Replacing a BBU
Once a defective Battery Backup Unit is identified, the Battery Backup Unit replacement
procedure must be performed using the Technician Advisor utility.
Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under Service
Scripts and Utilities > XtremIO Technician Advisor.
Note: If the XtremIO Technician Advisor Utility User Guide instructs that the Technician
Advisor utility cannot be used to replace Battery Backup Units on your cluster, contact
XtremIO Global Tech Support for directions on how to manually replace the Battery Backup
Units.
Replacing a Battery Backup Unit manually may lead to data-loss if not performed
correctly! Therefore, every effort must be made to use Technician Advisor to automatically
replace a cluster’s Battery Backup Unit.
Incorrect replacement of 5P 1550i BBU serial communication cables may result in
damage to connectors and/or component ports.
5P 1550i Battery Backup Units are supplied with DB9-RJ45 serial data cables
accompanied by DB9-RJ50 adapters, or with RJ45-RJ50 serial communication cables with
labeling clearly indicating which devices and ports to plug into, depending on the XtremIO
hardware version in use.
A defective cable and/or cable adapter of this type must be replaced with a new RJ45-RJ50
serial communication cable.
Note: Replacement RJ45-RJ50 serial communication cables may not be labeled to indicate
which devices and ports to plug into.
Tolerance
In single X-Brick clusters, a failure of both communication cables (one for each BBU)
results in loss of service.
In multiple X-Brick clusters, a failure of more than half of the overall communication
cables in the cluster results in loss of service.
To verify whether failed serial communication cables exist within the cluster:
1. Log in to the XMS CLI as tech.
2. Run the following command:
show-bbus cluster-id="<cluster name>"
Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy sc_2_disconnected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...
10101
3. Disconnect the RJ50 end of the defective communication cable (or cable adapter) from
the COM (R) port of the BBU.
COM (R)
4. Connect the RJ45 end of the replacement communication cable (as indicated in the
figure below) to the 10101 port of the Storage Controller.
5. Connect the RJ50 end of the replacement communication cable (as indicated in the
figure above) to the COM (R) port of the BBU.
Note: Verify that the RJ50 end of the cable is connected to the BBU COM (R) port, and
that the RJ45 end of the cable is connected to the Storage Controller 10101 port.
6. If you initially tilted the cable management bracket's tray (up/down) of the connecting
Storage Controller, return it to its original position by pulling the latches (on the left
and right sides of the bracket) until the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in position.
Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy connected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...
APPENDIX A
Software Re-Installation
This section provides instructions for downloading and re-installing a software image on
the Storage Controller and XMS.
This section includes the following topics:
Writing the XtremIO Rescue Image to a USB Drive.................................................... 88
Re-Installing a Storage Controller ............................................................................ 90
Re-Installing a Physical XMS ................................................................................... 91
Software Re-Installation 87
EMC CONFIDENTIAL
Software Re-Installation
Note: Verify that you have a USB drive that is at least 2GB in capacity.
1. Locate the XtremIO Rescue Image from the XtremIO Global Tech Support page in
support.emc.com.
For details on the XtremIO Storage Controller Rescue Image or XtremIO virtual XMS
Rescue Image to download from the support page, refer to the latest Release Notes for
the XtremIO installed version.
Note: Before proceeding, access the EMC Support page and verify that the MD5
checksum of the package you downloaded matches the MD5 checksum that appears
in the support page for that package.
2. Download the image to the local machine where the USB drive will be created.
Note: Before you proceed, verify that the USB drive is available.
Note: Use Window Explorer to make sure that the correct drive letter is selected.
6. Click Write to write the image file to the USB Drive; a warning appears to indicate that
existing data on the selected drive will be overwritten.
7. Verify that the correct drive letter is selected and click Yes to confirm.
8. Follow the write operation progress. When the operation is completed, a message
appears, indicating that the write was successful.
9. From the Windows Notification Area, click Safely Remove Hardware and Eject Media.
Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).
Wait for the "Safe to remove hardware" message to appear in the Notification Area and
remove the USB drive.
An X-Brick Storage Controller image is available for USB flash drives to restore a Storage
Controller to its original state.
Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.
Note: It is important to keep the affected Storage Controller isolated from the rest of
the XtremIO cluster, throughout the re-installation procedure.
2. Power-cycle the Storage Controller by unplugging and re-connecting its two power
cables.
3. As the Storage Controller powers up, press F6 to enter the Boot Device menu.
4. When prompted, type the BIOS password to display the Boot Device menu.
Note: If the Boot Device menu is not displayed, F6 was pressed too late. Go back to
step 1 and repeat the procedure.
Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).
6. When the Storage Controller is booted-up, select Install XtremApp from the GRUB
menu.
7. Wait for the installation to complete and for the Storage Controller to reboot.
8. Remove the USB drive.
9. Reconnect the InfiniBand and SAS cables to the Storage Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.
An XMS image is available for USB flash drives to install physical XMS node.
Extract the image to a USB flash drive (refer to “Writing the XtremIO Rescue Image to a USB
Drive” on page 88) and connect the USB flash drive to the XMS USB port.
Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.
Note: If the Boot Device menu is not displayed, F6 was pressed too late. Go back to
step 1 and repeat the procedure.
Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).
7. When the server is booted-up, select Install XMS from the GRUB menu.
8. Wait for the installation to complete and for the XMS to reboot.
9. Remove the USB drive.
APPENDIX B
Generating and Uploading a Log Bundle
This section provides instructions for generating and loading an XtremIO log bundle to FTP.
This section includes the following topics:
Generating and Collecting the Bundle ..................................................................... 94
Uploading the Bundle Collection............................................................................. 94
Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.
Note: The cluster-id parameter is not mandatory for single cluster configurations.
3. Copy the link into a web browser and download the package.
APPENDIX C
Using LEDs to Identify Hardware Components
This section provides instructions for locating LEDs through CLI commands and using the
GUI.
This section includes the following topics:
Hardware Components’ LEDs .................................................................................. 96
Using the GUI to Activate Identification LEDs........................................................... 97
Using the CLI to Activate the Identification LEDS ..................................................... 98
Note: If the component’s identification LED is already turned on, a check sign appears
next to the Turn On Identification LED option and the message box that follows states
that the LED will be turned off.
3. In the Change All Other Identification LEDs dialog box, select the desired state of the
LEDs (On or Off) and click OK; LEDs of all components, except for the LED of the
component you want to identify, change their state.
control-led
The control-led command beacons the identification LED.
DAE X1-DAE
DAEController X1-DAE-LCC-A
LocalDisk X1-SC1-LocalDisk1
StorageController X1-SC1
SSD wwn-0x5000cca02b0555dc
Note: It is possible to have SC1, SC2 and/or LCC-A, LCC-B, etc. (per X-Brick).
show-leds
The show-leds command displays the values for the identification and status LEDs.
APPENDIX D
Priority FA
This section provides instructions for shipping failed hardware parts to EMC for Failure
Analysis (FA).
When Failure Analysis should be performed, the failed parts should be shipped to EMC via
FedEx.
Priority FA 101
EMC CONFIDENTIAL
Priority FA
APPENDIX E
Manually Replacing Storage Controllers
This section provides procedures for manually replacing defective Storage Controllers
(without the use of the Technician Advisor Utility).
This manual installation should only be performed in situations where the Technician
Advisor Utility cannot be used.
If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).
Note: For further details on using LEDs to identify components, refer to Appendix C.
Before proceeding to replace the defective Storage Controller, contact XtremIO Global Tech
Support for guidance and directions. No further action should be taken without explicit
direction from XtremIO Global Tech Support.
Do not remove the defective Storage Controller until the new Storage Controller is
configured by XtremIO Global Tech Support and is ready to take over.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
Note: Make sure that all cables are clearly labeled before disconnecting them from the
Storage Controllers. Do not proceed with the replacement procedure until all cables
that are connected to the Storage Controller are labeled.
Note: The disconnected cables can remain fastened to the cable management bracket
during the Storage Controller replacement procedure.
6. If required, release the cables from the cable tray of the cable management bracket
(mounted on the rear side of the Storage Controller) by releasing its cable straps.
7. Pull the tabs on both sides of the cable management bracket to release the bracket
from the Storage Controller’s inner rail.
8. Pull the cable management bracket out and remove it from the Storage Controller.
9. Remove the bezel that covers the front of the server as follows:
a. If the bezel is locked, unlock the bezel with the provided key.
b. Simultaneously press the tabs on both sides of the bezel to release it from its
latches, then pull the bezel off the component.
10. Remove the stabilizing screw behind the latch bracket on each side.
Note: A JIS screwdriver may be required if the rails are from an older version.
11. If a shipping bracket is installed directly above or below the server, remove it to
prevent damage to the foam padding.
12. Pull the server forward until it locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.
Note: After the Storage Controller is successfully replaced, send the defective Storage
Controller to EMC for Priority Failure Analysis (Priority FA). Refer to Appendix D for the
procedure details.
Execute the following procedure to install the new Storage Controller only when requested
by XtremIO Global Tech Support.
3. Slide the server into the slide rails and push the server into the cabinet.
An audible click indicates that the slide rails are engaged and locked.
4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the cabinet.
5. If you removed a shipping bracket directly above or below the server, reinstall it.
6. To further secure the rail assembly and server in the cabinet, insert and tighten a small
stabilizer screw directly behind each bezel latch.
7. From the rear side of the Storage Controller, align the rails of the cable management
bracket with the server's inner rails.
8. Insert the rails of the cable management bracket onto the inner rails of the Storage
Controller.
9. Push to slide in the cable management bracket until an audible click is heard. This
indicates that the cable management bracket and the Storage Controller rails are
engaged and locked.
10. Tilt the cable tray down by simultaneously pulling both latches, on the left and right
sides of the cable management bracket, and then pushing the tray downwards.
Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.
11. Connect the MGMT network cable to the Storage Controller’s " 1" port (leftmost
port), and connect the InfiniBand, SAS, LAN and COM cables.
Note: Leave the FC/iSCSI cables disconnected until you are instructed to connect
them.
2
1
Note: Make sure that the InfiniBand, SAS, LAN and COM cables are properly
connected, before connecting the two power cables to the Storage Controller, and
powering on the Storage Controller.
Note: If the cables are properly fastened to the cable management bracket, ignore steps 1
and 2, and proceed to step 3.
3. Lift the cable tray, while pulling the latches (on the left and right sides of the bracket)
until the latches click in.
Note: Make sure that the latches are engaged and the tray is locked in position.
The figure below shows an example of the installed cable management bracket, with
the cables strapped to the tray.
HDDs
SSDs
2. Pull the lever open and slide the disk drive assembly (B) from the server.
Note: Once all four disks have been removed, the Storage Controller can be shipped back
to EMC.
Note: It is not always possible to perform Fault Analysis on Storage Controllers that have
been returned to EMC without the Storage Controller’s disks.