Download as pdf or txt
Download as pdf or txt
You are on page 1of 114

EMC CONFIDENTIAL

EMC
XtremIO Storage Array
Version 4.0, 4.0.1, 4.0.2 and 4.0.4

FRU Replacement Procedures


P/N 302-002-044
REV 11
EMC CONFIDENTIAL

Copyright © 2021 EMC Corporation. All rights reserved. Published in the USA.

Published June, 2021

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without
notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect
to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular
purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

XtremIO, EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other
countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com).

2 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

CONTENTS

Preface

Chapter 1 General Information


Required Tools and Part Numbers ............................................................... 10
Missing, Wrong or Damaged Components................................................... 10
Cable Management Brackets....................................................................... 10

Chapter 2 Replacing Server Components


Replacing a Storage Controller .................................................................... 12
Tolerance .............................................................................................. 12
Opening and Closing a Tunnel Between a Storage Controller
and the XMS ......................................................................................... 13
Accessing the XMS via a Storage Controller ........................................... 13
Identifying the Defective Storage Controller........................................... 14
Confirming the Open Network Ports for Storage Controller Replacement 16
Replacing the Defective Storage Controller Using the
Technician Advisor Utility ...................................................................... 17
Replacing a Storage Controller Power Supply .............................................. 18
Tolerance .............................................................................................. 18
Identifying the Defective Storage Controller Power Supply ..................... 18
Checking the XtremIO Cluster Health ..................................................... 18
Executing the Encryption-Recovery Procedure........................................ 19
Replacing the Defective Storage Controller Power Supply ...................... 19
Configuring the Replaced Storage Controller Power Supply.................... 21
Checking the XtremIO Cluster Health (Post Replacement) ...................... 22
Replacing an SFP+ ...................................................................................... 23
Tolerance .............................................................................................. 23
Opening and Closing a Tunnel Between a Storage Controller
and the XMS ......................................................................................... 23
Procedure Prerequisite .......................................................................... 23
Identifying the Defective SFP+ ............................................................... 24
Checking the Defective SFP+ Using an SFP+ Loopback Tool ................... 26
Checking the XtremIO Cluster Health ..................................................... 26
Replacing a Defective SFP+.................................................................... 27
Installing the Bezel ............................................................................... 31
Post Configuration Procedures .............................................................. 32
Replacing the XMS ...................................................................................... 33
Tolerance .............................................................................................. 33
Identifying the Defective XMS................................................................ 33
Checking the XtremIO Cluster Health ..................................................... 33
Executing the Encryption-Recovery Procedure........................................ 34
Replacing the Defective XMS ................................................................. 34
Configuring a Replaced Physical XMS .................................................... 37
Replacing a Virtual XMS ........................................................................ 39
Checking the XtremIO Cluster Health (Post Replacement) ...................... 42

EMC XtremIO Storage Array FRU Replacement Procedures 3


EMC CONFIDENTIAL
Contents

Chapter 3 Replacing DAE Components


Replacing the SSDs..................................................................................... 44
Tolerance .............................................................................................. 44
Identifying the Defective SSD ................................................................ 44
Checking the XtremIO Cluster Health ..................................................... 45
Executing the Encryption-Recovery Procedure........................................ 45
Handling Defective SSDs, Detected by 5D SMART Error.......................... 45
Physically Locating the Defective SSD (Using LEDs) ............................... 46
Replacing a Defective SSD..................................................................... 46
Checking the XtremIO Cluster Health (Post Replacement) ...................... 50
Replacing a DAE Chassis............................................................................. 51
Tolerance .............................................................................................. 51
Identifying the Defective DAE Chassis.................................................... 51
Physically Locating the Defective DAE Chassis (Using LEDs) .................. 51
Checking the XtremIO Cluster Health ..................................................... 52
Executing the Encryption-Recovery Procedure........................................ 52
Replacing the Defective DAE Chassis..................................................... 53
Configuring the Replaced DAE chassis................................................... 56
Checking the XtremIO Cluster Health (Post Replacement) ...................... 57
Replacing a DAE Controller (LCC)................................................................. 57
Tolerance .............................................................................................. 57
Identifying the Defective DAE Controller ................................................ 57
Physically Locating the Defective DAE Controller (Using LEDs) ............... 58
Checking the XtremIO Cluster Health ..................................................... 58
Executing the Encryption-Recovery Procedure........................................ 58
Replacing the Defective DAE Controller.................................................. 59
Configuring the Replaced DAE Controller ............................................... 61
Checking the XtremIO Cluster Health (Post Replacement) ...................... 61
Replacing a DAE Power Supply .................................................................... 62
Tolerance .............................................................................................. 62
Identifying the Defective DAE Power Supply........................................... 62
Checking the XtremIO Cluster Health ..................................................... 62
Executing the Encryption-Recovery Procedure........................................ 63
Replacing the Defective DAE Power Supply ............................................ 63
Configuring the Replaced DAE Power Supply ......................................... 65
Checking the XtremIO Cluster Health (Post Replacement) ...................... 65

4 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Contents

Chapter 4 Replacing InfiniBand Switch Components


Replacing the InfiniBand Switch.................................................................. 68
Tolerance .............................................................................................. 68
Identifying the Defective InfiniBand Switch ........................................... 68
Checking the XtremIO Cluster Health ..................................................... 69
Executing the Encryption-Recovery Procedure........................................ 69
Replacing the Defective InfiniBand Switch............................................. 69
Configuring the Replaced InfiniBand Switch .......................................... 72
Checking the XtremIO Cluster Health (Post Replacement) ...................... 73
Replacing InfiniBand Switch Power Supply Units......................................... 74
Tolerance .............................................................................................. 74
Identifying the Defective InfiniBand Switch Power Supply Unit .............. 74
Checking the XtremIO Cluster Health ..................................................... 75
Executing the Encryption-Recovery Procedure........................................ 75
Replacing the Defective InfiniBand Switch Power Supply Unit................ 75
Checking the XtremIO Cluster Health (Post Replacement) ...................... 76
Replacing InfiniBand Switch Fan Units ........................................................ 77
Tolerance .............................................................................................. 77
Identifying the Defective InfiniBand Switch Fan Unit .............................. 77
Checking the XtremIO Cluster Health ..................................................... 78
Executing the Encryption-Recovery Procedure........................................ 78
Replacing the Defective InfiniBand Switch Fan Unit ............................... 78
Checking the XtremIO Cluster Health (Post Replacement) ...................... 79

Chapter 5 Replacing Battery Backup Units


Replacing a Battery Backup Unit (BBU) Using the Technician Advisor Utility 82
Battery Backup Unit Types..................................................................... 82
Tolerance .............................................................................................. 82
Identifying the Defective BBU ................................................................ 82
Replacing a BBU.................................................................................... 83
Replacing a Serial Communication Cable for a 5P 1550i BBU ...................... 83
Tolerance .............................................................................................. 84
Verifying Failed Serial Communication Cables ....................................... 84
Disabling All Notifiers............................................................................ 84
Replacing the Defective Cable ............................................................... 84
Verifying Replaced Serial Communication Cables .................................. 85
Restoring All Notifiers............................................................................ 86

Appendix A Software Re-Installation


Writing the XtremIO Rescue Image to a USB Drive........................................ 88
Re-Installing a Storage Controller ................................................................ 90
Re-Installing a Physical XMS ....................................................................... 91

Appendix B Generating and Uploading a Log Bundle


Generating and Collecting the Bundle ......................................................... 94
Uploading the Bundle Collection................................................................. 94

EMC XtremIO Storage Array FRU Replacement Procedures 5


EMC CONFIDENTIAL
Contents

Appendix C Using LEDs to Identify Hardware Components


Hardware Components’ LEDs ...................................................................... 96
Using the GUI to Activate Identification LEDs............................................... 97
Using the CLI to Activate the Identification LEDS ......................................... 98
control-led ............................................................................................ 98
show-leds ............................................................................................. 99

Appendix D Priority FA

Appendix E Manually Replacing Storage Controllers


Replacing a Storage Controller Manually ................................................... 104
Physically Locating the Defective Storage Controller (Using LEDs)........ 104
Replacing the Defective Storage Controller .......................................... 105
Configuring the Replaced Storage Controller ....................................... 111
Fastening the Storage Controller Cables to
the Cable Management Bracket........................................................... 111
Installing the Bezel ............................................................................. 113
Post Configuration Procedures ............................................................ 113
Removing the Old Storage Controller Disks.......................................... 114

6 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

PREFACE

As part of an effort to improve its product lines, EMC periodically releases revisions of its
software and hardware. Therefore, some functions described in this document might not
be supported by all versions of the software or hardware currently in use. The product
release notes provide the most up-to-date information on product features.
Contact your EMC technical support professional if a product does not function properly or
does not function as described in this document.

Note: This document was accurate at publication time. Go to EMC Online Support
(https://support.emc.com) to ensure that you are using the latest version of this
document.

Purpose
This document provides the required information for replacing EMC XtremIO Storage Array
Field Replaceable Units (FRUs) that have been identified as unserviceable.

Audience
This document is intended for the EMC field support personnel.

Related documentation
The following EMC publications provide additional information:
 XtremIO Storage Array Hardware Installation and Upgrade Guide
 XtremIO Storage Array Software Installation and Upgrade Guide
 XtremIO Storage Array User Guide
 XtremIO Storage Array Release Notes

Preface 7
EMC CONFIDENTIAL
Preface

Conventions used in this document


EMC uses the following conventions for special notices:

Note: A note presents information that is important, but not hazard-related.

Typographical conventions
EMC uses the following type style conventions in this document:
Bold Use for names of interface elements, such as names of windows, dialog
boxes, buttons, fields, tab names, key names, and menu paths (what the
user specifically selects or clicks)
Italic Use for full titles of publications referenced in text
Monospace Use for:
• System output, such as an error message or script
• System code
• Pathnames, filenames, prompts, and syntax
• Commands and options
Monospace italic Use for variables.
Monospace bold Use for user input.
[] Square brackets enclose optional values
| Vertical bar indicates alternate selections — the bar means “or”
{} Braces enclose content that the user must specify, such as x or y or z
... Ellipses indicate nonessential information omitted from the example

Where to get help


EMC support, product, and licensing information can be obtained as follows:
Product information — For documentation, release notes, software updates, or
information about EMC products, go to EMC Online Support at:
https://support.emc.com
Technical support — Go to EMC Online Support and click Service Center. You will see
several options for contacting EMC Technical Support. Note that to open a service request,
you must have a valid support agreement. Contact your EMC sales representative for
details about obtaining a valid support agreement or with questions about your account.

Your comments
Your suggestions will help us continue to improve the accuracy, organization, and overall
quality of the user publications. Send your opinions of this document to:
techpubcomments@emc.com

8 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

CHAPTER 1
General Information

This chapter includes the following topics:


 Required Tools and Part Numbers ........................................................................... 10
 Missing, Wrong or Damaged Components............................................................... 10
 Cable Management Brackets................................................................................... 10

General Information 9
EMC CONFIDENTIAL
General Information

Required Tools and Part Numbers


 It is recommended to wear an ESD bracelet or grounding heels when handling
hardware components.
 A #2 Phillips screwdriver is required for removing and tightening the screws of all
XtremIO hardware components.
 A KVM, or keyboard and monitor are required on-site in case there is a need to
re-install a physical XMS and/or Storage Controllers.
 To view the part numbers of the XtremIO cluster components, from the GUI hover the
mouse pointer over the desired component; a ToolTip appears, showing the
component’s details, including its part number.
 For SFP+ replacement, the following tools are required:
• SFP+ extraction tool for raising the SFP+ bail. If an SFP+ extraction tool is not
available, use a flat-headed screwdriver instead.
• SFP+ loopback tool to further check the state of the SFP considered for
replacement.

Missing, Wrong or Damaged Components


For detailed information on how to handle missing, wrong or damaged items, access the
Missing, Wrong, or Damaged (MWD) Customer Complaints Capture System via the
following URL:
https://emcmwd.emc.com/default.asp

Cable Management Brackets


Each Storage Controller in clusters installed after Version 4.0 is fitted with a cable
management bracket on the server’s rear side.
The older clusters do not include cable management brackets. For all such clusters, ignore
any instructions regarding the cable management bracket in this guide.

10 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

CHAPTER 2
Replacing Server Components

This chapter includes the following topics:


 Replacing a Storage Controller ................................................................................ 12
 Replacing a Storage Controller Power Supply .......................................................... 18
 Replacing an SFP+ .................................................................................................. 23
 Replacing the XMS .................................................................................................. 33

11
EMC CONFIDENTIAL
Replacing Server Components

Replacing a Storage Controller



The Storage Controller replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure
and contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with Dell EMC KB# 479972
(https://support.emc.com/kb/479972).

Note: Before arriving on the site, make sure that you have the updated Storage Controller
rescue image for the cluster’s version. In addition, ensure that the latest version of
Technician Advisor utility is installed on your laptop.

Note: If the customer has a Disk Retention Agreement with Dell EMC, remove the hard
disks and SSDs from the replaced Storage Controller and give them to the customer. for
instructions, refer to “Removing the Old Storage Controller Disks” on page 114.

Tolerance
 Failure of a single Storage Controller results in a performance degradation.
 Failure of both Storage Controllers in the same X-Brick results in:
• Loss of service in a multiple X-Brick cluster
• Data loss in a single X-Brick cluster
 Failure of both InfiniBand links and/or both SAS ports in the same Storage Controller
results in a Storage Controller failure.

12 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Opening and Closing a Tunnel Between a Storage Controller and the XMS
A tunnel must be opened in order to access the XMS via a Storage Controller that is
healthy.

To open a tunnel between the Storage Controller and the XMS:


1. Run the following CLI command:
modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> open

Note: Since the Storage Controller may not be able to open an SSH tunnel due to
security issues, the tunnel is opened from the XMS’s side.

2. Upon completion of the procedure (when access to XMS is no longer required), make
sure to close the tunnel.

To close the tunnel that was opened between the Storage Controller and the XMS:
 Run the following CLI command:
modify-technician-port-tunnel cluster-id=<Cluster ID>
sc-id=<Storage Controller ID> close

Accessing the XMS via a Storage Controller

Note: Make sure to access the XMS via a Storage Controller that is healthy.

To access the XMS via the Storage Controller:


1. Open a tunnel between the Storage Controller and the XMS, as described in “Opening
and Closing a Tunnel Between a Storage Controller and the XMS” on page 13.
2. Connect your laptop to the Tech port (Port 2) of the Storage Controller.
3. Configure your laptop's network interface with the following:
• IP: 169.254.254.3
• Subnet mask: 255.255.240.0
The actual connection is made by SSH to the Tech port’s IP address (static
169.254.254.1) on port 10022, with the Username "xmsadmin" (the password for
xmsadmin is supplied by Dell EMC):
ssh xmsadmin@169.254.254.1 -p 10022
The Storage Controllers can now forward traffic from the Tech port to the SSH tunnel.

Replacing a Storage Controller 13


EMC CONFIDENTIAL
Replacing Server Components

Identifying the Defective Storage Controller


Access the XMS via the Storage Controller Tech port to identify a defective Storage
Controller, as described in “Accessing the XMS via a Storage Controller” on page 13.

Note: Make sure to access the XMS via a Storage Controller that is healthy.

To identify the defective Storage Controller, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the clusters, using the following command:
show-clusters

xmcli (tech)> show-clusters


Cluster-Name Index State Conn-State Num-of-Vols Num-of-Internal-Volumes Vol-Size ...
xbrick335 1 active connected 18 12 9.712T ...

Note: It is recommended to keep the CLI window in a maximized mode. Minimizing the
window may cause the activation progress bar to be displayed on new lines instead of
the same line.

3. Verify that the cluster is in Active state.


4. List the Storage Controllers status, using the following command:
show-storage-controllers cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

xmcli (tech)> show-storage-controllers


Storage-Controller-Name Index Mgr-Addr IB-Addr-1 IB-Addr-2 Brick-Name Index Cluster-Name Index State
X1-SC1 1 10.82.89.38 169.254.0.1 169.254.0.2 X1 1 xbrick335 1 healthy ...
X1-SC2 2 10.82.89.40 169.254.0.17 169.254.0.18 X1 1 xbrick335 1 healthy ...

14 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

5. Use Table 1 to record the configuration data of the defective Storage Controller, and
refer to it when you configure the new Storage Controller.

Table 1 Storage Controller Configuration Data

Parameter Value Retrieval Value

Management Interface IP Address

X-Brick Name

Storage Controller Name Run the following command:


show-storage-controllers
Storage Controller Index cluster-id="<cluster name>"

Cluster Name

cluster-id

Network Subnet Mask Information may be retrieved via the Network


Subnet Mask from the peer Storage
Controller of the same X-Brick as the
detective Storage Controller. Information
may also be retrieved from the customer.

Note: Make sure to close the tunnel between the Storage Controller and XMS when access
to XMS is no longer required, as described in “Opening and Closing a Tunnel Between a
Storage Controller and the XMS” on page 13.

To identify the defective Storage Controller, using the GUI:


 From the GUI, view the Inventory; the defective Storage Controller appears in orange.

Replacing a Storage Controller 15


EMC CONFIDENTIAL
Replacing Server Components

Confirming the Open Network Ports for Storage Controller Replacement


Storage Controller replacement requires specific network ports to be open between the
XMS and Storage Controllers in the XtremIO cluster. For the list of network ports required
for Storage Controller replacement, refer to the XtremIO Site Preparation Guide.

Network ports are assigned for each Storage Controller, two at a time. For a (partial)
example, refer to Table 2 to determine the range of ports assigned for each Storage
Controller.

Table 2 Required Network Ports for Storage Controller Replacement (Partial Example)

Storage Controller Network Port

X1-SC1 11000-11001, 22000-22001, 23000-23001

X1-SC2 11002-11003, 22002-22003, 23002-23003

X2-SC1 11004-11005, 22004-22005, 23004-23005

X2-SC2 11006-11007, 22006-22007, 23006-23007

.... ....

Note: The network port 11112 is only required if Storage Controllers are using an IPv6
management IP address.

It is necessary to confirm that each required network port from the XMS is open to its
respective Storage Controller.

To confirm the required Storage Controller network ports:


1. Access the XMS as tech.
2. Issue the following command for each Storage Controller and each port in the Storage
Controller's corresponding set of network ports:
test-xms-tcp-connectivity port=<port number> server="<IP of
Storage Controller>"
For example: To confirm that network port 11000 is open from the XMS to X1-SC1, run
the following command:
test-xms-tcp-connectivity port=11000 server="<IP of X1-SC1>"
Done!
Connectivity checked successfully

16 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

If a required network port is detected as blocked, discontinue the Storage Controller


replacement procedure. Troubleshoot with the customer to determine why the required
network port is blocked.

Note: For checking the required ports to a defective Storage Controller, use the existing
Storage Controller to verify whether the port is open. However, if the defective Storage
Controller is not responsive, work with the customer to check the required ports for the
peer Storage Controller instead.

Note: Make sure to close the tunnel between the Storage Controller and the XMS upon
completion, as described in “Opening and Closing a Tunnel Between a Storage Controller
and the XMS” on page 13.

Replacing the Defective Storage Controller Using the Technician Advisor Utility

The Storage Controller replacement procedure should be performed using the XtremIO
Technician Advisor utility following a Service Request (SR), determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support.

Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under Service
Scripts and Utilities > XtremIO Technician Advisor.

Note: If XtremIO Global Tech Support instructs you to follow the manual configuration
procedures, refer to Appendix E.

Replacing a Storage Controller 17


EMC CONFIDENTIAL
Replacing Server Components

Replacing a Storage Controller Power Supply


Tolerance
 Failure of a single Storage Controller power supply does not affect the Storage
Controller operation.
 Failure of both Storage Controller power supplies in the same Storage Controller
results in Storage Controller failure.

Identifying the Defective Storage Controller Power Supply

To identify the defective Storage Controller power supply, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the Storage Controllers power supply status, using the following command:
show-storage-controllers-psus cluster-id="<cluster name>"

Name Index Serial-Number Location-Index Power-Feed State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index
X1-SC1-PSU1 1 E98791D1251179549 1 port_1 healthy on left 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC1-PSU2 2 E98791D1251179559 2 port_1 healthy on right 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC2-PSU1 3 E98791D1242139127 1 port_2 failed on left 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1
X1-SC2-PSU2 4 E98791D1242139116 2 port_2 healthy on right 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1

3. Note the Index of Storage Controller power supplies with a non-healthy state.

To identify the defective Storage Controller power supply, using the GUI:
 From the GUI, view the Inventory; the defective Storage Controller power supply
appears in orange.

Checking the XtremIO Cluster Health


Before replacing the defective Storage Controller power supply, check the cluster’s health
by using the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.

Note: You can access the Dell EMC SolVe Desktop at:
https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

18 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to Dell EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective Storage Controller Power Supply


A defective Storage Controller power supply is indicated with an amber LED.

To remove the defective Storage Controller power supply:


1. Tilt the cable tray of the cable management bracket downwards, by simultaneously
pulling the latches on the left and right, and then pushing the tray downwards.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

2. Disconnect the power cable from the defective Storage Controller power supply. To
revoke cable retention, release the power cord latch. The cables should remain
fastened by the cable strap in the cable management bracket.

Replacing a Storage Controller Power Supply 19


EMC CONFIDENTIAL
Replacing Server Components

3. To remove the Storage Controller power supply, push the green lever and then pull on
the handle.

Note: If the defective Storage Controller power supply should be sent to Dell EMC for
Failure Analysis (FA), refer to Appendix D for the procedure details.

To install the new Storage Controller power supply:


1. Insert the new Storage Controller power supply.

2. Connect the power cable to the new Storage Controller power supply. To resume cable
retention, fasten the power cord latch.
3. Lift the cable tray of the cable management bracket, while pulling the latches (on the
left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

20 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Configuring the Replaced Storage Controller Power Supply

To configure the replaced Storage Controller power supply:


1. Log in to the XMS CLI as tech.
2. Confirm that the new Storage Controller power supply is available, using the following
command:
show-storage-controllers-psus cluster-id="<cluster name>"
3. Run the following command:
replace-storage-controller-psu sc-psu-id=<ID>
cluster-id="<cluster name>"
where id is the Index of the defective Storage Controller power supply.

xmcli (tech)> replace-storage-controller-psu sc-psu-id=2


psu X1-SC1-psu2 [2] replacement initiated

To verify that the new Storage Controller power supply is healthy, using the CLI:
1. Log in to the XMS CLI as tech.
2. Wait several seconds, then run the following command:
show-storage-controllers-psus cluster-id="<cluster name>"

Name Index Serial-Number Location-Index Power-Feed State Input Location HW-Revision Part-Number Storage-Controller-Name Index Brick-Name Index Cluster-Name Index
X1-SC1-PSU1 1 E98791D1251179549 1 port_1 healthy on left 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC1-PSU2 2 E98791D1251179559 2 port_1 healthy on right 02 105-000-244-01 X1-SC1 1 X1 1 xbrick238 1
X1-SC2-PSU1 3 E98791D1242139127 1 port_2 healthy on left 02 105-000-244-01 X1-SC2 2 X1 1 xbrick238 1

3. If the State is not healthy, inspect the Storage Controller power supply.

To verify that the new Storage Controller power supply is healthy, using the GUI:
1. Hover the mouse pointer over the new Storage Controller power supply; a ToolTip
appears, showing the power supply status.
2. Verify that the State is Healthy.

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Replacing a Storage Controller Power Supply 21


EMC CONFIDENTIAL
Replacing Server Components

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

22 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Replacing an SFP+


The SFP+ replacement procedure should be performed following a Service Request (SR)
determined by XtremIO Global Technical Support.

Tolerance
 Failure of an SFP+ may result in performance degradation.

Opening and Closing a Tunnel Between a Storage Controller and the XMS
Before replacing a defective component, a tunnel must be opened in order to access the
XMS via a Storage Controller, and be closed upon the procedure’s completion (when
access to XMS is no longer required). For instructions, refer to “Opening and Closing a
Tunnel Between a Storage Controller and the XMS” on page 13.

Procedure Prerequisite
Make sure to perform the following instruction prior to replacing an SFP+.

Note: XtremIO Global Tech Support should confirm this procedure prerequisite with the
respective Dell EMC network connectivity teams and with the customer.

For suspected SFP+ errors and/or iSCSI/Fibre Channel “connection to XtremIO cluster”
errors, arrange for the Connectivity team to confer with the customer in order to confirm
that iSCSI and Fibre Channel environment(s) to the XtremIO Storage Controller iSCSI or
Fibre Channel ports are validated. This includes confirming the network or Fibre Channel
switches, switch ports, network patch panels, cables and cable reseating (at both ends).


An SFP+ replacement procedure must only be performed after all other network
components and configurations have been verified. If not, replacing an SFP+ will not
resolve the issue.

Replacing an SFP+ 23
EMC CONFIDENTIAL
Replacing Server Components

Identifying the Defective SFP+

To identify the defective SFP+, using the CLI:


1. Log in to the XMS CLI as tech.
2. Run the following command:
show-targets cluster-id="<cluster name>"

xmcli (tech)> show-targets


Name Index Cluster-Name Index Port-Type Port-Address Mac-Addr Port-Speed Port-State .. Storage-Controller-Name Index .. .. Relative-Id Target-Port-HW-Label
X1-SC1-target1 1 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a001 00:90:fa:c4:a1:cb unknown down .. X1-SC1 1 .. .. 5 Port1
X1-SC1-target2 2 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a000 00:90:fa:c4:a1:c9 unknown down .. X1-SC1 1 .. .. 6 Port2
X1-SC1-target3 3 xbrick736 1 fc 51:4f:0c:50:df:07:a0:01 unknown down .. X1-SC1 1 .. .. 1 Port3
X1-SC1-target4 4 xbrick736 1 fc 51:4f:0c:50:df:07:a0:00 8GFC up .. X1-SC1 1 .. .. 2 Port4
X1-SC2-target1 5 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a005 00:90:fa:c4:9b:69 1Gb down .. X1-SC2 2 .. .. 15 Port1
X1-SC2-target2 6 xbrick736 1 iscsi iqn.2008-05.com.xtremio:xio00164507136-514f0c50df07a004 00:90:fa:c4:9b:67 10Gb down .. X1-SC2 2 .. .. 16 Port2
X1-SC2-target3 7 xbrick736 1 fc 51:4f:0c:50:df:07:a0:05 4GFC up .. X1-SC2 2 .. .. 11 Port3
X1-SC2-target4 8 xbrick736 1 fc 51:4f:0c:50:df:07:a0:04 16GFC up .. X1-SC2 2 .. .. 12 Port4

3. Note the Name, Index, Cluster-Name, Port-Type, Target-Port-HW-Label,


and Storage-Controller-Name of defective FC or iSCSI targets, with one of the
following scenarios:
• Port-Speed as unknown and Port-State as down
• For FC target - Port-Speed as 8GFC1 and Port-State as up
• For iSCSI target - Port-Speed as 1Gb2 and Port-State as up

Note: Identify the Storage-Controller-Name for each target by either the Name
value, or by running the following command:
show-targets prop-list=["Storage-Controller-Name"]

Note: In the example provided, following this step, a subset of SFP+s on the cluster is
detected as potentially defective. However, other SFP+s on the cluster can also be
defective. Complete the remaining steps in this procedure to determine thoroughly
which of the cluster’s SFP+s are defective.

1. Assuming the FC network supports 8GFC and was tested as noted in the prerequisites, prior to
starting this procedure.
2. Assuming the iSCSI network supports 10Gb and was tested as noted in the prerequisites, prior to
starting this procedure.

24 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

4. Run the following command to identify the defective FC SFP+s:


show-targets-fc-error-counters cluster-id="<cluster name>"
xmcli (tech)> show-targets-fc-error-counters
Name Index Cluster-Name Index Dumped-Frames Sync-Loss Signal-Loss Invalid-Crc Link-Failure Prim-Seq-Err
X1-SC1-fc1 1 xbrick736 2 0 0 0 0 1 0
X1-SC1-fc2 2 xbrick736 2 0 468 34 78 16 0
X1-SC2-fc1 5 xbrick736 2 0 1 0 0 0 0
X1-SC2-fc2 6 xbrick736 2 0 0 0 0 2 0

Note: If necessary, rerun the show-targets-fc-error-counters command


(once or twice, as needed) to confirm that the defective FC targets’ error counters
actually increase.

5. From the show-targets-fc-error-counters command output, note the


Index values of FC targets for which the Sync-Loss, Signal-Loss, Invalid-Crc
and Lync-Failure are far greater, or have increased, compared to the other
(non-defective) FC targets.
The SFP+s connected to FC targets noted in either step 3 or step 5 are potentially
defective, and should be replaced.
6. Run the following command to identify defective iSCSI SFP+s:
show-targets-iscsi-counters cluster-id="<cluster name>"

xmcli (tech)> show-targets-iscsi-counters


Name Index Cluster-Name Index Port-Address Num-PKTS-Rx Total-KB-Rx Num-PKTS-Tx Total-KB-Tx Num-Crc-Err Num-NO-Buff-Err Num-Tx-Err
X1-SC1-iscsi1 3 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e000 4361442 256388 2 0 516 121 56
X1-SC1-iscsi2 4 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e001 4410131 262370 2 0 501 112 28
X1-SC2-iscsi1 7 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e004 12111048 710477 52 4 0 1 0
X1-SC2-iscsi2 8 xbrick736 1 iqn.2008-05.com.xtremio:xio00162306680-514f0c50d0c7e005 12224042 722502 6107 252 1 0 1

Note: If necessary, rerun the show-targets-iscsi-counters command (once


or twice, as needed) to confirm that the defective iSCSI targets’ error counters actually
increase.

7. From the show-targets-iscsi-counters command output, note the Index


values of iSCSI targets for which the Num-Crc-Err, Num-NO-Buff-Err, and
Num-Tx-Err are far greater, or have increased, compared to the other
(non-defective) iSCSI targets.
The SFP+s connected to iSCSI targets noted in either step 3 or step 7 are potentially
defective and should be replaced.
8. For each defective SFP+ to be replaced, note the following details:
• Name
• Index
• Cluster-Name
• Port-Type (FC or iSCSI)
• Target-Port-HW-Label
• Storage-Controller-Name

Replacing an SFP+ 25
EMC CONFIDENTIAL
Replacing Server Components

Checking the Defective SFP+ Using an SFP+ Loopback Tool


The following steps should be performed using an SFP loopback tool, to further check the
SFP state of each SFP identified as defective.

Note: If an SFP+ loopback tool is not available, skip this section and proceed with the rest
of the SFP replacement procedure.

To check a defective SFP+ using SFP+ Loopback tool:


1. Disconnect the original SFP+ cable and connect the SFP+ loopback tool to the SFP+.
2. Perform the steps listed in “Identifying the Defective SFP+” on page 24 to check
whether the SFP+ is defective, even when disconnected to customer network.
3. If the SFP+ still shows as defective, insert the original SFP+ cable, and proceed to
“Replacing a Defective SFP+” on page 27 to replace the defective SFP+.
4. However, if the SFP+ is shown as healthy once it is connected to the SFP+ loopback
tool, connect the original SFP+ cable, and escalate the case to XtremIO Global Tech
Support, as the issue appears to be related to a network connectivity issue. For details
on next steps in this case refer to “Procedure Prerequisite” on page 23.

Checking the XtremIO Cluster Health


Before replacing the defective component, check the cluster’s health by using the XtremIO
Health-Check Script (HCS). For instructions, refer to

Disabling All Notifiers

To disable all Notifiers:


1. Log in to the XMCLI as tech.
2. Disable all Notifiers, using the following command:
disable-notifiers cluster-id="<cluster name>".

xmcli (tech)> disable-notifiers cluster-id="xbrick711-714"


Event notifiers were disabled

26 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Replacing a Defective SFP+

To remove a defective SFP+:


1. Log in to the XMS CLI as tech.
2. To physically locate the Storage Controller with the defective SFP+ (using LEDs), enter
the control-led CLI command using the defective SFP+’s noted Cluster-Name
and Storage-Controller-Name.
For example, you can use the following command:
control-led cluster-id="Cluster_One"
entity="StorageController" led-mode="on"
object-id-list=["X1-SC1"]
This lights the LED on Storage Controller X1-SC1 situated on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix C.

3. Using the noted details of the defective SFP+ (Name, Index, Port-Type, and
Target-Port-HW-Label), physically locate the SFP+ on the Storage Controller located
following step 2 of “Identifying the Defective SFP+”. For details, refer to the
Connecting the Cluster to Host section of the XtremIO Hardware Installation and
Upgrade Guide.
4. From the rear of the Storage Controller, unplug the (iSCSI or Fibre Channel) cable
connected to a defective SFP+.

Replacing an SFP+ 27
EMC CONFIDENTIAL
Replacing Server Components

5. Raise the SFP+ bail.

Note: Use an orderable SFP+ extraction tool to raise the SFP+ bail. If an SFP+ extraction
tool is not available, carefully use a flat-headed screwdriver to lift the SFP+ bail.

6. Grasp the bail and slide the SFP+ out from the Storage Controller.

Note: The defective SFP+ should be sent to Dell EMC for Failure Analysis (FA) if possible.
Refer to Appendix D for the procedure details.

28 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

To install the new SFP+:


1. Verify the P/N of the new SFP+.

Note: For details on the required replacement SFP+ with XtremIO, refer to the XtremIO
Part Number List on XtremIO SolVe (Solve Desktop > XtremIO Generator > XtremIO X1
(XIOS 2.x, 3.x, 4.x) > FRU Replacement Procedures > XtremIO FRU Part Number List).

2. Make sure that the mating connector of the new SFP+ is free of dirt and/or obstacles.
3. Align the new SFP+ with the guides in the slot, and insert the SFP+ by sliding it into the
slot until slight resistance is felt.

Replacing an SFP+ 29
EMC CONFIDENTIAL
Replacing Server Components

4. Reconnect the (iSCSI or Fibre Channel) cable that was disconnected.


Wait for 15 minutes before verifying that the replacement was successful.

5. Run the following command to verify the SFP+ replacement was successful:
show-targets cluster-id="<cluster name>"
6. On the show-target output, locate the information for the replaced SFP+(s), using
the Name and Index of the replaced defective SFP+.
7. Verify a successful FC SFP+ replacement, as follows:
a. Run the following command to verify that the Port-Speed is 8GFC and that the
Port-State is up:
show-targets-fc-error-counters cluster-id="<cluster name>"
b. In the show-targets-fc-error-counters output, locate the corresponding
FC target, using the Index of the replaced FC SFP+.
c. Verify that for this FC target, the Sync-Loss and Lync-Failure column values
no longer increase.

Note: If necessary, run the show-targets-fc-error-counters command


again to confirm that the two error counters (Sync-Loss and Lync-Failure)
no longer increase.

Note: If performing either step a or step c of this procedure was unsuccessful,


discontinue the SFP+ replacement procedure, and proceed by replacing the
affected Storage Controller.

30 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

8. Verify a successful iSCSI SFP+ replacement, as follows:


a. Run the following command to verify that the Port-Speed is 10Gb and that the
Port-State is up:
show-targets-iscsi-counters cluster-id="<cluster name>"
b. In the show-targets-iscsi-counters output, locate the corresponding
iSCSI target, using the Index of the replaced iSCSI SFP+.
c. Verify that for this iSCSI target, the Num-Crc-Err, Num-NO-Buff-Err, and
Num-Tx-Err column values no longer increase.

Note: If necessary, run the show-targets-iscsi-counters command again


to confirm that the three error counters (Num-Crc-Err, Num-NO-Buff-Err,
and Num-Tx-Err)no longer increase.

Note: If performing either step a or step c of this procedure was unsuccessful,


discontinue the SFP+ replacement procedure, and proceed by replacing the
affected Storage Controller.

Installing the Bezel

To install the front bezel:


1. Pushing on the ends (not the middle) of the bezel, press the bezel onto the latch
brackets until it snaps into place.
2. Lock the bezel with the provided key and store the key in a secure place.

Replacing an SFP+ 31
EMC CONFIDENTIAL
Replacing Server Components

Repeating Alert Counters


You should check for active repeating alerts. If repeated alerts exist, it is necessary to clear
the alerts in order to verify whether the replacement procedure remedied the component
failure.

To check for repeating alerts:


 Run the following command: show-alerts

xmcli (tech)> > show-alerts


Index Description Severity Raise-Time ...
34 Repeating: Storage Controller InfiniBand port 2 is down. major Mon Apr 18 11:22:03 2016.....
33 Repeating: InfiniBand port 2: link status is not healthy. The port state is down. major Mon Apr 18 11:22:03 2016.....
xmcli (tech)>

If the response shows alerts with the “repeating” text in the prefix, it is necessary to
clear the alert counters.

Note: Clearing alert counters clears all of the system’s alerts. In case of multiple alerts,
make a note of the components with repeated active alerts, prior to clearing alert
counters.

To clear alert counters:


1. Log in to the XMCLI as tech.
2. Clear all alert counters, using the following command:
clear-alert-table-counters

Post Configuration Procedures


After the SFP+ is successfully replaced and the Storage Controller is configured, generate a
log bundle and re-install the Storage Controller’s front bezel:

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Uploading a Log
Bundle” on page 93).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

32 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Replacing the XMS


Tolerance
 Failure of an XMS prevents cluster management.

Identifying the Defective XMS


Inability to connect to cluster management (after ruling out network problems) indicates
that the XMS is not healthy.
Use Table 3 to record the configuration data of the defective XMS, and refer to it when you
configure the new XMS.

Table 3 XMS Configuration Data

Parameter Value Retrieval Value

Management Interface IP Address


Log in as xinstall to the XMS and select
XMS Server DNS Name Display configuration from the xinstall menu.
Network Subnet Mask

Default Gateway

Checking the XtremIO Cluster Health


Before replacing the defective XMS, check the cluster’s health by using the XtremIO Health
Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.

Note: You can access the Dell EMC SolVe Desktop at:
https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error
is reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Replacing the XMS 33


EMC CONFIDENTIAL
Replacing Server Components

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to Dell EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective XMS

To remove the defective physical XMS:


1. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.
2. Disconnect all cables from the back of the XMS.

Note: Make sure that all cables are clearly labeled before disconnecting then from the
XMS.

3. Remove the bezel that covers the front of the server, as follows:
a. If the bezel is locked, unlock the bezel with the provided key.
b. Simultaneously press the tabs on both sides of the bezel to release it from its
latches, then pull the bezel off the component.

4. Remove the stabilizing screw behind the latch bracket on each side.

34 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Note: A JIS screwdriver may be required if the rails are from an older version.

5. Pull the server forward until is locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.

6. Remove each inner rail as follows:


a. On the middle of the inner rail, push in and hold the metal latch.
b. Push the rail forward to release the connection studs from the small end of the rail
notches.
c. When the connections studs are in the large end of the rail notches, release the
metal latch.
d. Pull the inner rails away from the server.

Note: If the defective XMS should be sent to Dell EMC for Failure Analysis (FA), refer to
Appendix D for the procedure details.

Note: For more detailed instructions on installing the physical XMS, refer to the XtremIO
Storage Array Hardware Installation and Upgrade Guide.

To install the new physical XMS:


1. Attach an inner rail to each side of the server, as follows:

Replacing the XMS 35


EMC CONFIDENTIAL
Replacing Server Components

a. Align the large end of the rail notches on the inner rail with the connection studs on
the side of the server.
b. Push the flat side of the inner rail onto the connection studs.
c. Slide the inner rail backwards along the server, until the studs fit securely into the
small end of the rail notches.
An audible click indicates that the rail is secure.
2. From the front of the cabinet, align the inner rails that are attached to the server with
the channels on the inside of the slide rails.
3. Slide the server into the slide rails and push the server into the cabinet.
An audible click indicates that the slide rails are engaged and locked.
4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the cabinet.

5. To further secure the rail assembly and server in the cabinet, insert and tighten a small
stabilizer screw directly behind each bezel latch.
6. Connect the two power cables to the XMS.
7. Connect the network cable to the MGMT1 Ethernet port (marked "1") on the physical
XMS.
8. If you initially tilted the cable management bracket's tray (up/down), on the Storage
Controller adjacent to the XMS, return it to its original position by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

9. Press the Power button to power on the XMS.


10. Reinstall the XMS bezel.

36 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

Configuring a Replaced Physical XMS

To configure the replaced physical XMS:


1. Connect to the XMS via the Tech port (marked " 2") on the console.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Note: If the Tech port connection fails, or the OS fails to load, reinstall the physical
XMS with the appropriate XtremIO XMS Rescue Image. Refer to “Re-Installing a
Physical XMS” on page 91 for details.

2. Log in as xinstall, to display the Install menu.

Note: If the user wants to use IPV6, proceed with Step 3 only once the software
installation process has been completed, as described in XtremIO Storage Array
Software Installation and Upgrade Guide.

3. From the Install Menu, select Configure XMS.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Provide the following parameters:


• XMS Server DNS Name
• Network interface information (IP Address, Netmask, and GW)
4. Access the support page for XtremIO to acquire the XtremIO software package that
matches the highest XtremApp version of all clusters managed by this XMS.

Note: If the package is not on the Support page for XtremIO, contact the XtremIO
Global Tech support.

Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.

Replacing the XMS 37


EMC CONFIDENTIAL
Replacing Server Components

5. Upload the software image to /var/lib/xms/images. Use an SFTP client (e.g. Filezila,
WinSCP) to log in as the xmsupload user and transfer the package downloaded on
your computer to the XMS. When the file transfer is complete, close the SFTP client
and re-open putty (SSH client) to the XMS.

XtremIO install interface


Checking XMS health
XMS health check passed

Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit

> > 1

6. From the Install Menu, select Perform XMS install only. Enter the image file name that
was used in the previous step as input.

Install menu
-------------------------------------
XtremIO install interface
Checking XMS health
XMS health check passed

Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit

>>5
5
Enter Installation image filename (previous value: ''):
> upgrade-to-4.0.4-23.tar
upgrade-to-4.0.4-23.tar
Input received: 'upgrade-to-4.0.4-23.tar'
Installing XMS
Reformatting XMS
XMS installed successfully

7. Exit the install menu.


8. Log in to the XMS CLI as tech.

38 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

9. Run the recover-xms command, and enter the IP address of a Storage Controller for
each of the clusters that should be managed by the XMS, followed by the force flag
(to override earlier cluster-XMS associations).

xmcli (tech) > recover-xms sc-mgr-hosts=["10.102.36.220"] force

Note: For multi-cluster environments, list the cluster Storage Controller IP


addresses/host names for all clusters to be managed by the XMS.
For example:
“recover-xms sc-mgr-host-list = ["10.102.36.220",
"10.102.36.221",...]force”.

10. Select to recover the XMS, by typing ’yes’.

Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started

11. Wait for the recovery process to complete.

Done!
XMS recovery finished successfully

12. Optional: Following the XMS recovery process, if you want to refresh the SSH key, run
the following command:
refresh-xms-ssh-key
13. After the recovery has successfully completed, log out of XMS CLI.
14. Log in to XMS CLI as admin.

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Replacing a Virtual XMS

To replace a defective virtual XMS:


1. Deploy a new XMS VM.

Note: For detailed instructions, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

2. Power on the XMS VM.


3. Connect to the XMS via the VMware console.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Replacing the XMS 39


EMC CONFIDENTIAL
Replacing Server Components

4. Log in as xinstall, to display the Install menu.

Note: If the user wants to use IPV6, proceed with Step 5 only completing the software
installation.

5. Configure the XMS.

Note: For the detailed procedure, refer to XtremIO Storage Array Software Installation
and Upgrade Guide.

Provide the following parameters:


• XMS Server DNS Name
• Network interface information (IP, Mask, GW)
6. Access the Dell EMC Support page for XtremIO to acquire the XtremIO software
package that matches the version currently installed on the server.

Note: If the package is not on the Support page for XtremIO, contact XtremIO Global
Tech support.

Note: When downloading a software package, access the Dell EMC Support page and
verify that the MD5/SHA-256 checksum of the downloaded package matches the MD5
or SHA-256 checksum that appears on the support page for that package.

7. Upload the software image to /var/lib/xms/images. Use an SFTP client (e.g.


Filezila, WinSCP) to log in as the xmsupload user and transfer the package
downloaded on your computer to the XMS. When the file transfer is complete, close
the SFTP client and re-open putty (SSH client) to the XMS.

Note: Make sure that the software image is of the same version as that used by the
running cluster.

XtremIO install interface


Checking XMS health
XMS health check passed

Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit

> > 1

40 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Server Components

8. From the Install Menu, select Perform XMS install only. Enter the image file name that
was used in the previous step as input.

XtremIO install interface


Checking XMS health
XMS health check passed

Install menu
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display inistalled Xtremapp version
5. Perform XMS install only
6. Perform "fresh" installation(XMS + storage controlers)
7. Set DC Agent configuration
8. Start DC Agent Installation
9. Set Policy Manager configuration
10. Start Policy Manager Installation
11. Run XMS Recovery
12. Reboot
99. Exit

>5
Please enter installation image filename:
> upgrade-to-4.0.0-XXX.tar
Running: /xtremapp/utils/first_install.py 0 0 /var/lib/xms/images/upgrade-to-4.0.0-XXX.tar
Installing XMS
Reformatting XMS
Installation ended successfully

9. Exit the install menu.


10. Log in to the XMS CLI as tech.
11. Run the recover-xms command and enter the IP addresses of all the clusters that
should be managed by the XMS.

Note: Even if working with a single cluster, ensure to add the single IP address.

xmcli (tech) > recover-xms sc-mgr-hosts=["10.102.36.220", "10.103.224.119"] force

12. Select to recover the XMS, by typing ’yes’.

Old XMS and all of its data will be lost. Are you sure you want to recover the XMS? (Yes/No): yes
XMS recovery has been started

13. Wait for the recovery process to complete.

Done!
XMS recovery finished successfully

14. After the recovery has successfully completed, log out of XMS CLI.
15. Log in to XMS shell as admin.
16. Review the XMS configuration and verify that SNMP, Email, and event handlers
definitions are correctly set.

Replacing the XMS 41


EMC CONFIDENTIAL
Replacing Server Components

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the Dell EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to Dell EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

42 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

CHAPTER 3
Replacing DAE Components

This chapter includes the following topics:


 Replacing the SSDs................................................................................................. 44
 Replacing a DAE Chassis ......................................................................................... 51
 Replacing a DAE Controller (LCC)............................................................................. 57
 Replacing a DAE Power Supply ................................................................................ 62

Replacing DAE Components 43


EMC CONFIDENTIAL
Replacing DAE Components

Replacing the SSDs


Tolerance
 Failure of up to two SSDs in a single X-Brick results in performance degradation during
rebuild.
 Concurrent failure of three SSDs in the same X-Brick results in a loss of service.
 Failure of six SSDs in the same XDP group results in a degraded state which is called
“degraded (single failure)”, where the data has only a single parity protection. For a
10TB Starter X-Brick (5TB) it is five SSDs.
 Failure of seven SSDs in the same XDP group results in dual-degraded state, which is
called “degraded (dual failure)”, where the data has no parity protection. For a 10TB
Starter X-Brick (5TB) it is six SSDs.
 Failure of eight SSDs in the same XDP group results in loss of service. For a 10TB
Starter X-Brick (5TB) it is seven SSDs.
 Insufficient SSD space may prevent the cluster from rebuilding the XDP group,
resulting in a degraded state where the data does not have double-parity protection.

Identifying the Defective SSD

To identify the defective SSD, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

xmcli (admin)> show-ssds


SSD-Name Index Brick-Name Index Slot # Product-Model ... State Position-State ...
wwn-0x5000cca013136cf4 1 X1 1 0 HITACHI HUSML404 CLAR400 ... in-rg good ...
wwn-0x5000cca0131365f0 2 X1 1 1 HITACHI HUSML404 CLAR400 ... in-rg good ...
wwn-0x5000cca01312ca6c 5 X1 1 2 HITACHI HUSML404 CLAR400 ... revoked_from_rg good ...
. . . . . ...
. . . . . ...
. . . . . ...
wwn-0x5000cca013124b24 25 X1 1 24 HITACHI HUSML404 CLAR400 ... in-rg good ...

3. Note the Index of the SSD with a non-healthy state.


Defective SSDs can be identified in the State column with a status of either
revoked_from_rg or eject_pending.

44 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

To identify the defective SSD, using the GUI:


 From the GUI, view the Inventory; the defective SSD appears in red.

Checking the XtremIO Cluster Health


Before replacing the defective SSD, check the cluster’s health by using the XtremIO Health
Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Handling Defective SSDs, Detected by 5D SMART Error


A 5D SMART error is a diagnostic-level device error that is detected on the SSD. If a
defective SSD is detected with a 5D SMART error, the system raises the following error:
alert_def_ssd_diag_level_4_minor.
Replace the defective SSD as soon as possible (refer to “Replacing a Defective SSD” on
this page).


If the alert is raised on more than one SSD in the XtremIO cluster, make sure to replace the
defective SSDs systematically, one at a time. Therefore, it is necessary to wait for the
rebuild and integration of each new SSD to complete entirely BEFORE proceeding to
replace the next SSD, after each SSD is replaced. Refer to EMC KB 205558 for further
details, and up-to-date information on this scenario.

Replacing the SSDs 45


EMC CONFIDENTIAL
Replacing DAE Components

Physically Locating the Defective SSD (Using LEDs)


To activate the SSD identification LED, using the CLI:
1. Log in to the XMS CLI as tech.
2. Enter the control-led CLI command to locate the defective SSD.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="SSD"
led-mode="blinking" object-id-list=[3]
This causes the LED to blink on SSD number 3 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix C.

To activate the SSD identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix C.

Replacing a Defective SSD


Note: Make sure to follow each step of the SSD replacement procedure. Specifically, do
not forget to remove the defective SSD from the DAE, and do not reinsert it.

For 10TB clusters that support encryption (PSNT P/N - 900-586-004) or for 10TB Starter
X-Brick (5TB) clusters (900-586-005), ensure that the SSD has one of the following part
numbers before replacing it:
 005050673
 00505110
Inserting an SSD with a different part number will prevent enabling encryption on this
cluster.

To check the state of the cluster before replacing a defective SSD:


1. Log in to the XMS CLI as tech.
2. List the clusters status, using the following command:
show-clusters
3. Note if the state of the cluster is healthy.
4. List the XDP Group status, using the following command:
show-data-protection-groups cluster-id="<cluster name>"
5. Note if the state of the XDP Group for the failed SSD is degraded, and to which DAE it
belongs.
6. Proceed to “No Rebuild in Progress” on page 47, “Rebuild in Progress” on page 49 or
“Failed to Rebuild” on page 50 according to the XDP Group status.
7. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"

46 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

8. Generate and upload a log bundle (refer to “Generating and Uploading a Log Bundle”
on page 93).

No Rebuild in Progress

To remove the defective SSD:


1. Remove the DAE bezel.
2. Eject the defective SSD from the DAE, as follows:
a. Press on the latch button on the disk to release the latch.
b. Pull the latch and slowly and pull the disk from its slot.

Note: The defective SSD should be sent to EMC for Failure Analysis (FA) if possible. Refer to
Appendix D for the procedure details.

To remove the defective SSD entry from the cluster database, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"

xmcli (admin)> show-ssds


SSD-Name Index Brick-Name Index Slot # Product-Model ... State Position-State ...
wwn-0x5000cca013136cf4 1 X1 1 0 HITACHI HUSML404 CLAR400 ... in-rg good ...
wwn-0x5000cca0131365f0 2 X1 1 0 HITACHI HUSML404 CLAR400 ... in-rg good ...
wwn-0x5000cca01312ca6c 2 X1 1 0 HITACHI HUSML404 CLAR400 ... revoked_from_rg good ...
.
.
.

3. For SSDs with a failed_in_rg or revoked_from_rg state, note the SSD Index,
Brick-Name and XDP Group.
4. Remove the SSDs entry, using the following command:
remove-ssd ssd-id=<Name or Index> cluster-id="<cluster
name>"

Replacing the SSDs 47


EMC CONFIDENTIAL
Replacing DAE Components

5. Verify that the SSD entry has been removed, using the following command:
show-ssds cluster-id="<cluster name>"

To remove the defective SSD entry from the cluster database, using the GUI:
1. Right-click the defective SSD.
2. Click Remove SSD.

To install the new SSD:


1. Insert the new SSD into the DAE, as follows:
a. Align the disk or module with the guides in the slot.
b. With the disk carrier latch fully open, gently push the disk into the slot. The latch
begins to rotate downward when its tabs meet the enclosure.
c. Push the handle down to engage the latch.

2. Reinstall the DAE bezel.

To add the new SSD to the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster-name>"
3. Note the new SSD WWN.
For example:
wwn-0x5000cca013118950
4. Add the new SSD to the relevant XDP Group, using the following command:
add-ssd brick-id=<Brick ID> ssd-UID=<SSD Index or Name>
For example:
add-ssd brick-id=1 ssd-uid="wwn-0x5000cca013118950"
cluster-id="Cluster_One"

48 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Note: If the SSD you added is not a new SSD (out of the box) and was used in another
cluster or the same one, use the is-foreign-xtremapp-ssd flag.
For example:
add-ssd brick-id=1 ssd-uid="wwn-0x5000cca013118950"
is-foreign-xtremapp-ssd cluster-id="Cluster_One"

5. To add the new SSD to the XDP Group using the GUI, right-click the new SSD and click
Add SSD. This also assigns the SSD to the correct XDP Group.

To assign the new SSD to the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. Assign the SSD to the XDP Group, using the following command:
assign-ssd dpg-id=<X> ssd-id=<Y> cluster-id="<cluster name>"
where X = XDP group Index for the defective SSD and Y = defective SSD Index.
For example:
assign-ssd dpg-id=1 ssd-id="wwn-0x5000ccashow013118950"
cluster-id="Cluster_One"
3. Use the following command to check if the integration process has completed:
show-ssds cluster-id="<cluster name>"
Check if the State changes from assigning_to_rg to in_rg.

Rebuild in Progress

Note: If the XDP Group is in the process of rebuilding, allow it to complete.

To identify a revoked SSD, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"
3. Note any SSDs in revoked_from_rg state.
4. For each revoked SSD, perform the steps in “No Rebuild in Progress” on page 47.

To identify a revoked SSD, using the GUI:


1. Drag the mouse pointer over the defective SSD.
2. Hover the mouse pointer over the defective SSD; a tooltip appears showing the SSD
status.
3. For each revoked SSD, perform the steps in “No Rebuild in Progress” on page 47.

Replacing the SSDs 49


EMC CONFIDENTIAL
Replacing DAE Components

Failed to Rebuild

To identify an SSD that has failed in the XDP Group, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the SSDs status, using the following command:
show-ssds cluster-id="<cluster name>"
3. Note any SSDs in failed_in_rg state.
4. For each revoked SSD, perform the steps in “No Rebuild in Progress” on page 47.

To identify an SSD that has failed in the XDP Group, using the GUI:
1. Hover the mouse pointer over the defective SSD; a ToolTip appears, showing the SSD
status.
2. For each failed SSD, perform the steps in “No Rebuild in Progress” on page 47.

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

50 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Replacing a DAE Chassis



If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).

Tolerance
 Failure of a DAE chassis results in loss of service.

Identifying the Defective DAE Chassis

To identify the defective DAE chassis, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the DAE Chassis status, using the following command:
show-daes cluster-id="<cluster name>"
3. Note the Index of the DAE chassis with a non-healthy state.

To identify the defective DAE chassis, using the GUI:


 From the GUI, view the Inventory; the defective DAE chassis appears in orange.

Physically Locating the Defective DAE Chassis (Using LEDs)

To activate the DAE chassis identification LED, using the CLI:


1. Log in to the XMS CLI as tech.
2. Enter the control-led CLI command to locate the defective DAE chassis.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="DAE"
led-mode="blinking" object-id-list=[1]
This causes the LED to blink on DAE chassis number 1 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix C.

To activate the DAE chassis identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix C.

Replacing a DAE Chassis 51


EMC CONFIDENTIAL
Replacing DAE Components

Checking the XtremIO Cluster Health


Before replacing the defective DAE chassis, check the cluster’s health by using the
XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

52 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Replacing the Defective DAE Chassis

To remove the defective DAE chassis:


1. Log in to the XMS CLI as tech.
2. Run the following command to prevent unnecessary notifiers:
disable-notifiers
3. If the cluster is still running, stop the cluster using the following command:
stop-cluster cluster-id="<cluster name>"


Verify that you specify the correct cluster name.

xm cli (tech)> stop-cluster


Warning: You are about to stop the cluster service. All connected initiators will be
denied access to cluster data.
Are you sure you want to stop Cluster Cluster_One [1]? (Yes/No): Yes
The stop process may take several minutes. Please wait for successful completion
prior to powering off the cluster.
[xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] 100% Done! (elapsed time
00:03:06) Stopped Cluster Cluster_One [1]. Cluster state: stopped

4. If the cluster is in a factory-assembled rack, remove the shipping bracket.


5. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

6. If cables are not marked, label them so that you can reconnect them as required to the
new DAE chassis.
7. Disconnect the power cables from the DAE’s PSUs.
8. Disconnect the SAS cables from the DAE Controllers.
9. Remove the DAE Controller (LCC) units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing a DAE Controller
(LCC)”).
10. Remove the DAE power supply units from the defective DAE and immediately insert
them into the new DAE Chassis (for details, refer to “Replacing a DAE Power Supply”).
11. Remove the DAE bezel.
12. Remove each SSD (one at a time) from the defective DAE chassis and immediately
insert it into the same slot in the new DAE Chassis.

Replacing a DAE Chassis 53


EMC CONFIDENTIAL
Replacing DAE Components

13. If you are replacing the DAE of a 10TB Starter X-Brick (5TB):
a. Remove the 12 plastic air seals from slots 13 through 24 of the defective DAE
chassis.
b. Insert the removed air seals into slots 13 through 24 of the new DAE chassis.

If you are replacing the DAE of a regular X-Brick, ignore this step.
14. Remove the four screws (two per side) that secure the front of the enclosure to the
front vertical channels of the cabinet, and save the screws.
15. With help from another person, slide the enclosure out of the cabinet.

Note: If the defective DAE chassis should be sent to EMC for Failure Analysis (FA), refer to
Appendix D for the procedure details.

54 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

To install the new DAE chassis:


1. Slide the DAE chassis into the DAE chassis rails in the cabinet. Ensure that the
enclosure is fully inside the cabinet. The rail stops in the back seat into the back of the
enclosure at the correct depth, and the front of the enclosure is aligned with the
cabinet face.
2. When the DAE chassis is in place, insert and tighten all of the screws. It may be easier
to install the screws working in a diagonal pattern, such as bottom left and top right,
bottom right and top left.

3. Reinstall the DAE bezel.


4. Connect the SAS cables.
5. Connect the power cables.
6. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

7. If you removed any shipping brackets, re-install them.

Replacing a DAE Chassis 55


EMC CONFIDENTIAL
Replacing DAE Components

Configuring the Replaced DAE chassis

To configure the replaced DAE chassis:


1. Log in to the XMS CLI as tech.
2. Display the DAE chassis, using the following command:
show-daes

Note: If the state of the DAE chassis is other than healthy, contact XtremIO Global
Tech support.

3. Replace the DAE chassis, using the following command:


replace-dae dae-id=<ID> cluster-id="<cluster name>”
4. Wait for several seconds and make sure that the new DAE chassis is in a healthy state,
using the following command:
show-daes cluster-id="<cluster name>"

xmcli (tech)> show-daes


Name Index Serial-Number State FW-Version Part-Number Brick-Name Index Cluster-Name Index
X1-DAE 1 APM00134401968 healthy 149 100-562-964 X1 1 xbrick238 1
X1-DAE 2 APM00134301084 healthy 149 100-562-964 X1 1 xbrick238 1

Note: If the state of the DAE chassis is other than healthy, contact XtremIO Global
Tech support.

5. Power up the cluster using the following command:


start-cluster cluster-id="<cluster name>"
6. Wait until the following message appears:
Cluster started
7. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"
8. To resume notifiers, run the following command:
restore-notifiers

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

56 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing a DAE Controller (LCC)



If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).

Tolerance
 Failure of both DAE Controllers (or all SAS cables) in the same X-Brick results in loss of
service.

Identifying the Defective DAE Controller

To identify the defective DAE Controller, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the DAE Controllers status, using the following command:
show-daes-controllers cluster-id="<cluster name>"
3. Note the Index of the DAE Controller with a non-healthy state.

To identify the defective DAE Controller, using the GUI:


 From the GUI, view the Inventory; the defective DAE Controller appears in orange.

Replacing a DAE Controller (LCC) 57


EMC CONFIDENTIAL
Replacing DAE Components

Physically Locating the Defective DAE Controller (Using LEDs)

To activate the DAE Controller identification LED, using the CLI:


1. Log in to the XMS CLI as tech.
2. Enter the control-led CLI command to locate the defective DAE Controller.
For example, you can use the following command:
control-led cluster-id="Cluster_One" entity="DAELCC"
led-mode="blinking" object-id-list=[3]
This causes the LED to blink on DAE Controller number 3 on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix C.

To activate the DAE Controller identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix C.

Checking the XtremIO Cluster Health


Before replacing the defective DAE Controller, check the cluster’s health by using the
XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

58 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Replacing the Defective DAE Controller

To remove the defective DAE Controller:


1. Log in to the XMS CLI as tech.
2. Run the following command to ensure that all Storage Controllers are healthy:
show-storage-controllers cluster-id="<cluster name>"

Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Tech Support before taking any further action.

3. If the cluster is a factory-assembled rack, remove the shipping bracket from behind
the DAE to be serviced.
4. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

5. Make sure that the SAS cables are labeled. If not, label them as necessary, so that you
can reconnect them as required to the DAE Controller.
6. Disconnect the SAS cables from the defective DAE Controller.

Note: When disconnecting the cables it is important to note the ports the cables were
disconnected from, so that you can reconnect them to the same ports after installing
the new DAE Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.

Replacing a DAE Controller (LCC) 59


EMC CONFIDENTIAL
Replacing DAE Components

7. Remove the defective DAE Controller unit from the DAE as follows:
a. Locate the orange handle buttons on the DAE Controller handles.
b. Press the orange handle buttons to release the DAE Controller, pull the latches
outward, and remove the DAE Controller from its slot.

Note: If the defective DAE Controller should be sent to EMC for Failure Analysis (FA), refer
to Appendix D for the procedure details.

To install the new DAE Controller:


1. Connect the SAS cables to the new DAE Controller.
2. Pull out the latches on the DAE Controller and make sure that they stay in the open
position.
3. Align the DAE Controller with the chassis opening and gently push it straight into the
chassis. Make sure that the DAE Controller is completely seated in the chassis.
4. Press the latches to secure the DAE Controller.
5. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

6. If you removed any shipping brackets, re-install them.

60 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Configuring the Replaced DAE Controller


1. Log in to the XMS CLI as tech.
2. Confirm that the new DAE Controller is available, using the following command:
show-daes-controllers cluster-id="<cluster name>"
3. Wait for 10 minutes and then run the following command:
show-storage-controllers cluster-id="<cluster name>"
Check the State column to ensure that all Storage Controllers are healthy.

Note: If one of the Storage Controllers is not operating correctly, contact XtremIO
Global Tech Support before taking any further action.

4. Replace the DAE Controller, using the following command:


replace-dae-controller dae-lcc-id=<ID> cluster-id="<cluster
name>"
where ID is the Index of the defective DAE Controller.
5. Wait for several seconds and then run the following command:
show-daes-controllers cluster-id="<cluster name>"
Make sure that for the new DAE Controller, the State column displays healthy.
6. Wait for 10 minutes and then run the following commands:
show-clusters
show-modules cluster-id="<cluster name>"
Make sure that the cluster and modules are active.

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing a DAE Controller (LCC) 61


EMC CONFIDENTIAL
Replacing DAE Components

Replacing a DAE Power Supply


Tolerance
 Failure of a single DAE power supply bears no consequence.
 Failure of both DAE power supplies in the same DAE results in loss of service.

Identifying the Defective DAE Power Supply

To identify the defective DAE power supply, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the DAE power supply status, using the following command:
show-daes-psus cluster-id="<cluster name>"
3. Note the Index of DAE power supplies with a non-healthy state.

To identify the defective DAE power supply, using the GUI:


 From the GUI, view the Inventory; the defective DAE power supply appears in orange.

Checking the XtremIO Cluster Health


Before replacing the defective DAE power supply, check the cluster’s health by using the
XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

62 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective DAE Power Supply

Note: Access to the disks in your DAE times out two minutes after a DAE power supply unit
is removed. While the system continues operating on a single PSU, the loss of the
removed PSU causes a timeout unless the PSU is replaced within two minutes. When
replacing a DAE PSU, ensure that the green light on the PSU remains permanently on for at
least five seconds before removing power on the second PSU.

To remove the defective DAE power supply:


1. If the cluster is in a factory-assembled rack, remove the shipping bracket.
2. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

3. Disconnect the power cable from the defective DAE power supply.

Note: Ensure that the new DAE PSU is prepared for insertion.

Replacing a DAE Power Supply 63


EMC CONFIDENTIAL
Replacing DAE Components

4. Remove the defective DAE power supply.

Note: If the defective DAE power supply should be sent to EMC for Failure Analysis (FA),
refer to Appendix D for the procedure details.

To install the new DAE power supply:


1. Insert the new DAE power supply.

2. Connect the DAE power supply power cable. A green light indicates that the DAE power
supply is successfully connected.

64 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing DAE Components

3. If you initially tilted the cable management bracket's tray (up/down) on the Storage
Controller adjacent to the DAE, return it to its original position, by pulling the latches
(on the left and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Note: If there are two Storage Controllers adjacent to each other, first return the cable
management bracket's tray nearest to the component being replaced, to its original
position, and then return the second tray.

4. If you removed any shipping brackets, re-install them.

Configuring the Replaced DAE Power Supply


1. Log in to the XMS CLI as tech.
2. Replace the DAE power supply, using the following command:
replace-dae-psu dae-psu-id=<ID> cluster-id="<cluster name>"
where id is the Index of the defective DAE power supply.
3. Wait for several seconds and make sure that the new DAE power supply is in healthy
state, using the following command:
show-daes-psus cluster-id="<cluster name>"
4. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id="<cluster name>"

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing a DAE Power Supply 65


EMC CONFIDENTIAL
Replacing DAE Components

66 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

CHAPTER 4
Replacing InfiniBand Switch Components

This chapter includes the following topics:


 Replacing the InfiniBand Switch.............................................................................. 68
 Replacing InfiniBand Switch Power Supply Units..................................................... 74
 Replacing InfiniBand Switch Fan Units .................................................................... 77

Replacing InfiniBand Switch Components 67


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Replacing the InfiniBand Switch



If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).

Note: In versions below 2.2.2.10, the InfiniBand Switches names, indexes and location
are reversed in the XMS GUI and CLI. Make sure you operate on the other switch than the
one indicated in the Hardware View. As best practice, you can compare the switch’s actual
S/N to the one presented in the GUI. It is always advisable to check the cable connection
and LED activities on the Storage Controllers IB NIC to make sure that you are operating on
the correct switch.

Tolerance
 Failure of a single InfiniBand Switch renders the cluster vulnerable to risk of failure of
the second InfiniBand Switch and therefore, compromises redundancy.
 Failure of both InfiniBand Switches in the same cluster results in loss of service.

Identifying the Defective InfiniBand Switch

Note: System Status LEDs are located at the front and rear of the InfiniBand Switch. A solid
red LED indicates that a major error has occurred.

To identify the defective InfiniBand Switch, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switch status, using the following command:
show-infiniband-switches cluster-id=”<cluster name>”

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

3. Note the Index of the defective InfiniBand Switch.

To identify the defective InfiniBand Switch, using the GUI:


 From the GUI, view the Hardware; the defective InfiniBand Switch appears in orange.

68 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Checking the XtremIO Cluster Health


Before replacing the defective InfiniBand Switch, check the cluster’s health by using the
XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective InfiniBand Switch

Note: Make sure that all cables are clearly labeled to enable proper connection to the new
InfiniBand Switch.

To remove the defective InfiniBand Switch:


1. Remove the InfiniBand Switch Bezel.
2. If necessary, from the rear side of the Storage Controller located adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right side of the
cable management bracket, and then push the tray either up or down.
3. From the front of the InfiniBand Switch, disconnect the InfiniBand Switch power
cables.
4. Disconnect any other InfiniBand Switch cables.
5. If a shipping bracket is installed directly above or below the InfiniBand Switch, remove
it to prevent damage to the foam padding.

Replacing the InfiniBand Switch 69


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

6. Carefully remove the InfiniBand Switch from the rack, taking care not to disconnect
any other cables.
7. Note the position of the inner rails on the defective InfiniBand Switch, so as to mount
them at the exact same position, on the new InfiniBand Switch.
8. Remove the inner rails from the InfiniBand Switch.

Note: It is recommended to remove and install one rail (for reference) before removing
the second rail.

120

18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

PS1
PS2
UID
RST

Note: If the defective InfiniBand Switch should be sent to EMC for Failure Analysis (FA),
refer to Appendix D, page D-101 for the procedure details.

To install the new InfiniBand Switch:


1. Align the screw holes of each inner rail with those on the side of the InfiniBand Switch,
as previously noted in step 7 on page 70 .

Note: Verify that the correct holes are aligned to ensure that the depth of the
InfiniBand Switch within the rack is adjusted correctly.

2. Secure each inner rail to the InfiniBand Switch, using three screws.
3. Lift the InfiniBand Switch and slide it onto the rails.

70 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

4. Align the screw hole of each bezel clip with those on the front side of the inner rails
(one on each side).

120
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

2
S1
PS2
P
UIDT
RS

5. Through each bezel clip, tighten a screw (one on each side) to secure the unit to rack.
6. Connect the InfiniBand Switch power cables.
7. Connect the InfiniBand Switch interlink cables (labeled IBSW1-P17 and IBSW1-P18).
8. If you removed a shipping bracket directly above or below the InfiniBand Switch,
re-install it.
9. Wait for the interlinks to synchronize, as shown by the green LEDs on the InfiniBand
Switch associated ports.
10. Connect the remaining InfiniBand cables from the Storage Controllers.
11. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

12. Install the bezel.

Replacing the InfiniBand Switch 71


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Configuring the Replaced InfiniBand Switch

To configure the InfiniBand Switch:


1. Log in to the XMS CLI as tech.
2. Wait for several seconds and run the following command:
replace-infiniband-switch ibswitch-id=<id>"
cluster-id=<cluster name>

Note: Make sure you are configuring the InfiniBand Switch that was just replaced, and
not the existing InfiniBand Switch.

3. Wait for several seconds and then run the following command:
show-infiniband-switches cluster-id=<cluster name>
Make sure that for the new InfiniBand Switch, the State column displays healthy.
4. Verify that the PSU is healthy, by running the following command:
show-infiniband-switches-psus cluster-id= <cluster name>
5. Verify that the cluster and modules are active, by running the following commands:
show-clusters
show-modules cluster-id=<cluster name>
The output for show-clusters when the cluster is online:

xmcli (tech)> show-clusters


Cluster-Name Index State Gates-Open Conn-State Num-of-Vols Num-of-Internal-Volumes Vol-Size UD-SSD-Space Logical-Space-In-Use...
xbrick335 1 active True connected 18 12 9.712T 30.489T 449.60G

72 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

The output for show-modules when all modules are active:

Module-Name Index Cluster-Name Index XEnv-Name Index Storage-Controller-Name Index Assigned-To-LCC Module-Type State
X1-SC1-R1 1 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 ROUTER active
X1-SC1-C1 2 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 CONTROL active
X1-SC1-D1 3 xbrick700-701 1 X1-SC1-E1 1 X1-SC1 1 X1-DAE-LCC-B DATA active
X1-SC1-R2 4 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 ROUTER active
X1-SC1-C2 5 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 CONTROL active
X1-SC1-D2 6 xbrick700-701 1 X1-SC1-E2 2 X1-SC1 1 X1-DAE-LCC-A DATA active
X1-SC2-R1 7 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 ROUTER active
X1-SC2-C1 8 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 CONTROL active
X1-SC2-D1 9 xbrick700-701 1 X1-SC2-E1 3 X1-SC2 2 X1-DAE-LCC-B DATA active
X1-SC2-R2 10 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 ROUTER active
X1-SC2-C2 11 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 CONTROL active
X1-SC2-D2 12 xbrick700-701 1 X1-SC2-E2 4 X1-SC2 2 X1-DAE-LCC-A DATA active
X2-SC1-R1 13 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 ROUTER active
X2-SC1-C1 14 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 CONTROL active
X2-SC1-D1 15 xbrick700-701 1 X2-SC1-E1 5 X2-SC1 3 X2-DAE-LCC-B DATA active
X2-SC1-R2 16 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 ROUTER active
X2-SC1-C2 17 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 CONTROL active
X2-SC1-D2 18 xbrick700-701 1 X2-SC1-E2 6 X2-SC1 3 X2-DAE-LCC-A DATA active
X2-SC2-R1 19 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 ROUTER active
X2-SC2-C1 20 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 CONTROL active
X2-SC2-D1 21 xbrick700-701 1 X2-SC2-E1 7 X2-SC2 4 X2-DAE-LCC-B DATA active
X2-SC2-R2 22 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 ROUTER active
X2-SC2-C2 23 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 CONTROL active
X2-SC2-D2 24 xbrick700-701 1 X2-SC2-E2 8 X2-SC2 4 X2-DAE-LCC-A DATA active

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing the InfiniBand Switch 73


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Replacing InfiniBand Switch Power Supply Units


Note: Replacement of an InfiniBand Switch power supply unit is only supported for
XtremApp version 4.0.2 (or higher).

InfiniBand Switches are equipped with two replaceable power supply units that work in a
redundant configuration. Either unit may be extracted without bringing down the system.

Note: Make sure that the power supply unit that you are NOT replacing is showing all
green, for both the power supply unit and System Status LEDs.

Tolerance
 Failure of a single InfiniBand Switch power supply unit does not affect the InfiniBand
Switch operation.
 Failure of both InfiniBand Switch power supply units will lead to an InfiniBand Switch
failure.

Identifying the Defective InfiniBand Switch Power Supply Unit

To identify the defective InfiniBand Switch power supply unit, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switches status, using the following command:
show-infiniband-switches-psus cluster-id=<cluster name>

xmcli (tech)> show-infiniband-switches-psus


Name Index Index-In-Cluster Location Input-Power State
IB-SW1-PSU1 1 1 left on healthy
IB-SW1-PSU2 2 2 right on healthy
IB-SW2-PSU1 3 1 left on healthy
IB-SW2-PSU2 4 2 right on healthy

3. Note the Index of the InfiniBand Switch power supply unit with a non-healthy state.

74 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Checking the XtremIO Cluster Health


Before replacing the defective InfiniBand Switch Power Supply Unit, check the cluster’s
health by using the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective InfiniBand Switch Power Supply Unit

To remove the defective InfiniBand Switch power supply unit:


1. Unlatch the power cord retainer, and remove the power cord from the power supply
unit.
2. Grasping the handle with your right hand, push the latch release with your thumb
while pulling the handle outward. As the power supply unit unseats, the power supply
unit status LEDs turn off.

Push Here Release Latch

Replacing InfiniBand Switch Power Supply Units 75


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

3. Remove the power supply unit.

To insert the new InfiniBand Switch power supply unit:


1. Make sure that the mating connector of the new power supply unit is free of dirt
and/or obstacles.

Note: Do not attempt to insert a power supply unit with a power cord connected to it.

2. Insert the power supply unit by sliding it into the opening until a slight resistance is
felt.
3. Continue pressing the power supply unit until the latch snaps into place, confirming
proper installation.
4. Insert the power cord into the power supply unit connector, until the power cord
retainer is latched.

Note: The green power supply unit indicator should illuminate. If not, repeat the whole
procedure to extract the power supply unit, and re-insert it.

Note: Make sure that the latches are engaged and the tray is locked in its position.

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

76 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Replacing InfiniBand Switch Fan Units


Note: Replacement of an InfiniBand Switch fan unit is only supported for XtremIO XIOS
versions 4.0.2 or later.

Tolerance
 Failure of one or more fan units does not affect the InfiniBand Switch operation, as
long as the ambient temperature is below 45° Celsius.
 If one or more fan units fail and the ambient temperature exceeds 45° Celsius, the
InfiniBand Switch fails.

Note: Operation without a fan unit should not exceed two minutes.
During a fan hot-swap procedure, if the LED indicator is OFF, the fan unit is
disconnected.

Note: Make sure that the fans have the air flow that matches the model number. An air
flow opposite to the system design will cause the system to operate at a higher (less
than optimal) temperature.

Identifying the Defective InfiniBand Switch Fan Unit

To identify the defective InfiniBand Switch fan unit, using the CLI:
1. Log in to the XMS CLI as tech.
2. List the InfiniBand Switch power supply unit status, using the following command:
show-infiniband-switches cluster-id=<cluster name>

xmcli (tech)> show-infiniband-switches


Name Index Index-In-Cluster Serial-Number Part-Number State FW-Version FW-Version-Error FAN-Drawer-State
IB-SW1 1 1 0xf45214030046ee90 *** SwitchX - Mellanox Technologies healthy 09.03.0000 no_error failed
IB-SW2 2 2 0xf45214030046ee10 *** SwitchX - Mellanox Technologies healthy 09.03.0000 no_error healthy

3. Note the Index of the defective InfiniBand Switch fan unit.

Replacing InfiniBand Switch Fan Units 77


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

Checking the XtremIO Cluster Health


Before replacing the defective InfiniBand Switch Fan Unit, check the cluster’s health by
using the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.

Note: You can access the EMC SolVe Desktop at:


https://solve.emc.com/desktopbinaries/setup.exe

The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="<cluster name>"


For guidance on running the XtremIO Health-Check Script and on resolving its output, refer
to EMC KB # 206076 (https://support.emc.com/kb/206076). If an unexpected error is
reported by the HCS, submit a standard Service Request to XtremIO Global Technical
Support.

Executing the Encryption-Recovery Procedure


Execute an encryption recovery procedure on the XtremIO cluster before starting this
replacement procedure. Refer to EMC KB# 482666 for details
(https://support.emc.com/kb/482666).


Failure to follow the above step may lead to data loss on the affected XtremIO cluster.

Replacing the Defective InfiniBand Switch Fan Unit

To remove the defective InfiniBand Switch fan unit:


1. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right side of the
cable management bracket, and then push the tray either up or down.
2. Unseat the fan unit by grasping the handle with your right hand and pushing the latch
release with your thumb while pulling the handle outward. As the fan unit unseats, the
fan unit status LEDs turn off.

78 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

3. Remove the fan unit.

Push Here Push Here

To install the new InfiniBand Switch fan unit:


1. Make sure that the mating connector of the new unit is free of dirt and/or obstacles.
2. Insert the fan unit by sliding it into the opening until slight resistance is felt. Continue
pressing the fan unit until it seats completely.


The green Fan Status LED should illuminate. If not, extract the fan unit and reinsert it.
After two unsuccessful attempts to install the fan unit, contact XtremIO Global Tech
Support for guidance and directions. No further action should be taken without
explicit direction from XtremIO Global Tech Support.

3. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by pulling the latches (on the left and right side of the bracket) until
the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in its position.

4. Identify the status of the fan unit via the XMS CLI, using the following command:
show-infiniband-switches cluster-id=<cluster name>

Checking the XtremIO Cluster Health (Post Replacement)


After completing the replacement procedure, it is necessary to check the cluster’s health
again, by running the XtremIO Health Check Script (HCS).
Download the latest HCS, available on the EMC XtremIO SolVe generator.
The following example shows the script for running an XtremIO HCS on the first cluster that
is connected to the XMS:
run-script script="system_health-vXXX.X.X-s4.0.0.py"
arguments="--cluster-id 1"

Note: For guidance on running the XtremIO Health-Check Script and on resolving its
output, refer to EMC KB # 206076 (https://support.emc.com/kb/206076). If an
unexpected error is reported by the HCS, submit a standard Service Request to XtremIO
Global Technical Support.

Replacing InfiniBand Switch Fan Units 79


EMC CONFIDENTIAL
Replacing InfiniBand Switch Components

80 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

CHAPTER 5
Replacing Battery Backup Units

The Battery Backup Unit replacement procedure should be performed, using the XtremIO
Technician Advisor utility, following a Service Request (SR) determined by XtremIO Global
Technical Support. If you have any questions or encounter problems, contact XtremIO
Global Technical Support. Technician Advisor is initially used to identify defective Battery
Backup Units on the cluster, and is then used to replace each Battery Backup Unit that is
identified as defective.

This chapter includes the following topics:


 Replacing a Battery Backup Unit (BBU) Using the Technician Advisor Utility ............ 82
 Replacing a Serial Communication Cable for a 5P 1550i BBU .................................. 83

Replacing Battery Backup Units 81


EMC CONFIDENTIAL
Replacing Battery Backup Units

Replacing a Battery Backup Unit (BBU) Using the Technician


Advisor Utility

The Battery Backup Unit is heavy and should be removed from and installed into the rack
by two people. To avoid personal injury and/or damage to the equipment, do not attempt
to lift or install the BBU without a mechanical lift and/or help from another person.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).

Battery Backup Unit Types


Battery Backup Units of an XtremIO cluster can be of one of the following types:
 5P 1550i R
 1550 Evolution

Tolerance
 Failure of more than half of the BBUs in the same cluster results in loss of service.

Identifying the Defective BBU

To identify the defective BBU, using the CLI:


1. Log in to the XMS CLI as tech.
2. List the BBUs status, using the following command:
show-bbus cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

3. Note the Index of the BBUs with a non-healthy state.

To identify the defective BBU, using the GUI:


 From the GUI, view the Inventory; the defective BBU appears in orange.

82 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Battery Backup Units

Replacing a BBU
Once a defective Battery Backup Unit is identified, the Battery Backup Unit replacement
procedure must be performed using the Technician Advisor utility.

Note: For details on the XtremIO Technician Advisor utility, refer to the XtremIO Technician
Advisor Utility User Guide, which is posted in the XtremIO SolVe Generator, under Service
Scripts and Utilities > XtremIO Technician Advisor.

Note: If the XtremIO Technician Advisor Utility User Guide instructs that the Technician
Advisor utility cannot be used to replace Battery Backup Units on your cluster, contact
XtremIO Global Tech Support for directions on how to manually replace the Battery Backup
Units.


Replacing a Battery Backup Unit manually may lead to data-loss if not performed
correctly! Therefore, every effort must be made to use Technician Advisor to automatically
replace a cluster’s Battery Backup Unit.

Replacing a Serial Communication Cable for a 5P 1550i BBU


Note: These instructions are not applicable for 1550 Evolution BBU serial communication
cables.


Incorrect replacement of 5P 1550i BBU serial communication cables may result in
damage to connectors and/or component ports.

5P 1550i Battery Backup Units are supplied with DB9-RJ45 serial data cables
accompanied by DB9-RJ50 adapters, or with RJ45-RJ50 serial communication cables with
labeling clearly indicating which devices and ports to plug into, depending on the XtremIO
hardware version in use.
A defective cable and/or cable adapter of this type must be replaced with a new RJ45-RJ50
serial communication cable.

Note: Replacement RJ45-RJ50 serial communication cables may not be labeled to indicate
which devices and ports to plug into.

This section describes recabling a 5P 1550i BBU to a Storage Controller using a


replacement RJ45-RJ50 serial communication cable.

Replacing a Serial Communication Cable for a 5P 1550i BBU 83


EMC CONFIDENTIAL
Replacing Battery Backup Units

Tolerance
 In single X-Brick clusters, a failure of both communication cables (one for each BBU)
results in loss of service.
 In multiple X-Brick clusters, a failure of more than half of the overall communication
cables in the cluster results in loss of service.

Verifying Failed Serial Communication Cables

To verify whether failed serial communication cables exist within the cluster:
1. Log in to the XMS CLI as tech.
2. Run the following command:
show-bbus cluster-id="<cluster name>"

Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy sc_2_disconnected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...

Disabling All Notifiers


To disable all Notifiers:
1. Log in to the XMS CLI as tech.
2. Disable all Notifiers, using the following command:
disable-notifiers

xmcli (tech)> disable-notifiers


Event notifiers were disabled

Replacing the Defective Cable


To replace the defective cable:
1. If necessary, from the rear side of the connecting Storage Controller, tilt the cable
management bracket's tray (up/down) to gain better access, by simultaneously
pulling the latches on the left and right, and then pushing the tray downwards, as
described on page 19.
2. Disconnect the RJ45 end of the defective communication cable from the 10101 port of
the Storage Controller.

10101

84 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Replacing Battery Backup Units

3. Disconnect the RJ50 end of the defective communication cable (or cable adapter) from
the COM (R) port of the BBU.

COM (R)

Note: Discard the defective communication cable/cable and adapter.

4. Connect the RJ45 end of the replacement communication cable (as indicated in the
figure below) to the 10101 port of the Storage Controller.

RJ45 Connector RJ50 Connector (with 'Plug Boot')

5. Connect the RJ50 end of the replacement communication cable (as indicated in the
figure above) to the COM (R) port of the BBU.

Note: Verify that the RJ50 end of the cable is connected to the BBU COM (R) port, and
that the RJ45 end of the cable is connected to the Storage Controller 10101 port.

6. If you initially tilted the cable management bracket's tray (up/down) of the connecting
Storage Controller, return it to its original position by pulling the latches (on the left
and right sides of the bracket) until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

Verifying Replaced Serial Communication Cables

To verify replaced serial communication cables:


1. Log in to the XMS CLI as tech.
2. Run the following command:
show-bbus cluster-id="<cluster name>"

Name Index Model Serial-Number Power-Feed State Connectivity-State Enabled-State Input Battery-Charge BBU-Load Voltage FW-Version Part-Number Brick-Name Index Cluster-Name Index ...
X1-BBU 1 Evolution 1550 DV0P2308A PWR-A healthy connected enabled on 100 24 210 9901DC 078-000-114 X1 1 xtremio-svt-003 1 ...
X2-BBU 2 Evolution 1550 DV0P23078 PWR-B healthy connected enabled on 100 22 211 9901DC 078-000-114 X2 2 xtremio-svt-003 2 ...

Replacing a Serial Communication Cable for a 5P 1550i BBU 85


EMC CONFIDENTIAL
Replacing Battery Backup Units

Restoring All Notifiers


To restore all Notifiers:
1. Log in to the XMS CLI as tech.
2. Restore all Notifiers, using the following command:
restore-notifiers

xmcli (tech)> restore-notifiers


Event notifiers were restored

86 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

APPENDIX A
Software Re-Installation

This section provides instructions for downloading and re-installing a software image on
the Storage Controller and XMS.
This section includes the following topics:
 Writing the XtremIO Rescue Image to a USB Drive.................................................... 88
 Re-Installing a Storage Controller ............................................................................ 90
 Re-Installing a Physical XMS ................................................................................... 91

Software Re-Installation 87
EMC CONFIDENTIAL
Software Re-Installation

Writing the XtremIO Rescue Image to a USB Drive


Before writing the XtremIO rescue image to a USB drive, perform the following steps:

Note: Verify that you have a USB drive that is at least 2GB in capacity.

1. Locate the XtremIO Rescue Image from the XtremIO Global Tech Support page in
support.emc.com.
For details on the XtremIO Storage Controller Rescue Image or XtremIO virtual XMS
Rescue Image to download from the support page, refer to the latest Release Notes for
the XtremIO installed version.

Note: Before proceeding, access the EMC Support page and verify that the MD5
checksum of the package you downloaded matches the MD5 checksum that appears
in the support page for that package.

2. Download the image to the local machine where the USB drive will be created.

Note: Before you proceed, verify that the USB drive is available.

To write the XtremIO image to a USB drive (on Windows 7):


1. Download and unpack the Win32 Disk Imager utility
(http://sourceforge.net/projects/win32diskimager/).
2. Launch the Win32 Disk Imager utility on the local machine.
3. Insert the USB drive into the USB port on the Windows machine.
4. Under Image File in the Win32 Disk Imager dialog box, click the folder icon and select
the XtremIO Rescue Image file you downloaded earlier.
5. Under Device, click the drop-down menu and select the device drive letter for the USB
drive.

Note: Use Window Explorer to make sure that the correct drive letter is selected.

88 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Software Re-Installation

6. Click Write to write the image file to the USB Drive; a warning appears to indicate that
existing data on the selected drive will be overwritten.

7. Verify that the correct drive letter is selected and click Yes to confirm.
8. Follow the write operation progress. When the operation is completed, a message
appears, indicating that the write was successful.

9. From the Windows Notification Area, click Safely Remove Hardware and Eject Media.

10. From the menu, select Eject USB drive.

Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).

Wait for the "Safe to remove hardware" message to appear in the Notification Area and
remove the USB drive.

Writing the XtremIO Rescue Image to a USB Drive 89


EMC CONFIDENTIAL
Software Re-Installation

Re-Installing a Storage Controller


Note: Unless instructed otherwise in this document, always consult with XtremIO before
reinstalling the Storage Controller.

An X-Brick Storage Controller image is available for USB flash drives to restore a Storage
Controller to its original state.

Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.

To re-install a Storage Controller:


1. Disconnect the InfiniBand and SAS cables to the Storage Controller.

Note: It is important to keep the affected Storage Controller isolated from the rest of
the XtremIO cluster, throughout the re-installation procedure.

2. Power-cycle the Storage Controller by unplugging and re-connecting its two power
cables.
3. As the Storage Controller powers up, press F6 to enter the Boot Device menu.
4. When prompted, type the BIOS password to display the Boot Device menu.

Note: If the Boot Device menu is not displayed, F6 was pressed too late. Go back to
step 1 and repeat the procedure.

5. In the Boot Device menu, select USB device.

Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).

6. When the Storage Controller is booted-up, select Install XtremApp from the GRUB
menu.

90 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Software Re-Installation

7. Wait for the installation to complete and for the Storage Controller to reboot.
8. Remove the USB drive.
9. Reconnect the InfiniBand and SAS cables to the Storage Controller.
For cabling guidelines refer to XtremIO Storage Array Hardware Installation and
Upgrade Guide.

Re-Installing a Physical XMS


Note: Always consult with XtremIO Technical Support before re-installing the physical
XMS.

An XMS image is available for USB flash drives to install physical XMS node.
Extract the image to a USB flash drive (refer to “Writing the XtremIO Rescue Image to a USB
Drive” on page 88) and connect the USB flash drive to the XMS USB port.

Note: Before starting the procedure, verify that you have a KVM or keyboard and monitor
connected.

To re-install the physical XMS:


1. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access.
2. Power-cycle the XMS by unplugging and reconnecting its two power cables.
3. If you initially tilted the cable management bracket's tray (up/down), return it to its
original position, by moving the cable management tray (up/down) until the latches
are aligned and an audible click is heard.
4. As the XMS powers up, press F6 to enter the Boot Device menu.
5. When prompted, type the BIOS password to display the Boot Device menu.

Note: If the Boot Device menu is not displayed, F6 was pressed too late. Go back to
step 1 and repeat the procedure.

6. In the Boot Device menu, select USB device.

Note: The menu option includes the USB drive’s brand name (e.g. "Eject Cruzer Blade"
appears when SanDisk Cruzer Blade USB drive is used).

Re-Installing a Physical XMS 91


EMC CONFIDENTIAL
Software Re-Installation

7. When the server is booted-up, select Install XMS from the GRUB menu.

8. Wait for the installation to complete and for the XMS to reboot.
9. Remove the USB drive.

92 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

APPENDIX B
Generating and Uploading a Log Bundle

This section provides instructions for generating and loading an XtremIO log bundle to FTP.
This section includes the following topics:
 Generating and Collecting the Bundle ..................................................................... 94
 Uploading the Bundle Collection............................................................................. 94

Generating and Uploading a Log Bundle 93


EMC CONFIDENTIAL
Generating and Uploading a Log Bundle

Generating and Collecting the Bundle


To generate and collect the bundle:
1. Log in to the XMS CLI as admin.
2. Issue a dossier package collection, using the following command:
xmcli (admin)> create-debug-info cluster-id="<cluster name>"

Note: It is recommended to use the cluster name (and not the cluster ID) as the cluster
identifier in cluster-related XMCLI commands.

Note: The cluster-id parameter is not mandatory for single cluster configurations.

The following message appears:

The process may take a while. Please do not interrupt.


Debug info collected and could be accessed via http://...

3. Copy the link into a web browser and download the package.

Uploading the Bundle Collection


To upload the package:
1. Connect to the XtremIO FTP, using either of the following methods:
• With FTP client - connect to ftp://ftp.xtremio.com/ using an anonymous user and
your email address as password.
• With a browser - go to https://ftp.emc.com/. In the list box, select XtremIO and
type the anonymous user and your email address as password.
2. Create a directory with a name, containing the customer name and SR number (case
number).
For example:
Customer-12345678

94 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

APPENDIX C
Using LEDs to Identify Hardware Components

This section provides instructions for locating LEDs through CLI commands and using the
GUI.
This section includes the following topics:
 Hardware Components’ LEDs .................................................................................. 96
 Using the GUI to Activate Identification LEDs........................................................... 97
 Using the CLI to Activate the Identification LEDS ..................................................... 98

Using LEDs to Identify Hardware Components 95


EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Hardware Components’ LEDs


Many of the XtremIO Storage Array hardware components are equipped with two LED types
that enable you to monitor the components’ health:
 Identification LED - Used to identify a component in the cluster.
 Status LED - Used to indicate the status of the component.
In addition to the actual LEDs on the physical hardware components, identical graphical
representation of the LEDs appear in the GUI’s Inventory, Graphical View image.
The possible states of the LEDs are:
 Off
 On (beacon)
Table 4 provides details of the hardware components’ LEDs.

Table 4 Hardware Components’ LEDs

Component Identification LED Status LED

Storage Controller Yes Yes

Storage Controller SSD Yes Yes

Storage Controller HDD Yes Yes

Storage Controller PSU & Fan No Yes

DAE Yes Yes

DAE SSD Yes Yes (called "Data LED")

DAE Controller Yes Yes

DAE PSU & Fan No Yes

Battery Backup Unit No Yes

InfiniBand Switch No Yes

InfiniBand Switch PSU No Yes

InfiniBand Switch Fan No Yes

Physical XMS No Yes

Physical XMS PSU & Fan No Yes

96 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Using the GUI to Activate Identification LEDs


You can identify a component in the cluster by turning on its identification LED, or, if the
component has failed and does not respond, by turning on the LEDs of all other
components (all but the selected component).

To turn on a component’s identification LED:


1. In the Inventory, hover the mouse pointer over the relevant hardware component and
right-click to open the drop-down menu.
2. Select Turn On Identification LED for <component’s name>; a message appears, stating
that the component’s LED will be turned On/Off.
3. Click OK.

Note: If the component’s identification LED is already turned on, a check sign appears
next to the Turn On Identification LED option and the message box that follows states
that the LED will be turned off.

To turn all other identification LEDs on or off:


1. In the Inventory, hover the mouse pointer over the relevant hardware component and
right-click to open the drop-down menu.
2. Select Change all other <component type> Identification LEDs.

3. In the Change All Other Identification LEDs dialog box, select the desired state of the
LEDs (On or Off) and click OK; LEDs of all components, except for the LED of the
component you want to identify, change their state.

Using the GUI to Activate Identification LEDs 97


EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

Using the CLI to Activate the Identification LEDS


You can identify a component in the cluster by turning on its identification LED, or, if the
component has failed and does not respond, by turning on the LEDs of all other
components (all but the selected component).
The control-led and show-leds commands are used for activating the LEDs.

control-led
The control-led command beacons the identification LED.

Input Parameter Description Value Mandatory

cluster-id Cluster ID id: name or index No

entity FRU 'SSD', 'DAEController', 'LocalDisk', Yes


'StorageController', 'DAE'

inverse-mode Apply on all except for N/A No


the specified one.

led-mode The desired LED mode 'On'1, ’Blinking’2 or 'Off' Yes

object-id-list Object ID list List of IDs: name or index Yes


if class=node, format is ["X1-N1",
X1-N2"]
1. On applies to 'LocalDisk' and 'StorageController' only.
2. Blinking applies to 'SSD', 'DAEController' and 'DAE' only.

LEDs Format Names


The format names for the objects that are returned by the control-led and show-led
commands are shown in the table below.

Entity Name Format Example

DAE X1-DAE

DAEController X1-DAE-LCC-A

LocalDisk X1-SC1-LocalDisk1

StorageController X1-SC1

SSD wwn-0x5000cca02b0555dc

Note: ’1’ in X1 represents the X-Brick number.

Note: It is possible to have SC1, SC2 and/or LCC-A, LCC-B, etc. (per X-Brick).

98 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

show-leds
The show-leds command displays the values for the identification and status LEDs.

Output Parameter Description

Entity The type of the entity represented by the LED

Name The name of the entity represented by the LED

Index The index of the entity represented by the LED

Identify-Beacon The identification LED status ('On', ’Blinking’ or 'Off')

Status-Beacon The status LED status ('On', ’Blinking’ or 'Off')

Using the CLI to Activate the Identification LEDS 99


EMC CONFIDENTIAL
Using LEDs to Identify Hardware Components

100 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

APPENDIX D
Priority FA

This section provides instructions for shipping failed hardware parts to EMC for Failure
Analysis (FA).
When Failure Analysis should be performed, the failed parts should be shipped to EMC via
FedEx.

To send a failed hardware component to EMC for analysis:


1. Use one of the following shipping addresses:
• For returns inside the USA, Canada and Mexico:
EMC Corporation
111 Constitution Blvd
Franklin, MA 02038, USA
Attn: Bob Pontes, Tel: 508-435-1000
• For all other returns:
EMC Information Systems International
C/O WiseTek Solutions Ltd.
IDA Business and Technology Park
Carrigtwohill, Co. Cork, Ireland
Attn: Daniel O’ Leary, Tel: +353-21 4945888
2. Set the “Priority FA” flag in the CSI debrief and add in the debrief “Please route to
SBMT 3rd party, 50 Franklin, SLOC ST22”.
This step is critical to the FA process, and if not done correctly it will hinder the
product group’s ability to root cause failures.
3. Update the Priority FA ticket with the AWB# for tracking purposes and reply all with all
available tracking numbers (FedEx and Priority tag).

Priority FA 101
EMC CONFIDENTIAL
Priority FA

102 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL

APPENDIX E
Manually Replacing Storage Controllers

This section provides procedures for manually replacing defective Storage Controllers
(without the use of the Technician Advisor Utility).


This manual installation should only be performed in situations where the Technician
Advisor Utility cannot be used.

This section includes the following topic:


 Replacing a Storage Controller Manually ............................................................... 104

Manually Replacing Storage Controllers 103


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

Replacing a Storage Controller Manually



The manual Storage Controller replacement procedure should be performed following a
Service Request (SR) determined by XtremIO Global Technical Support.


If RecoverPoint is connected to an XtremIO cluster, notify the customer to pause the
activity of Consistency Groups that are configured to replicate with the cluster, using
RecoverPoint native replication, during this FRU procedure.
If the customer requires assistance to pause in RecoverPoint, contact RecoverPoint Global
Tech Support.
If the customer is unable to perform this operation, do not perform this FRU procedure and
contact XtremIO Global Tech Support before taking any further action.
For further details, provide the customer with EMC KB# 479972
(https://support.emc.com/kb/479972).

Physically Locating the Defective Storage Controller (Using LEDs)

To activate the Storage Controller identification LED, using the CLI:


1. Log in to the XMS CLI as tech.
2. Enter the control-led CLI command to locate the defective Storage Controller.
For example, you can use the following command:
control-led cluster-id="Cluster_One"
entity="StorageController" led-mode="on" object-id-list=[1]
This will light the LED on Storage Controller 1 situated on Cluster_One.

Note: For further details on using LEDs to identify components, refer to Appendix C.

To activate the Storage Controller identification LED, using the GUI:


 For instructions on using LEDs to identify components, refer to Appendix C.


Before proceeding to replace the defective Storage Controller, contact XtremIO Global Tech
Support for guidance and directions. No further action should be taken without explicit
direction from XtremIO Global Tech Support.

104 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

Replacing the Defective Storage Controller


Specific clusters are compatible with specific Storage Controllers, as listed below.
 On clusters with the following PSNT P/N:
• PSNT P/N 900-586-002 (10TB X-Brick type)
You can only install a Storage Controller with the following P/Ns:
• SC: P/N 100-586-007-xx
• SC: P/N 100-586-017-xx
 On clusters with one of the following PSNT P/Ns:
• PSNT P/N 900-586-003 (20TB X-Brick type - Encryption Capable)
• PSNT P/N 900-586-004 (10TB X-Brick type - Encryption Capable)
• PSNT P/N 900-586-005 (5TB X-Brick type - Encryption Capable)
You can only install a Storage Controller with the following P/Ns:
• SC: P/N 100-586-017-xx
• SC: P/N 100-586-018-xx
 On clusters with the following PSNT P/N:
• PSNT P/N 900-586-006 (40TB X-Brick type - Encryption Capable)
You can only install a Storage Controller with the following P/N:
• SC: P/N 100-586-025-xx


Do not remove the defective Storage Controller until the new Storage Controller is
configured by XtremIO Global Tech Support and is ready to take over.

To remove the defective Storage Controller:


1. Log in to the XMS CLI as tech.
2. XtremIO Global Tech Support will deactivate the defective Storage Controller.
3. Verify that the deactivation process is complete, by running the following command:
show-storage-controllers cluster-id ="<cluster name>"
Verify that the value of the Enabled-State output parameter is user_disabled.
4. If necessary, from the rear side of the Storage Controller that is adjacent to the
component you are replacing, tilt the cable management bracket's tray (up/down) to
gain better access. Simultaneously pull the latches on the left and right sides of the
cable management bracket, and then push the tray either up or down.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

Replacing a Storage Controller Manually 105


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

5. Disconnect all cables from the back of the Storage Controller.

Note: Make sure that all cables are clearly labeled before disconnecting them from the
Storage Controllers. Do not proceed with the replacement procedure until all cables
that are connected to the Storage Controller are labeled.

Note: The disconnected cables can remain fastened to the cable management bracket
during the Storage Controller replacement procedure.

6. If required, release the cables from the cable tray of the cable management bracket
(mounted on the rear side of the Storage Controller) by releasing its cable straps.

7. Pull the tabs on both sides of the cable management bracket to release the bracket
from the Storage Controller’s inner rail.

106 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

8. Pull the cable management bracket out and remove it from the Storage Controller.

9. Remove the bezel that covers the front of the server as follows:
a. If the bezel is locked, unlock the bezel with the provided key.
b. Simultaneously press the tabs on both sides of the bezel to release it from its
latches, then pull the bezel off the component.

10. Remove the stabilizing screw behind the latch bracket on each side.

Note: A JIS screwdriver may be required if the rails are from an older version.

Replacing a Storage Controller Manually 107


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

11. If a shipping bracket is installed directly above or below the server, remove it to
prevent damage to the foam padding.
12. Pull the server forward until it locks in place, then, slide the blue disconnect tabs
forward to release the inner rails from the slide rails.

13. Remove each inner rail as follows:


a. On the middle of the inner rail, push in and hold the metal latch.
b. Push the rail forward to release the connection studs from the small end of the rail
notches.
c. When the connections studs are in the large end of the rail notches, release the
metal latch.
d. Pull the inner rails away from the server.

Note: After the Storage Controller is successfully replaced, send the defective Storage
Controller to EMC for Priority Failure Analysis (Priority FA). Refer to Appendix D for the
procedure details.


Execute the following procedure to install the new Storage Controller only when requested
by XtremIO Global Tech Support.

To install the new Storage Controller:


1. Attach an inner rail to each side of the server as follows:
a. Align the large end of the rail notches on the inner rail with the connection studs on
the side of the server.
b. Push the flat side of the inner rail onto the connection studs.
c. Slide the inner rail backwards along the server until the studs fit securely into the
small end of the rail notches.
An audible click indicates that the rail is secure.
2. From the front of the cabinet, align the inner rails that are attached to the server with
the channels on the inside of the slide rails.

108 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

3. Slide the server into the slide rails and push the server into the cabinet.
An audible click indicates that the slide rails are engaged and locked.
4. On the outside of each rail assembly, slide the blue disconnect tab forward to unlock
the server, and push the server completely into the cabinet.

5. If you removed a shipping bracket directly above or below the server, reinstall it.
6. To further secure the rail assembly and server in the cabinet, insert and tighten a small
stabilizer screw directly behind each bezel latch.
7. From the rear side of the Storage Controller, align the rails of the cable management
bracket with the server's inner rails.

Replacing a Storage Controller Manually 109


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

8. Insert the rails of the cable management bracket onto the inner rails of the Storage
Controller.

9. Push to slide in the cable management bracket until an audible click is heard. This
indicates that the cable management bracket and the Storage Controller rails are
engaged and locked.
10. Tilt the cable tray down by simultaneously pulling both latches, on the left and right
sides of the cable management bracket, and then pushing the tray downwards.

Note: If there are two Storage Controllers adjacent to each other, first tilt the cable
management bracket's tray furthest from the component being replaced and then tilt
the tray of the other Storage Controller.

110 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

11. Connect the MGMT network cable to the Storage Controller’s " 1" port (leftmost
port), and connect the InfiniBand, SAS, LAN and COM cables.

Note: Leave the FC/iSCSI cables disconnected until you are instructed to connect
them.

2
1

Note: Make sure that the InfiniBand, SAS, LAN and COM cables are properly
connected, before connecting the two power cables to the Storage Controller, and
powering on the Storage Controller.

12. Connect the two power cables to the Storage Controller.


13. Upon receiving the instruction from XtremIO Global Tech Support, press the Power
button to power up the Storage Controller.

Configuring the Replaced Storage Controller



To configure the new Storage Controller, contact XtremIO Global Tech Support.

Fastening the Storage Controller Cables to the Cable Management Bracket

Note: If the cables are properly fastened to the cable management bracket, ignore steps 1
and 2, and proceed to step 3.

To fasten the Storage Controller cables to the cable management bracket:


1. Place the Storage Controller cables in the tray of the cable management bracket and
route them to the left and right of the tray according to their direction towards the
sides of the rack.

Replacing a Storage Controller Manually 111


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

2. Fasten the cable straps.

3. Lift the cable tray, while pulling the latches (on the left and right sides of the bracket)
until the latches click in.

Note: Make sure that the latches are engaged and the tray is locked in position.

The figure below shows an example of the installed cable management bracket, with
the cables strapped to the tray.

112 EMC XtremIO Storage Array FRU Replacement Procedures


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

Installing the Bezel

To install the front bezel:


1. Pushing on the ends (not the middle) of the bezel, press the bezel onto the latch
brackets until it snaps into place.
2. Lock the bezel with the provided key and store the key in a secure place.

Post Configuration Procedures


After the Storage Controller is successfully replaced and configured, generate a log bundle
and re-install the Storage Controller’s front bezel:

To generate and upload a log bundle:


1. Generate and collect the log bundle (refer to “Generating and Collecting the Bundle”
on page 94).
2. Upload the log bundle to FTP (refer to “Uploading the Bundle Collection” on page 94).

Replacing a Storage Controller Manually 113


EMC CONFIDENTIAL
Manually Replacing Storage Controllers

Removing the Old Storage Controller Disks


If the customer has a Disk Retention Agreement with EMC, you should remove the
following disks (as shown below) from the replaced Storage Controller and give them to
the customer:
 2 x HDDs
 2 x SSDs

HDDs

SSDs

To remove each of the Storage Controller’s disk assemblies:


1. Press the green button (A) on the left side of the disk drive assembly to unlock the
module’s lever.

2. Pull the lever open and slide the disk drive assembly (B) from the server.

Note: Once all four disks have been removed, the Storage Controller can be shipped back
to EMC.

Note: It is not always possible to perform Fault Analysis on Storage Controllers that have
been returned to EMC without the Storage Controller’s disks.

114 EMC XtremIO Storage Array FRU Replacement Procedures

You might also like