Professional Documents
Culture Documents
Vxrail Appliance Operations Guide
Vxrail Appliance Operations Guide
Vxrail Appliance Operations Guide
OPERATIONS GUIDE
A Hyper-Converged Infrastructure Appliance from Dell
EMC® and VMware®
ABSTRACT
This document describes how to perform day-to-day operations on a
VxRail Appliance environment after the system has been installed and
configured. The target audience for this document includes customers,
field personnel, and partners who manage and operate a VxRail
Appliance.
November 2017
PART NUMBER: H16788
Audience
The target audience for this document includes customers, field personnel, and partners who
manage and operate a VxRail Appliance.
vSphere
The VMware vSphere software suite delivers virtualization in a highly available, resilient, on-
demand infrastructure — making it the ideal software foundation for the VxRail Appliance. ESXi
and vCenter Server are core components of vSphere. ESXi is a hypervisor installed on a
physical VxRail server node in the factory and enables a single physical server to host multiple
logical servers or virtual machines (VMs). VMware vCenter server is the management
application for ESXi hosts and VMs.
Customers can use existing eligible vSphere licenses with their VxRail, or the licenses can be
purchased with a VxRail Appliance. This VxRail vSphere license-independent model (also
called “bring your own” or BYO vSphere License model) allows customers to leverage a wide
variety of vSphere licenses they have already purchased.
Several vSphere license editions are supported with VxRail including Enterprise+, Standard,
and ROBO editions. (vSphere Enterprise is also supported, but is no longer available from
VMware). Also supported are vSphere licenses from Horizon bundles or add-ons when the
appliance is dedicated to VDI.
If vSphere licenses need to be purchased, they can be ordered through Dell EMC, the
customer’s preferred VMware channel partner, or from VMware directly. Licenses acquired
through VMware ELA, VMware partners, or Dell EMC receive single-call support from Dell EMC.
vSAN
VxRail Appliances leverage VMware’s vSAN for enterprise-class software-defined storage.
vSAN aggregates the locally attached disks of hosts in a vSphere cluster to create a pool of
distributed shared storage. Capacity is scaled up by adding additional disks to the cluster and
scaled out by adding additional VxRail nodes. vSAN is fully integrated with vSphere, and it
works seamlessly with other vSphere features.
vSAN is notable for its efficiency and performance. Built directly into the ESXi hypervisor at the
kernel layer, it has very little impact on CPU utilization (less than 10 percent). vSAN is self-
optimizing and balances allocation based on workload, utilization and resource availability.
vSAN delivers a high performance, flash-optimized, resilient hyper-converged infrastructure
suitable for a variety of workloads. Enterprise-class storage features include:
▪ Efficient data-reduction technology, including deduplication and compression as well as
erasure coding
▪ QoS policies to control workload consumption based on user-defined limits
VxRail Manager
VxRail Manager provides monitoring and lifecycle management for physical infrastructure.
VxRail Manager streamlines deployment, configuration. VxRail Manager also integrates Dell
EMC services and support to help customers get the most value from the VxRail Appliance.
Procedure
1. Click HEALTH > Logical.
The default view shows cluster-level utilization levels for storage IOPS, CPU usage, and
memory. The view is color-coded to enable you to identify resource utilization:
• Red: More than 85% used
• Yellow: 75 to 85% used
• Green: Less than 75% used
2. Click on a node name to view information about that node.
3. Click the components of a node to view more information about the capacity disk (HDD,
SSD), cache disk (SSD), ESXi disk or NIC.
Procedure
1. In VxRail Manager, click HEALTH to view the overall health of the nodes that make up the
cluster.
2. Click a node name or the picture of node to view more information about that node.
Front and back views of the appliance are displayed. An example is shown below.
3. If a status icon is displayed next to an appliance, click the appliance or the magnifying glass
icon to see more information.
4. Click any appliance component to view more details.
• Click a disk in the Front View or Back View to see disk status and information.
• Click a node in the Back View to see compute and network information.
• Click a power supply in the Back View to see power supply status and information.
• Click the Back View to see compute information.
• Click a NIC in the Back View (E, S, P, and V models) to see network information.
5. If a status icon is displayed for a component, click it to view event details in the Health
window.
6. Use your browser's back button to return to the appliance view on the Health > Physical
tab.
Starting up VxRail
Verify that the top-of-rack (TOR) switch is powered on and connected to the network.
Power on each node manually by pressing its power button.
Wait several minutes for all service VMs to be powered on automatically. The locator LED of
Node1 is turned off automatically when VxRail Manager starts.
Using VCenter, manually restart all client VMs.
Note: If vCenter and VxRail Manager VMs are not available in 10 minutes, use the vSphere
client to log into host 1 to check the status of these VMs.
Procedure
In VxRail Manager navigation pane, click CONFIG and open the Market tab
Scroll the Available Applications for this Cluster list of all applications available for your
VxRail Appliance.
• The list includes a description and version number for each application.
• Filter the application list using the filter selector at the top of the list, if desired.
• Click Learn more if you want to view information about the application.
• The application page opens in a separate browser tab.
In VxRail Manager, install the application on your appliance:
• Click Install to install an application directly.
• Click Download to navigate to an external web page where you can download and
install the application.
Multiple instances of an application can be installed. To view and manage instances, do the
following:
Figure 10. Each VxRail Cluster Managed by VxRail Manager deployed vCenter Server
A default storage policy is configured when the system is initialized. You can use the vSphere
web client to configure new policies based on a set of rules. The table below summarizes vSAN
policy rules.
Failures to tolerate Defines number of host, disk or network failures that a storage object can
(FTT) tolerate. When Failure Tolerance Method is set to Mirroring to tolerate “n”
failures; “n+1” copies of the object are created and “2n+1” hosts (or fault
domains) are required.
When the Failure Tolerance Method is Erasure Coding and Failure to
Tolerate=1, RAID5 3+1 is used and 4 hosts (or fault domains) are required. If
Failure to Tolerate =2, RAID6 4+2 is used and 6 hosts (or fault domains) are
Disable object vSAN uses end-to-end checksum to validate on read that the data is the same
checksum as what was written. If checksum verification fails, data is read from an
alternate copy and data integrity is restored by overwriting the incorrect data
with the correct data. Checksum calculation and error-correction are
performed as background operations and are transparent to the virtual
machine. The default setting for all objects in the cluster is “No”, which means
checksum is enabled.
As a best practice, keep software checksum enabled. Software checksum can
be disabled if an application already provides a data integrity mechanism.
Disabling checksum is an immediate operation. Re-enabling checksum
requires a full data copy to apply the checksum, which can be resource- and
time intensive.
Object space Specifies a percentage of the logical size of a storage object that is reserved
reservation when a VM is provisioned. The default value is 0%, which results in a thin
provisioned volume. The maximum is 100%, which results in a thick volume.
The value should be set either to 0% or 100% when using RAID-5/6 in
combination with deduplication and compression.
Number of disk Establishes the minimum number of capacity devices used for striping a
stripes per object replica of a storage object. A value higher than 1 may result in better
performance, but may consume more system resources. The default value is
1. The minimum value is 1. The maximum value is 12.
vSAN may decide that an object needs to be striped across multiple disks
without any stripe-width policy requirement. While the reasons for this vary, it
typically occurs when a virtual machine disk (VMDK) is too large to fit on a
single physical drive. If a specific stripe width is required, it should not exceed
the number of disks available to the cluster.
Flash read cache Refers to flash capacity reserved as read cache for a virtual machine object
reservation and applies only to hybrid configurations.
By default, vSAN dynamically allocates read cache to storage objects based
on demand. While there is typically no need to change the default 0 value for
this parameter, a small increase in the read cache for a VM can sometimes
significantly improve performance. Use this parameter with caution to avoid
wasting resources or taking resources from other VMs.
The maximum value is 100 percent.
Failure tolerance Specifies whether the data protection method is Mirroring or Erasure Coding.
method (FTM) RAID-1 (Mirroring) provides better performance and consumes less memory
and fewer network resources, but uses more disk space. RAID 5/6 (Erasure
Coding) provides more usable capacity, but consumes more CPU and
IOPS limit for object Establishes Quality of Service (QoS) for an object by defining an upper limit
on the number of IOPS that a VM/VMDK can perform. The default behavior is
that all workloads are treated the same and IOs are not limited.
Use this rule to apply limits to less important workloads so that they do not
adversely impact more important workloads. Often used to address the “noisy-
neighbors” issue to keep them from impacting performance of more important
applications. See the section on IOPS limit for Object for more information.
Force provisioning Allows an object to be provisioned even though there are not enough
resources available to meet the policy. The default of No is appropriate for
most production environments. If set to Yes, an object can be created even if
there are not enough resources available to satisfy other policy rules. In this
case, the object appears as Not Compliant.
This changes the default for any new VMs that are created on that datastore, but does not affect
the policy of any VMs created previously.
From the vSphere web client, navigate to Policies and Profiles > VM Storage Policies,
and click on Create a new VM storage policy.
• Select a vCenter Server from the dropdown.
• Type a name and a description for the storage policy and click Next.
• On the Rule-Set 1 window, define the first rule set.
Select VSAN from the Rules based on data services drop-box.
The page expands to show capabilities reported by the vSAN datastore.
Add a rule and supply appropriate values.
• Make sure that the values you provide are within the range of values advertised by
storage capabilities of the vSAN datastore.
• Review the storage consumption model to understand how the rules specified impact the
capacity required.
• (Optional) Add tag-based capabilities.
• (Optional) Add another rule set.
8. Click Next and review the list of datastores that match this policy. Click Finish when done.
• To be eligible, a datastore must satisfy at least one rule set and all rules within this set.
Verify that the vSAN datastore meets the requirements set in the storage policy and that
it appears on the list of compatible datastores.
While the new policy is being applied, the compliancy status may show as Noncompliant until
the storage object is reconfigured to match the new policy.
Availability is not impacted while the storage object is being reconfigured. For example, if the
original policy used a FTM of Mirroring and the new policy has a FTM of Erasure Coding, a new
replica of the storage object is created using erasure coding before deleting the original copy
that used mirroring. Therefore, the storage capacity consumption may increase while the
storage is being reconfigured to comply with the new policy.
Renaming a VM folder
When the VxRail cluster is configured onsite, the default VM folder is assigned a default name
in the vCenter Server inventory. Use the procedure below to change the name to conform to
local naming conventions. See the SolVe Desktop procedure for VxRail available on
support.emc.com for more detail.
Mirroring
Mirroring is supported with both hybrid and All-Flash VxRail models. If FTM=mirroring and
FTT=1, two replicas of data are maintained. If FTM=mirroring and FTT=2, there are three
replicas of data. In addition, vSAN uses the concept of a witness. When determining if a
component remains online after a failure, more than 50% of the components that make up a
storage object must be available.
Witnesses are components that contain only metadata. Their purpose is to serve as tiebreakers
when determining if a quorum of components is online in the cluster. If more than 50% of the
components that make up a virtual machine’s storage object are available after a failure, the
object remains online. If less than 50% of the components of an object are available across all
the nodes in a VSAN cluster, that object is no longer available. Witness prevents “split brain
syndrome” in a vSAN Cluster.
The figure below illustrates a four-node cluster and a virtual machine with a FTM=mirroring and
a FTT=1. Note the two replicas and the witness.
Erasure coding
Erasure coding is an alternative failure tolerance method. Erasure codes provides up to 50
percent more usable capacity than RAID-1 mirroring. Erasure coding is supported on All-Flash
models only.
Erasure coding breaks up data into chunks and distributes them across the nodes in the vSAN
cluster. It provides redundancy by using parity. Data blocks are grouped in sets of n, and for
each set of n data blocks, a set of p parity blocks exists. Together, these sets of (n + p) blocks
make up a stripe. If a drive containing a data block fails, the surviving data blocks (n + p) is
sufficient to recover the data in the stripe.
In VxRail clusters, the data and parity blocks for a single stripe are placed on different ESXi
hosts in a cluster, providing failure tolerance for each stripe. Stripes do not follow a one-to-one
distribution model. It is not a situation where the set of n data blocks sits on one host, and the
parity set sits on another. Rather, the algorithm distributes individual blocks from the parity set
among the ESXi hosts in the cluster.
Erasure coding provides single-parity data protection (RAID-5) that can tolerate one failure
(FTT=1) and double-parity data protection (RAID-6) that can tolerate two failures (FTT=2). The
figures below illustrate the implementations. A single-parity stripe uses three data blocks and
one parity block (3+1), and it requires a minimum of four hosts or four fault domains to ensure
availability in case one of the hosts or disks fails. It represents a 30 percent storage savings
over RAID-1 mirroring. RAID-5 (FTT=1) requires a minimum of four nodes.
The figure below compare the usable capacity for mirroring and erasure-code fault tolerance
method. As you can see erasure coding can increase usable capacity up to 50 percent
compared to mirroring.
With erasure coding, in the event of a drive failure, backend activity increases. During a rebuild
operation, a single read from the VM requires multiple reads from disk and additional network
traffic, since the surviving drives in a stripe must be read to calculate the data of the failed
member. This additional IO is the primary reason why only all-flash VxRail configurations use
erasure coding. The rationale is that the speed of flash disks compensate for the additional
overhead.
Note: with VxRail 4.5, rebuild rate is configurable and this activity can be throttled to minimize
the impact to other workloads that run on the cluster. However by throttling resynchronization,
the time that the data is exposed in the event of another drive failure increases.
Note the number of nodes required for compliance and the recommended number of nodes that
allows data to rebuilt and maintain compliancy with the SPBM policy.
When fault domains are configured, vSAN applies the storage policy to the entire domain,
instead of the individual hosts. vSAN adjusts the placement of storage object to make them
compliant with the storage policy.
The de-duplication algorithm is applied at the disk-group level and results in a single copy of
each unique 4K block per disk group. While duplicated data blocks may exist across multiple
disk groups, limiting the de-duplication domain to a disk group does not require a global lookup
table. This minimizes network overhead and CPU utilization, making VxRail deduplication very
efficient.
Considerations
While the VxRail deduplication method is very efficient, some CPU resources are used to
compute the segment fingerprints or hash keys, and additional IO operations are needed to
perform lookups on the segment index tables.
vSAN computes the fingerprints and looks for duplicated segments only when the data is being
de-staged from the cache to the capacity tier. Under normal operations, VM writes to the write
buffer in the cache SSD should incur no latency impact.
Environments that benefit most for deduplication are read intensive environments with highly
compressible data. Use the figure below to determine the value of deduplication for an
application environment.
Consult with your Dell EMC or VMware VxRail specialist, who can model your workload against
a specific system configuration to help you decide if the benefit of deduplication offsets the
resource requirements for your application workload.
Procedure
This is an online operation and does not require virtual machine migration or DRS. The time
required for this operation depends on the number of hosts in the cluster and the amount of
data. You can monitor the progress on the Tasks and Events tab. If the system uses
deduplication and compression, the best practice is to enable it at the time the system is initially
set up.
Navigate to the Virtual SAN host cluster in the vSphere Web Client. Click the Configure tab.
Under vSAN, select General.
In the vSAN is turned ON pane, click the Edit button.
Configure deduplication and compression.
The figure below shows the vSAN setting for enabling compression.
Procedure
Navigate to the vSAN host cluster in the vSphere Web Client and click the Configure tab.
Under vSAN, select General.
In the vSAN is turned ON pane, click the Edit button.
Set the disk claiming mode to Manual.
Set deduplication and compression to Disabled.
Click OK to save your configuration changes.
Considerations
VMware offers two encryption methods for VxRail. VMs can be encrypted using vSphere
encryption, or the entire cluster can be encrypted using vSAN encryption. Only one encryption
method can be used on a cluster. The appropriate method depends on protection concerns and
its impact on deduplication and compression.
vSAN encryption provides protection for data at rest and is effective in addressing concerns with
media theft. Because data is encrypted after deduplication and compression, it gets the full
benefit of these data services. vSphere VM-level encryption protects data while it is in motion
(over the wire) and is designed to provide protection from a rogue administrator. With VM-level
encryption, the data is encrypted before being stored on the vSAN datastore. Encrypted data
typically does not benefit from deduplication and compression, therefore VM-level encryption
benefits little from vSAN deduplication and compression.
The options used depend on the KMS and local security policies. In the example dialog
below, Root CA Certificates are used.
Prerequisites
Domain of trust is set up.
Procedure
Using the vSphere web client, navigate to the vSAN host cluster and click Configure.
Under vSAN, select General.
In the vSAN pane, click Edit.
On the Edit vSAN settings dialog, check Encryption and select a KMS cluster. Click OK.
Physical network
Physical network considerations for VxRail are no different from those of any enterprise IT
infrastructure: availability, performance, and extensibility. Generally, VxRail appliances are
delivered ready to deploy and attach to any 10GbE network infrastructure. A 10GbE dual-
switch, top-of-the-rack (ToR) network configuration is recommended for most environments.
The topology should be designed to eliminate all single points of failure at the connection level,
the uplink level, and within the switch itself.
The figure below shows typical network connectivity using two switches for redundancy. Single-
switch implementations are also supported.
VxRail VLANs
A virtual distributed switch (VDS) connects the physical switch ports to the logical components
in a VxRail Appliance. The VDS is configured as part of the system initialization. Port groups are
created spanning all nodes in the cluster. Network traffic is isolated at the port-group level using
switch-based VLAN technology and vSphere Network IO Control (NetIOC). Network traffic is
segregated using switch-based VLAN technology with a recommendation of a minimum of four
VLANs for the four types of network traffic in a VxRail cluster:
▪ Management. Management traffic is use for connecting to VMware vCenter web client,
VxRail Manager, and other management interfaces and for communications between the
management components and the ESXi nodes in the cluster. Either the default VLAN or a
specific management VLAN is used for management traffic.
▪ vSAN. Data access for read and write activity as well as for optimization and data rebuild is
performed over the vSAN network. Low network latency is critical for this traffic and a
specific VLAN is required to isolate this traffic.
▪ vMotion. VMware vMotion allows virtual-machine mobility between nodes. A separate
VLAN is used to isolate this traffic.
▪ Virtual Machine. Users access virtual machines and the services provided over the VM
networks. At least one VM VLAN is configured when the system is initially configured, and
others may be defined as required.
The tables below show how the NIOCs are configured for VxRail. Do not modify these values
since they have been set for optimum availability and performance.
Enabling multicast
By default, a switch in a VxRail network floods multicast traffic in the broadcast domain or
VLAN. To mitigate this problem, vSphere services use IPv4 multicast (IGMP
Querying/Snooping) to prune multicast traffic so that only the nodes that need the traffic receive
the traffic, thereby improving performance.
Routers are registered to receive specific multicast traffic only. IPv4 also responds to topology-
change notifications. Without IGMP Querying/Snooping, multicast traffic is treated like a
broadcast transmission, which forwards packets to all ports on the network. IPv6 multicast
functions similarly, sending multicast traffic only to multicast members. IPv6 multicast needs to
be enabled on all ports used by the VxRail nodes. IPv4 multicast is required for vSAN. For
larger environments that span multiple switches, it is important that IPv4 and IPv6 multicast
traffic is passed between them. Beginning with VxRail 4.5 and vSAN 6.6, vSAN uses Unicast
rather than multicast.
Considerations
When considering migration strategies, data and storage are often the first topic that comes to
mind. There are, however, other considerations as well. If the workload can be taken offline for
a period of time (cold migration), a simple shutdown, backup and restore strategy work fines. If
minimal downtime is required (warm migration), use tools like RecoverPoint for VM to replicate
the workload and then perform a minimally disruptive switchover. If the workload cannot be
taken offline (hot migration), use vSphere vMotion to keep the workload online during the
migration.
While this document does not contain an exhaustive list of considerations, take note of the
following:
▪ Physical and logical network connectivity. VLANS may need to be configured on the
physical switches connecting the target VxRail, and port groups for the application
environment may need to be configured on the virtual distributed switch (VDS) in vCenter.
▪ Access to network services and related applications. Consider all the services that are used
by the application virtual machines including DNS and NTP, and how these may change in
the new environment. If there are dependencies on other applications or services, how the
migration impacts these should be considered as well.
▪ Snapshots will need to be configured on the target VxRail. If the snapshots are used on the
source, you must reconsolidate and make the snapshots persistent.
▪ Backup processes and procedure. VxRail supports most backup technologies, but some
reconfiguration may be necessary after the migration. As part of migration planning,
reconsider and update current strategies.
Figure 28. Migration using vMotion – Change both compute and storage
Additional dialogs are displayed as the target cluster is selected and verified. vSphere copies
the files – VMX, NVRAM, VMDKs and so on – from the source storage to the VxRail vSAN
datastore. Potentially large amounts of data need to be copied. How long it takes to complete
the migration is determined by the size of the dataset and the speed of the storage. In most
cases, it is best to perform the migration a few VMs at a time. Using storage vMotion does not
require any downtime and the virtual machines stay online as they are being migrated.
Figure 29. Migration for source vSphere environment and recovery on VxRail
Once migration is complete, you can continue to use RecoverPoint for VM as part of your
disaster recovery strategy. RecoverPoint for VM supports flexible configuration options including
local and remote protection, and asynchronous and continuous replication.
Figure 30. iSCSI provides data mobility into and between VxRail environments
iSCSI is a standard part of a vSphere environment. A software adapter using the NIC on an
ESXi host is configured as an initiator, and targets on an external storage system present LUNs
to the initiators. The external LUNs are typically used as VMFS datastores. iSCSI configuration
is performed using the vSphere web client.
Procedure
Create a port group on the Distributed Virtual Switch
Create a VMkernal Network Adapter and associate it with the port group and assign an IP
address.
From the vCenter Manage Storage Adapters view, use the Add iSCSI Software Adapter
dialog to create the software adapter. This step binds the iSCSI software adapter with the
VMkernel adapter.
Once this is complete, iSCSI targets and LUNs can be discovered, used to create new
datastores and mapped to the hosts in the cluster. Refer to VMware documentation for more
details.
Using vMotion, storage object can be easily moved between the NFS filesystem and the VSAN
datastore.
Procedure
NFS is a standard vSphere feature and is configured using the vCenter web client.
In the Hosts and Clusters view under Related Objects and the New Datastore dialog, select
NFS as the datastore type.
Specify the NFS version, name of the datastore, IP address or hostname of the NFS server
that exported the filesystem, and the hosts that will mount it.
The NFS filesystem will appear like the vSAN datastore.
Conclusion
Different options for migrating virtual machine and associated data onto a VxRail cluster are
available. Options include backup and restore, replication where the VM is replicated and
quickly restarted, and online migration using vSphere vMotion. Keep in mind there may be
considerations other than the VMs and data that need to be part of the planning process.
Figure 32. VxRail Manager HEALTH- Logical view showing cluster level capacity consumption
In this view, resource consumption is color coded. Alerts are generated when CPU and memory
consumption exceeds 70%. For capacity planning purposes, observe both the cluster totals and
each individual node. If the consumption for individual nodes exceeds 70%, but the cluster
average is below 70%, using vMotion and/or DRS may help in rebalance the workload across
nodes in the cluster.
The Logical view displays total storage capacity and used capacity. The first consideration is to
have enough available capacity for near-term application requirements. As a general practice,
maintaining at least 20% free capacity provides optimal performance. When disk capacity
utilization exceeds 80% on any capacity drive, vSAN invokes an automatic rebalancing process.
This increases the backend workload that could potentially impact application performance.
Maintaining at least 20% available capacity eliminates the need for this rebalancing. This
additional capacity is sometimes referred to as slack space.
Note the Deduplication and Compression savings and ratio. Deduplication and compression
savings vary but generally most datasets experience between 1.5:1 and 2:1 capacity savings.
For longer-term capacity planning and analysis, the performance view below shows storage
consumption trends. The dropdown allows you to select a time range of up to one year.
Understanding consumption trends helps predict when more storage capacity will be required.
You can add capacity to a VxRail cluster by either adding drives to existing nodes (if drive slots
are available) or by adding nodes to a cluster.
CPU consumption is reported in MHz. The total MHz available is calculated by multiplying the
number of CPU cores by the speed of the processors.
Consumption is often bursty. As a general rule, CPU utilization should be under 70% to handle
workload spikes. When CPU requirements exceed available resources, application performance
may be impacted. Monitor CPU consumption at both the cluster and host levels. Scenarios
where host-level CPU utilization is high but overall cluster level utilization is low may benefit by
using vMotion and/or DRS to rebalance the compute workload.
Memory consumption varies. Advanced vSphere ESXi memory management makes it possible
to assign more memory to VMs than is physically available in the hosts. When a VM is
configured, the memory size is specified. The full memory space, however, is not actually
allocated. The ESXi hypervisor only allocates the amount of memory that is used, and a VM
does not normally need its full allocation of memory at all times. For example, a VM allocated
with 4GB might only need the full 4GB of memory for 15 minutes a day; otherwise it may only
use .5GB. The ESXi hypervisor allocates .5GB of memory to the VM, increases it to 4GB only
when needed, and then reclaims memory afterward for use elsewhere. In addition to memory
allocation on demand, vSphere uses other memory-management techniques, including page
sharing, memory ballooning, and compression.
Memory capacity management balances the risk of running out of memory and the potential
performance impact with the inefficiency of underutilizing the memory configured. Because of
the bursty nature of memory utilization, memory utilization should average less than 70%.
More details about resource consumption are available in the Advanced level view. In the
navigation tree, select the object to monitor, the chart options, and the view of interest. The
Chart Options allows you to specify the timespan and other options. Note that a shorter
timespan exposes details that are lost when the data is averaged over longer timespans. The
example below shows the CPU utilization for the selected ESXi host in real-time.
Once the service has been enabled, vSAN performance data is collected. You can display
graphs can in the Performance > Monitor view when a cluster, host or VM is selected in the
vCenter server inventory.
The specific IOPS and throughput values are indicators of the system workload and provide little
insight by themselves. Instead, monitor values over time to understand what is normal for a
system and to identify trends that may indicate a need for additional resources.
IOPS Measure of input/output operations per IOPS for a system is a function of the
second consumed by all vSAN clients. workload characteristics and the system
configuration, including number of nodes,
Read/write activity is maintained and
number of drives, data protection type,
reported separately.
and other data services. High or low
Size of IO operations vary from a few values are neither good nor bad, but
bytes to a few megabytes. indicate relative system utilization.
Use this metric to understand normal
workload activity and to identify when
workloads deviate from normal.
Congestion vSAN congestion occurs when the IO Sustained congestion is not usual. In
rate on the backend (lower layers) most cases, it should be near zero. It is
cannot keep up with the IO rate on the possible to see congestion during bursts
front-end. For more information on of workload activity, and this can impact
congestion, see this KB article: response time. If the system consistently
https://kb.vmware.com. shows high levels of congestion, further
analysis is required and may indicate
that the system needs additional
resources.
Outstanding When a virtual machine requests a read While vSAN is designed to handle some
IO or write operation, the request is sent to number of outstanding IOs, outstanding
the storage device. Until the request IOs impact response times.
completes, it is considered an
outstanding IO. Analogous to queuing.
The table below shows key Virtual SAN - Backend metrics and provides guidelines on how to
interpret them. These metrics show IO from vSAN to the disk groups and physical disk. The
backend workload includes the overhead associated with data protection and added workload
for recovery and resynchronization.
When viewed from a cluster level, these metrics can be used similarly to front-end metric for
understanding the normal workload. At the host level, more granular information at the disk
group and drive level is available that can be used for troubleshooting and to identify workload
imbalance.
For a complete list of metrics, see https://kb.vmware.com/.
IOPS Number of input/output operations per IOPS for a system is a function of the
second. workload characteristics and the system
configuration, including the number of
Separate read/write metrics are
nodes, number of drives, data protection
maintained.
type, and other data services. High or
Additional metrics include Recovery low values are neither good nor bad,
Write IOPS and Resync Read IOPS rather an indicator of the relative
(*vSAN 6.6). busyness of the system.
Recovery writes occur during the resync Recovery writes require further
of components that were impacted by a investigation.
failure.
Resync Reads may be the result of
Resync Read IOPS are the result of normal maintenance operation and do
recovery operation or maintenance such not necessarily indicate a problem.
as policy change, maintenance
mode/evacuation, rebalancing, and so
on.
Throughput Throughput is a measure of the data rate Throughput for a system is a function of
and is calculated as block size times IOPS and block size. High or low values
IOPS. are neither good nor bad. Use this
metric, along with IOPS as indicator of
Separate read/write throughput metrics
the relative busyness of the system.
are maintained.
Recovery Writes require further
Additional metrics include Recovery
investigation.
Write and Resync Read Throughput
(*vSAN 6.6). Resync Read may be the result of
normal maintenance operation and not
Recovery writes occur during the resync
necessary the indication of a problem.
of components impacted by a failure.
Resync reads are the result of recovery
operation or maintenance such as policy
change, maintenance mode/evacuation,
rebalancing, etc.
Congestion vSAN congestion occurs when the IO Sustained congestion is not usual and
rate on the lower layers (backend) typically should be near zero. Congestion
cannot keep up with the IO rate on the may occur during bursts of workload
upper layers. See KB article activity and can impact response time.
https://kb.vmware.com for more
If the system consistently shows high
information.
levels of congestion, further analysis is
required and may indicate that the
system needs additional resources.
Outstanding When a virtual machine requests a read While vSAN is designed to handle some
IO or write operation, the request is sent to number of outstanding IOs, outstanding
the storage device. Until this request is IOs impact response times.
complete, it is considered an outstanding
IOs. Analogous to queuing
The above tables explain cluster level metrics. Host level metrics are more granular and include
details down to the disk group and physical disk level with more metrics available for analysis.
rvc administrator@vsphere.local@localhost
▪ For Customer Deployed vCenter that run on Windows use the following command:
%PROGRAMFILES%\VMware\vCenter\rvc\rvc.bat
Enter the user password.
Navigate to the vCenter directory using the following command:
cd localhost
Navigate to the vSAN Datacenter using the following command:
cd <VxRail-Datacenter>
cd <VxRail-vSAN-Cluster-name>
vsan.observer . –g /tmp -o -i 60 -m 2
▪ The output bundle will be placed in /tmp unless other location is specified.
▪ The default interval is 60 seconds unless otherwise specified.
▪ The maximum runtime is 2 hours unless otherwise specified.
After the collection completes, send the tar.gz file generated by vSAN Observer in the /tmp
directory to Dell EMC Support.
The analysis of the data collected by vSAN Observer is outside the scope of this document. For
further information on using vSAN Observer reference https://kb.vmware.com/2064240 and
VMware Virtual SAN Diagnostics and Troubleshooting Reference.
VxRail Manager and Dell EMC support are the sole sources for version control and cluster
software updates. Updates are developed by Dell EMC and VMware and tested as a bundle.
The bundle may include updates to one or more component including VxRail Manager, VxRail
Manager deployed VCSA and PSC, VSphere ESXi hypervisor, vSAN, firmware and other
software. The actual components that make up a bundle vary and could be a single software
component for a bug fix or a collection of components for a minor or major code upgrade.
Testing and applying updates as complete bundles ensure version capability and reduces risk.
Updates to both VxRail and VMware software components are applied across all nodes in a
cluster using VxRail Manager. VxRail Clusters that meet the minimal configuration requirements
use a fully automated and non-disruptive process that is initiated and executed entirely from
VxRail Manager.
The figure below summarizes the overall software upgrade workflow.
The VxRail upgrade bundles can be either downloaded from Dell EMC support or downloaded
from the internet directly from VxRail Manager. The upgrade process first performs a readiness
check to verify the bundle is complete, compatible with the current running versions, and to
ensure the system is in healthy state before proceeding with the upgrade. When the upgrade
completes, VxRail manager performs post-checks to validate that the upgrade was performed
successfully.
Some updates may require taking a node offline to complete. These are performed as rolling
updates one node at a time. During these upgrades, the node is put into maintenance mode and
workloads are evacuated to other nodes in the cluster. This is accomplished using vSphere
DRS and vMotion.
Depending on the type of upgrade, the upgrade may be a two-step process where VxRail
Manager is upgraded first followed by the upgrade of other components.
Before performing any software upgrades, consult the VxRail Appliance Software Release
Notes. The SolVe Desktop includes detailed instructions for specific VxRail models and installed
and target software versions.
Procedure
Power on the new appliance.
Once the appliance is powered on, it broadcasts availability and VxRail Manager displays
the available node in the Dashboard view of VxRail Manager. The following figure shows an
example of a newly detected node ready to be added to the cluster.
Figure 40. VxRail Manager Dashboard view showing new node ready to be added to cluster
Figure 42. VxRail Manager Cluster Expansion dialog – Allocate new IP addresses
DNS look-up records must be configured with IP addresses and hostnames before
continuing. Confirm that the DNS records have been configured by clicking the checkbox at
the bottom of the dialog.
Click Validate to continue.
If the validation succeeds, click Expand Cluster to add the new nodes.
Progress is displayed in the VxRail Manager Dashboard view. When complete, the new
nodes can be seen in the Cluster Health view.
Verify that the new nodes were added to the VxRail in the vCenter Hosts and Cluster view.
Note: Any additional network configuration that was not part of the initial VxRail Appliance
configuration must be manually added to this new node.
VxRail appliance can be scaled-up by adding disk to a node if there are available drive slots.
Take note of the restrictions listed below. Consult your Dell EMC systems engineer for specific
details.
▪ Only drives purchased from Dell EMC specifically for VxRail can be added to a VxRail
appliance. Other drive types are not supported and cannot be added to a VxRail Cluster.
▪ Drives placement in the VxRail appliance is important. SSD drives to be used for cache
must be installed in specific slots. This drive slot locations vary for different VxRail models.
▪ Disk group configuration rules must be followed. For example, while there may be available
slots within a node, it may not be possible to add additional drives to the disk group.
▪ A disk group includes one cache drive and a minimum of one capacity drives. The maximum
number of capacity drives per disk group vary for different VxRail models.
▪ The specific SSD drive types used for caching and capacity are different.
▪ While nodes within a VxRail cluster may have different drive configurations, a consistent
drive configuration across all nodes in a VxRail cluster is recommended.
Procedure
The following procedure lists the high-level steps for adding drives to an existing VxRail node.
For more detailed procedures for specific hardware configurations, see the procedures
generated from the SolVe Desktop available from Dell EMC support https://support.emc.com/.
Within VxRail Manager, open the Health, Physical view for the node where you will be
adding the drive and select the back view.
In the Node Information box, click Add Disks.
VxRail Manager attempts to identify any drive that has been added to the system but not yet
configured, and report the drive information. If you have not physically added any drives,
you are asked if the drive type is a capacity or a cache drive as well as the slot number.
When adding cache drives, you also create a new disk group. You must add at least one
capacity drive at the same time.
Unpack the new drives.
Carefully open the shipping carton and remove the drives from the foam carrier, one drive at
a time. Open the anti-static bag containing the drives, remove the drive and place the drive
on top of the bag until ready to insert the drive.
Remove the bezel from the server.
If the bezel has a key lock, unlock the bezel with the key provided. Press the two tabs on
either side of the bezel to release it from its latches and pull the bezel off the latches.
Install the new disks in the suggested slots.
In the VxRail Manager Add Disk dialog, click Continue.
The newly added disk is discovered and details about the drive are displayed. This
discovery process may take several minutes.
If the information is consistent with the disk you inserted, click Continue.
VxRail Manager executes pre-checks on the new disks.
The new disk is configured on the node and added to the VxRail vSAN cluster. This process
can take a few minutes to complete. When the disk addition completes successfully, it
shows that “New disk(s) have been added successfully”.
Click Close.
In the VxRail Manager HEALTH > Logical tab, select the node and scroll down to the ESXi
node. Verify that the host is now reporting the new disks and that there are no errors.
The figure below is an example of this dialog.
Log into vCenter and perform a vSAN health check. Verify that there are no errors.
Procedure
The following procedure lists the high-level steps for replacing a capacity drive.
For more detailed procedures for specific hardware configurations, see the procedures
generated from the SolVe Desktop available from Dell EMC support https://support.emc.com/.
On the VxRail Manager Health page, Physical view, click the failed drive to display the disk
information.
Click HARDWARE REPLACEMENT to initiate the drive replacement.
The automated procedure verifies the system and prepares for drive replacement.
Remove the bezel from the VxRail appliance.
If the bezel has a key lock, unlock the bezel with the key provided. Press the two tabs on
either side of the bezel to release the bezel from its latches and pull the bezel off the
latches.
Identify the drive to be replaced by the blinking locator LED. Remove the failed drive by
pressing the drive release button and pulling the drive release lever completely open, and
then sliding the drive and carrier out of the system.
Remove the failed drive from the drive carrier by removing the four small screws on the side.
Install the replacement drive into the carrier and replace the four screws.
Install the replacement drive and carrier back into the system by carefully inserting it into the
slot in the appliance until it is fully seated. Close the handle to lock it into place.
Follow the prompts in the VxRail Manager drive replacement procedure. The drive is verified
and added back into the vSAN cluster.
Reinstall the bezel by pushing the ends of the bezel onto the latch brackets until it snaps
into place.
If the bezel has a key lock, lock the bezel with the provided key and store the key in a
secure place.
Procedure
The following procedure lists the high-level steps involved with replacing a capacity drive.
For more detailed procedure for specific hardware configuration, refer to the procedure
generated from the SolVe Desktop available from Dell EMC support https://support.emc.com/.
On the VxRail Manager Health, Physical view, identify the failed drive and click on it to
display the disk information.
Click HARDWARE REPLACEMENT to initiate the drive replacement.
The automated procedure performs the necessary steps to verify the system and prepare
for drive replacement.
Remove the bezel from the VxRail appliance.
If the bezel has a key lock, unlock the bezel with the provided key. Press the two tabs on
either side of the bezel to release the bezel from its latches and pull the bezel off the
latches.
Identify the drive to be replaced by the blinking locator LED. Remove the failed drive by
pressing the drive release button and pulling the drive release lever completely open, and
then sliding the drive and carrier out of the system.
Remove the four small screws on the side of the failed drive and remove it from the drive
carrier.
Install the replacement drive into the carrier and replace the four screws.
Install the replacement drive and carrier back into the system by carefully inserting it into the
slot in the appliance until it is fully seated. Close the handle to lock it into place.
Follow the prompts in the VxRail Manager drive replacement procedure.
The drive is verified and added back into the vSAN cluster.
Reinstall the bezel by pushing the ends of the bezel onto the latch brackets until it snaps
into place.
Procedure
The following procedure lists the high-level steps involved to replace a capacity drive.
For more detailed procedures for specific hardware configurations, see the procedure generated
from the SolVe Desktop available from Dell EMC support https://support.emc.com/.
On the VxRail Manager Health, Physical view, identify the failed component.
Disconnect the power cable from the power source and from the PSU.
Press the release latch and slide the power supply unit out of the chassis.
Slide the new power supply unit into the chassis until the power supply unit is fully seated
and the release latch snaps into place.
Connect the power cable to the power supply unit and plug the cable into a power outlet.
The system takes approximately 15 seconds to recognize the power supply unit and
determine its status. The power supply status LED turns green signifying that it is working
properly.
On the VxRail Manager, Health page, Physical view, verify that the error is resolved and
the PSU shows a Healthy status. Refresh the page the page if needed.