Professional Documents
Culture Documents
Life Cycle Manager Guide v2 - 3
Life Cycle Manager Guide v2 - 3
Life Cycle Manager Guide v2 - 3
1. Overview...................................................................................................................... 3
Life Cycle Manager................................................................................................................................................... 3
Prism Central and Prism Element.......................................................................................................................4
Copyright...................................................................................................................22
License......................................................................................................................................................................... 22
Conventions............................................................................................................................................................... 22
Default Cluster Credentials..................................................................................................................................22
Version..........................................................................................................................................................................23
1
OVERVIEW
Life Cycle Manager
The Life Cycle Manager (LCM) tracks software and firmware versions of all entities in the
cluster.
LCM Structure
LCM consists of a framework and a set of modules for inventory and update.
This document assumes you are using LCM at a site that has Internet access. To use LCM at a
location without internet access, see the Life Cycle Manager Dark Site Guide.
LCM supports software updates for all platforms that use Nutanix software.
LCM supports firmware updates only for the following platforms.
The LCM framework is accessible through the Prism interface. It acts as a download manager
for LCM modules, validating and downloading module content. All communication between the
cluster and LCM modules goes through the LCM framework.
LCM modules are independent of AOS. They contain libraries and images, as well as metadata
and checksums for security. Currently Nutanix supplies all modules.
The LCM framework targets a configurable URL to download content from the LCM modules.
LCM Support
LCM supports both Prism Element and Prism Central.
Nutanix recommends that you use LCM to update your cluster to the most recent version of
Nutanix Foundation before performing any other LCM updates.
Nutanix recommends that you perform all updates through LCM, where available. However,
your platform vendor may recommend updates that are not yet available through LCM. The lag
is caused by the time it takes Nutanix to revalidate the LCM payload after incorporating new
updates from vendors.
LCM Operation
LCM performs two functions: taking inventory of the cluster and performing updates on the
cluster. LCM updates are not reversible.
LCM | Overview | 3
Before performing an update, LCM runs a set of pre-checks to verify the state of the cluster. If
any checks fail, LCM stops the update.
LCM writes all operations to output logs:
• genesis.out
• lcm_ops.out
• lcm_ops.trace
• lcm_wget.log
The log files record all operations, including successes and failures. If an operation fails, LCM
suspends it to wait for mitigation. Contact Nutanix Support for assistance if there is an LCM
failure.
The LCM framework can also update itself when necessary. Although connected to AOS, the
framework is not part of the AOS release cycle.
Limitations
• LCM cannot perform firmware upgrades on single-node clusters. Firmware updates usually
require services to stop, and on a single-node cluster there is no other node to take over the
workload. This limitation does not apply to software updates.
• LCM is a separate process from Prism one-click upgrades. CLI commands specific to the
one-click upgrade workflow, such as firmware_upgrade_status, do not apply to LCM. For
update status, use lcm_update_status.
• LCM cannot operate when you have VLAN traffic filtering (such as VLAN trunking or virtual
guest tagging) deployed on the dvPortGroup connected to the Controller VM eth0 interface.
Prism Central
Some entities are local to Prism Central, such as Calm, Epsilon, Karbon, and Objects. To manage
these entities, you must log on to Prism Central.
Other entities, such as component firmware, are local to Prism Element. To manage these
entities, you must log on to Prism Element.
In general, follow these guidelines for using LCM with Prism Element and Prism Central:
• Make sure that Prism Central and Prism Element use the same LCM URL. (At a dark site,
make sure that both Prism Central and Prism Element use the same dark site bundle.)
• Before you access Prism Element from Prism Central, perform an LCM inventory on Prism
Central, using cluster quick access.
• Every time you register a new Prism Element instance to Prism Central, perform an LCM
inventory on Prism Central.
• Nutanix recommends that you enable auto-inventory for LCM on Prism Central, to make sure
that the UI stays up to date on all Prism Element instances.
2
OPENING LCM
Accessing the Life Cycle Manager from Prism Element
Open LCM from Prism Element.
Procedure
Open LCM with the method for your version of AOS.
Procedure
Open LCM with the method for your version of AOS.
Procedure
1. Open LCM according to the procedure for the version of AOS you are using, as described in
Accessing the Life Cycle Manager from Prism Element on page 5 or Accessing the Life Cycle
Manager from Prism Central on page 6.
2. Select Inventory.
4. Click OK.
The new inventory appears on the Inventory page.
5. Use the Focus button to switch between a general display and a component-by-component
display.
7. To enable auto-inventory, click Settings and select the Enable LCM Auto Inventory checkbox
in the dialog box that appears.
8. To return to Prism, open the drop-down menu on the upper left and select Home.
Procedure
1. Open LCM according to the procedure for the version of AOS you are using, as described in
Accessing the Life Cycle Manager from Prism Element on page 5 or Accessing the Life Cycle
Manager from Prism Central on page 6.
a. Select the checkbox for the node you want to update, or select All to update the entire
cluster.
b. Select the components you want to update. When you select a node, LCM selects
the checkboxes for all updateable components by default. Clear the checkbox of any
component you do not want to update.
5. In the dialog box that appears, specify which prechecks you want LCM to run before
updating and click Run.
a. ESXi: LCM cannot place ESXi hosts with active pinned VMs into maintenance mode.
Before you can proceed with LCM updates, you must choose one of the following.
Note: Non-migratable VMs include VMs with affinity set to a single host; VMs with GPUs;
and agent VMs.
9. To return to Prism, open the drop-down menu on the upper left and select Home.
LCM Workflow
LCM updates the cluster one node at a time: it brings a node down, performs updates, brings
the node up again, and then moves on to the next node. If LCM encounters a problem during an
update, it waits until you resolve the problem before moving on to the next node.
During an LCM update, there is never more than one node down at the same time.
For hosts running ESXi: LCM restarts the host after it migrates the guest VMs but before it
enters maintenance mode, so a "maintenance" status alert never appears in vCenter. This is
expected behavior.
All LCM updates follow the procedure shown in the following flowchart.
1. If updates for the LCM framework are available, LCM auto-updates its own framework, then
continues with the operation.
2. After updating itself, if necessary, LCM runs the series of pre-checks described in Life Cycle
Manager Pre-Checks on page 16.
3. When the pre-checks are complete, LCM looks at the available component updates and
batches them according to dependencies. LCM batches updates in order to reduce the
down-time of the cluster; when updates are batched, LCM only performs the pre-update
and post-update actions once. For example, on NX platforms, BIOS updates depend on BMC
updates, so LCM batches them so the BMC always updates before the BIOS on each node.
4. Next, LCM chooses a node and performs any necessary pre-update actions.
5. Next, LCM performs the update. What actually happens during an update varies by
component; see the following sections for lists of update actions for supported components.
Note: Because of requirements of Intel microcode, updating the BIOS requires several
restarts, so BIOS updates take longer than updates for other components.
BMC
What happens during the update:
1. Put the CVM in maintenance mode.
2. The CVM entering maintenance mode automatically triggers the host to migrate all
guest VMs to another node.
3. Restart the node into the Phoenix ISO.
4. Perform the update.
5. Restart out of Phoenix and bring the CVM out of maintenance mode.
Data Drives and HBA Controllers
What happens during the update:
1. Check disk health on all CVMs, to make sure that taking down one drive does not
cause any data loss.
2. Move all storage traffic out of the Controller VM.
3. Put the CVM in maintenance mode.
4. The CVM entering maintenance mode automatically triggers the host to migrate all
guest VMs to another node.
5. Stop services on the CVM.
6. Restart the node into the Phoenix ISO.
7. Perform the firmware update.
8. Restart the node out of Phoenix.
9. Bring the CVM out of maintenance mode, automatically returning guest VMs.
10. Restart services on the CVM.
11. Return storage traffic to the CVM.
12. Recheck all data drives.
SATA DOM
What happens during the update:
1. Put the CVM in maintenance mode.
2. The CVM entering maintenance mode automatically triggers the host to migrate all
guest VMs to another node.
• For all LCM operations, disable Common Access Card (CAC) authentication on your cluster.
• For all LCM updates on ESXi, disable admission control in vCenter.
Note: For information on configuring vCenter, see the vSphere Administration Guide for
Acropolis: vCenter Configuration.
LCM Pre-Checks
test_aos_afs_compatibility
Verifies that AOS and Nutanix Files are compatible.
test_cluster_status
Verifies that the cluster statuscommand returns a healthy state.
test_cassandra_status
Checks zeus_config_proto (run zeus_config_printer for equivalent) for the following:
• ESXi: checks if there is a VmkNic for all ESXi hosts in the same subnet as eth0/eth2.
Verifies pyVim connectivity to ESXi hypervisor.
• Verifies that the host does not have APD/VMCP enabled. (If yes, disable it.)
• Verifies that no CD-ROMs (ISO image stored in local storage) or other physical
devices are attached to the VM. (If yes, remove the physical device from UVM.)
• AHV: verifies that the host can enter maintenance mode.
• Hyper-V: none.
test_check_revoke_shutdown_token
Checks that the CVM can revoke the shutdown token after an LCM update completes on
a node in a cluster.
test_ntp_server_reachability
Checks that the configured NTP servers are reachable before starting LCM updates.
Note: For inventory scans, LCM skips the aos_afs_compatibility and under_replication tests,
because inventory scans are not disruptive.
6
LCM LOG COLLECTION
Collecting Life Cycle Manager Logs
Run the LCM log collector utility.
Note: The collector cannot pull logs from a node booted into Phoenix when the node IP address
is not reachable over the network. In that case, apply the CVM IP address to the Phoenix instance
using IPMI, following the procedure in KB 5346, before running the collector.
Procedure
From any CVM, run the collector command.
$ python /home/nutanix/cluster/bin/lcm/lcm_log_collector.py
LCM creates a log bundle in the /home/nutanix directory on the node where you ran the script.
The log bundle format is lcm_logs_lcm-leader_timestamp.tar.gz.
Hardware Terms
BMC
baseboard management controller, the microcontroller that manages the motherboard.
BIOS
basic input/output system, the firmware that initializes the motherboard and runtime
services on startup.
HBA
host bus adapter, a device that manages communication between storage media and other
system components.
SATA DOM
SATA disk on a module, the hypervisor boot drive for Nutanix platforms up to and including
G5.
M.2
mSATA 2, the hypervisor boot drive for Nutanix platforms G6 and later.
Dell-Specific Terms
iDRAC
integrated Dell remote access controller, a software tool that lets you administer a server
without needing physical access.
iSM
iDRAC service module, a module that integrates iDRAC with an operating system.
PTAgent
Power Tools agent, the software entity responsible for configuring the iSM.
Firmware Entities
A collective term for all Dell firmware not related to iDRAC.
Lenovo-Specific Terms
IMM2
integrated management module 2, a service processor that manages the motherboard
and allows remote access for server administration. Present on Lenovo generation-5
platforms; replaced by XCC in later generations.
License
The provision of this software to you does not grant any licenses or other rights under any
Microsoft patents with respect to anything other than the file server implementation portion of
the binaries for this software, including no licenses or any other rights in any hardware or any
devices or software that are used to communicate with or in connection with this software.
Conventions
Convention Description
root@host# command The commands are executed as the root user in the vSphere or
Acropolis host shell.
> command The commands are executed in the Hyper-V host shell.
LCM |
Interface Target Username Password
Version
Last modified: September 1, 2020 (2020-09-01T12:55:05-07:00)
LCM |