Professional Documents
Culture Documents
PDF pt203 Sos Nutanix Troubleshooting
PDF pt203 Sos Nutanix Troubleshooting
PDF pt203 Sos Nutanix Troubleshooting
aMANAGING
L NUTANIX ENVIRONMENTS
• Cluster Monitoring
• NCC overview
• Prism Analysis (and Prism Central)
I I . TROUBLESHOOTING N UTANIX
ENVIRONMENTS
• General Troubleshooting
• Troubleshooting Scenarios
• Engaging support best practices
• Additional Resources
I I I.
Q/A
CONFERENCE
Monitoring
Pulse ,.•,
Emai
l SNMP
Syslog
Prism Alerts
CONFERENCE
Prism Alecs Pulse
HD
I N S GH T S
Pulse
HDurly Cluster
RepDrts
Deep Analytics
And InventDP/
/\UtDFFIBtIC
Case generatiDFl
Cluster PhDn e
Prism Alerts
HDme
Health Alerts
COINF
ERENCE
Auto-case Generation
Example:
Description Block Serlal Number:
alert tima: Tue Mar 22 2016 18:54:51 GMT-0700 (PDT)
aIert_type: PowerSupplyDown
alert msg: A1046:Bottom power supply iB down on
block
cluster id:
aIert„body: No Alert Body Available
cluster id:
aIert_body: No Alert Body Available
Resolution Scheduled Maintenance. As advised by customer
CONFERENCE
Auto-case Generation
›I
i
COINFERE
NCE
Working with Prism Central Alerts Dashboard
COINFERE
NCE
NCC Health
ChecksCLI - (NCC HEALTH PRISM (AOS 5.X)
CHECKS RUN ALL)
Passed
Total
CONFERENCE
DC Chcck Na mc
Checks• Aftecte a C V M s
NCC s a framewo of a tomatically diagnose cluster
scfi$
• Default
hea checks are non-disru we
• KB article for each NCC check
• Helps get a baselines
• NCC can be upgrade
Troubleshooting no impa
withrelevant
Information fincludinp KB) act to cluster
• Poperation
: The tested aspect of the cluster is healthy and no
further
action is required
COINFERE
NCE
CONFERENCE
Troubleshooting Nutanix Environments: A Framework
• Problem Isolation
• Product
Improvement
CONFERENCE
Troubleshooting by
Layers
A PPLICAT1ON
• SOL, VDI, Oracle RAC. etc.
CVM
• Stargate. Curator. Cassandra. etc.
HYPERVISOR
• AHV, ESXi, Hyper-V, XenServer
HARDWARE
• NVMe. SSD, HDD, Memory, NIC. Processor, etc.
N ETWOR K
• OVS. vswitch, Physical Switch, etc.
CONFERENCE
Troubleshooting: Problem Isolation
• Rapidly reduce failure domain scope. achieve faster resolution.
• Any recent changes in the environment*
IMPACT
• Is storage available*
• Are there performance issues*
• Can you reach Prism*
• NCC
CONFERENCE
Problem Isolation - Data Resilenc States
O&
Rebuild capaclty
available
CONFERENCE
Root Cause Analysis - Log Collection
Logs will be collected for all the no0es and components. Once the
task completes the bundle will de aveilabJe for download.
Pun C h ec k s
BY CHI-CK S TA I US
Log Collecfor
Passed 39
1
C an cel
CONFERENCE
Best Practices for Engaging Suppor
• Update your break/fix contact via My Nutanix Portal
• Upgrade to the latest NCC and start a health
check
• Clear problem description
• What steps have you already taken?
• Keep components on the recommended version levels
• Press the Escalate Button in portal for immediate
attention
• Provide feedback after case closure. Surveys
matter!
CONFERENCE
Additional Resources
The Nutanix Bible - Architecture details
portal.nutanix.com - Nutanix Support Portal, KBs, Documentation, Software, etc.
portal.nutanix.com/ h/4530 — Additional troubleshooting details for Acropolis File Services
CONFERENCE