Professional Documents
Culture Documents
5 ACI Monitoring and Tshoot PDF
5 ACI Monitoring and Tshoot PDF
5 ACI Monitoring and Tshoot PDF
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public ATX: Prepare to Implement Cisco ACI
1 10-minute daily routine
2 Navigation shortcuts
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
First 10 Minutes – Daily Routine
• Alert List
• Health Scores
• System
• Node
• APIC Controller
• Faults
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Alert List
Shows critical warnings and
error information to inform
user to take action
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Alert List (Continued)
Looks like we
Looks like
hadwean issue!
had an issue!
Health Score
Health Scores are based on Faults and events
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Check APIC Cluster Status
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Check Faults
Faults
Faults are indications of mis-config or any
issues on ACI Fabric
※ This is a lab setup. Try to clear all Faults whenever a
new one is raised in production.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
First 10 Minutes – Daily Routine (contd)
• Fabric inventory
• Topology Summary
• Physical switches
• Global EndPoints
• Interfaces’ status
• Policies-to-interfaces association
• Duplicated Ips
• Disabled interfaces
• DOM values
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Topology Summary: Physical Switches
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Global Endpoints
EPG Level
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Interface
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Interface and Policies
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Duplicate IP/Disabled Interfaces
Duplicate IPs
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Understand DOM values (inventory > leaf > physical interfaces)
• dBm – logarithmic scale
• value between low and high warnings
Notice:
• Optics used must support DOM
• Must create Fabric Node Controls policy with DOM enabled first, to be associated with switch profiles:
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/troubleshooting/b_APIC_Troubleshooting/m_troubleshooting_tools_and_methodology.html
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
First 10 Minutes – Daily Routine (contd)
• Tenant statistics
• Flows stats
• Drops (operational tab)
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Stats
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Operations: Packets
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
1 Health score overview
4 Evaluation policy
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Health Score overview
• Health Score provides a quick overview of the health of the system/module
• It is based on the Faults generated in the Fabric
• Range: 0 to 100 (100 is perfect Health Score)
• Each Fault reduces the Health Score based on the severity of the Fault
• Health Score is propagated to container and related MOs
• Health Score policies can control the penalty values, propagation, healthRecords
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Health Score Views
• System — Aggregation of system-wide health, including Pod Health Scores, Tenant Health
Scores, system Fault counts by domain and type, and the APIC Cluster health state
• Pod — Aggregation of Health Scores for a Pod (a group of Spine and Leaf switches), and Pod-
wide Fault counts by domain and type
• Tenant — Aggregation of Health Scores for a Tenant, including performance data for objects
such as applications and EPGs that are specific to a Tenant, and Tenant-wide Fault counts by
domain and type
• Managed Object — Health Score policies for managed objects (MOs), which includes their
dependent and related MOs. These policies can be customized by an administrator
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Health Score: Impact Example
• In this example, a hardware Fault impacts the
Health Score of an application component
• The Health Score is propagated to the
following MOs:
• Parent MO
• MOs that have a relation pointing to the
current MO
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Health Score
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Drill Down
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Health Score Evaluation Policy
• Currently can modify the penalty of the Health Score at the Fault severity level or ignore
acknowledged Faults
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
1 Statistics overview
2 GUI counters
Statistics
3 Capacity dashboard
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Statistics
• Helps in quantifying the data with respected to application traffic
• Statistics contain counters
• Sampled into various granularities (5min, 15min, 1hr, etc.)
• History retained for each granularity
• Statistics are related to observable overlay objects
• Tenants/VRF/BD/EPG
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Counter values in GUI
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Capacity Dashboard – Fabric Capacity
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Capacity Dashboard – Leaf Capacity
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
1 Faults
Faults/Events/Audit 2 Events
logs
3 Audit logs
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Faults
• Faults, events and audit logs are essential tools for monitoring the administrative
and operational state of an ACI Fabric as well as troubleshooting current and past
issues
• They are the first thing to check when something is not behaving as expected
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Faults
• When a Fault occurs in an MO, a Fault instance MO is created
under the MO (Fault:Inst)
• It can be queried by DN and class
• Types:
• Generic, Equipment, Configuration, Connectivity,
Environmental, Management, Network
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Fault Types
Type Description
Equipment The system has detected that a physical component is inoperable or has another
functional issue
Connectivity The system has detected a connectivity issue, such as an unreachable adapter
Environmental The system has detected has detected a power issue, thermal issue, voltage issue,
or a loss of CMPS settings
Management The system has detected a serious management issue, such as one of the
following:
• Critical services could not be started
• Components in the instance include incompatible firmware versions
Network The system has detected a network issue, such as a link down
Operational The system has detected an operational issue, such as a log capacity limit or a
failed component discovery
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Fault Triggers
• Four types of Fault triggers:
1. Specific conditions described in the model by Fault rules
2. Counters crossing thresholds specified in user-programmable policies
3. Task or FSM failures
4. Object resolution failures
• Faults are raised and managed on the node (switch or controller) where the condition is
detected
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Fault Lifecycle and Acknowledgement
• Faults can be “acknowledged”, meaning “mark as
viewed”
• Acknowledging a Fault in “retaining” state causes
the Fault to be deleted immediately (without
waiting for expiration or retention timer)
• Faults in Acknowledged state will not affect Health
Score
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Fault Policies
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Sample fault
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Faults
• Documentation
• https://<APIC IP>/doc/html/
• Or go directly to a Fault code:
• https://<APIC IP>/doc/html/FAULT-<Fault Code>.html
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Events
• An event is a specific condition that occurs at a certain point in time (for example “link went
from down to up”)
• As they are part of the normal system workflow, they do not necessarily require user
attention
• Useful for monitoring and debugging issues
• Similar to an entry in a log file: once created, they are never modified
• Deleted once the maximum number specified in retention is reached
• Events are triggered by “event rules”
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Events
• Configuration or state change creates an event
• Logs:
• Audit log: user-initiated actions
• Health Score log: changes in Health Scores
• Event log: other system-generated events
• Viewing
• Via GUI (“History” tab in-context)
• Via CLI, API, Syslog, SNMP, Cisco Call
Home, subscriptions
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Audit Logs
• A mechanism to track user-initiated configuration changes
• When a user creates/modifies/deletes an MO, we create an “audit record” containing affected MO DN,
user name, timestamp and change details
• System also creates logs for log-in/log-out to controllers and nodes
• Similar to an entry in a log file: once created, the aaaModLR are never modified
• Configuration change logs are MOs of class modification log record
• Login/logout logs are MOs of class aaaSessionLR
• Accounting logs get deleted only when a maximum number specified in a retention policy is hit
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Audit Logs
All fields have filters and searchable
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Audit Logs Search - Date
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Audit Logs Search - User
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Audit Logs Search - Action
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Faults with Event Correlation
Show relevant
events before
Fault
BRKACI-1001 48
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
1 SPAN
2 Endpoint Tracker
Troubleshooting
(“operations” tab)
3 Troubleshooting Wizard
4 CLI commands
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
ACI SPAN Feature Overview
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
SPAN S10
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Access SPAN Overview
• Access SPAN in ACI is used to configure local SPAN on leaf switches
• Access SPAN is the only SPAN option that supports SPAN destination to physical port. Access
SPAN also supports ERSPAN destination
• SPAN can be ingress, egress, or both directions
• SPAN sources can be physical port, port-channels, VPC port-channels, or VPC component
ports
• Access SPAN currently supports EPG filters. An EPG filter is translated to a VLAN filter when
the SPAN session is programmed on the leaf
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
SPAN Enhancements
• In ACI 4.1, we are introducing 3 key functionalities for SPAN
• SPAN on drop
• SPAN based on filter (5 tuple)
• SPAN with destination as Port-channel
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Traffic Map (“operations” tab > “visualization”)
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
5
EP Tracker
“We had a
problem at
14:21!”
Attach/Detach
events are logged
for each EP
Was IP Moving?
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Wizard - Faults
Shows Faults
in the Path
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Wizard – Drop Stats
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Wizard - Contracts
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Wizard – Atomic Counters
No Drops!
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Wizard – SPAN
Ability to SPAN to APIC or other devices
attached to the Fabric
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
iPing
• iPing works by sending an ICMP frame to the destination endpoint, from the source leaf on behalf of the
source endpoint
• The reply is relayed back to the source leaf over the infra VRF to ensure the source leaf recognizes the
response without disrupting existing traffic
• Recommend to set the source IP address for troubleshooting
leaf101 # iping -h
iping: option requires an argument -- 'h'
usage: iping [-V vrf] [-c count] [-i wait] [-p pattern] [-s packetsize] [-t timeout] [-S source
ip/interface] host
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
CLI Commands
show version
acidiag avread (show controller)
acidiag fnvread (show switch)
moquery (CLI base MO browser)
show audit
show event
show faults
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
moquery
show all tenants and children objects
moquery -c fvTenant -x 'query-target=subtree’
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Other Operational Support Products
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Troubleshooting Review Questions:
• We notice a slight decrease in System health, how do we find out what happened?
• How to check if Fabric is running out of resources?
• Server team is reporting connectivity issues between two servers. How do I check
if Fabric is in good shape on datapath between two endpoints?
• Server team just connected a new server, gave me only IP and asking if I see their
server on the network?
• I see a new server, but can I ping it from the leaf?
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
1 Backups
Infrastructure and
2 SNMP
Services
3 Syslog
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Infrastructure and Services
Backups - Snapshots
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Infrastructure and Services
Backups - Snapshots
Changed From
Changed To
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
SNMP Support
• SNMP
(read-only)
• http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/aci/apic/sw/1-
x/mib/list/mib-support.html
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
SNMP Configuration
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Open Question: SNMP or REST API?
• SNMP support for any information needs to be explicitly implemented
• Therefore SNMP will never cover 100% of all info available
• The REST API covers 100% of all info available
• It is very easy figuring out which REST call to make to poll a specific counter
• Most tools out there support REST-based polling (mostly indirectly through the help of additional scripts):
Cacti, Nagios, Graphite, etc.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Syslog Integration
• Forwards to syslog server
• Can forward events, audit
and faults
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Syslog Integration
• Verify SYSLOG configuration
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Syslog configuration – Why 3 locations?
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Questions
Next Steps
Additional services
Accelerator close
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Resources
ACI Troubleshooting Guide
ACI Upgrade/Downgrade Matrix Continue the conversation in
our ACI community
https://community.cisco.com/t
5/application-centric/bd-
p/12206936-discussions-aci
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential