Gaining Operational Intelligence in ACI: Day 2 Operations Application Stack

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Gaining Operational Intelligence in ACI

Day 2 Operations Application Stack

Joseph Ezerski
DCSBU 2019
Network Operations – Typical Questions
“Is my policy and state
adhering to the compliance
“Is my network running with mandates?”
PSIRTs/vulnerabilities? “ “Do I have
anomalous hardware
table usage ? “
Architecture and Planning
teams
“ I am not an ACI policy model “Do I have unusual
expert , how do I map my existing latency on some flows
networking functions to the ACI ?“
world?”

“Which devices are “I’m about to make a set of


operating out of changes, Any latent
spec?” misconfigurations?”

Network Operations Network Administration


“Do I have CRC “ I inherited the infrastructure
drops / buffer drops from another team, how do I
on interfaces ?” easily discover what’s in my
network?”

“Ticket was opened 4


“Which switches should I patch or
days ago, How do I
upgrade to improve availability?? “
troubleshoot a
“Which patches should I run ? “
historical issue?”

“Can I run my DC “Can I quickly make sure that my


HVAC at 5C policy has successfully
higher ?” quarantined some EPGs and has
made backup EPGs available ?”

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Solving the Problem Where It Happens

Different technology requirements at different layers

• Policy and State Assurance analysis need to be model-based to be predictive


• Proactive operations allows good policy to be posted to infrastructure thus reducing escaped issues.

• Health Analysis/Anomaly Detection is based on Resource Monitoring and Trend Analysis


• Real-time troubleshooting of low level data path issues
• Analysis of Network Flows
• AI/ML based signature/anomaly detection

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Cisco Network Insights & Assurance
Day 2 Operations Stack

+
Network Insights: Network Assurance: Moving
Health and Availability from Reactive to Proactive

For ACI & NX-OS Fabrics For ACI Fabrics


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Part One: Health
Analysis/Anomaly Detection

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Introducing Network Insight Telemetry Applications
Providing Network Health Visibility & Enabling Proactive Insights

New Apps

Network Availability Network Health


Network Insights Advisor Network Insights Resource
NIA NIR Analysis
Proactive Software Recommendations/Notifications Physical/Logical Network Capacity & Utilization
Issue Vulnerability Detection & Remediation Data & Control Plane & Environmental Health

Enhance Availability, Uptime & Network Wide Visibility


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Available Now!
ACI Network Insights – Resources
Understand What’s Running in your Network
Resource
Event Analytics Dashboard Analytics

Data Collection

Anomaly
Detection

Remediation

Event Analytics Dashboard Displays Faults, Events, And Audit Logs In A Time Series Fashion.

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Q1 CY 2019
ACI Network Insights – Resources
Understand What’s Running in your Network

Flow Anomalies Flow Analytics Dashboard

Packet Drops

Latency

End Point Move

Flow Analytics Dashboard Displays Key Indicators Of Infrastructure Data Plane Health.

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Demo NIR

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Resource Analysis - Common Use Cases
Menu Items
! Dashboard -- “Tell me now if I’ve got a problem!
• Anomalies

System
• Resource Utilization [Fabric Wide
• Trend Monitoring
(rising/falling)
• Fabric Capacity
• Environmental
Operations
• Statistics
• Flow Analytics
• Event Analytics

Monitor Troubleshoot Predict


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Use Cases for Insights Resource Analysis

• Track Flows – Get end to end flow details ~ 10k flows/sec


• Track packet drops - Watch the flows and root-cause any packet drops.
Show the reason for drop and at which point in the fabric
• Flow Latency – Show / Troubleshoot end-to-end Latency
• Reduce time to innocence – Network Problem or App problem
• Baseline trends across resources on every node
• Significant State Change in the Fabric wrt
operational/config/environmental/interface/protocol counters and
utilization/interface up down
• Track rate of change –Any sudden changes in the routing or MAC address
tables
• State changes after a reload/maintenance window
• Dynamic correlation – ISIS errors due to CRC errors or LLDP flaps
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Insights – Resources (ACI)
NIR 1.2 Release NIR 1.3 Release NIR 2.0 Release NIR 2.1 Release

Flow Analytics
Event Analytics Multi fabric support Resource tagging
Packet Drop diagnosis

Resource Utilization End Point Analytics Integrations NLP

Anomaly notification
Events & Faults Remote Storage vPOD
via Kafka bus

Protocol & DP stats –


Audit Logs Buffer Analytics SD-WAN
Anomaly detection

Operational Resources Predictive Hardware


Environmental Packet Injection
State diffs failure

Flow Analytics – Troubleshooting On demand Packet


cAPIC
Limited Availability Insights capture

Q1CY19 Q2CY19 Q3CY19 Futures

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Insight Advisor (NIA)

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Q2 CY 2019

Network Insights Advisor


Software/Hardware
Recommendations Avoid multiple TAC calls
Workarounds

EOL/EOS Keep Network up to date


Field Notices Adhere to Cisco policies
SMUs Recommendations

Network Known Issues/PSIRTs Remove Complexity

Insights Anomalies Unknown runtime


Config anomalies
Avoid Outages
Faster Deployment times

Advisor Version Scale


Limits/Hardening Significant CAPEX
Check And OPEX Savings
Configuration

Forwarding State Check


Prevent traffic black holing
Loops Detection
Cable Checkers Avoid downtimes

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Insights Applications

Apps
DCNM APIC

Platform
NX-OS ACI

App Hosting Framework App Hosting Framework


App Store App Store

Data collection and ingestion Data correlation and analysis Data visualization and action

Visibility Insights Proactive Troubleshooting


Learn from your network and See problems before Find root cause faster with
recognize anomalies your end users do granular details
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Insights Advisor Targeted Use Cases
Proactive supportability insights
Dashboard ”Give me a summary of issues”
!

Advisories, Notifications, PSIRTs


• Provide Timely updates about your
system
• Track Bugs and PSIRTs

Anomalies
• Configuration, Consistency, Unplanned
events

(Fabric) Fabric wide analysis


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Use Case – Notify About Anomalies
Known Anomalies
3 Alert / Inform
Detected:
CSCDT2396 SAL1820SDRE
Weekly Sync

Recommend:
Upgrade S/W to NXOS
7.0(3)I7(3) in SAL1820SDRE

NIA
2 Detect
Insight
DB Fabric
Monitor 1
4 Implement

Detect Alert Remediate


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Use Case – Notify Me About New Releases
Notifications
s
3 Alert / Inform Detected:
p PSIRT: SAL1820SDRE

Recommend:
Upgrade S/W to NXOS
7.0(3)I7(3) in SAL1820SDRE
NIA

Push
Insight
Notification
DB Fabric
Monitor 1
4 Implement

p PSIRT

s S/W p
2 Identify Switches
p p
Notify

Detect Alert Remediate


Detect
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Alert Remediate
Use Cases for NIA

• Root cause from fingerprints and signatures – Constantly collects and checks
logs and identifies known caveats, which switches are affected and
recommendations for remediation
• PSIRTs, Field notices, SMUs, EOL/EOS of Software and Hardware
• Config anomalies - Get notifications when your configurations are not within
verified scale
• Compliance checks – hardening, control and data plane inconsistencies
• Measure upgrade impact - If a software upgrade will be disruptive or non
disruptive / if the new hardware can support the existing feature-set and scale
• Open TAC case with logs readily available, check status of SRs

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Part Two: Policy and State
Assurance

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
New App
for ACI!

NAE Policy Explorer

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
NAE Policy Explorer (PE) Introduction
• NAE PE is an ACI App, available in ACI
Appstore
https://aciappcenter.cisco.com/
• It uses NLQ (Natural Language Queries)
to explore ACI policy model, answering
questions about connectivity and
associations among objects, including
VRFs, BDs, ENCAPs, Eps, Interfaces,
Contracts,
• It support ACI 3.2+ release with 2G
footprint
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
NAE PE Introduction – Exploration Primitives

“What” Query
• A “What” query answers how different networking assets are related to each other
• Example: What endpoints are associated with BD:X

“Can” Query
• A “Can” query answers if two given assets in the fabric talk to each other
• Example : Can A talk to B , A and B can be arbitrary sets - EPGs, BDs, VRFs, Endpoints,
Encaps, interfaces

“How” Query
• A “How” query answers how and on ports do a pair of EPG’s talk to each other
• Flow information between EPG pair (ether_type, src_port, dst_port, protocol, flags..etc)

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
NAE PE Workflow

Take snapshot of network

Analyze and transform network configuration and state

Build graph/formal models of the network

Query the model to reason behavior of the network

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Open the NAE EP App from ACI AppCenter

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Take a Snapshot of the ACI Fabric
On the timeline, click on the camera icon to take a instantaneous snapshot of the ACI fabric.

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Initiate a Query (Begin with What or Can)
▪ Select a snapshot on the timeline
▪ Start a What or Can query

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
“What” Query

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
“Can” queries

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Packaging/Footprint/Limitations
• APP on APIC’s app infrastructure
• Initially support APIC version 3.2

• 2G memory foot print enforced by APIC 3.2

• Support limited scale in APIC 3.2 releases


• Network Mode (VRF: unenforced)
• 3/5/7 APICs
• 40 Leaves
• 100 EPGs , 100 BDs , 10 VRFs, 10 Tenants
• 2K Endpoints

• Policy Mode (with contracts)


• 3/5/7 APICs
• 20 Leaves
• 100 Contracts/ 100 Filters, 10K Unique ActrlRules fabric wide
• 100 EPGs , 100 BDs , 5 VRFs , 5 Tenants
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Demo NAE Policy Explorer

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Assurance Engine
Update (NAE)

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
NAE Pricing – Breaking News
• ACI Premier Bundle for Leaf includes NAE Leaf license. The Premier PID belongs
to Core category (Std Disc. 42%) with higher discounts available to customers
• But the NAE Appliance and Spine Licenses are in Market category (Std. Disc
20%) leading to varying discount structure
• To enable uniform discounting and reduce confusion from discounting mismatch
- all NAE PIDs are moving from Market to Core category
• NAE PIDs will reflect price uplift to enable higher discounts characteristic in the
core
• For ease of purchase and to be consistent with other ACI software licensing
approach we are no longer charging for NAE Spine licenses
• The pricing changes will be effective in March 2019

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Key Points
• There is no change in names of existing SKUs
• Only change is in pricing to enable higher discounts
• No Spine licenses
• Existing quotes given to customers are valid for 30 days.
• BU and PMs will work with any accounts that have
• deal in flight to make sure net price is maintained by adjusting discounts
• quote submitted but deal is not booked make sure net price is kept

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Intent Assurance

The confidence that the


infrastructure is doing what
you intended it to do
Intent Encompasses Data Center Operations
Configs, Changes, Routing, VMs, Security, … Compliance, Audits

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
How Cisco Network Assurance Engine
How it Works

Data Collection Formal Modeling of Network Continuous Analysis


Capture DC Wide Intent, Policy, Precise Mathematical Models that codify Models verify that Network operates
Control/State across Cisco’s 30+ Years of Networking and per Intent and accurately tell what is
Forwarding & Security Cross Customer Domain Knowledge wrong, where, why, impact and how to
fix

Reasoning you do after the fact, the Engine does before the fact, continuously, network wide
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

#CLUS DEVNET-1699 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Verification Results Delivered via Smart Events
What ? Who and Where ?

Why ? How to fix ?

Reduce Mean Time to Repair with Precise Pin-Point Analysis


© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
New Additions in the 2.1.1 Release (Aug 2018)
The following new features in NAE Release 2.1.1a

• Epoch Delta Analysis


• Inband management IP support for device access and collection
• NAT support for device access
• Assurance results table export (CSV/JSON)

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Epoch Delta Analysis
Correlated Ad hoc Analysis Workflow
4 Qs, correlated answers…
• What changed?
• Who was impacted?
• Was it due to config changes?
• What happened as a result?

Use Cases
• Change Management
• Root-cause analysis
Before / After /
Baseline Current • Migration
• Maintenance Upgrades

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Capacity Management
Health Delta - Summary
Change in the health of the Fabric

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Epoch Delta Workflow – Policy Delta
Impact, Change, Operator

What got
impacted ?

Who made the changes ?


Details of
impact, if
any

What has changed ?

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

#CLUS © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
NAT Implementation
• DNAT (IP to IP ) is supported in Release 2.1.1a
• Use a .csv file (NAT.csv) to provide the mapping info
of private-IP and public-IP for all ACI
leafs/spines/APICs
Typical NAT deployment with
NAE

Public IPs of
ACI
APIC Hosts
NAT.csv
NAE NAT Network
(Public, Private IP)
Network Assurance Network Address Translation APIC/Leaf/Spine
Engine

Public IP Network OOB/Inband Private Network

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Tables Export Available
Customer may need to externally NAE Release 2.1.1a supports export of
save assurance analysis results the following assurance results tables:
for different reasons: • All Smart Events
• Analysis
• Tenant Security Smart Events
• Ticketing
• Tenant Endpoints Smart Events
• Change Management
• Tenant Forwarding Smart Events
• Backup
• Real-time Change Analysis Smart Events
• TCAM Smart Events

• Tenant Endpoint Details

• L3 Forwarding Table
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Working With Export Options
▪ Two export formats are supported: CSV and JSON

CSV

JSON

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
New Additions in the 3.0.1 Release (Dec 2018)

Tenant Forwarding
• PC and vPC interface Smart Events
• L2 Path Binding Smart Events enhanced with PC and vPC support

Segmentation Compliance Events


• Compliance and Violation Smart Events

Policy
• Overlapping subnets Smart Event enhancements

Scale Increase
• 200 Leaf support in a single fabric

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Forwarding Connectivity Analysis
Health of Forwarding Communication Fabric-wide
Use Cases

Forward Communication
Issues across entire fabric

Visibility into Route


Leakage*

Visibility into Fabric


Communication with
External Network

Policy and Forwarding


External
Inconsistencies
Internal to
Iinternal to
External
Internal to
External

InterVRF IntraVRF

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential *Roadmap
Forwarding Connectivity Analysis
Health of Forwarding Communication Fabric-wide

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Compliance Analysis
Continuous Compliance Verification
NAE COMMUNICATION COMPLIANCE REQ

Entity A must not talk to Entity B Segmentation


Compliance

Entity A must talk to Entity B on Y SLA Compliance*

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential *Roadmap
Smart Events & Compliance Score for Compliance

COMPLIANCE VIOLATED SMART EVENT COMPLIANCE SATISFIED SMART EVENT

• Identify non compliant policy • Identify compliant policy


• Identify requirements violated • Identify requirements satisfied
• Identify non-compliant EPGs • Identify compliant EPGs

COMPLIANCE SCORE

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Bringing It All Together

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Network Insight & Network Assurance
ACI NX-OS

Available via Premier Tier Subscription


NAE Policy Explorer *
• Network Policy exploration
• Ad-hoc connectivity and segmentation Architecture and Planning
discovery

Network Assurance Engine


• Policy/ Control/Data plane Assurance Roadmap
• Incident and Problem Management
• Compliance and Audit Network Administration and
Maintenance

Network Insights Resources **


• Fabric wide resource utilization & trends
• Anomaly detection – environmentals, config &
operational resources, interface errors
• End-to-end flow path, latency and drop reason Network Operations

Network Insights Advisor **


• Notifications of EOS/EOL of H/W & S/W
• Security Advisory Notification Updates (PSIRTs)
• Recommended S/W Release Updates and
upgrade impact analysis * Available as App on APIC
• ©Update
2018 Cisco and track
and/or its TAC
affiliates. SR’s
All rights centrally
reserved. Cisco Confidential
** Available as App on APIC and DCNM
Opstack benefits
Architecture and Planning teams Network Operations Network Administration and Maintenance

Verification of design/connectivity mandates


Architecture
Ad-hoc and Planning
policy exploration
AccelerateOperations
ACI on-ramp Administration and Maintenance
Light weight book-keeping procedures
NAE PE Ad-hoc Connectivity and Segmentation
Analysis

Capacity Planning Proactive Assurance and Compliance Execute high confidence production
Design and verify compliance mandate and Faster incident and problem management maintenance and upgrades
NAE posture Shrink change management windows
Design and verify security mandate and posture Accelerate ACI on-ramp

Fabric wide anomaly detection based on


Trend based capacity planning Proactively reduce vulnerability exposure
- Resource monitoring
NIR/NIA Design trend-based environmental site
- Flow monitoring Improve Site reliability
operating procedures
Faster low-level troubleshooting and diagnostics

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
OPSTACK – Tools at a Glance
Platforms
Key Use cases Data Sources Technology Target Teams Packaging
supported

• Design verification and policy governance


NAE PE • Visibility into logical fabric inventory • Architecture
• APIC Policy Graph- Model based Apic APP ACI
• Explore object associations for fabric inventory • Network Administration
• Connectivity and segmentation queries


• Proactive assurance of Policy and Dynamic state
• Network Operations
changes. • APIC Policy and
• Network Provisioning
NAE • Incident and Problem Management Network wide
Formal Model based • Network Administration Appliance ACI
• Audit and Compliance control/data plane
of logical network assets
• Change Management state
• Network Security policy Management

• Heuristic analysis • Network Operations


• Resource monitoring and flow anomaly detection App hosted on
of Streaming • Network Administration
NIR • Telemetry data collection based on triggers • Device resources an
telemetry data and book-keeping of Nexus/ACI
• Operating environment anomaly detection • Flow data appliance/APIC
• Base line physical network assets
• Physical hardware health monitoring -X
threshold

• Network security maintenance based on PSIRTs


• Device level SW
and known vulnerabilities App hosted on
and HW • Comprehensive • Network Maintenance
NIA • Fabric health /Site reliability and maintenance an
information global advisory • Network Administration Nexus/ACI
based on Cisco global advisories appliance/APIC
• Network failure database of SW and HW
• Suggested resolution based SW release and -X
signatures
impact analysis of upgrades
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

You might also like