Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Menu

ACI Best Practice Con gurations


Posted on July 16, 2021 by Jody

The top question all new ACI customers have (or should have), is what
are the configurations that should be enabled on my fabric from the
beginning? With that in mind, we’re going use this post as a living
document with configurations that are considered “Best Practice” to
have enabled. We will keep this document updated as new versions
come out, so don’t forget to bookmark this page! Wherever possible, we
will include the Cisco documentation for the links, or at the very least, a
detailed explanation of our reasoning.

Global Settings Best Practices:

1. MCP (per Vlan) should be enabled – MisCabling Protocol (or


MCP) detects loops from external sources (i.e., misbehaving
servers, external networking equipment running STP) and will err-
disable the interface on which ACI receives its own packet.
1. This can be enabled by going to Fabric > Access Policies >
Global Policies > MCP Instance Policy default
2. Make sure and enable the “Enable MCP PDU per VLAN”
option (available after 2.0(2)), which enables MCP to send
packets on a per-EPG basis, otherwise, these packets will only
be sent on untagged EPGs (which basically makes it useless
from a loop-detection perspective).
3. If you want to read more about, MCP, go check out this post!
2. Disable Remote EP Learn – This will disable remote IP learning on
border leaf switches.
1. Prior to 3.0, this can be enabled by going to Fabric > Access
Policies > Global Policies > Fabric Wide Setting Policy
2. After 3.0, this can be enabled by going to System > System
Settings > Fabric Wide Setting
3. This is first available starting with 2.2(2e) and all code after
4. Be aware of CSCvi11291 (fixed in 3.2(1l) and later). This bug
will allow remote EP learns on border leaf switches even if
Disable Remote EP learn is configured when the switch
receives packets with src/dst of tcp 179.
3. Enforce Subnet Check (will only work on -EX and -FX based
leafs)
1. Prior to 3.0, this can be enabled by going to Fabric > Access
Policies > Global Policies > Fabric Wide Setting Policy
2. After 3.0, this can be enabled by going to System > System
Settings > Fabric Wide Setting
3. This is first available starting with 2.2(2q) and all 2.2(x) code
after
1. Not available in 2.3(x)
2. First available for 3.0 starting with 3.0(2k) and after
4. Enforce Subnet Check is somewhat like “Limit IP Learning to
subnet”, but on steroids. You might remember that the “Limit
IP Learning to subnet” BD configuration option prevents the
learning of IP endpoints if they are not a subnet configured on
the BD. “Limit IP Learning to subnet” does NOT drop the
packet, it just stops it from being learning on the BD. The
packet can still be learned on a leaf that does not have the BD
configured (i.e., a border leaf). This can be problematic, and
thus, the need for the Enforce Subnet configuration option.
When enabled, we will not learn the IP component at the VRF
level as well.
5. Be aware of CSCvh17285 (fixed in 3.2(1l) and later). When
Enforce Subnet is enabled, any Bridge Domains which are
configured as L2-only -AND- have L2 Unknown unicast set to
proxy will result in mac addresses not being learned from
ARP/GARP packets. The workaround is to have the L2 BD
configured for L2 Unknown Unicast = Flood.
4. EP Loop Detection
1. While the EP Loop Detection configuration has good
intentions, (i.e., finding a loop, and killing it), I have found that
it is triggered as often (or more) by false positives, such as
Vmotions of VMs, as it finds true loops. For this reason, I would
disable EP Loop Detection.
2. Prior to 3.0, this can be disabled (or enabled) by going to Fabric
> Access Policies > Global Policies > EP Loop Detection Policy
3. After 3.0, this can be disabled (or enabled) by going to System
> System Settings > Endpoint Controls > EP Loop Detection
5. IP Aging should be enabled
1. When IP Aging is not enabled (which is the default), if
multiple IP’s are learned on a single MAC, then as long as the
MAC is active then all IP’s will stay learned on the fabric.
Cosmetically, this is undesirable in scenarios where DHCP
enabled hosts get a new IP address but both IP’s are still shown
within the EPG operational tab as tied to that MAC. This
feature will age each IP separately to address that scenario. At
75% of the endpoint retention timer, a directed ARP is sent to
the IP component of the endpoint, and if unanswered, ACI will
allow the IP endpoint to age out.
2. Prior to 3.0, this can be enabled by going to Fabric > Access
Policies > Global Policies > IP Aging Policy
3. After 3.0, this can be enabled by going to System > System
Settings > Endpoint Control > IP Aging (look to the right for
this tab)
4. This is first available starting with 2.1(1h) and all code after
6. Rogue Endpoint Detection should be enabled.
1. Starting with 3.2, Rogue Endpoint detection will lessen the
impact from flapping endpoints.
2. When Rogue Endpoint detection is enabled, the misbehaving
endpoint (MAC/IP) will be quarantined and a fault will be
generated to allow for easy identification.
3. After 3.2, this can be enabled by going to System > System
Settings > Endpoint Controls > Rogue EP Control
4. Recommended Values:
1. Rogue EP Detection Interval = 30
2. Rogue EP Detection Multiplication Factor = 6
5. NOTE – Rogue Endpoint Detection should be disabled prior
to upgrading or downgrading the fabric. It can be re-
enabled once the upgrade/downgrade is complete. This is
documented in the APIC Basic Configuration Guide on CCO.
7. Enable Strict COOP Group Policy
1. The APIC provides a managed object (fabric:SecurityToken),
that includes an attribute to be used for the MD5 password. An
attribute in this managed object, called “token”, is a string that
changes every hour. COOP obtains the notification from the
DME to update the password for ZMQ authentication. The
attribute token value is not displayed.
There are 2 choices, Compatible Type and Strict Type.
Compatible Type accepts both MD5 authenticated and non-
authenticated ZMQ connections, whereas Strict Type only
allows MD5 authenticated ZMQ connections.
2. This can be enabled by going to System > System Settings >
COOP Group

At a high level, options 2 and 3 will prevent the mis-learning of IP


endpoints on your fabric that can occur. Mis-learning of endpoints leads
to things like black-holed packets, as a remote IP endpoints can get stuck
on a border leaf (for example). The process of clearing such events is
cumbersome and causes a lot of heartburn. For detailed examples of use
cases for each of the endpoint configuration knobs, please check out the
ACI Endpoint Learning Whitepaper (below). While I always recommend
that these changes are performed in a maintenance window, the impact
from enabling these options would be basically non-existent (i.e., a flush
of remote IP endpoints in the VRF will occur).

ACI Endpoint Learning WhitePaper

https://www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-paper-c11-
739989.html

Are you looking for a programmatic way of enabling all of the Global
Setting Best Practices with a shell-script? Take a look at our ACI Best
Practices for curling article!!

Bridge Domain Best Practices:

For Bridge Domains, there are a wide-mixture of use-cases, and lots of


perfectly valid use-cases for different configurations. So – in general,
best practice is in the eye of the beholder. However, with that being said,
I’ll try a few blanket recommendations, with appropriate caveats.
1. Do not enable Unicast Routing if ACI is not the L3 Gateway for
your Subnet. Why would you ever enable unicast routing if ACI is
not the L3 Gateway? Without Unicast routing enabled, ACI will not
learn the IP address for Endpoints. This leads some customers to
enable Unicast routing, because (understandably) they want to
learn the IP endpoint and not just the mac-address of connected
devices. The problem with this, is it can lead to asymmetric routing,
which can result in packets being dropped or mis-routed.
2. Configure a single subnet for each Bridge Domain. ACI will only
forward dhcp requests on the primary subnet for each BD. If you
have second subnet configured on the same BD, DHCP will not
work for the 2nd BD and beyond.
3. In Network-Centric Mode (i.e., VLAN=EPG=BD), Do not configure
multiple EPGs to a BD. When you mapping Vlans to EPGs and BDs
in ACI, the external STP and HSRP multicasts are flooded in the
same BD. For example, if you have Vlan 11 (EPG11) and Vlan12
(EPG12) attached to the same BD, HSRP hellos for both Vlans will
intermingle in the BD and cause problems in your external (non-
ACI) environment.
4. Enable Limit IP Learning to Subnet – This should be enabled
99.999% of the time. It limits the IP learning of endpoints based on
the subnets configured on the respective bridge domains. Note – If
you have -EX or -FX based leafs and have configured “Enforce
Subnet Check” Globally, this is turned on whether you have
🙂
enabled it or not.
5. Consider ARP Flooding + GARP-based detection – This is a 50/50
recommendation. I could go either way, but if it is my datacenter,
I’m probably going to enable this configuration option. The Pro’s
for GARP-based detection is that it will prevent IP learning issues in
a specific situations. The Con is that you have to enabled ARP
Flooding on the BD before you can configure the GARP-based
detection. From the Cisco ACI Fabric Endpoint Learning
Whitepaper – “Although Cisco ACI can detect MAC and IP address
movement between leaf switch ports, leaf switches, bridge
domains, and EPGs, it does not detect the movement of an IP
address to a new MAC address if the new MAC address is from
the same interface and same EPG as the old MAC address. When
the GARP based detection option is enabled, Cisco ACI will trigger
an endpoint move based on GARP packets if the move occurs on
the same interface and same EPG. If a GARP packet comes from
the same interface and same EPG, then endpoint learning is
triggered only when Unicast Routing, ARP Flooding, and “GARP
based detection” are all enabled for the bridge domain. Although
this scenario has not been widely seen across our customer base,
in some cases customers do change their IP to MAC bindings and
need to enable GARP-based detection.”

Fabric Provisioning Best Practices

Performing an ACI Fabric Setup is one of the best things about ACI.
However, proper planning for your fabric setup values is critical. When
considering the values for your ACI fabric, it is important to
remember that changing either the infrastructure IP address (TEP IP
pool) range or the infra VLAN after the initial provisioning setup
process is not possible without rebuilding the fabric.

When performing your initial Fabric Setup, you are required to input a
“TEP address range”. This range of IP addresses is used primarily to
provide TEP addresses for Leaf and Spine nodes in the fabric. While the
default value for this is 10.0.0.0/16, it is considered best practice to
provide a unique address block for your TEP pool for a couple of reasons:

1. If you want to extend your TEP pool to AVE (ACI Virtual Edge)
switches in the future, you want a unique address that does not
overlap with existing routing in your network.
2. If you want to have communication to external devices from the
APIC (i.e., VCENTER for VMM integration), you would want
addressing on your infra TEP pool that is unique to avoid IP address
/ routing conflicts for traffic coming back to the APIC from your
VCENTER device.
3. Note – Changing the infrastructure IP address range or the VLAN
after initial provisioning is not possible without rebuilding the
fabric.

The Infra Subnet should not overlap with any other routed subnets in
your network. If this subnet does overlap with another subnet, change
this subnet to a different /16 subnet.

Beginning with APIC 2.2 code, the minimum supported subnet for a
3-APIC cluster is a /23.
If you are using APIC 2.0(1) code up until APIC 2.2 code, the
minimum is /22.
Infra TEP IP should be unused and unique. However, if you do not
have any spare RFC1918 addresses, consider using the RFC6598
range (100.64/10 – CGN use). This will ensure that this is never
conflicted on the internet.
Every Fabric / POD infra TEP pool should come from a unique IP
subnet range.

For more information about this, check out the Cisco APIC Getting
Started Guide, Release 3.x guide on CCO.

Infra Vlan ID – Set your Infra Vlan to 3967

During fabric setup, ACI requires a VLAN to be used as the infrastructure


VLAN. This VLAN is used for control traffic between devices that make
up the fabric (i.e., leafs, spines, and APICs).
Because this vlan can be extended outside of the fabric (Openstack
integration, AVS/AVE), it is a best practice to have this as a unique Vlan in
your environment. In addition, many Cisco devices have reserved Vlan
ranges that are hard to modify (i.e., you have to reboot the switches for
changes to take effect). Vlan 3967 is a Vlan which is not reserved on any
Cisco switching platform and ideal for ACI.

Node ID Settings – Spines should be numbered between 101-199; Leafs


should be numbered 200 and above.

For more detailed information, check out the Cisco ACI Best Practices
Guide for Fabric Provisioning.

ACI Fabric Naming Best


Practices
Need a good primer on ACI Fabric Naming best practices? Check out this
post for suggested tips on naming your objects in both the Tenant and
Fabric Access Section of your fabric!

Also – please check out the Official Cisco ACI Best Practices guide on
CCO!

Posted in All, Best Practice, Feature


← Cloud ACI 5.2: AWS Enhancements in TGW with TGW Connect
Attachments

Openstack with ACI Integration – Part 1 (General Discussions of


Openstack with ACI) →

10 thoughts on “ACI Best


Practice Con gurations”

Complicated Sister January 20, 2018 at 11:48 am

Hey great article. Maybe you can add the reasons why you would
recommend that … eg Disable Remote EP Learn, why?

Liked by 1 person

Reply

Jody January 22, 2018 at 12:17 pm

Thanks! When enabled, the Disable Remote EP Learn and Enforce


Subnet Check config knob do not allow remote leafs to learn IPs.
This removes the possibility for the remote leafs to black hole traffic
due to mislearned or stuck IP EPs. By disabling the remote learning
of IPs on remote leafs, the remote leafs do not look up the IP
component of the EP on the leaf, but punt the traffic to the Spines,
which already have knowledge of all endpoints. This is explained in
detail in the ACI Endpoint Learning Whitepaper –
https://www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-paper-c11-
739989.pdf

Like

Reply

Laurent April 13, 2018 at 7:00 am

Hello,

great articles, too bad I did not found it earlier.

This leads to my question :


– What happens if you activate the 4 global options on a live fabric ?
– What kind of disruption can we expect ? ( none ? And I mean it in
the best possible scenario, when everything is working and it does not
trigger something unsupported.

Those are enhancements but what really happens ? Do process


restarts, are cache cleared leaving to an unresponsive fabric, or
blocked servers for a few seconds ?

Like

Reply
Jody April 13, 2018 at 10:23 am

Laurent – Thanks for checking out the blog! To answer your


questions, I always recommend these changes be done in a
maintenance window (just to be safe) – but in actuality, the impact
from enabling them should be minimal. For the Endpoint
enhancements (Disable remote EP learn / Enforce Subnet Check) –
ACI flushes all local IP endpoints outside bridge domain subnets and
all
remote IP endpoints. For IP aging, there should be no impact. For
MCP, no impact (other than stopping a loop).

Loading...

Reply

Jody July 24, 2018 at 4:50 pm

Caveat – All changes should be enabled in a maintenance window;


With that being said, MCP is non-disruptive (nothing is cleared or
bounced). The only exception to this is if you had a loop in place, it
would shut down the loop, but I think we can agree that would be a
good thing. For Disable Remote EP Learn, this is non-disruptive. For
Enforce Subnet Check, this should clear all remote IP learns when it
is enabled. I do not believe there would be any real impact from that
process, as the the traffic should revert to using the Spine-Proxy for
L3 routing at that point.

Loading...

Reply
Mohammed September 1, 2018 at 2:09 am

Hi Jody,

Thanks a lot for the great blog,

1. if i am using golf feature do you prefer to disable “Disable Remote


EP Learn”?
2. do you prefer to enable endpoint dataplane learning under BD since
this is L3 BD?
3. if i have faced an fault regarding ip address with multiple mac
address, is there eny feature need to enable it to solve this issue?

Thanks

Loading...

Reply

Jody September 6, 2018 at 12:52 pm

Mohammed – 1. yes, you can have disable remote ep learn enabled if


you use golf. 2 – Do not disable dataplane ep learning on the bd
unless you are using pbr. 3 – There is a feature coming in 4.0 (disable
dataplane learning for the vrf) that would allow you to workaround
the issue you are describing; more on that later!! 😉
Loading...

Reply
Adeboye June 2, 2019 at 3:07 pm

any thoughts on vzany can’t find any solid information on vzany.

Loading...

Reply

Antonio November 14, 2019 at 3:23 am

Hi Jody,

What’s your BD setup recommendation when the default gateway is


not the BD SVI, i.e. a firewall? Would ARP flooding and L2 unknwon
unicast features needed or hardware proxy would be enough?

Thanks.

Loading...

Reply

Pingback: Application Centric Infrastructure (ACI): Datacenter SDN –


tawm i/o

Leave a Reply
Enter your comment here...

This site uses Akismet to reduce spam. Learn how your comment data is
processed.

Search …

Github

 Cisco Datacenter
 Unofficial ACI Guide

Powered by WordPress.com.

You might also like