The RoadMap To Cisco ACI Whys and Wherefores

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Vahid Nazari

Vahid Nazari. Get in touch with me on WhatsApp and Skype: +1 (415) 8004183

The RoadMap to Cisco ACI; Whys and


wherefores
Vahid Nazari
Datacenter Solution Architect. | Senior Network Engineer. �Cisco ACI
+ Follow
�Hyperflex �Cloud Infrastructure
Published Aug 16, 2021

Why Cisco ACI??

Indeed, This is still an unresolved technical question for some network engineers,
managers, and even higher-ranking officials, who generally deal with Cisco products in
their infrastructure; Why go for Cisco ACI? and What are the advantages of that. I'm
going to accommodate these asks in the rest of this article.
Vahid Nazari

I believe that, what you need to know, is the roadmap to Cisco ACI. What basically
drives us to ACI, is the evolution of Data Center fabric technologies we've dealt with
them, the challenges and problems that each of them had, and accordingly, some of
them are completely obsolete. Let's see what the most important of these challenges are
:

Part One: What are the most common challenges we've ever
faced in Data Center fabric design?

• (1) Large Broadcast domains should be avoided.

It's definitely something we are running away from. A single Broadcast domain is a
single Failure domain. It means any failure that occurs within a broadcast domain can
affect all services running in that. Furthermore, In a large broadcast domain, unusual
traffic propagation often occurs to a host that doesn't need to listen. Consequently,
Network bandwidth and processing capacity are consumed for almost nothing and you
always have a busy network. From the security point of view, Large broadcast domains
are more vulnerable to traffic sniffing and easier man-in-the-middle kind of attacks.

• (2) Simple Cabling.

Traditional Data Center Network (DCN) technologies rely on full-mesh cabling to meet
the desired level of redundancy. Of course, this makes the cabling more complicated as
the network scale is growing.

Based on the explanations provided in the rest of the article, A large-scale Data Center
fabric is often split into several points of Delivery or PODs that are communicated via
the L3 Core. This helps to simplify cabling and also breaking the large L2 broadcast
domains.
Vahid Nazari

• (3) End-to-end VM mobility and distributed workloads among all server racks
within the Data Center.

In contrast to the previous challenges, For a variety of reasons and requirements such as
distributed workloads and VM mobility, we have to provide layer 2 connection between
all server racks within the data center fabric. Actually, in modern infrastructures, We
basically shouldn't impose restrictions on applications and services that require L2
connection in such a way that Administrators have to place their hardware only on
certain racks. But More L2 connections equal larger L2 broadcast domains! As you can
see, meeting these requirements in traditional technologies conflicts with each other!
especially if you have Multi-POD infrastructure.

Well, It seems we have to find out a solution to establish Layer 2 connections without
relying on the Layer 2 network! This is the key point to what new technologies are
based on. A Layer 2 connection as the overlay, over an L3 Network as the underlay.

• (4) Top-of-rack (ToR) switches

It's one of the most common challenges we face in traditional data center networks.
Fewer switches at the top of the racks, mean we have much more complicated cabling.
On the other hand, having more switches at the top of the racks equal more
configuration workload and more devices to monitor and maintain. the situation gets
worse!

• (5) Loop-Prevention mechanisms: No Spanning-Tree protocol anymore!

People who are dealing with business-critical services such as banking transactions,
must have experienced that, Even the fastest types of the spanning-tree protocol can
lead to service disruption during the topology changes. And that's simply because they
Vahid Nazari
no longer are fast convergence as expected. The Loop-Prevention mechanism (Blocking
state) used by STP is not acceptable in the case of new Data Center switches with
10GE, 40GE, 100GE, and 400GE line cards. This protocol is almost obsolete and no
longer has a place in modern infrastructures. So, what are the alternatives?

• (6) Scalable Fabric across multiple sites. How DCI requirements are supposed to
be covered?

Data Center Interconnection (DCI) building blocks, Path Optimization techniques, High
Availability, and the considerations related to these items are among the major parts of
data center design, which are also covered in detail in the Cisco CCDE course. The
challenges in this part, entirely depend on the technology used in the data center fabric.
Does the technology itself have solutions to address these challenges? Or network
administrators are responsible for all of them! They have to run several third-party
protocols corresponding to each part of the requirements. This is also extremely
important when you are going to design an underlay infrastructure for a Private
Cloud. In other words, Some of the traditional DCN technologies are basically not
desired for Multi-Site infrastructure. It's necessary to support easy extensibility and
workload mobility features by the Data Center fabric technology we are considering in
Multi-Site infrastructures.

• (7) A Unified Data Center fabric for both physical and virtual networks

In traditional Datacenter design, we find the physical network and virtual network
completely separated from each other. Of course, what we called a 'Virtual Network' not
only consists of virtual machines, But we may also have the cloud-native infrastructure
as well. In this case, We face several different types of Networks: Physical networking,
Virtual machine networking, and Container networking. Basically, there is no integrity
and visibility from each one to the others. Each network has Its own fabric,
administrators, and security considerations that need to be addressed separately. For
Vahid Nazari
instance, The Security team needs to have some solutions for physical networking, and
some other solutions for container networking. Of course, Who is actually responsible
to enforce network and security policies within cloud-native infrastructure?!

Some of these challenges are addressed by establishing a new culture known as


DevOps, and following that, a new Team structure is formed which is named "Site
Reliability Engineering" or SRE. These new attitudes have been formed to accelerate
the development process, eliminate the gap between traditional IT and Software
developers, help teams collaborate better, And as a result, deliver reliable service faster.
But changing the team structure is not enough and we definitely need to make the
infrastructure more agile in the same way.

Therefore, the ideal form of a Data Center fabric is a unified network platform in which
doesn't matter what the type of an Endpoint or a host is. whether our Endpoints are
Containers or virtual machines or Bare metal servers, we have one common fabric. all
endpoints have complete reachability to the others using the same fabric, and this fabric
will provide us the desired level of visibility, even more than before.

• (8) Security concerns.

Traffic forwarding in traditional DCN technologies is implicitly allowed by default.


This means that once a host connects to the network and administrators perform the
initial VLAN and IP configuration associated with that, It will be able to communicate
with others unless the security policies have been enforced already. This could make
security holes and unauthorized access due to misconfigurations in large-scale
networks. But this is no longer the default behavior in Cisco ACI, Even if no firewall or
security appliance is deployed. Any traffic forwarding across EPGs is denied by default
Unless a Contract is deployed between them and explicitly allows the desired traffic.
• (9) Complex Deployment of L4-7 service appliances (Firewall, WAF, IPS, LB, so
Vahid Nazari
on)

As you know, each L4 to L7 service appliance such as a firewall or load balancer has
different deployment models that could be challenging during the insertion into a
traditional network. Let's clarify that by answering this vital question: Which is
responsible for the Servers gateway? an L3 Firewall? or Switching fabric? Some experts
tend to choose the firewall because otherwise, They inevitably have to face Vlan
Stitching (which dramatically increases the number of VLANs), VRF sandwiching,
MP-BGP route leaking, and some other troublesome principles that exist in traditional
networks. But on the other hand, this can not be a good choice for large-scale
Enterprises where regardless of having different types of traffic, most of the time, you
have multi-tenancy over a shared physical infrastructure, so that there are different
operating environments, including "Production", "Test and Try" and "Develop" with
each has It's own security benchmarks. Consequently, It is very common that you don't
want all traffic to pass through a specific firewall or You may have multiple firewalls
with different purposes, It would be more efficient if Switching fabric have control of
traffic forwarding within the network. More than that, the firewall resources should only
get involved with security-driven operations, not network operations. In the same way,
if you have the network and security teams separated from each other, which is often
the case in large-scale Enterprises, Each team must do their own duties and there must
be a separation of affairs and decision-making between them.

Back to the beginning of the answer, the major problem that needs to be solved is those
troublesome principles mentioned above that exist in traditional DCN technologies.

• (10) Easy Implementation, Configuration, and Troubleshoot

Last but not least, easy implementation, configuration and especially, Troubleshooting
with minimum time required, is an influential factor in choosing a solution for a data
center fabric. As the fabric is being scale-out, we have a more complex network, So that
Vahid Nazari
the concerns about troubleshooting and accidental misconfiguration intensify. This is
one of the cases and places where Automation and Codifying the fabric comes into
play and we see something called "Programmable Fabric" infrastructure.

In simple words, what to do next if we have some technologies for Data Center fabric
that meet most of the challenges I mentioned before, But they are not easy to configure
and troubleshoot?? Well, we can go towards Automation, Infrastructure as Code (IaC),
or Software-defined networking (SDN). Since we are talking about Data Center, we'll
have the term of Software-defined data center or SD-DC.

We examined some of the most important challenges that have come to my mind so far,
If you have more items to add, especially from The Security perspective, leave a
comment, It's my pleasure :).

Now Let's keep the discussion going on by looking at the evolution of Data Center
fabric technologies and investigate the challenges that each technology has.

Part Two: Data Center 'Fabric' Journey

First Generation: Typical Collapsed Core design; STP-based Architecture.

No alt text provided for this image

This is the first generation of Data Center Network (DCN) design which is called STP-
based Network Architecture. Almost all of the challenges I mentioned earlier can be
seen in such infrastructure!! The Link redundancy completely relies on Spanning-Tree,
so that some interfaces will stay in the blocked state. Since there is no high availability
solution for L3 Core switches, The only option is to use FHRP protocols such as HSRP
to protect the default gateway used on a subnetwork by allowing the two L3 Switch to
provide backup for that address.
First Generation: Typical 3-tier design; STP-based Architecture
Vahid Nazari

No alt text provided for this image

As the scale of the fabric is growing, There are also more concerns about Cabling and
the L2 broadcast domains! Thus we have to change the topology to what we called "3-
tier architecture" including the Access layer, Distribution Layer and Core Layer. As
shown in the picture, The default gateways are still on the L3 Core switch. so that we
have the layer 2 Distribution layer along with large L2 broadcast domains that still need
to be solved. One noteworthy is that at first glance, the 3-tier architecture which we
have in Data Center networks is similar to the 3-tier architecture that exists in Campus
LAN networks. But don't forget we have much more concerns and considerations in
DCN than Campus networks. How service appliances such as firewalls and load
balancers are going to be deployed in DCN? Furthermore, we need to have end-to-end
Workload mobility and VM-Mobility within the DC fabric.

First Generation: Typical 3-tier Multi-POD design; STP-based Architecture

No alt text provided for this image

To break the large broadcast domains, It's possible to put the default gateways on the
distribution layer as shown in the picture above however, This splits the fabric into
multiple Points of Delivery (PODs). In this case, the L2 firewall, WAF, and load
balancer have to be connected to each distribution cluster separately, which may affect
the cabling volume and impressively increases the Configuration workload
corresponding to these service appliances as well. More importantly, It may also break
the seamless end-to-end VM mobility requirements and distributed workloads across
different PODs. All the architectures discussed so far, are based on Spanning-tree and
rely on FHRP protocols to make an L3 gateway cluster.

Second Generation: mLAG-based Architecture


No alt text provided for this image
Vahid Nazari

In this generation, high availability protocols such as VSS, Switch Stack, and virtual
Stack are started to use in different switch series. As a result, we are no longer required
to configure HSRP or VRRP anymore. Further, HA protocols give us the mLAG
(Multi-Chassis Link Aggregation) feature that can significantly overcome the Spanning-
tree shortcomings, decreases the number of devices that need to be configured and
maintained, and provides end-to-end link redundancy based on the LACP as well. This
architecture could be implemented based on either collapsed core or 3-tier design, just
like to the first generation. Apart from STP and Top-of-rack (ToR), all other challenges
still exist in this generation.

Third Generation: Introduction to Cisco NX-OS and Nexus switch family.

No alt text provided for this image

Begging with Data Center 3.0, Cisco had many innovations both in hardware and
technology. Cisco Nexus devices were introduced as the new Data Center switch family,
on which a new branded Linux-based operating system named "NX-OS" is installed.
The most significant advantage of this OS is that It brings SAN and LAN together.
What it means is a Cisco Nexus 5K switch could be Simultaneously a native FC SAN
Switch and a native classical Ethernet Switch. It also could be configured based on the
Fiber Channel over Ethernet (FCoE); a New operational mode Cisco introduced in this
generation. FCoE is actually a storage protocol that enables Fibre Channel (FC)
communications to run directly over Ethernet. This is the idea that Cisco named
"Unified Fabric".

Various series of Cisco Nexus switch family has been marketed so far, of which the 2K,
5K, 7K, and 9K series are the most important. Cisco tried to solve Top-of-Rack (ToR)
challenge by introducing nexus 2K backed with FEX technology which has been
successful so far, But with the advent of newer technologies like Cisco ACI, It's
Vahid Nazari
virtually no longer needed. To find out why? I've given a detailed answer in one another
article about ACI. I suggest you read it if you are interested.

Cisco Fabric Extender technology as its names imply, not only focused on ToR, but It
was also the beginning of the integration of physical and virtual infrastructure. Cisco
Adapter FEX for UCS servers along with VM-FEX and Nexus 1000v were also
introduced for this purpose. But unfortunately, this project almost failed, since the main
hypervisor vendors in the market such as VMware and Hyper-v no longer support 'VM-
FEX' in their recent versions.

Another innovation in this generation was the replacement of the VSS technology in
Nexus devices, so that Cisco Nexus 5K, 7K, and 9K series no longer support VSS
anymore. Cisco introduced vPC (Virtual Port-Channel) in this switch family instead.
The Big difference between vPC and VSS is that vPC is not really a high availability
protocol, But the VSS is. In vPC we have still two separate devices with each has Its
own control plane, Just the same as before. vPC is a technology that enables the "Multi-
Chassis link aggregation" or mLAG feature across two separated nexus switches. With
this explanation, you may ask: WHY vPC is the replacement for VSS in Cisco nexus
Vahid Nazari
devices?? There are basically two answers for that: (1). Nexus switches could be used in
unified fabric architecture which means there may be both SAN and LAN connectivity
on the switch at the same time. Thus both SAN and LAN design basics need to be
considered. In SAN design basics, switches are separated from each other and the FSPF
protocol controls the failures instead. (2). In order to use VXLAN over the Leaf and
Spine infrastructure supported with the Cisco Nexus family. The CLOS fabric structure,
which we know better as the Leaf -and- Spine model, is one of the most widely used
architectures in Data Center fabrics. It's the physical infrastructure for technologies such
as VXLAN and of course Cisco ACI as well. In this architecture, As you know, The
Leaf switches have the role of Virtual Tunnel End Point (VTEP) in the VXLAN
fabric and all the servers and hosts are connected directly to them. From the Leaf and
Spine point of view, the Leaf switches must be separated and independent from each
other. But we also need to have the link aggregations down to the servers and hosts.
This is exactly where the vPC could be functional. That's why we say VSS is used in
Campus Network environments however, The vPC is used in Data Center
infrastructures. But this has nothing to do with the third generation of DCN architecture
I'm discussing, Leave it for later.

Data Center 3.0 has some improvements rather than before. such as FEX as a Top-of-
rack solution, Easier Command-line, and less configuration, Multi-tenancy support on
Nexus 7Ks, and ASR 9K routers thanks to Virtual Device Context (VDC), and
Programmable hardware along with more powerful resources. But Most of the
fundamental challenges we've considered in this article are remain unresolved up to this
generation. Cisco vPC has one important trouble with this particular generation of Data
Center fabric, That it works only between two switches. If you have more pairs of
Nexus switches, you have to configure other independent vPC domains. Following that,
We will split the fabric into multiple PODs as before. The scenario is identical to the
second generation I discussed earlier.
As a result, The technologies such as vPC or VSS alone couldn't be considered as a
Vahid Nazari
solution for Data Center switching fabric. Accordingly, the next generation is Cloud-
Based fabric, instead of having pairs of vPC or pairs of VSS.

One another noteworthy about vPC is that Since the Control planes are still separated
from each other, Again we need to use FHRP protocols to keep the default gateways
on a pair of vPC switches.

Fourth Generation: Cloud-Based fabric architecture.

Cisco FabricPath

As I mentioned earlier, The Protocols such as vPC and VSS could only be run between
two switches, thus in the case of Scaling out the DC fabric, we will have more than one
vPC pair and vPC domain. Ultimately, this leads to split the fabric into multiple PODs
with L3 routing between them. On the flip side, we need solutions to provide end-to-end
Layer 2 workload mobility but at the same time, we also want to eliminate the large L2
broadcast domains. We reach a contradiction! So what?? These requirements drive us to
a new generation of DCN technologies known as Cloud-based architectures. In such
infrastructures, we have the terms "Underlay" and "Overlay" networks. What basically
happens behind the scenes is we have a number of switches in the Data Center fabric
connected to each other; This is the underlay network. On the other hand, the servers
and hosts are connected to this network and want to communicate with each other. The
server-to-server traffic which is known as the overlay is encapsulated within the
underlay network instead of typically being routed or bridged. Depending on what the
type of the Underlay network is, there are different types of encapsulations as below

MAC-in-MAC

The most common Protocols and technologies that use this type of encapsulation are
TRILL, Cisco FabricPath, and SPB (Shortest Path Bridging). In these protocols, The
underlay network is not typical layer 3 nor Classical Ethernet. But they actually have
Vahid Nazari
their own Framing structure instead. In plain English, The original Ethernet frame is
encapsulated within a new Special frame and is sent towards the fabric. All these
protocols leverage IS-IS to perform L2 routing that doesn't rely on IP for carrying
frames. The IS-IS routing is used to distribute link-state information and to calculate
the shortest paths through the network to form the underlay network. This calculation is
based on Layer 2 Multi-Pathing so that, All links will be available and are used, unlike
STP protocol where some interfaces are always blocked. Let's drill down to the Cisco
FabricPath and get through what challenges it has overcome.

• Cisco FabricPath

Cisco FabricPath offers a flexible and scalable design when applied to the Data Center
fabric. A typical FabricPath network uses a Leaf-and-Spine architecture. There is no
spanning-tree running and no longer Layer 2 broadcast domain challenge. It retains the
easy configuration and Of course, the end-to-end layer 2 VM mobility and distributed
workloads are also provided. Ultimately, using The leaf and Spine architecture could
simplify the cabling in Data Center Network.

In contrast, FabricPath has some trouble drawbacks which cause it to be almost


deprecated. First Off: As I mentioned before, FabricPath has Its own data plane and
frame format, so that does not ride within Ethernet frames and does not ride above IP.
This makes us inevitably have to connect all FabricPath nodes directly together. as a
result, FabricPath is not scalable and extendable across multiple Sites unless the
transport media is dark fiber or DWDM.

No alt text provided for this image

Another problem is the mechanism for handling BUM traffic in Cisco FabricPath. BUM
traffic on the FabricPath network is not flooded. Instead, It follows what's called a
Multi-Destination Tree (MDT), which works very much like a traditional multicast
Vahid Nazari
tree. FabricPath automatically builds two separate logical trees for handling
Multidestination traffic.

No alt text provided for this image

When BUM packets enter the fabric, they have to traverse the root switch to reach the
whole network. So, the Placement of the root switch becomes key in a Data Center
Interconnection (DCI) scenario. In such a scenario, the root can only be at one site
which means the other site(s) need to traverse the DCI for all BUM traffic!.

No alt text provided for this image

Finally, Cisco FabricPath has no control plane for the overlay network. End-host
information in the overlay network is learned through the flood-and-learn mechanism
which is not an efficient method. These challenges have made Cisco FabricPath almost
deprecated.

MAC-in-IP

We see 'IP', It obviously means the Underlay is a typical Layer 3 Routed network, and
the Server-to-Server traffic is going to be encapsulated within that. From this moment
on, we are relieved that, this type of Overlay transport could be easily scalable across
multiple sites, and unlike FabricPath, we are not restricted to rely only on dark fiber or
DWDM! hence the first trouble is solved just as soon.

The primary protocols that work based on this type of encapsulation are 'VXLAN',
'STT', 'NVGRE', and also 'GENEVE'. Both VXLAN and GENEVE use UDP-based
while STT utilizes TCP-based encapsulation. GENEVE is not only designed to support
all the capabilities and flexibilities that the other protocols have, But It also could be an
upgraded version, thanks to the little changes made to it. It uses the variable-length
identifier so that, this field could be assigned to more extra headers and information. It's
Vahid Nazari
one of the reasons that GENEVE is used in the VMware NSX SDN platform. In
contrast, VXLAN and NVGRE have fixed 24-bit and STT has fixed 64-bit identifiers. If
you're interested in finding out more information about GENEVE, follow the link
below.

Since VXLAN is still widely supported and considered by Cisco, VMware, Red Hat,
Citrix, and some other Leaders, We would more focus on it and specify the challenges
that are addressed using this significant technology.

• VXLAN Fabric

No alt text provided for this image

VXLAN actually is nothing more than a Layer 2 Tunneling protocol over a Layer 3
IP/UDP underlay network. It follows from the CLOS fabric model for the Physical
Infrastructure, known as the "Leaf-and-Spine" architecture. This conveys the concept
that VXLAN as a technology for Data Center fabric, relies on Cloud-based fabric
architecture. As a result, seamless end-to-end workload distribution is available while
the broadcast domains are restricted only on each server rack. Further, Since the
Vahid Nazari
underlay infrastructure is just a typical routed network, The VXLAN fabric is easily
scalable across multiple sites. Amazing! isn't it??

One of the other Significant things about VXLAN is that it extends the number of
network IDs from 4096 where VLAN scope is, to 16 million ones. As I previously
mentioned, VXLAN has a fixed 24-bit identifier, so-called 'VNI' or 'VNID' instead of
VLAN. This could be an impressive evolution for Multi-tenant infrastructure. This
identifier is only used when traffic leaves a leaf towards the Spine and then reaches
another leaf. The VXLAN encapsulation is performed through a Leaf switch on the
initiator host side, and the VXLAN decapsulation process is then performed through a
Leaf switch on the destination host side. So the leaf switches are responsible for the
encapsulation and decapsulation of VXLAN headers; The Tunneling process I
mentioned a while ago. This is a role named 'Virtual Tunnel Endpoint' or VTEP. This
encapsulation is shown in the picture below. The original ethernet frame is encapsulated
into a new UDP within the new IP within a new Ethernet frame corresponding to the
underlay network.

No alt text provided for this image

Unified Fabric for both Physical and Virtual Network.

One of the main goals of the VXLAN-based SDN technologies is to extend the VXLAN
fabric to the virtualization network, remove any dependence on traditional VLANs in
the same way, and ultimately, make the physical and virtual fabrics unified.

VXLAN Data Plane and Control Plane

The VXLAN encapsulation and decapsulation process briefly explained earlier, is the
data plane for the overlay network. As I mentioned before, Cisco FabricPath relies on
the typical Flood-and-Learn mechanism to learn end-host information. This means there
Vahid Nazari
is no Control Plane for this protocol in the Overlay network. But What about VXLAN?

VXLAN traditionally utilizes the Flood-and-Learn method in the same way, when
transporting data over an underlay network. This method basically relies on IP
Multicast for BUM traffics. Therefore the whole infrastructure needs to support
multicast and administrators have to configure that. But there is an alternative to IP
multicast for handling multi-destination traffic in a VXLAN environment. That is
'ingress replication (IR)', which is also called "head-end replication". With Ingress
Replication, every VTEP must be aware of other VTEPs that have membership in a
given VNI. The source VTEP generates n copies for every multidestination frame, with
each destined to other VTEPs that have membership in the corresponding VNI. This
places an additional burden on the VTEPs, but it has the benefit of simplification since
there is no need to run multicast in the IP underlay.

On the other hand, The best practice is to Use MP-BGP plus EVPN address family as
the control plane along with VXLAN. In simple terms, The EVPN address family
allows the host MAC, IP, network, VRF, and VTEP information to be carried over MP-
BGP. In this way, as long as a VTEP learns about a host behind it, BGP EVPN
distributes and provides this information to all other BGP EVPN–speaking VTEPs
within the network. MP-BGP EVPN control plane finally reduces the need for flooding
however, It doesn't remove that completely. some kind of overlay traffic including
'ARP', 'DHCP', and clients who don't have any sent or received packets; known as
'Silent hosts' may still incur flooding.

Beyond all that, VXLAN has great solutions and instructions for easy scaling out the
fabric in multi-Site infrastructures and providing seamless Layer 2 and Layer 3 LAN
extensions. It's one of the most important shortcomings faced by previous technologies.
Here is a practical white paper provided by Cisco to get more information about that.
Vahid Nazari

Part Two: Conclusion

There are lots of things to discuss VXLAN, but giving an in-depth explanation of this
technology is out of the scope of this article. Let's conclude this section by answering
the question: which of the mentioned challenges in part one does VXLAN address??
As I mentioned earlier, VXLAN MP-BGP EVPN with CLOS fabric infrastructure is a
cloud-based fabric technology with a structured cabling model that removes the need
for full-mesh connectivities we had before. It easily eliminates the large broadcast
domains and STP protocol and provides seamless end-to-end workload mobility across
all server racks by relying on Layer 2 tunneling over an L3 underlay network. more than
that, It extensively expands multi-tenancy by replacing 24-bit VXLAN identifier (VNI)
instead of traditional 12-bit VLAN. Further, VXLAN could be efficiently scaled out in
form of a Multi-Site fabric, In terms where you have more than one data center, Or you
have one large-scale infrastructure so that, implementing a single Leaf-and-Spine
architecture is not the best practice for that. Although the configuration wouldn't be that
easy and straightforward, Anyway there would be no technology-related restriction on
either workload mobility or VM mobility in Muti-Site VXLAN fabric.

In Contrast, the complexity of configuration and maintenance is the major problem with
this technology. By the way, based on my previous discussion, With the advent of
DevOps culture, we need a more agile infrastructure in which the service changes or
Vahid Nazari
launch of a new service could be done more quickly, accurately, and almost
automatically. These concerns and requirements lead us to a door that opens to another
generation, Actually the 5th generation of Data Center networks. The concepts such as
SD-DC (Software-defined Data Center), NFV (Network Function Virtualization),
and Infrastructure as Code (IaC), or Codifying the Infrastructure are the new terms
introduced in this generation. These technologies are an important part of implementing
DevOps practices and CI/CD. If the target technology is SDN, Then Cisco ACI would
be one of the significant options alongside Open Source solutions such as
OpenDayLight.

Part three: The benefits of Cisco ACI; How does it address these
challenges?

Application-Centric Infrastructure (ACI) is one of the market-leading ready-to-use SD-


DC solutions owned by Cisco which addresses all top 10 challenges mentioned in the
first part of this article.

Easy Implementation, Configuration, and Troubleshoot

ACI has a lot to learn, and a lot to consider, but ultimately is easy to configure and easy
to troubleshoot as long as you have an appropriate plan, and you know how to
implement that. The technology relies on Nexus 9Ks in form of Leaf-and-Spine
architecture and automatically builds the IP underlay leveraging IS-IS routing protocol,
as well as VXLAN MP-BGP EVPN Overlay during the Fabric discovery process.
Further, Cisco ACI constructs follow the Object-Oriented model so that you can create
an object one time (Interface Policy Group for instance), but use it as many as possible.
The Configurations are all done with just a few clicks, but you can still enjoy the
benefits of automation using Ansible in working with ACI. eventually, ACI event logs
and fault reports accurately clarify the misconfigurations and the details of failures that
occur for any reason.
Complex Deployment of L4-7 service appliances
Vahid Nazari

This is one of the major advantages of Cisco ACI over the traditional networks. ACI
introduced the Application-Centric approach as an emerging perspective on network
design. and following that, uses the benefits of the Service Graph template along with
Policy-based redirect (PBR) to make an amazing change in the way service appliances
are deployed on the network. As I previously explained, we will no longer face the
limitations and challenges of VLAN Stitching and VRF sandwiching, and of course,
there are new features to make policy enforcement more efficient. I have another article
in which discussed this topic in-depth. Take a look at it if you're interested.

Security concerns.

I've discussed the default behavior of traffic forwarding in Cisco ACI in the first part of
this article. I just have to mention, Cisco ACI significantly increases the visibility and
ability of security policy enforcement on the Data Center Network (DCN), Even in
terms of having Cloud-Native infrastructure, without the extra cost burden.

A Unified Data Center fabric for both physical and virtual networks
In my opinion, This is the top advantage of Cisco ACI. Unlike some other SDN
Vahid Nazari
products that only rely on virtual networks and have no idea about physical
infrastructure, ACI covers both physical and virtual fabric and truly conveys the
concept we expect from SDN. (Remember that we have another term called "NFV").
ACI fabric could be easily extended to either VM-based virtual network or Container-
based virtual network, thanks to the seamless integration It provides with a wide range
of market-leading virtual machine hypervisors, Cloud orchestrators, and Container
orchestration systems, such as VMware vCenter, Kubernetes, OpenShift, OpenStack,
and so on. Ultimately, regardless of having various types of hosts consist of Bare-Metal
servers, Virtual Machines, Containers (Pods), They are all connected to one common
fabric that provides end-to-end communication, high-security visibility, and high
operation speed.

Scalable Fabric across multiple sites

Cisco ACI is an extremely scalable SD-DC solution so that you can use for either small,
medium, or large-scale environments, by starting with only 5 or 7 rack units and scale it
out as the infrastructure is growing. It has a bunch of solutions that almost meet all your
needs. Whether you have more than one data center, or a single large-scale data center,
the combination of on-premise and cloud infrastructure, or even if all your services are
deployed and rested on multiple cloud platforms (such as AWS, Azure). In cases you
have more than one site, doesn't matter if another DC is a remote site, if dark fiber with
very low latency exists or not, and how far apart are the sites. The pictures below show
the ACI fabric and policy domain evolution in different versions

Cisco ACI Policy domain evolutions

Cisco Cloud ACI

Loop-Prevention mechanisms
There wouldn't be loop problem within the Cisco ACI fabric itself however, the external
Vahid Nazari
resources such as an L2 switch that is connected to ACI leaf switches can cause the
loop. That's basically because ACI doesn't participate in Spanning-tree so that there are
no Spanning-tree processes running on any ACI switches in the fabric. Therefore, ACI
doesn't generate Its own BPDU packets but instead forwards STP BPDUs from another
port where ever the same VLAN is allowed. Cisco has a best practice for this situation,
that is using MCP or MisCabling Protocol. To get more information about how it
works, you can read this white paper.

Since Cisco ACI has been built over VXLAN fabric, the other challenges have
practically been addressed already.

53 · 6 Comments

Like Comment Share

Madhavan Thiyagarajan (Maddy) 7mo


Great articles !
Like Reply

Vahid Nazari 7mo


My pleature�

Like Reply

Vahid
Alireza Nazari
Ghahrood 7mo
�nice
Like Reply 1 Like

Vahid Nazari 7mo


Grateful�

Like Reply 1 Like

Jafar Tavana 7mo


Bravo. Good luck
Like Reply

Vahid Nazari 7mo


Thanks alot�

Like Reply 1 Like
See more comments

To view or add a comment, sign in

More articles by this author See all

Cisco ACI: A Practical CISCO ACI 5.2: From a Why You shouldn't
Deep Dive into… new Topology to the… Think about Fabric…
Jun 26, 2021 Jun 12, 2021 Apr 3, 2021
Vahid
© 2022 Nazari About

Accessibility User Agreement

Privacy Policy Cookie Policy

Copyright Policy Brand Policy

Guest Controls Community Guidelines

Language

You might also like