Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

NSX Compendium

by pandom - http://networkinferno.net/nsx-compendium

NSX Compendium

VMware NSX for vSphere


By Anthony Burke – VMware NSBU System Engineer.

Disclaimer: This is not an official reference and should be treated as such. Any mistakes on this page are
a reflection on my writing and knowledge not the product itself. I endeavour for technical accuracy but we
are only human! These are to serve as a formalisation of my own notes about NSX for vSphere.
Everything discussed on this page is currently shipping within the NSX product.

Introduction
This page serves to be a resource for the components and deployment of VMware’s NSX for vSphere . I work
with the product daily and educate customers and the industry at large the benefits of Network Function
virtualization (NFV) and the Software Defined Data Centre (SDDC) has. This resource aims to provide information
both at high level and technical depth regarding the components and use cases for VMware NSX. This page will
evolve as I add more content to it. It will eventually cover off all aspects of VMware NSX and how to use,
consume, and run an environment using it. In time there will be a collection text, video and image that I am
binding together into a compendium.

VMware NSX delivers a software based solution that solves many of the challenges faced in the data centre
today. For a long time administrators and organisations have been able to deploy x86 compute at lightning pace.
The notion of delivering an application by a template and the excitement of doing this in the time it takes to the
boil the kettle has had its sheen taken off by the three weeks it can take to provision network services.

Network function virtualisation and delivering network services in software has always been a challenge to many.
The notion of not only delivering a user-space instance of a service but the ability to program the end to end work
flow from end user right through to storage has been a dream for a long time. It wasn’t until the acquisition by
VMware of Nicira did this come about and the ability to deliver many functions of the data centre in software took
its strong foot hold.

With a new ability to deliver DC features, such as a distributed in-kernel firewall and routing function, NSX edge
functionality and L2 switching across L3 boundaries thanks to VXLAN, does NSX re-define the architecture of the
data centre. Whilst rapidly reducing time to deploy, decreasing the administrative overhead and empowering the
next generation of DC architectures, NSX provides the flexibility to build and define the next generation data
centre.

There are some major components on NSX which provide varying function. This page is a technical resource for
NSX and deployment on VMware infrastructure.

NSX Core components


Whilst NSX for vSphere is very far reaching it is surprisingly light weight. There are only a handful of components
that make up this solution to provide the final piece in VMware’s SDDC vision.

NSX Manager

The NSX manager is one of the touch points for the NSX for vSphere solution. NSX manager provides a

page 1 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

centralized management plane across your data centre. It provides the management UI and API for NSX. Upon
installation the NSX Manager injects a plugin into the vSphere Web Client for consumption within the web
management platform. Along with providing management APIs and a UI for administrators, the NSX Manager
component installs a variety of VIBs to the host when initiating host preparation. These VIBs are VXLAN,
Distributed Routing, Distributed Firewall and a user world agent. The benefit of leveraging a VMware solution is
that access to the kernel is much easier to obtain. With that VMware provide the distributed firewall function and
distributed routing function in kernel. This provides extremely in kernel function processing without the
inadequacies of traditional user space or physical firewall network architectures.

NSX Controller

The NSX controller is a user space VM that is deployed by the NSX manager. It is one of the core components of
NSX and could be termed as the “distributed hive mind” of NSX. It provides a control plane to distribute network
information to hosts. To achieve a high level of resiliency the NSX Controller is clustered for scale out and HA.

The NSX controller holds three primary tables. These are a MAC address table, ARP table and a VTEP table.
These tables collate VM and host information together for each three tables and replicate this throughout the NSX
domain. The benefit of such action is to enable multi-cast free VXLAN on the underlay. Previous versions of vCNS
and other VXLAN enabled solutions required VXLAN enabled on the Top of Rack Switches or the entire physical
fabric. This provided a significant administrative overhead and removing this alleviates a lot of complexity.

By maintaining these tables an additional benefit is ARP suppression. ARP suppression will allow for the
reduction in ARP requests throughout the environment. This is important when layer two segments stretch across
various L3 domains. If a segment requests the IP of a MAC address that isn’t on a local segment the host will
have the replicated information in its tables pushed to it by the controller.

Roles and function

The NSX Controller has five roles:

API Provider, Persistence Server


Logical Manager
Switch Manager
Directory server

The API provider maintains the Web-services API which are consumed by NSX Manager. The Persistence server
assures data preservation across nodes for data that must not be lost; network state information. Logical manager
deals with the computation of policy and the network topology. The switch manager role will manage the
hypervisors and push the relevent configuration to the host. The directory server will focus on VXLAN and the
distributed logical routing directory of information.

Whilst each role needs a different master each role can be elected to sit on the same or different host. If a node
failure occurs and there is no master for an elected role a new node is promoted to master after the election
process.

Most deployment scenarios see three, five or seven controllers deployed. This is due to the controller running Zoo
Keeper. A Zoo Keeper cluster, known as an ensemble, requires a majority to function and this is best achieved
through an odd number of machines. This tie-breaker scenario is used in many cases and HA conditions during
NSX for vSphere operations.

Slicing

In a rapidly dynamic environment that may see multiple changes per second how do you dynamically distribute
workload across available clusters, re-arrange workloads when new cluster members are added and sustain
failure without impact all while this occurs behind the scenes? Slicing.

page 2 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

A role is told to create x number of slices of it self. An application will collate its slices and assign the object to a
slice. This ensures that no individual node can cause a failure of that NSX controller role.

When a failure of a Controller node occurs the slices that the controller is in charge of will be replicated and
reproduced onto existing controllers. This ensures consistent network information and continuous state.

VXLAN

VXLAN is a multi-vendor industry-supported network virtualization technology. It enables much larger networks to
be built at layer 2. This is done without the crippling limitation of scale that is found with traditional layer 2
technologies. Like a VLAN, which is an encapsulation of a layer 2 frame with a logical ID, VXLAN encapsulates
the layer 3 packet with a VXLAN header, IP headers and a UDP header. From a virtual machine perspective,
VXLAN enables VMs to be deployed on any server in any location, regardless of the IP subnet or VLAN that the
physical server resides in.

VXLAN solves many issues that have arisen in the DC through the implementation of Layer 2 domains.

• Creation of large Layer 2 domains without the blast radius.


• Scales beyond 4094 VLANs
• Enables layer 2 connectivity across traditional DC boundaries.
• Enable smarter traffic management abstracted from the underlay.
• Enables large layer 2 networks to be built without the high consumption of CAM table allocation on ToR.
• VXLAN is an industry-standard method of supporting layer 2 overlays across layer 3. There is an alliance on
vendors which support a variety of VXLAN integration: as a software feature on hypervisor-resident virtual
switches, on firewall and load-balancing appliances and on VXLAN hardware gateways built into L3 switches.

page 3 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

Scaling beyond the 4094 VLAN limitation on traditional switches has be solved thanks to the 24 bit VXLAN
Network identifying. Similar to the field in the VLAN header where a VLAN ID is stored, the 24 bit header allows
for 16 million potential logical networks.

VXLAN Enhancements – Data Plane

There are a few VXLAN enhancements for NSX for vSphere. It is possible to support multiple VXLAN vmknics per
host which allows uplink load balancing. QoS support is there through the DSCP and CoS tags from an internal
frame copied to the external VXLAN header. It is possible to provide guest VLAN tagging. Due to the VXLAN
format used there is potential for later consumption of hardware offload for VXLAN in network adapters such as
Mellanox.

VXLAN Enhancements – Control Plane

Control plane enhancements come through adjustments in the VXLAN headers. This allows the removal of the
multicast or PIM routing on the physical underlay. It is possible also for the suppression of broadcast traffic in
VXLAN networks. This is due to ARP directory services and the role the NSX controller plays in the environment.

VXLAN Replication – Control Plane

Unicast mode along with Hybrid mode select a single VTEP in every remote segment. This is selected from its
mapping table. This VTEP is used as a proxy. This is performed on a per VNI basis and load is balanced across
proxy VTEPS.
Unicast mode calls this proxy a UTEP – Unicast Tunnel Endpoint. Hybrid mode calls this a MTEP – Multicast
Tunnel End Point. The table of UTEPs and MTEPs are synchronised to all VTEPs in the cluster.

Optimization replication occurs due to a VTEPs performing software replication of Broadcast, Unicast, Multicast
traffic. This replication is to local VTEPS and one UTEP/MTEP for each remote segment.

This is achieved through an update to how NSX uses VXLAN. A REPLICATE_LOCALLY bit in the VXLAN header
is used for this. This is used in the Unicast and Hybrid modes. A UTEP or MTEP receiving a unicast frame with
the REPLICATE_LOCALLY bit set is now responsible for injecting the frame to the local network.

page 4 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

The source VTEP will replicate an encapsulated frame to each remote UTEP via a unicast and replicates the
frame to each active VTEP in the local segment. UTEP role is responsible for the delivery of a copy of the
de-encapsulated inner frame to the local VMs.

This allows the alleviation of the dependencies on the physical network but there is a slight overhead incurred. It
is configurable per VNI during the provisioning of the logical switch.

Preparing for VXLAN

NSX manager deploys the NSX controllers. A subsequent action after deploying the controllers is preparing the
vSphere clusters for VXLAN. Host preparation will install the network VIBs onto hosts in the cluster. These are
dFW, LDR and VXLAN host kernel components. After this an administrator will create VTEP VMkernel interfaces
for each host in the cluster. The individual host VMK interfaces can be allocated IP’s from a pool that can be set
up.

Due to the increase of the Ethernet payload due to L2 being encapsulated there is 50 bytes of overhead. An MTU
of 1600 is recommended on the physical underlay.

page 5 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

Transport Zone

A transport zone is created to delineate the width of the VXLAN scope This can span one or more vSphere
clusters. A NSX environment can contain one or more transport zones based on user requirements. The use of
transport zone types is interchangeable and an environment can have unicast, hybrid and multicast
communication planes.

Transport Zone Control Plane communication

• Multicast mode leverages IP addresses on the physical underlay network for control plane VXLAN replication. It
is a recommended transport zone control plane mode when upgrading from older VXLAN deployments. It requires
PIM or IGMP on the physical network.
• Unicast control plane mode is handled by the NSX controller. This is true to the creation and replication of
VTEP, ARP and MAC tables on controllers which are subsequently distributed to eligible clusters in a transport
zone.
• Hybrid is a optimized unicast mode. Offloading of local traffic replication to the physical network requires L2
multicast. Leveraging IGMP snooping on the first-hop switch is required but does not required PIM. The
first-switch hop switch replicates the traffic for the subnet.

Network Function Virtualization


NSX Logical Switching

The NSX logical switch creates logically abstracted segments to which applications or tenant machines can be
wired. This provides administrators with increased flexibility and speed of deployment whilst providing traditional
switching characteristics. The environment allows traditional switching without the constraints of VLAN sprawl or
spanning-tree issues.

A logical switch is distributed and reaches across compute clusters. This allows connectivity in the data centre for
Virtual Machines. Delivered in a virtual environment this switching construct is not restricted by historical MAC/FIB
table limits. This is due to the broadcast domain is a logical container that resides within the software.

With VMware NSX a logical switch is mapped to a unique VXLAN. When mapped to a VXLAN the virtual machine
traffic is encapsulated and is sent out over the physical IP network. The NSX controller is a central control point
for logical switches. Its function is to maintain state information of all virtual machines, hosts, logical switches and
VXLANs on the network.

Segment ID range

The segment ID range pool is configured on setup and preparation of the host cluster. The VNI ID’s are allocated
from this pool and one ID is allocated per Logical Switch. If you made an example range of 5000-5999 you could
provision 1000 logical switches within the range.

Logical Switch layout

When creating a logical switch at first it is wise to consider what you are connecting to it. The creation of a Logical
switch will consume a VXLAN ID from the segment ID range pool previously defined. Upon creation you will select
the control plane replication mode aligned with the transport zone selected for the control plane.

page 6 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

The logical topology of a NSX logical switch looks like this. This highlights the seamless L2 nature the VMs
experience even though they traverse different L3 boundaries.

What is really happening?

When Web01 communicates to Web02 it communicates over VXLAN transport network. When the the VM
communicates and the switch looks up the MAC address of Web02 the host is aware in its ARP/MAC/VTEP
tables pushed to it by the NSX Controller where this VM resides. It is forwarded out into the VXLAN transport
network. It is encapsulated within a VXLAN header and routed to the destination host based on the knowledge of
the source host. Upon reaching the destination host the VXLAN header is stripped of and the preserved internal
IP packet and frame continues to the host.

Logical Distributed Routing

NSX for vSphere provides L3 routing without leaving the hypervisor. Known as the Logical Distributed Router, this
advancement sees routing occur within the kernel of each host allowing the routing data plane distributed across
the NSX enabled domain. It is no possible to optimising traffic flows between network tiers, no longer break out to
core or aggregation devices for routing and support single or multi-tenancy models.

Logical routing provides the scalable routing. It can support a large number of LIFs up to 1000 per Logical
Distributed Router. This along with the support of dynamic routing protocols such as BGP and OSPF allows for
scalable routing topologies. An additional benefit is that there no longer is hair-pinning of traffic like that of which is
found in traditional application and network architectures. LDR allows for heavy optimization of east – west traffic
flows and improves application and network architectures.

page 7 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

Data Path components

Logical interfaces, known as LIFs, are configured on logical routers. They are analogous to Switched virtual
interfaces or routed virtual interfaces, SVI’s / RVI’s, on traditional network infrastructure. IP addresses are
assigned to LIFs and there can be multiple LIFs per Logical Distributed Router. When a LIF is configured it is
distributed to all other hosts. An ARP table is also built and maintained for every LIF.

The above image highlights how routing between two L3 segments on a single host occurs. The LIF or gateway
for each tier is in the LIF table. The routing table is populated with directly connected networks. When a packet
destined for the App tier reaches the gateway it has its MAC re-written and is placed onto the L3 segment in
which the destination resides. This is all done in kernel and does not require the packet to traverse the physical
infrastructure for L3 function.

page 8 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

The above image demonstrates a similar scenario to the single host in kernel routing. The same process applies
in regards to routing when dealing with different host scenarios. Where this differs is when the ARP look up
occurs. After being routed locally closest to the source in kernel the packet is then routed onto the App segment
and sent towards the VXLAN transport network.

page 9 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

The VXLAN vmk interface wraps the packet destined to App02 in a VXLAN header. The VXLAN header knowing
its destination from the tables built by the controllers and pushed to each host is routed. Upon reaching the
destination host with App02 on it the VXLAN header is stripped off and the packet is routed to its destination on
the local segment.

There is a MAC address assigned to each LIF. This is known as the vMAC. It is the same for all hosts and it is
never seen by the physical network. The physical uplink interface has a pMAC associated to it. This is the
interface in which traffic flows to the network. If it is an uplink to a VLAN network the pMAC is seen whereas a
VXLAN uplink will not expose the pMAC.

It is important to remember that the pMAC is not the physical MAC address. The MAC addresses are generated
for the number of uplinks on a VDS enabled for logical routing. The vMAC is replaced by the pMAC on the source
host after the routing decision is made but before the packets reach a physical network. Once arriving at the
destination host traffic is directly sent the virtual machine.

Control VM and distributed routing operations.

The control VM is a user space virtual machine that is responsible for the LIF configuration, control-plane
management of dynamic routing protocols and works in conjunction with the NSX controller to ensure correct LIF
configuration on all hosts.

page 10 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

When deploying a Logical Distributed Router the following Order of Operations occurs:

1. When deploying a logical distributed router a logical router control VM is deployed. NSX manager creates
the instance on the controller and hosts. This is a use space VM and should be deployed on the edge and
management cluster.
2. The controller pushes a new LIF configuration to hosts.
3. Routing updates received from an external router. This can be any device.
4. The LR control VM sends route updates to the controller.
5. The controller sends these route updates to the hosts.
6. The routing kernel module on the hosts handle the data path traffic.

VLAN LIF

Not all networks require or have VXLAN connectivity everywhere. The Logical Distributed router can have an
uplink that connects to VLAN port groups. The first hop routing is handled in the host then routed into a VLAN
segment. There must be a VLAN ID associated to the dvPortGroup. VLAN 0 is not supported. VLAN LIFs require
a designated instance.

VLAN LIFs generally introduce some design constraints to a network. A design consideration of one PortGroup
per virtual distributed switch can limit this uplink type and there can only be one VDS. The same VLAN must span
all hosts in the VDS. This doesn’t scale as Network Virtualization seeks to reduce the consumption of VLANs.

Designated Instance

The role of a Designated Instance is to resolve ARP on a VLAN LIF. The election of a Host as a DI is performed
by the NSX Controller. This information is subsequently pushed to all other Hosts. Any ARP requests on the
particular segment or subnet is handled by that host. If a Host fails or is removed the Controller selects a new host
as the Designated instance. This information is there re-advertised to all hosts.

page 11 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

VXLAN LIF

VLXAN LIFs are a more common uplink type. Logical Distributed Routing works with VXLAN logical switch
segments. First hop routing is handled on the host and traffic is routed to the corresponding VXLAN. If required, it
is encapsulated if needed to travel across the transport network to reach the destination on another host.

A designated instance is not required in the case of VXLAN LIFs. The next hop router is a generally a VM within
the transport zone – such as a NSX Edge Services Gateway. It is recommended that Distributed Logical routing
leverages VXLAN LIFs as they work the best with this feature. A VXLAN LIF can span all VDS in the transport
zone.

LIF Deployment types

There are three use cases in which LIF interfaces can be configured. There are three internal to uplink LIF
interface configurations:

Internal to External

VXLAN to VXLAN
VXLAN to VLAN
VLAN to VLAN

NSX Edge Services Gateway

NSX Edge Services gateway is a critical component to NSX. The virtual appliance provides a vast array of
network functionality. As the evolution of the vCNS gateway edge, the NSX ESG is a leaner, meaner and
resource optimised gateway. It can provide near on 10Gbps throughput. This is almost double what other virtual
appliances can move through it. The joys of owning the hypervisor right? Forming one of the termination points
between the physical and virtual world, the NSX edge can provide routing, VPN services, firewall capability, L2
bridging and load balancing.

Virtual appliance

Deployed as an OVA by the NSX manager, the NSX edge has a few requirements. Due to being a virtual
appliance, unlike it’s in kernel counterparts, each edge requries vCPU, memory and storage resources. It should
be treated like any virtual machine would be.

Requirements

The requirements are as follows

Compact 1vCPU, 512MB RAM


Large 2vCPU, 1024MB RAM
Quad-Large 4vCPU, 4096MB RAM
Extra-Large 6vCPU, 8192MB RAM

As a rule of thumb your milage will vary based on application and workload that reside behind the edge. VMware
has found that Quad-Large is good for a high level firewall whilst Extra-Large is suitable for Load Balancing with
routing and firewalling. It is a simple interaction that allows the re-sizing of the Edge gateway. This goes both
ways – small to large and large to small. This suits environments where performance is known to grow for a
certain event or period of time.

Routing

A key function of the NSX edge is the L3 gateways. The ability to provide a subnet interface and attach a logical

page 12 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

segment allows for optimisation of network traffic. No longer do virtual workloads require a SVI/RVI on a physical
Top of Rack or aggregation switch, they can be used on NSX edges. In most topologies a dFW is used for this
function to provide kernel LIFs. There are some topologies such as a micro segment that may reside behind an
edge on a single logical switch that use an edge for a L3 gateway plus NAT. With that said connectivity needs to
be provided to these networks. They need to be reachable.

Whilst I won’t go through how the routing protocols work I will call out what the edge supports. NSX edges
support OSPF, BGP (external and internal), ISIS and static routing. This provides administrators flexibility in how
they choose to peer with the physical infrastructure and advertise the subnets into the network. ECMP support is
also new in 6.1 allowing multiple routes to a destination. This provides redundancy and resiliency in the IP
network.

Redistribution plays a critical part in a scalable and dynamic network. It is possible to redistribution from one
protocol to another. Prefix list filtering is also available.

NAT

Network Address Translation is a staple of modern networks. NAT can be performed for traffic that flows through
the Edge. Both Source and Destination NAT is supported.

In a customer environments where hosting of applicaitions occur, such as a cloud platform, NAT plays a critical
role in the IP address resusage. Where topologies are defined by a catalogue or template the reuse of IP
addresses allow for simple topology deployment. On an NSX edge, NAT can be used to translate a private range
to a public range. Servers can be reached through this NAT’d IP address. This allows public IP addresses only
consumed on the NSX Edge opposed to all virtual machines within a topology.

VPN Services

The Edge gateway provides a number of VPN services. Layer 2 and Layer 3 VPNs give flexibility in deployment.
L2 VPN services allow connectivity between seperate DCs, allowing layer 2 domains the ability to connect to each
other. With the decoupling of an NSX edge gateway in 6.1, it is possible to allow L2 VPN services from a non NSX
enabled environment to create a L2 tunnel to an NSX enabled cloud/environment. This enables the ability to move
and connect workloads with other sites or clouds.

L3 VPNs allow for IPsec Site to Site connections between NSX edges or other devices. SSL VPN connections
also allow users to connect to an application topology with ease if security policies dictate this.

L2 Bridging

NSX edge supports the ability to bridge a L2 VLAN into a VXLAN. This allows connectivity to a physical VLAN or
a VLAN backed port group. Connection to physical workloads are still a reality in this day an age. The ability to
bridge this allows migration from P-to-V, connection to legacy systems, and a host of other use cases.

Firewall

The NSX provides a stateful firewall which complements the Distrubted Firewall that is in the kernel of all hosts.
Whilst the dFW is primarily used for enforce intra-DC communication the NSX edge can be used for filtering
communication that leaves an application topology.

Load balancing

NSX Edge provides a load balancing function. For most server farms the features and options provided to server
pools suit most real world requirements. Where the NSX Load Balancer does not meet the requirements, partner
integration from F5 (and soon to be Citrix and Radware) allow administrators flexibility in their application
deployments.

page 13 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

DHCP Servers

The NSX edge can be a DHCP server for the application topology that resides behind it. This allows automatic IP
address management and simplification. Customers using a hosted platform do not need to rely on an
infrsatructure management solution and can use DHCP from the Edge. Handy for environments that require
dynamic addressing but are volatile in nature (development, test environments).

DHCP Relay

In 6.1, NSX added DHCP relay support. Before this the NSX edge could either be a DHCP server or the
application topology that resided behind it required it’s own DHCP server. This wasn’t always suitable for
customers or application topologies. DHCP relay support by NSX edge and DLR allows for the relaying of
Discover, Offer, Request and Accept messages that make up DHCP.

The scenario is the messages are proxied by the DHCP relay enabled edge to a device running DHCP, in this
case it is our Infrastructure Server. This means in certain environments you can have a centralised server cluster
that manages IP addresses. The numerous services that reside behind different networks can access this. This
can be configured on the DLR or ESG.

Numerous relay agents can reside in the data-path and they support numerous DHCP servers. Any server listed
will have request sent to it. The one thing it cannot do is option 82. This is no overlapping address space.

If you issue the show config dhcp command on the Edge gateway you get the following output:

vShield Edge DHCP Config:


{
"dhcp" : {
"relay" : {
"maxHopCounts" : 10,
"servers" : [
"192.168.254.2"
],
"agents" : [
{
"interface" : "vNic_1",
"giaddr" : [
"192.168.10.1"

page 14 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

]
}
]
},
"logging" : {
"enable" : true,
"logLevel" : "debug"
},
"enable" : true,
"bindings" : {},
"leaseRotateTime" : 900,
"leaseRotateThreshold" : 10000
}
}

The gateway for which DHCP requests are expected from is vNic_1 – 192.168.10.1. It will relay DHCP requests to
the server 192.168.254.2. It will enable lease time of 10000 seconds and log debugs.

ECMP

With the advent of NSX 6.1 for vSphere, Equal Cost Multi Path (ECMP) has been introduced. Each NSX Edge
appliance can push through it 10Gbps of traffic. There may be applications that require more bandwidth and as
such ECMP helps solve this problem. It also allows increased resiliency. Instead of active/standby scenarios
where one link is not used at all, ECMP can enable numerous paths to a destination. Load-sharing also means
that when failure occurs only a subset of bandwidth is lost and not feature functionality.

North-South traffic is handled by all active edges. ECMP can be enabled on both the distributed logical router and

page 15 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

the NSX edge appliance. As long as the path is equal cost then there will be multiple routes usable for traffic.

The OSPF adjacencies shown here highlight there are peerings between the DLR and all the edges and the
edges and the physical router. Confirm this by a show ip route and it will demonstrate that the routes to the
destination will have multiple equal cost next hops.

Here you can see that traffic takes varying paths inbound and outbound because there is only two hops between
assets behind the DLR and the physical infrastructure.

page 16 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

vSphere HA should be enabled for the NSX edge VM’s to help achieve higher levels of availability during failure.
It is recommended also that timers are aggressively tuned. Hello and hold timers should be 1 and 3 seconds
respectively to speed up traffic recovery.

It is impotant to remember that Active/Active ECMP has zero support for stateful FW, Load balancing or NAT
services on edges running ECMP.

Hashing mechanisms

The importance of hashing cannot be overlooked. The ECMP used in NSX uses one of two hashing algorithms.
On the NSX edge, load-balancing is based on the linux kernel. It is a flow based random round robin used in
next-hop selection. A flow is determined by a pair of source and destination IPs. On the distributed logical router
the hashing is simply done by the source and destination IP.

Linux flow based random round robin algorithm is rather interesting. It uses the concept of budgets. The kernel
will define the budget as the sum of all next-hop weights. Each next hop, when initialised, is assigned a budget of
1. Each round, the kernel will generate a random value from 0 to the total round-robin budget. It will search the
next-hop list until it finds a next hop with a budget that is equal to or greater than the generated random value. It
will decrement both the round-robin budget and selected next-hop budget after selection.

When an edge fails the corresponding flows are re-hashed. It is re-hashed through the remaining active edges.
Once the adjacencies time out and the routing table entries are removed a re-hash is triggered.

Distributed Firewall

Within VMware NSX for vSphere there is a feature known as the distributed firewall (dFW). The distributed firewall
provides an in-kernel firewall that provides enforcement of policy at the guest level. This is done by matching rule
sets applied centrally at the vCenter level down at the vNIC level. This ensures consistent enforcement in a
distributed fashion irrespective of where the workload is placed. By allowing optimised traffic flow due to a vNIC
level firewall the dFW removes odd traffic manipulation to reach a firewall attached to an aggregation or services
layer.

Before the dFW was brought to market there was a need for East to West firewalling. There has been a long time
a focus on perimeter security. This has been brought on through the industry focus on north-south application and
network architectures that placed security at the DMZ and internet edge. Firewalls were littered around on a per
application basis but nothing targeted East-West enforcement. Virtual appliances permeated the market such as
vShield App, vSRX Gateway and ASA gateway but due to being virtual appliances they were limited to poor
performance. This generally was 4-5 Gbps and a reduced feature set. Each had also licencing issues and a
substantial memory and vCPU footprint which made scaling horizontally quite the issue. Not suited at all to
attempting to firewall Tbps of lateral traffic. Enter the dFW that scales based on CPU allowing upwards of 18+
Gbps per host.

Components

Upon Cluster and Host preparation there is a dFW VIB that is installed to ever host. Once installed the dFW is
enabled. It is installed along side the VXLAN and Logical Distributed Routing VIBs. To leverage the dFW an
administrator can use three touch points – vCenter Web Client, REST API client via NSX Manager or SSH client
to the ESX host. The REST API and the vCenter Web Client will propagate all rule changes to all hosts within an
NSX enabled domain and this is the recommended method. SSH access provides Level 3-4 troubleshooting and
superfluous verification techniques.

Rules configured via the API exposed by NSX Manager or the vCenter UI are pushed down via the User World
Agent on a host. The User World Agent will take the rules learnt by NSX manager and enforce them at the

page 17 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

matching VM vNIC(s).
This means any traffic that requires communication with any host must traverse the firewall kernel module in the
vNIC before accessing the vSwitch.

The dFW has two tables it maintains, first, a rules table. The rules table is the collection of matching rules based
on what the matching criteria are. This is a standard numerical indexed table that can match on IP addresses,
Ports and vCenter objects. The second table is the connection tracker table. This tables function is to manage
current flows through the firewall which tantamount to traffic permitted to the firewall. The first packet of each flow
is inspected and the connection table acts as the fast path.

Building dFW filters

Tight integration with vCenter and the hypervisor puts the dFW into a unique position. It can use the traditional
source and destination IP and ports to create filters and object groups. It can also take advantage of vCenter
objects such as Clusters, DataCenters, Logical Switches, Tags, VM name, Guest Operating System and more.
This allows administrators to apply context to rule sets that make sense and allow the use of modern security
architectures.

Packet Walk

From the source guest VM a packet is sent out towards the vSwitch. Before egressing onto the vSwitch from the
vNIC the dFW performs its actions. By firewalling at the vNIC with an in kernel module it is possible to reduce the
amount of un-authorized traffic within the network.

First-packet lookup

1. A lookup is performed. This lookup is of the connection tracker table. This checks to see if an entry for a
flow is pre-existing.
2. If a flow is not pre-existing in the connection tracker table it is listed as a miss result. Subsequently a rule
lookup occurs on the rule table. An attempt to identify and find a matching rule applicable to the flow.
3. Upon finding a matching rule in the table a new entry is created within the connection tracker table. The
packets are then transmitted.

Subsequent Packets

1. A lookup is performed. This lookup is of the connection tracker table. This checks to see if an entry for a
flow is pre-existing.
2. An entry exists for the flow within the connection tracker table. The packets are transmitted.

If communication is between two guests on the same host then traffic will not hit the physical network. If traffic is
on two guests on different hosts then traffic only needs to make its way to the other host. No longer is traversing a
virtual appliance or physical firewall at an aggregation layer necessary.

By policy on the ingress to it is possible to provide granular policy control at both source and destination.

Distributed Firewall Logs

There are three types of logs that the firewall keeps and stores. They have a variety of information. It is important
to understand where these are kept if advanced troubleshoot or auditing is required.

page 18 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

The NSX manager stores two types of logs. Stored at /home/secureall/secureall/logs/vsm.log are Audit logs and
System events.
The audit logs include administration logs and dFW configuration changes. From an auditing perspective this
includes pre and post rule changes. The System Event logs include dFW configuration applied, filter created,
deleted, failed, VM’s added to security group and more.

On each host there is a Rules message log that is kept at /var/log/vmkernel.log.


This set of logs has PASS or DROP associations for each flow. The combination of these logs provides the
required information for audited environments such as PCI-DSS and other compliance frameworks.

Memory, CPU and Storage requirements

There is an amount of hardware required to support the virtual appliances that drive VMware NSX. This is
measured in RAM, Storage and CPU. There is very little overhead as you scale in terms on impact on resources.
Reliance on vCPU is important and these numbers can help you in terms of attempting to design an NSX
environment.

MEMORY

NSX manager – 12GB


NSX edge – 512GB to 1GB (based on instance size)
NSX data security – 512GB

DISK SPACE

page 19 / 20
NSX Compendium
by pandom - http://networkinferno.net/nsx-compendium

NSX manager – 60GB


NSX edge 512GB – 4GB
NSX data security – 6GB

vCPU

NSX manager – 4 vCPU


NSX edge – 1 to 6 vCPU(s)
NSX data security – 1 vCPU

NSX resources and reference documents.


VMware NSX Network Virtualization Design Guide PDF

The VMware NSX design guide looks at common deployment scenarios and explores the from the ground up the
requirements and considerations in a VMWare NSX deployment. I did a write up here about this when it was
released and since then there has been additional content added surrounding spine and leaf switch
configurations. There is also sections on QoS, DMZ designs and L3 edge services offered by VMware NSX.

VMware NSX leveraging Nexus 7000 and Cisco UCS infrastructure

This new design guide looks at NSX running over the top of existing Cisco Infrastructure. Cisco Nexus and UCS
are a mainstay of many data centers and this design document highlights the easy of which NSX can run over the
top. Packed full of UCS and Nexus tips and tricks this guide is worth a read.

Next Generation Security with VMware NSX and Palo Alto Security VM-series

Our Net-X API provides partner integration into NSX. The network fabric which we deliver with VMware NSX can
be further expanded to partners such as Palo Alto. Their VM-series user-space firewall specialising in integrating
into existing PAN deployments and Layer 7 advanced application filtering.

VMware and Arista Network Virtualization Reference Design Guide for VMware vSphere Environments

VMware have published an Arista paper. This shows off network topologies that leverage Arista infrastructure that
have integration with VMware NSX. It shows off VTEP integration with hardware offload on the TOR and more.
Worth a read if you are looking at alternatives or have alternatives to the current incumbent.

Change Log
v0.1 – Introduction and Core NSX components.
v0.2 – Logical Switching and VXLAN replication.
v0.3 – LDR and Reference design documentation.
v0.4 – Hardware requirements, Distributed Firewall.

page 20 / 20

Powered by TCPDF (www.tcpdf.org)

You might also like