By Anthony Burke – VMware NSBU Senior System Engineer.
Disclaimer: This is not an official reference and should be treated as such. Any mistakes on this page are a reflection on my writing and knowledge not the product itself. I endeavour for technical accuracy but we are only human! These are to serve as a formalisation of my own notes about NSX for vSphere. Everything discussed on this page is currently shipping within the NSX product.
This page serves to be a resource for the components and deployment of VMware’s NSX for vSphere . I work with the product daily and educate customers and the industry at large the benefits of Network Function virtualisation (NFV) and the Software Defined Data Centre (SDDC) has. This resource aims to provide information both at high level and technical depth regarding the components and use cases for VMware NSX. This page will evolve as I add more content to it. It will eventually cover off all aspects of VMware NSX and how to use, consume, and run an environment using it. In time there will be a collection text, video and image that I am binding together into a compendium.
VMware NSX delivers a software based solution that solves many of the challenges faced in the data centre today. For a long time administrators and organisations have been able to deploy x86 compute at lightning pace. The notion of delivering an application by a template and the excitement of doing this in the time it takes to the boil the kettle has had its sheen taken off by the three weeks it can take to provision network services.
Network function virtualisation and delivering network services in software has always been a challenge to many. The notion of not only delivering a user-space instance of a service but the ability to program the end to end work flow from end user right through to storage has been a dream for a long time. It wasn’t until the acquisition by VMware of Nicira did this come about and the ability to deliver many functions of the data centre in software took its strong foot hold.
With a new ability to deliver DC features, such as a distributed in-kernel firewall and routing function, NSX edge functionality and L2 switching across L3 boundaries thanks to VXLAN, does NSX re-define the architecture of the data centre. Whilst rapidly reducing time to deploy, decreasing the administrative overhead and empowering the next generation of DC architectures, NSX provides the flexibility to build and define the next generation data centre.
There are some major components on NSX which provide varying function. This page is a technical resource for NSX and deployment on VMware infrastructure.
Whilst NSX for vSphere is very far reaching it is surprisingly light weight. There are only a handful of components that make up this solution to provide the final piece in VMware’s SDDC vision.
The NSX manager is one of the touch points for the NSX for vSphere solution. NSX manager provides a centralised management plane across your data centre. It provides the management UI and API for NSX. Upon installation the NSX Manager injects a plugin into the vSphere Web Client for consumption within the web management platform.Along with providing management APIs and a UI for administrators, the NSX Manager component installs a variety of VIBs to the host when initiating host preparation. These VIBs are VXLAN, Distributed Routing, Distributed Firewall and a user world agent. The benefit of leveraging a VMware solution is that access to the kernel is much easier to obtain. With that VMware provide the distributed firewall function and distributed routing function in kernel. This provides extremely in kernel function processing without the inadequacies of traditional user space or physical firewall network architectures.
The NSX controller is a user space VM that is deployed by the NSX manager. It is one of the core components of NSX and could be termed as the “distributed hive mind” of NSX. It provides a control plane to distribute network information to hosts. To achieve a high level of resiliency the NSX Controller is clustered for scale out and HA.
The NSX controller holds three primary tables. These are a MAC address table, ARP table and a VTEP table. These tables collate VM and host information together for each three tables and replicate this throughout the NSX domain. The benefit of such action is to enable multi-cast free VXLAN on the underlay. Previous versions of vCNS and other VXLAN enabled solutions required VXLAN enabled on the Top of Rack Switches or the entire physical fabric. This provided a significant administrative overhead and removing this alleviates a lot of complexity.
By maintaining these tables an additional benefit is ARP suppression. ARP suppression will allow for the reduction in ARP requests throughout the environment. This is important when layer two segments stretch across various L3 domains. If a segment requests the IP of a MAC address that isn’t on a local segment the host will have the replicated information in its tables pushed to it by the controller.
The NSX Controller has five roles:
The API provider maintains the Web-services API which are consumed by NSX Manager. The Persistence server assures data preservation across nodes for data that must not be lost; network state information. Logical manager deals with the computation of policy and the network topology. The switch manager role will manage the hypervisors and push the relevant configuration to the host. The directory server will focus on VXLAN and the distributed logical routing directory of information.
Whilst each role needs a different master each role can be elected to sit on the same or different host. If a node failure occurs and there is no master for an elected role a new node is promoted to master after the election process.
Most deployment scenarios see three, five or seven controllers deployed. This is due to the controller running Zoo Keeper. A Zoo Keeper cluster, known as an ensemble, requires a majority to function and this is best achieved through an odd number of machines. This tie-breaker scenario is used in many cases and HA conditions during NSX for vSphere operations.
In a rapidly dynamic environment that may see multiple changes per second how do you dynamically distribute workload across available clusters, re-arrange workloads when new cluster members are added and sustain failure without impact all while this occurs behind the scenes? Slicing.
A role is told to create a number of slices of itself. An application will collate its slices and assign the object to a slice. This ensures that no individual node can cause a failure of that NSX controller role.
When a failure of a Controller node occurs the slices that the controller is in charge of will be replicated and reproduced onto existing controllers. This ensures consistent network information and continuous state.
VXLAN is a multi-vendor industry-supported network virtualisation technology. It enables much larger networks to be built at layer 2. This is done without the crippling limitation of scale that is found with traditional layer 2 technologies. Like a VLAN, which is an encapsulation of a layer 2 frame with a logical ID, VXLAN encapsulates the layer 3 packet with a VXLAN header, IP headers and a UDP header. From a virtual machine perspective, VXLAN enables VMs to be deployed on any server in any location, regardless of the IP subnet or VLAN that the physical server resides in.
VXLAN solves many issues that have arisen in the DC through the implementation of Layer 2 domains.
Scaling beyond the 4094 VLAN limitation on traditional switches has be solved thanks to the 24 bit VXLAN Network identifying. Similar to the field in the VLAN header where a VLAN ID is stored, the 24 bit header allows for 16 million potential logical networks.
There are a few VXLAN enhancements for NSX for vSphere. It is possible to support multiple VXLAN vmknics per host which allows uplink load balancing. QoS support is there through the DSCP and CoS tags from an internal frame copied to the external VXLAN header. It is possible to provide guest VLAN tagging. Due to the VXLAN format used there is potential for later consumption of hardware offload for VXLAN in network adapters such as Mellanox.
Control plane enhancements come through adjustments in the VXLAN headers. This allows the removal of the multicast or PIM routing on the physical underlay. It is possible also for the suppression of broadcast traffic in VXLAN networks. This is due to ARP directory services and the role the NSX controller plays in the environment.
Unicast mode along with Hybrid mode select a single VTEP in every remote segment. This is selected from its mapping table. This VTEP is used as a proxy. This is performed on a per VNI basis and load is balanced across proxy VTEPS.
Unicast mode calls this proxy a UTEP – Unicast Tunnel Endpoint. Hybrid mode calls this a MTEP – Multicast Tunnel End Point. The table of UTEPs and MTEPs are synchronised to all VTEPs in the cluster.
Optimisation replication occurs due to a VTEPs performing software replication of Broadcast, Unicast, Multicast traffic. This replication is to local VTEPS and one UTEP/MTEP for each remote segment.
This is achieved through an update to how NSX uses VXLAN. A REPLICATE_LOCALLY bit in the VXLAN header is used for this. This is used in the Unicast and Hybrid modes. A UTEP or MTEP receiving a unicast frame with the REPLICATE_LOCALLY bit set is now responsible for injecting the frame to the local network.
The source VTEP will replicate an encapsulated frame to each remote UTEP via a unicast and replicates the frame to each active VTEP in the local segment. UTEP role is responsible for the delivery of a copy of the de-encapsulated inner frame to the local VMs.
This allows the alleviation of the dependencies on the physical network but there is a slight overhead incurred. It is configurable per VNI during the provisioning of the logical switch.
NSX manager deploys the NSX controllers. A subsequent action after deploying the controllers is preparing the vSphere clusters for VXLAN. Host preparation will install the network VIBs onto hosts in the cluster. These are Distributed Firewall, LDR and VXLAN host kernel components. After this an administrator will create VTEP VMkernel interfaces for each host in the cluster. The individual host VMK interfaces can be allocated IP’s from a pool that can be set up.
Due to the increase of the Ethernet payload due to L2 being encapsulated there is 50 bytes of overhead. An MTU of 1600 is recommended on the physical underlay.
A transport zone is created to delineate the width of the VXLAN scope This can span one or more vSphere clusters. A NSX environment can contain one or more transport zones based on user requirements. The use of transport zone types is interchangeable and an environment can have unicast, hybrid and multicast communication planes.
Transport zones can be used as a method to further carve infrastructure down which is under a single NSX administrative domain. For example, an administrator may have a DMZ environment. Allocating the hosts that provide the DMZ functionality to a separate transport zone ensures Virtual functions – namely Distributed Logical Routers and Logical Switches – are confined to the scope of that Transport Zone. The rest of the infrastructure will be assigned to another Transport Zone.
Another example of this when a business is seeking to achieve a compliant architecture. Whilst NSX can build a compliant mixed-mode architecture there are QSA’s out there who deem an appropriate gap between elements control planes. Using a separate Transport Zone for elements that are inside an environment requiring a PCI compliant architecture can provide further assurance for the elements attached.
The NSX logical switch creates logically abstracted segments to which applications or tenant machines can be wired. This provides administrators with increased flexibility and speed of deployment whilst providing traditional switching characteristics. The environment allows traditional switching without the constraints of VLAN sprawl or spanning-tree issues.
A logical switch is distributed and reaches across compute clusters. This allows connectivity in the data centre for Virtual Machines. Delivered in a virtual environment this switching construct is not restricted by historical MAC/FIB table limits. This is due to the broadcast domain is a logical container that resides within the software.
It is important to remember that the pMAC is not the physical MAC address. The MAC addresses are generated for the number of uplinks on a VDS enabled for logical routing. The vMAC is replaced by the pMAC on the source host after the routing decision is made but before the packets reach a physical network. Once arriving at the destination host traffic is directly sent the virtual machine.
The control VM is a user space virtual machine that is responsible for the LIF configuration, control-plane management of dynamic routing protocols and works in conjunction with the NSX controller to ensure correct LIF configuration on all hosts.
When deploying a Logical Distributed Router the following Order of Operations occurs:
Not all networks require or have VXLAN connectivity everywhere. The Logical Distributed router can have an uplink that connects to VLAN port groups. The first hop routing is handled in the host then routed into a VLAN segment. There must be a VLAN ID associated to the dvPortGroup. VLAN 0 is not supported. VLAN LIFs require a designated instance.
VLAN LIFs generally introduce some design constraints to a network. A design consideration of one PortGroup per virtual distributed switch can limit this uplink type and there can only be one VDS. The same VLAN must span all hosts in the VDS. This doesn’t scale as Network Virtualization seeks to reduce the consumption of VLANs.
L3 VPNs allow for IPsec Site to Site connections between NSX edges or other devices. SSL VPN connections also allow users to connect to an application topology with ease if security policies dictate this.
With the advent of NSX 6.1 for vSphere, Equal Cost Multi Path (ECMP) has been introduced. Each NSX Edge appliance can push through it 10Gbps of traffic. There may be applications that require more bandwidth and as such ECMP helps solve this problem. It also allows increased resiliency. Instead of active/standby scenarios where one link is not used at all, ECMP can enable numerous paths to a destination. Load-sharing also means that when failure occurs only a subset of bandwidth is lost and not feature functionality.
Before the Distributed Firewall was brought to market there was a need for East to West firewalling. There has been a long time a focus on perimeter security. This has been brought on through the industry focus on north-south application and network architectures that placed security at the DMZ and internet edge. Firewalls were littered around on a per application basis but nothing targeted East-West enforcement. Virtual appliances permeated the market such as vShield App, vSRX Gateway and ASA gateway but due to being virtual appliances they were limited to poor performance. This generally was 4-5 Gbps and a reduced feature set. Each had also licensing issues and a substantial memory and vCPU footprint which made scaling horizontally quite the issue. Not suited at all to attempting to firewall Tbps of lateral traffic. Enter the Distributed Firewall that scales based on CPU allowing upwards of 18+ Gbps per host.
To use the Distributed Firewall an administrator can use two touch points – vCenter Web Client or the REST API client via NSX Manager. Validation can be performed in three places – on the ESX host itself, on the NSX Manager with Central CLI, or via the vSphere Web Interface. The REST API and the vCenter Web Client will propagate all rule changes to all hosts within an NSX enabled domain. Host and Manager CLI access provides advanced troubleshooting and verification techniques.
Upon Cluster and Host preparation there is a Distributed Firewall vSphere Installation Bundle (VIB) that is installed to every host. This package is installed by NSX Manager via ESX Agency Manager (EAM). This VIB, the esx-vsip kernel module, is instantiated and the vsfwd daemn is automatically started in the user space of the hypervisor.
vCenter communicates the NSX Manager IP address to the host. NSX Manager will communicate directly with the host through the User World Agent (UWA) speaking with a messaging bus (Rabbit MQ in this case) on tcp/5671
NSX Manager sends rules to the vsfwd user world process over the message bus in a format known as protobuf. This is serialised structured data for programs that communicate over that involve parsing a stream of bytes that actually is structured data.
The vsfwd process converts the protobuf messages into VMKernel ioctls and configures the Distributed Firewall Kernel Module (vsip) with the appropriate configurations for filters and rules. It is important to note that firewall filters are created on all Virtual Machines on all hosts unless an ‘Applied To’ or Exclusion has been applied.
Security Groups provide administrators a mechanism to dynamically associate and group workloads. This abstraction allows a membership definition based upon one of many vCenter constructs. An administrator has the ability to create numerous Security Groups.
Security groups can have the following types of memberships:
My definition of object, abstraction, or expression is one of the following – Security Tag, IP Set, Active Directory Group VM Name, OS Type, Computer Name, Security Group, etc. Something that is express in vCenter that is not a note, folder, or label.
It is possible to match on one or more of the aforementioned objects. These objects can match based on one or more or must match all. Whilst the granularity and control is here – this policy or logical box allows matching the right workload.
If a workload is instantiated that matches one or all of the parameters defined by the Security Groups membership rules it will be associated with the Security Group. At this stage all that has occurred is a manual, dynamic, or inherited membership of workloads.
Security Tag is a labelling mechanism that can be used as an abstraction to describe a state. This can be impressed upon a workload or be the matching criteria to a Security Group. An administrator can create numerous labels to suit how they want to identify a specific workload. Given that the matching criteria of a Security Group can be a Security tag, a workload that is tagged can be automatically placed into a Security Group. Whilst an administrator can express a Security Tag onto a workload via the Web Client, the API or 3rd Party integration can be used to Tag a workload.
Something that uses the API directly would be a cloud management platform such as vRealize Automation. When a blueprint is selected by a user or an administrator it can be configured or set to tag workloads one or many security tags. As a result the workloads will inherit membership of the relevant group.
A 3rd party integration that uses Security Tags to change the group membership of a workload is Endpoint Security. An agent-less anti-virus solution could scan the VMDK associated to a selection of workloads. On detection of a severity one threat the anti-virus solution could revoke a particular tag (say, Production Tier) and invoke a new Security Tag upon the workload.
Security Tags may be dose of deja vu to VMware administrators who have used Labels for a long time. Security Tags are specific and exclusive to NSX. The story goes that NSX Security tags were introduced to the product due to the heavy usage of labels and folders. Heavy usage is a good thing – the problem is that they have been used solely with a compute mentality in mind. This meant that where roles and responsibilities were isolated there was a chance that Labels and Folders used by Compute administrators could adversely alter the security domain.
Rule 2
This would enforce the entire Security Policy Quarantine - High in its entirety BEFORE Security Policy A. This is in a scenario where a workload is dual tagged based on a event.
What if I wanted to have two different network introspection services on one flow type? What comes first? What is the order of operations? The Distributed Firewall for Network Introspection services is key for redirection into 3rd Party Integrations. The Distributed Firewall has 16 slots of which VMware reserve 0-3 and 12-15. Slots 4-11 can be used for registered Network Introspection services. This gives the administrator the flexibility to register services and use the correct 3rd Party Integration based on the desired outcome.
If an administrator had Palo Alto Networks and Symantec registered to the NSX for vSphere platform for IDS functionality it can be deployed on a per application basis. With the redirection policy enforced by a Security Policy applied to a Security Group there is choice down to a flow level what action is taken. Application A could leverage Symantec IDS on a flow, Application B could leverage Palo Alto IDS on a flow, and Application C could use both in order for a dual vendor strategy. The flexibility of the architecture leaves the choice to the administrator.
There is an amount of hardware required to support the virtual appliances that drive VMware NSX. This is measured in RAM, Storage and CPU. There is very little overhead as you scale in terms on impact on resources. Reliance on vCPU is important and these numbers can help you in terms of attempting to design an NSX environment.
The VMware NSX design guide looks at common deployment scenarios and explores the from the ground up the requirements and considerations in a VMWare NSX deployment. I did a write up here about this when it was released and since then there has been additional content added surrounding spine and leaf switch configurations. There are also sections on QoS, DMZ designs and L3 edge services offered by VMware NSX.
This Solution Guide takes a look at how an administrator can integration VMware NSX for vSphere into an existing environment or architecture. This includes NSX core infrastructure, design tips, caveats, and migration strategies.
This paper is an introduction to a key use case of NSX – micro segmentation. At a pseudo-technical level it looks at the place and concepts of micro segmentation and how it delivers value to existing architectures. No longer using just IP address but consuming context from the virtual infrastructure as a foundation for security policy administrators can start building policy they need immediately.
This design guide looks at NSX that uses Cisco Infrastructure to provide network connectivity and compute resources. It talks about Cisco’s new data centre switches and how to take full advantage of NSX across a robust switching and compute infrastructure.
This new design guide looks at NSX running over the top of existing Cisco Infrastructure. Cisco Nexus and UCS are a mainstay of many data centers and this design document highlights the easy of which NSX can run over the top. Packed full of UCS and Nexus tips and tricks this guide is worth a read.
Our Net-X API provides partner integration into NSX. The network fabric which we deliver with VMware NSX can be further expanded to partners such as Palo Alto. Their VM-series user-space firewall specialising in integrating into existing PAN deployments and Layer 7 advanced application filtering.