Professional Documents
Culture Documents
Contrail SD-WAN Solution Design Guide
Contrail SD-WAN Solution Design Guide
Contents
Contrail SD-WAN Solution Design Guide......................................................................................................................................................1
Document Version Control ...............................................................................................................................................................................5
Version History ................................................................................................................................................................................................5
Introduction...........................................................................................................................................................................................................6
SD-WAN Deployment Considerations ...........................................................................................................................................................7
Juniper Recommendation ..............................................................................................................................................................................8
Contrail SD-WAN Solution Overview ............................................................................................................................................................9
CSO Service Types ..............................................................................................................................................................................................9
Topology Components .................................................................................................................................................................................... 10
Enabling Multi-Tenancy and Segmentation. .......................................................................................................................................... 10
CSO Overlay (Tunnels) Network............................................................................................................................................................... 12
Operations and Maintenance (OAM) ....................................................................................................................................................... 13
Juniper Recommendation ....................................................................................................................................................................... 13
Mesh Tags ...................................................................................................................................................................................................... 14
Use Cases ................................................................................................................................................................................................... 14
Juniper Recommendation ....................................................................................................................................................................... 15
Controller Based on-demand Mesh Dynamic Virtual Private Network (DVPN) ........................................................................... 15
Juniper Recommendation ....................................................................................................................................................................... 16
Local Breakout Options -FIX...................................................................................................................................................................... 16
Breakout Profile and Breakout Policy Management ............................................................................................................................ 17
Juniper Recommendation ........................................................................................................................................................................... 18
APBR and SLA Management...................................................................................................................................................................... 18
Service Level Agreement (SLA) Policy and Profiles. ............................................................................................................................. 18
Juniper Recommendation ....................................................................................................................................................................... 19
Configuration Templates ............................................................................................................................................................................ 19
Device Types ..................................................................................................................................................................................................... 21
Hub Devices................................................................................................................................................................................................... 22
Provider Hubs ........................................................................................................................................................................................... 22
Enterprise Hubs ........................................................................................................................................................................................ 22
Spoke Devices ............................................................................................................................................................................................... 24
On-premise Spoke.................................................................................................................................................................................... 24
Cloud Spoke............................................................................................................................................................................................... 25
Device Deployment Overview. ..................................................................................................................................................................... 25
Version History
Introduction
Software-defined wide-area network (SD-WAN) is an automated, programmatic approach to managing enterprise network
connectivity and circuit costs. It extends software-defined networking (SDN) into an application that businesses can use to
quickly create a smart “hybrid WAN” – a WAN that comprises business-grade IP VPN, broadband Internet, and wireless
services. Hybrid WAN architectures enable companies to manage their growing number of applications, particularly when
using the cloud. Traffic is automatically and dynamically forwarded across the most appropriate and efficient WAN path
based on network conditions, the security and quality-of-service (QoS) requirements of the application traffic at hand, and
cost of the circuit. The enterprise customer sets the routing policies describing the intent of how they want traffic to be
routed
Organizations often review their WAN capabilities, either to add capability (features), increase availability, or to reduce
costs. Some automation of router configurations, for network tasks, is possible by Network Management Systems but fully
automating branch deployment, policy and monitoring reduces man hours and eliminates unforced errors. This differs from
legacy NMS in that active management and monitoring of links can optimize link usage and application performance via
Juniper Networks’ Contrail SD-WAN solution.
Juniper Networks’ Contrail SD-WAN solution delivers a simple, automated Multicloud platform for life-cycle support of
your branch location as well as extending into your Virtual Private Clouds. It enables you to create an evolvable
architecture to simplify growth, from secure routers to SD-WAN and SD-Branch. It automates the WAN edge across
Juniper cloud endpoints and on-premise next-generation firewalls or universal CPE platforms. Juniper Networks’ Contrail
SD-WAN solution provisions and enforces multilevel security policy at scale, across multiple clouds and enterprise sites
with diverse topologies. It provides a solution to provision, configure, monitor and report on the environments ensuring
you meet your branch network operational requirements.
Contrail Service Orchestration is the application, which is used to design, secure, automate, and manage the SD-WAN
service lifecycle across say Juniper Networks’ branch portfolio of devices within the Contrail SD-WAN environment. CSO
is available as an on-premise distribution and also as a Software as a Service (SaaS) beginning with the CSO 5.0 release. Not
all features are the same between the two releases so we will note any differences in this paper.
Contrail SD-WAN is the foundation on which you can chart a course to SD-Branch and beyond, seamlessly integrating
provisioning, full-stack security, and monitoring. Deployments can be onboarded and then monitored and configured from
a single pane of glass to reduce operational complexity. Granular Role Based Access Control allows for organizational
responsibility to stay where it belongs within your organization or crafted to meet organizational change that may occur.
Figure1 depicts the SD-WAN network topology that allows for greater management control than has been possible in the
past.
This paper provides guidance and examples to assist with design considerations of Juniper Networks’ Contrail SD-WAN
Solution, which expands on the Contrail Service Orchestration (CSO) installation and deployment guides.
Juniper Networks Contrail Service Orchestration provides the tools to create different topologies depending on tenant and
site requirements. Once the tenant / site requirements are understood then we can start working through deployment
considerations and designs.
Please consider the following questions when evaluating branch networking requirements:
increase functionality?
increased availability?
Do you run cloud-based service such as Microsoft Office 365, Salesforce.com, Slack, GitHub etc.?
Once an adoption decision has been made the next question is whether to use a cloud-based service, SP provided service
or on-premise managed service. We have included the following flow chart to help guide the decision.
Yes
DIY on premises
Large
Small
No. of branches
No
SP Delivered
Few
In general, the number of sites, geographic location, and the availability of in-house talent will be the determining factors
with regards to CSO as a service or as an On-Premise implementation. A business with less than 200 locations and existing
WAN links should look at a cloud delivered solution. Enterprises that are migrating to new service providers or need
assistance with integration and migration should look at an SP delivered platform. Large deployments with in-house
personnel would probably opt to install an On-Premise solution and customize as needed.
CSO Cloud Delivered or Software as a Service (SaaS) provides standardized templates for CPE and HUBs as it is geared
towards a prescriptive approach to reduce the margin for error. End-point customers will know exactly what ports need to
be used for WAN and LAN links and what configuration is available.
Juniper Recommendation
Customers requiring 200 or less branch sites and standard configurations should look at CSO Software as a Service.
Customers with more than 200 sites or customers that require customized configurations should use the On-Premise
solution or SP/Integrator delivered solution. A customized configuration would be where a customer wants to use
specialized templates (known as a Configuration Template) to a device. For instance, they may want the option to add a
template for specific routes per device or add RADIUS settings to an endpoint.
Supporting devices that create the SD-WAN overlay environment are certified by Juniper Networks’ System Test team and
are fully supported by JTAC, although other devices may work as well but outside the scope of this Design Guide. Please
read the guidelines for Hub selection to decide which devices and where to deploy. Devices can be deployed with multiple
WAN interfaces consisting of MPLS type circuits, a better definition would be “private line” and “Internet” which is best-
effort type service. Internally CSO will tag traffic types as either MPLS or Internet but they should be thought of as just
different CSO levels rather than an LDP/RSVP driven MPLS circuit with all the accompanying features and knobs.
Topology Components
The type of site and the functionality required determine which topology components will be implemented by CSO per
site. NGFW refers to independent devices with routing, switching and security features that optionally have overlay
tunnels but only have a connect to OAM hubs for monitoring and management.
SDWAN Spokes have overlay tunnels connection to one or more Provider and/ or Enterprise hubs, in addition, mesh tags
can be used to create additional topologies to direct traffic based on geographic, redundancy or to balance traffic. If you
decide to use an SD-WAN approach in your deployment then Spokes will have a data path provisioned to a Provider
and/or Enterprise Hub as an anchor point and all non-local flows will be directed to the anchor point via routes learned
from the Virtual Route Reflector within CSO. The anchor points terminate overlay tunnels and provide forwarding to
underlay.
SDWAN Sites will default to a Hub and Spoke topology until either a system or user defined threshold for sessions is met and then
sites may connect directly to other sites if Spokes (depending on NAT Type). Each site within a tenant will establish a tunnel to Hubs
for Operations and Maintenance (OAM) and then to Enterprise Hub (if given). If no Enterprise Hub is specified, then only Provider
Hub connections will be used. Any site can be configured for a variety of options that will determine forwarding by application, site,
segment or department. Provider Hub redundancy is enabled via multi-homing where Provider Hubs are selected as primary and
secondary respectively. Hub and Spoke is useful when CPE sites are behind NAT or if there is a governance policy for all sites traffic
to pass through a central point for security scanning. In that case the number of active sessions per site can be set so no site to site
VPNs are established.
Sites that meet session criteria and routable IP addresses have the ability to dynamically establish site-to-site tunnels and mesh as
needed. All Spokes require an OAM HUB and at least one Provider Hub or Enterprise Hub for anchor points and for provisioning
and monitoring. Spokes should be connected to two hubs to ensure a backup data path in the event that one hub is not available.
Dynamic VPN tunnels for site-to-site connectivity are established based on the number of sessions between any pair of sites.
Dynamic mesh allows tunnels to be constructed as needed and torn down when traffic between sites hit a lower threshold defined
in CSO policy. Dynamic VPN tunnels between sites offloads the Provider Hubs and creates flexible topologies. Spokes with public
IP addresses can directly establish site-to-site connections or may use hole-punching for site-to-site connections, otherwise the
Spokes will use their configured Hub sites. Use Table 1 to determine the logical topology for sites and tenants.
While configuring the site, if one or more WAN links have mesh tags that match associated spokes within the same tenant
then dynamic VPNs may established with those sites as required. If mesh tags or D-VPN establishment metrics are not met
than the site is connected only to Hubs as if in a Hub-Spoke topology. Thus, a single tenant can support both Hub-Spoke
and Dynamic mesh within their SD-WAN environment.
These concepts can be illustrated using an L3VPN environment as an example. As shown below, each customer is assigned
a unique route target value, and all sites of the customer VPN use that route target value. When a router advertises a
customer’s routing information it attaches the appropriate route target value based on which customer VRF originated the
advertisements. The receiving router uses the attached route target value to identify the customer VRF into which the
received routing information should be placed.
A Hub-and-Spoke environment uses route targets differently, as shown below. For each customer, every Spoke VRF
attaches the same route target value when sending routing information. The receiving Hub accepts routes with that same
route target value and installs them into Data Hub VRF. By contrast, the Data Hub VRF attaches a different route target
value when sending routing information, and the receiving Spokes accept and install routes with that same route target
value into Spoke VRFs.
With this setup, only the Data Hub VRF accepts routes from the Spoke VRFs, and only the Spoke VRFs accept routes from
the Data Hub VRF. Using this method, the Spoke sites need very little routing information (perhaps just a default route) as
they only need reachability to the Hub site, thereby keeping routing tables small and churn-free.
The Hub and Spoke example above serve as a good foundation, since the Contrail SD-WAN solution implements route
distribution and separation in the same way when forwarding traffic from one site to another, or when breaking out traffic
to the local internet.
The drawing below shows a Spoke site example configured with two overlay tunnels and local breakout, with all traffic
flowing out the same interface. Each traffic path has its own VRF, and route targets are assigned appropriately at the
Spoke and Data Hub sites to ensure proper tenant route separation.
Figure 5 Local Breakout for Cloud Apps and Direct Internet Access
L3VPN contructs foster the creation of overlay logical topologies to group and connect sites within a tenant. Multiple topologies can
be created within a tenant by enabling network segmentation when creating a tenant. Departments can then be created by function
or location and assigned to sites as needed. Mesh tags can be deployed to build geographical or regional mesh topologies. This
allows administrators multiple levels to segment and control traffic by if they choose.
The Diagram above shows an overlay network for a Hub and Spoke environment. Each Spoke has two tunnels to carry
traffic to the Hub: one through the private MPLS cloud and one over the internet. Another option would be to have one
link broadband and a backup LTE link. The OAM traffic will be carried over an IPsec tunnel via designated WAN interface.
CSO Overlay tunnels can be either MPLSoGRE or MPLSoGREoIPSec. CSO automatically provisions and establishes these tunnels as
part of the deployment process independent of the underlaying infrastructure.
Redirect.juniper.net
1 Device connects to
CSO redirect.juniper.net.
Juniper Internal
Juniper Recommendation
CSOaaS will provide the loopback address for Hubs and Spokes. On-Premise installations have the option of choosing a
loopback address space that will be added to an internal CSO IPAM. Once CSO is provisioned it will automatically allocate
from the internal pool of OP address unless an administrator overrides the default addresses by providing an loopback
address during site creation.
Let’s look at some other ways you can optimize topology within a tenant.
Mesh Tags
Mesh tags are arbitrary labels that enable an administrator to create additional sub-topologies within a tenant. The default
tag names are Internet and MPLS, but any tag name may be used. The only requirement is that Spokes and Enterprise Hubs
must share like tags if you want the site to be able to peer to a given gateway site. This creates a lot of interesting ways to
segment traffic, increase redundancy and scalability.
Use Cases
Use cases for mesh tagging include:
• Connecting different Underlay Links, Mesh tags allow different link types to be connected as long as they have a
common Mesh Tag.
• Site to Site Tunnels based on capacity – Assigning Mesh Tags by capacity will prevent a high-speed link from over-
running a lower speed link.
• Dual CPE may not always have the same number of WAN links - Mapping Mesh Tags to links allows better traffic
distribution on sites with a higher number of WAN links. In the past those links would have been underutilized.
• Geo-Meshing – Establishing regional Mesh Tags allows a Tenant administrator to group sites by geography and
distribute load on Enterprise Hubs.
• Dynamic Mesh Load Balancing – CSO will load balance on sites with multiple links with the same Mesh Tags.
• Redundant Links – Established by specifying the same Mesh Tag to multiple WAN links. This can be a site with two
WAN links and a Mesh Tag common with a site that only has a WAN single link.
Site#1 Site#2
Site#1 Site#2
WAN_1 WAN_2
Tag: CPE2-MPLS Tag: CPE2-MPLS
CPE 2 CPE 2
WAN_3
Tag: CPE2-MPLS
MPLS-IND MPLS-IND
MPLS-USA MPLS-USA
INET-IND INET-IND Site#1 Site#2
INET-USA INET-USA
Site#3 Site#4
Site#1 Site#2
Juniper Recommendation
Use mesh tags to compartmentalize and separate traffic, create sub-topologies, and add additional redundancy. Use
descriptive names that allow the viewer to understand what components will be meshed together, i.e., NA-West,
Accounting etc.
DVPN can only be triggered for Spoke to Spoke traffic where the Spokes have routable IP addresses or via NAT Hole-
Punching. In the case of NAT Hole-Punching, CSO incorporates STUN server functionality and real-time monitoring to
decide when to push configuration to establish D-VPN tunnels.
CSO monitors the sessions and will terminate tunnels based on tenant specified Key Performance Indicators (KPI). The
default threshold value for creating DVPN tunnels is 5 (sessions closed measured over a two-minute period). The default
threshold for deleting DVPN tunnels is 2 (measured over a 15-minute period). Administrators can modify default
thresholds at global, tenant or site level.
Juniper Recommendation
DVPN tunnels should be used to reduce hair-pinning of traffic through Hubs and to off-load Hubs when possible. For sites
that will communicate frequently the KPI for DVPN default value for establishment can be left at default while the session
teardown should be changed to 5. This will reduce the frequency of configuration changes by enabling long-lived DVPN
tunnels and reduce tunnel churn.
Local Breakout is analogous to split-tunnel terminology on the same interface where some traffic will go directly out the underlay
network versus all traffic going out the overlay or VPN network. When enabled the devices routing table is updated to have default
route pointed at a given interface and then longer prefix matches will go through appropriate overlay tunnels. This allows the site
administrator to create SLA policies which determine when traffic should egress from overlay network to take alternative routes to
a destination point. Enterprise Hubs should have a separate interface for underlay breakout to increase traffic visibility on the
Enterprise Hub.
Juniper Recommendation
Customer corporate policy will dictate which breakout policy (if any) will be deployed. Some customers will want to
have internet bound traffic egress their links asap while others may require that all traffic passes through an inspection
point. The breakout options provide flexibility to allow customers to hand-off or process traffic at different points in the
network based on corporate policy requirements. In general, Breakout policy allows Internet bound traffic to egress at
first possible opportunity to offload traffic to the Internet while optimizing IPSec capacity.
Real-Time Optimized, also known as Application Quality of Experience (AppQoE), a data plane-level mechanism that
provides better scalability and faster decision making. Also, working in conjunction with APBR, AppQoE functions at the
device level, that is, the devices themselves perform SLA measurements across the available WAN links, and then
dynamically map the application traffic to the path that best serves the application’s SLA requirement. Unlike bandwidth
optimized mode, this is all done without the need of the CSO controller to distribute SLA-specific routes.
With AppQoE, when an SLA violation occurs, only traffic corresponding to the application that reported the SLA violation
is moved to an alternate link; any other traffic using the link is unaffected. Link switching is done at the application level by
the Spoke device. Only traffic related to an application that is in violation of SLA policy is moved to another link if available,
other applications will remain on the link unless they report an SLA violation.
SLA Profiles are created for applications or groups of applications for all tenants. The consist of configurable constraints
such as path preference, SLA Parameters including throughput, latency, jitter and loss. You can also define Class of Service
(COS) and rate limiters for upstream and downstream traffic. They can be used to map applications to breakout options or
map a given source (Spoke site, department, group) and selected applications to an SLA profile. There are predefined
profiles supplied with CSO for the most common use cases as shown in Table 3. You can also define and create custom
traffic types. The traffic types map QOS to Forwarding class and scheduler queues and packets are marked with
appropriate DSCP values. It is also used to map type of probe for that link / traffic type.
Voice and video tend to have the same loss, jitter and latency considerations. Typically, loss must be <1%, jitter <30ms,
latency <150ms. You can add profiles for individual traffic types based on this and fine tune the rate limiting per
application. Data applications may be able to withstand higher packet loss and jitter depending on the application but
machine to machine application performance will need voice/video type SLA profiles.
BW with
Loss Jitter Latency Ethernet
Encapsulation
Voice – G711 <1% 30ms 150ms 93Kbps
Interactive <1kbps
Juniper Recommendation
AppQoE provides the fast link failover for traffic although it does require the Spoke to have more than one WAN
interface. If LTE is used as a failover link, then you should choose which applications will failover to LTE link with an SLA
profile as it might not have enough bandwidth for all applications.
Default profiles can be used to create SDWAN policies or they can be defined at the global level. When defining a
profile group like applications,
1 configure
© 2018 parameters
Juniper Networks and decide
J3.02.P01.T07 Rev 17 the order of failover.
Template Owner: Radhika Narayanan © 2017 Juniper Networks,
© 2016 Ju
Another option is to use bandwidth optimized in conjunction with Link Cost so that traffic will always use the lowest
cost link that meets SLA requirements.
Configuration Templates
Customers often have base configurations that are applied to all deployed devices, often referred to as golden configs.
These configurations may contain settings to configure syslog, RADIUS, SNMP or many other settings which remain
constant across all deployed devices within the customer environment. Other settings such as routing protocols or routing
policy will be unique per device. For example, a devices BGP configuration will often be unique per device. Configuration
templates enable network administrators to push common configurations to all sites within a tenant or settings that are
unique per device.
CSO Configuration templates can be configured at Global, Opco or Tenant levels to allow the re-use of templates, as well
as Tenant isolation, keeping the Configuration Templates private. Many common templates have been created by Juniper
engineering and have been pre-assigned to device templates to facilitate out-of-the-box and tested configurations. Pre-
assigned templates can be removed from device templates within the tenant view or new templates can be assigned to
tenants by tenant admins. If unique Configuration templates are required, CSO 5.1 includes a template workflow to create
Configuration templates by importing CLI statements and then creating and defining variables to build the final template
this allows any valid Junos configuration to be templatized.
Configuration Templates inherit initial parameters or default configuration values (if given), or the admin can choose to add
unique values by tenant or by site. Template configurations are unique per level of granularity. For instance; a
Configuration template applied to a site may have unique values per that site and override configuration templates at
tenant level.
Configuration templates can be deployed at the end of device provisioning or anytime as long as device is managed by
CSO. The templates can be applying or do groups of devices within a tenant.
Device Types
Devices are defined as Spokes or Hubs; the different functionality is enabled by Device Templates associated with the site.
Spokes are usually located at branch locations and based on the device template may be NGFW or SD-WAN. Hubs are
always SD-WAN as NGFW does not use Hubs. SD-WAN requires an OAM capable Hub for control traffic, and a Provider
or Enterprise Hub as an anchor point to terminate overlay tunnels. Spokes may also establish tunnels to other spokes
based on traffic flows.
Hub Devices
Hubs are used to terminate OAM (control) and data overlay tunnels from Spokes. CSO SaaS will provide the OAM
functionality while On-Premise deployments must configure OAM functionality on at least one Hub. All Hubs must have
public IP addresses or routable IP addresses in a private network. On-Premise HUB deployments may be defined as OAM,
OAM and Data, or Data only.
Multihoming Hubs
Sites can connect to two of the same type hubs in a process known as multihoming. One hub is primary and the other is
secondary. Sites will always have a route preference for the Primary hub. If the Primary hub is not available, then traffic will
automatically failover and then fail-back if primary site is available again. This behavior allows administrator to select which
hubs are primary and secondary on a site by site basis enabling manual loaf balancing of traffic across hubs. Since there is
potential for overload links in a failover scenario, one should be careful not to overload the link and to prioritize traffic
accordingly.
Provider Hubs
Provider Hubs may be a physical or virtual SRX that is configured to provide any combination of OAM or Data services for
SDWAN Spoke devices. Provider hubs are onboarded at Global or Opco level depending on whether CSOaaS or On-Prem
solution. Tenants can import the Provider hub and when a spoke is on-boarded it will correctly pick the correct devices for
control and data path traffic. Provider Hubs can be shared among multiple Opcos or defined at Opco layer to be shared
among tenants. Each tenant has a virtual slice of the Provider Hub similar to logical systems with independent routing,
security and other network characteristics. Provider Hubs serve as the termination point for overlay tunnels from the
Spokes if the Enterprise hub isn’t configured or in the event that an Enterprise Hub associated with the site is not available.
Provider Hubs can be configured as Primary or Secondary for multihomed Spokes. Provider are not designed to be used as
Gateways to Datacenters or remote campuses, for that we use Enterprise Hubs.
When onboarding a provider hub:
• Onboard at Opco or Global level (on-prem only)
• No ZTP – Stage-1 is pasted in but that allows you to verify network connectivity and fast fail.
• Configure NAT for egress traffic
• Provider Hubs put WAN interfaces in different routing tables so you may need to add a default route for inet.0
interface.
• Additional configuration is required for L2/L3 VPN connections or dynamic routing with legacy network.
Enterprise Hubs
Enterprise Hubs are an extension of the SD-WAN CPE template that are created and used by Tenants and may be
considered somewhat of a self-managed Hub. Enterprise Hubs are not shared among tenants. They serve as the primary
termination point for Spoke overlay tunnels. When a Spoke is associated with an Enterprise Hub than the Enterprise Hub
will take on the role of the Provider Hub with regards to data traffic. In that case the Enterprise Hub will advertise a default
route to its associated Spokes. In the event that an Enterprise Hub fails then Spoke traffic will failover to the associated
Provider Hub (On-Premise version only).
Enterprise Hubs do not replace OAM capable Hubs and there must always be at least one OAM-capable Hub. Enterprise Hubs
must have a public IP address or a routable address in a private network. Enterprise Hubs support OSPF and BGP neighboring with
customer routers or switches. They can learn customer prefixes and advertise those prefixes to other Hubs. They will advertise a
default route to attached Spokes and can be used for CIBO if breakout policy is applied.
Both the Provider Hubs and Enterprise Hubs may serve as an anchor point for Spokes to communicate with other Spokes.
Spokes that are behind Nat have to use a Hub and Spoke topology where IPSec termination point is either a Provider or
Enterprise Hub. If Spokes have routable addresses, then DVPN tunnels can be triggered for Spokes to communicate
directly with each other. Both Provider and Enterprise Hubs can provide NAT services for outbound traffic.
You can have multiple Enterprise Hubs per Tenant, in fact, you can add 5 or more. When you have multiple Enterprise
Hubs within a Tenant they will auto-mesh using BGP with a GRE Tunnel/IPSec control channel between Enterprise Hubs.
When you create an Enterprise Hub, you specify Mesh Tags to define which Spokes can create DVPN tunnels to the
Enterprise Hubs, you can use Site edit to add or change Mesh Tags in the CSO 5.2 or greater release.
Enterprise Hubs are larger SRX devices in Juniper’s Contrail SD-WAN solution and serve multiple purposes such as:
With the above functionality one can understand why a larger SRX is required to achieve the proper scale of an SD-WAN
network. As with building physical networks, an SD-WAN network must have a governing set of rules to create
predictability, availability and reliability. These rules are not set in stone and are more guiding principles with Solution
Validation testing, by Juniper Networks to give you the correct direction on this SD-WAN journey.
Juniper Recommendation
CSOaaS – OAM hubs are provided by the solution. Provider Hubs or Enterprise Hubs are required for data path
termination.
On-Premise Solution - At least one Provider Hub with OAM capabilities must be deployed. Juniper recommends
deploying at least two so there is OAM redundancy.
In either solution locate the Provider hubs in the natural direction of traffic to optimize traffic flows. Import multiple
instances to tenants so the Spokes have a failover option.
Enterprise Hubs anchor associated Spokes and should be placed in strategic locations within the network such as,
Datacenters, headquarters, Network Aggregation points, etc. When multiple Enterprise Hubs are deployed within a
tenant, they create a Dynamic mesh between each other, and we must consider the (N squared -1) scaling problems
that come from a Dynamic mesh of anything. The Enterprise Hub full meshing is automatic with the Contrail SD-WAN
solution to create ‘All Sites’ connectivity, although there is no restriction on how many Enterprise Hubs a Tenant may
create so be aware of the dynamic meshing and number of tunnels created.
Spoke Devices
The CPE device at an Enterprise customer’s branch site acts as a Spoke device in the SD-WAN model. The Spoke is the branch
router providing connectivity from the branch site to other sites in the tenant network, to cloud services, and to the Internet. Spokes
can also provide SD-WAN, UTM and SSL–Proxy Security Services, when licensed and configured in addition to standard branch
router services such as DHCP, Firewall and NAT. CPE can be physical or virtual device located on premise or cloud based.
On-premise Spoke
As of CSO Release 5.1.2, the supported Spoke devices and their interface types are listed in Table 3.
NFX devices provide the ability to spin up virtual functions on demand with a key VNF supported on the NFX platform
being the Juniper Networks’ vSRX. With the vSRX residing on the NFX all the feature-rich security services found on a
standard SRX Series Services are available. The NFX Series comes in the 150 and 250 models of which the NFX150s can
be provisioned over LTE while both the NFX150 and NFX250 can use the LTE as a backup link with the NFX 250 utilizing
an USB dongle for LTE.
Cloud Spoke
Cloud Spokes are devices that serve as gateways to a cloud provider environment, seamlessly connecting the Virtual
Private Cloud into the SD-WAN environment. SD-WAN Spoke devices can be located in an AWS VPC natively, but on
other public clouds it is treated as a White Box implementation. When a vSRX instance in the VPC is provisioned it will
serve as the cloud Spoke device with the appearance of being another Spoke end point on the SD-WAN network. Cloud
Spokes can also be provisioned as Enterprise Hubs.
Redirect.juniper.net
1 Device connects to
CSO redirect.juniper.net.
Juniper Internal
Figure 10 ZTP Provisioning Process
Stage-1 configurations are based on device deployment templates that contain configuration and provisioning settings for a physical
device, such as a Spoke device or a Hub, which you manage through Contrail Service Orchestration (CSO). The CSO installation
includes several default device templates for standalone and clustered devices which can be deployed as presented or customized
as required. Each Device template is specific to the type of device and type of deployment. You must assign a device template to
each Spoke device at the site.
The two types of information included in the device templates:
Basic information—It prepares the device for remote activation with a base configuration to configure interfaces, default
routes, and connectivity to OAM, Enterprise, or Provider Hubs as required by device type and site configuration
parameters. The number of links and type of links will also be specified.
Configuration template information (requires admin access to create and assign to tenant)—It specifies the additional
settings that can be configure for the device. For example, you can enable configuration of LAN and firewall policies. You
create these configuration templates using Configuration Designer, or import existing templates, and then optionally add
settings per device.
Configuration templates are used to customize settings particular to an organization. Common types of configuration templates
include system settings such as logging, NTP and DNS, or protocol configuration including VRRP, IGMP or LACP. The templates are
provided by Juniper Engineering and included with both On-Premise and SaaS solutions. In addition, tenants can create their own
Configuration templates via CSO Configuration Designer and added to a device template.
Juniper Recommendation
Standardize what ports will used for WAN and LAN interfaces. If more than one template is required clone a template, make the
changes and test. If Spokes require Configuration templates, copy the Junos CLI snippet into Configuration Designer, create and
save the template. Associate it with a cloned template and test. Use variables if Spokes will have different parameters for the same
cli stanza and set the Configuration template to set auto-deploy to no. Then after provisioning is complete select the device in GUI,
fill in the fields and deploy.
Active/Active or Active/Backup
LTE
Definition Dynamic
LTE
MPLS
BB
BB
Policy
MPLS
LTE
Prescriptive
BB
BB
During Spoke site configuration, you can designate WAN links can be active/ active or active / backup, independent of link type as
shown in Figure 7. How the link fails over to the backup or how the link load-balances across multiple active links will be used to
configure the Spoke site.
Prescriptive links are a static configuration. Traffic failover will occur when routing updates are received. A couple examples are
below to give the reader an idea of the traffic steering methodology.
• Voice over MPLS and Everything else over broadband
• In the event of a link failure only Point-of-Sale goes over LTE as it is a metered link.
Dynamic links use AppQOE to test the quality of the link and can make local traffic forwarding decisions based on loss, delay and
jitter thresholds.
Single Active Prescriptive – Residential or retail Spokes where Internet access is non-essential for business continuity. Traffic only
fails over to secondary link via intervention from routing update. Simplest and most common deployment for retail Spokes where
multiple active wired links are not cost effective.
Single Active Dynamic - Retail or business with POS terminals that requires Internet access and fast failover if a link fails. Failover
decision based on link quality and then failback if CPE determines primary link is again valid.
Dual Active Prescriptive – Campus type environments with routing policy to select links. Any change to link quality requires an
upstream change to affect the link. This is the case if RPM was used and the link degrades. Standard deployment for years.
Dual Active Dynamic – Banking or some businesses that requires Internet access uptime. SLA for active link measurement and then
decisions local to CPE to determine which applications will be on which links. Fast failover, link optimization, can separate traffic into
bundles and control failover / fallback.
Juniper Recommendation
Understand your site requirements, available links per site, and tolerance for latency and jitter. LTE should be used as
backup to minimize usage-based charges. Build an SLA to offload internet bound traffic and to prefer broadband links over
cellular access.
Scaling
There are a number of considerations that impact scaling including the number and types of devices that will be deployed. SDWAN
vs. NGFW deployments typically effect sizing requirement more than the number of devices deployed as we need to consider Ipsec,
BGP and Meshing. In general about 3500 sites are supported by HA Production server but mileage varies based on the type of sites.
Please refer to the CSO Installation and Upgrade Guide for further details.
SRXs support up to 1000 BGP peers per device so that should not be the gating factor. VSRX are qualified at 200 BGP
peers but should scale higher given enough RAM and CPU. Table 6 is derived from the scaling sheets and based on the
number of VRFs or GRE/IPSec tunnels supported, we can determine they won’t be a gating factor either. It turns out the
average throughput we want a site to have is the gating factor.
IPsec performance
During site creation the administrator can specify which links will carry OAM traffic and if WAN links will establish either GRE or
Ipsec tunnels to Provider / Enterprise Hubs for data termination and enable D-VPN tunnels to other Spokes The number of Spokes
that a Hub can support is dependent on the number of tunnels and the IPSec throughput capabilities supported on that particular
platform. The throughput factor can be somewhat mitigated by configuring Local Break Out so that traffic will route directly to the
Internet (AKA: Underlay) vs. over the tunnel connections to other Spoke or Hubs. The IPSec scaling numbers can be found at:
https://www.juniper.net/us/en/local/pdf/datasheets/1000265-en.pdf
Spokes with IPSec tunnels on one or both WAN interfaces will have those plus the OAM overlay tunnel terminating on a Provider
Hub or separate OAM Hubs. If a Spoke is multi-homed to two Hubs, then both Hubs may have OAM and DATA tunnels for a given
Spoke, supplying high availability.
It is important to look at the applications required by a tenant and associated sites when configuring the end locations. Software as a
Service (SaaS) applications can be directly accessed and usually don’t require traffic to originate from a centralized location. In that
case, you should specify LBO and offload the traffic from the IPSec tunnels transiting the Hub, which could tremendously increase
the scale of the SD-WAN deployment. If LBO is configured, then only VPN traffic, monitoring logs, and AppQOE (if enabled) will
utilize Provider Hub bandwidth allowing more Spokes per Provider Hub. The following example table has a list of common
applications with and without IPSec tunnel overhead. Multiplying the number of users per Spoke would give approximate
bandwidth requirements for local breakout and Provider or Enterprise Hub IPSec processing requirements. To get a better estimate
of site requirements copy the numbers into a spreadsheet and add a column for the number of users.
Given the qualified numbers from the scaling sheets we recommend the following number of Spokes per Hub based on
average IPsec throughput. If sites require additional throughput you can adjust the number of spokes accordingly.
DVPN
We also need to consider the number of DVPN connections supported. As we can see from the following table, there are
sufficient DVPN tunnels available for most deployments.CHECK
Description Scale
Maximum number of events per second that can be processed be SDWAN log process 90000
One last factor to consider is that each Spoke will require the following; one-time and ongoing bandwidth independent of
© 2017 Juniper Networks, Inc. All rights reserved.
VPN traffic. © 2018 Juniper Networks J3.02.P01.T07 Rev 17 Template Owner: Radhika Narayanan
CSO 4.1 and after recommends a minimum of 2Mbps per link. After the process is complete there is a minimal amount of
syslog and maintenance traffic on the OAM link, mostly for log files and netconf traffic that is checking for configuration
synchronization.
Juniper Recommendation
Characterizing the traffic types and application bandwidth requirements will assist with sizing Hubs appropriately. Spokes
can be categorized as small, medium, and large based on expected IPsec throughput and mapped accordingly. It is better to
add 30 percent to expected capacity requirements and specify a larger Hub with higher throughput than discover there is
contention for IPsec throughput.
Deployment Considerations
This guide builds upon the CSO Deployment and Installation Guides to help you deploy a production ready Contrail Service
Orchestration environment. Follow the instructions given in the Installation Guide to provision CSO as you will find the
server requirements and guidelines for installation. The main focus of this document will be SD-WAN design principles to
choose the right topologies and devices to meet the requirements of the branch offices connecting via Juniper Network’s
Contrail SD-WAN solution. Read the section below to understand when to use public vs private IP addressing and how
that choice may require additional underlay configuration.
Internet access is required during On-Premises CSO installation and also to download signature databases and Junos
images. Ensure that the underlay is configured, and routing is working properly. Hubs of both kinds will need default
routes for interfaces in inet.0. A Virtual IP address is required for User Interface, Virtual Route Reflector, and South-Bound
Load Balancer. The same address can be used for all with the all, for labs we put the address on a vSRX and use Destination
Nat with port matching rules to forward to the correct virtual machines.
CSO can be deployed with private or public IP addresses or a combination of the two. If deployed with public IP addresses then
Juniper recommends limiting devices with public IP addresses to the services that the CPE devices require for provisioning and
operations and found in the Deployment and Installation Guides.
spokes. This is only use in link types defined as type “Internet”, MPLS links are assumed to be on routable addresses and
can create D-VPN tunnels without UDP Hole-Punching.
HUB HUB
Spoke A with Private IP Spoke B with Public or Private IP Spoke A with Public IP Spoke B with Public IP
• If both Spokes are behind Symmetrical NAT, the traffic will always traverse via a Hub.
• CSO will not attempt to NAT interfaces with links creates as with MPLS label as it is assumed there is reachability
on MPLS links.
• Tool like Pystun can be sed from a host behind a spoke to confirm NAT type
IP Services
Juniper Internal
All devices in the SDWAN environment require access to common IP services such as NTP, DNS and optionally DHCP. On-Premise
CSO installations will ask for FQDN which is used to create the x.509 certificate that is used for the ZTP process. It. Is difficult to
change the FQDN after installation so please use correct FQDN for the organization and configure DNS before installing the
servers. In a lab environment you can install a DNS/ NTP services on CSO host and use for testing the solution.
Juniper Recommendation
Have DNS and NTP tested and validated before beginning any installation.
• Provider and Enterprise hubs require EBGP for hubs to establish tunnels. The BGP Neighbor must have
routes to CSO servers and the gateways of other devices. Use a separate ASN then the one used internally
for CSO.
• The on-premise devices’ Internet-facing interfaces can be attached to different service provider networks.
It is essential that there is a functioning underlay with routing reachability and any firewall or NAT device is provisioned
and validated. The list of required ports is listed in the CSO Installation and Upgrade Guide.
Juniper Recommendation
All Hubs or Spokes that are going to be provisioned must be able to ping their default router and be able to reach CSO.
Network connectivity must be validated before any attempt at building overlay tunnels (see next section) or provisioning.
Generally speaking, an OAM/Provider Hub should have two WAN ports, a designated OAM port and one additional port
for Internet access. Enterprise Hubs require at least two WAN ports, and Internet port and one LAN port. Beginning in
CSO Release 5.2 additional WAN Links can be configured if required after provisioning, although existing Spokes will not
recognize WAN Links provisioned on a hub after the spoke is provisioned.
Some deployments will need to import routes to Provider Hubs. This can be done via manual configuration or through
configuration templates.
CSO Requires certain ports to be open in the customers firewalls to allow for device provisioning and monitoring.
Logging
All CSO provisioned devices will send log files to CSO to enable monitoring and additional provisioning. Logs are encrypted
and transmitted over ports 3514 and 514. Logs contain session close information, tunnel state and other information
relative to the device. Logs are maintained for 30 days.
Provider Hubs for data tunnel termination are created at the Opco level and can be shared among the tenants within the
Opco. Enterprise Hubs are created at the tenant level and cannot be shared with other tenants. Both Provider and
Enterprise Hubs can be distributed geographically, or you can use a central Provider Hub and then geographically placed
Enterprise Hubs depending on the Spoke distribution. Ideally the Hubs should be placed in the path of traffic towards the
CSO instance so traffic isn’t hair-pinned and will have a shortest path to a egress point of network. Traffic can be steered
via policy and configuration to use the closest Enterprise Hub location and then traffic can be broken out wherever the site
admin chooses. If a Spoke has both Provider and Enterprise hubs configured it will always prefer the Enterprise hub for
data termination unless the Enterprise hub is un-available.
Juniper Recommendation
Use the following generalized rules when sizing and placing Enterprise Hub sites:
• Size Enterprise Hub Sites based on the Ipsec bandwidth requirements of Spokes and Hub-to-Hub traffic.
• SRX 1500/ 4100/4200 /4600 can be used for Tenant Enterprise Hub Sites
• Use Mesh tags to load balance across WAN Links
• Use Mesh tags to geographically anchor tunnels based on region
• Specify at least one link for LBO with AutoNAT to enable Central Breakout configuration after provisioning
oam-vrf and WAN interfaces in WAN routing instances. In some designs, it makes sense to add another interface / VLAN if you
need an interface that isn't associated with a routing-instance.
During ZTP The Spoke will use the WAN interface address (underlay) to contact the redirect server and then to contact the CSO
servers. The CSO server will create a unique stage-1 configuration including loopback OAM address and push it to the target
device. After the configuration is committed by the device all further configuration is through the OAM tunnel. For this to work
correctly all the underlay addresses must be reachable via normal routing.
Multihoming
Multihoming is the ability of a Spoke to connect to two different Provider Hub devices to provide redundancy. The Hub
devices function as primary and the secondary Hub devices. If there are multiple Spokes in the system, the same Hub
device may act as primary Hub device for one Spoke and secondary Hub device for another Spoke. That is, the selection of
the primary and the secondary Hub devices is only in the context of a Spoke. The Spoke is connected to both the Hub
devices through an underlay network.
Traffic is switched from the primary Hub to the secondary Hub in the following scenarios:
Enterprise Hubs
The Spoke can connect to multiple Enterprise Hubs which can be used for Centralized Breakout or as IPSec termination point.
Select the enterprise Hub in the path or at the location where you expect most traffic to flow towards. Mesh tags are associated
with Enterprise Hubs and not Provider Hubs. Mesh tags have to be created before provisioning Enterprise Hubs or the default tags
of “MPLS” or Internet” can be used and associated with matching tags on the target device.
Local Breakout
There is also the option to setup Local Breakout to enable unknown prefixes to egress to the internet. The spoke will learn SDWAN
associated prefixes from the VRR and traffic for known prefixes will traverse the overlay tunnels. Local breakout will have a default
route on selected port and egress to internet at selected WAN interfaces. When you create LBO on a Spoke you may also need to
enable auto-NAT to translate between LAN private IP addresses and WAN interface.
If a Spoke is configured without LBO then all traffic will terminate at the Provider Hub or Enterprise Hub and source NAT should be
applied on the Hub using egress default WAN link as the outbound NAT interface.
Other Options
Spokes with multiple WAN links have additional topology options available:
There are a multitude of ways to configure a Spoke with regards to expected traffic patterns, redundancy requirements
and services provided by the Spoke to hosts behind the Spoke. The flexibility is a result of proper planning and placement
of Hub resources in logical locations within the network.
Firewall Policies
Firewall policies are from source and can include address, device, department, zone or users. The destination can include devices,
addresses, zones, applications, services, Spokes or Spoke groups. Acceptable actions are allow, deny or reject. You can use a CSO
defined UTM policy or clone and modify as needed. This allows a user to block certain Spoke or type of Spokes at either a micro or
macro level.
Spoke-to-Spoke tunnels require Firewall policy to allow communication between Spokes. Firewall Intents are uni-directional but
stateful so the intent must be specified in both directions. This can be written into a single rule and deployed to target Spokes
simultaneously. Groups or departments can be used to deploy intents to all devices sharing the same object.
NAT
AutoNAT can be enabled during Spoke configuration and then all traffic is source NATted when egressing the Spoke or you have
the option to create and deploy NAT rules after the Spoke is provisioned using NAT Policy. Nat Policy allows an administrator to
specify the type of NAT (Source, Destination or Strict), and then specify source by address, zone, routing instance, protocols or
interfaces or any combination of sources. Destination can include addresses, zones, routing instances, services or interfaces. There
are also options for defining NAT Pools and Proxy-ARP.
SLAs
SD-WAN policy optimizes utilization of the WAN links and efficient load distribution of traffic. SD-WAN policy can be
applied to source endpoints (such as sites and departments) and destination endpoints (applications or application groups)
or for breakout traffic (by using breakout profiles). SDWAN policy can be defined by department, Spokes or group of
Spokes, associated with applications and service profiles. The profiles can be one of the breakout profiles or a link defined
SLA. Sla policies can be used to specify what applications are monitored and if the traffic fails to meet defined
characteristics the traffic can be moved to another link if available and the new link meets the defined characteristics.
Configuration Templates
On-Premise deployments can use CSO GUI to define and deploy intents or you can build Configuration templates. Configuration
templates are used to deploy additional settings to CSO managed devices. These can be organization type settings such log hosts,
Authentication servers, or SNMP configuration or they can be site specific. CSO allows admins to take any valid Junos CLI
configuration and create templates using Configuration Designer tool included with CSO. The templates can include variables to
allow changes on a per-Spoke basis. So, a template can be created and associated with a device template and then customized per-
Spoke. Configuration templates without variables can be automatically applied during later stages of provisioning or after
provisioning is complete. Configuration templates with variables appear as device configuration options and allow you to specify
parameters for the variables based on requirements of the individual Spoke.
CSO provides administrators a lot of options and flexibility in deploying Hubs and Spokes and Configuration templates provide a
means for additional device configuration through CSO provided templates or through custom Configuration templates.
Start by determining if this will be cloud delivered on an On-Premise installation. In addition to the questions in the SD-
WAN Deployment Considerations section, we also need to ask:
• Do they have a document outlining the services they plan to offer or provide using SD-WAN?
• Will this be an OTT build or all on customer network?
• Total number of Spokes and what type of CPE?
Use the decision chart to determine if the CSO 5.2 SaaS or On-Premise solution will fit their requirements.
In addition to qualifying the deployment we need to understand if this will be a clean install or there are integration
requirements that will additional work effort. We need to ask:
CSO requires certain devices for the various functionality. Use the customer’s high-level drawing to determine what
devices are needed and how many of each type. Once the topology drawing is complete then you can determine what
devices are required at each location.
The customer design document should be refined to specify the number of VLANs required on each node, a port to VLAN
mapping, an IP address plan, security/load balancing, WAN access, and other details required for each environment.
On-Premise Installation
CSO servers should have a base operating system (Ubuntu 16.04.5 or ESXi 6.0) and meet the server specifications for the
intended installation type (small or HA Large). Server specifications can be found in the CSO Installation and Upgrade
guides.
Site Readiness
• Racks, power, cable, access, servers with correct code version installed.
• Underlay test and validated.
• All Hub-type devices connected and have network reachability
• Login credentials for Hub-type devices.
• Personal available to assist with site and underlay access as required.
Spending the time preparing for the installation by taking the time to understand customer requirement and their
environment will reduce the time to operationalize the SD-WAN deployment. We recommend documenting topology,
access methods, IP addresses, accounts, procedures for adding Spokes and Hubs, adding policies etc.
Conclusion
Contrail Service Orchestration designs, secures, automates, and runs the entire service life cycle across NFX Series
Network Services Platforms and SRX Series for NGFW or SD-WAN. It’s also a service orchestrator for the vSRX Virtual
Firewall, available in public cloud marketplaces such as Amazon AWS.
SD-WAN as a growth platform—Contrail Service Orchestration allows you to chart a course to SD-WAN and beyond by
seamlessly integrating Zero Touch Provisioning, full-stack security, monitoring.
Enabling automation to help simplify your operations, providing reliability and agility while extending visibility across your
multicloud network.
Pervasive, always-on security—Managing your WAN edge policy at scale and end to end from the cloud to the branch,
campus, and data center is all controlled through Contrail Service Orchestration. In addition to connecting, scaling, and
securing WAN topologies and services, Contrail Service Orchestration directs application-aware handling and deep security
inspection, enforcement, and analytics across all managed devices.
Terminology
Term Definition
Application Quality of Experience provides real time monitoring of traffic flows using active and passive
probes to measure application traffic for SLA compliance. CSO uses inline passive probes sent in
AppQOE conjunction with application traffic to monitor the link. CSO monitors other link candidates with active
probes in the event traffic on active link fails SLA and if a candidate links meet SLA requirements then
traffic is moved to candidate link
Branch A tenant site connected to other sites in either a Dynamic mesh or Hub-and-Spoke topology.
Dynamic Virtual Private Network. When traffic is seen by CSO between Spokes that meet administrator
DVPN
defined thresholds then CSO will provision an Ipsec tunnel between the Spokes.
A tenant site that acts as a Hub for traffic from multiple Spokes in a Hub-and-Spoke topology. In this
Hub
topology, all Spoke-to-Spoke traffic flows through the Hub.
Intrusion Detection / Intrusion Protection Systems - Monitor and identify unauthorized access or other
IDP / IPS
incidents then notify administrators and stopping the incident.
LBO Local Break Out – default route will be local next-hop not across VPN
MP-BGP Multiprotocol BGP; a routing protocol used for large-scale, multi-tenancy deployments.
Next Generation Firewall – Standalone Firewalls with advanced features managed and monitored by CSO
NGFW but are standalone devices as they do have connections to hubs for data termination. They will have
outbound-SSH for OAM functionality.
Site Any Enterprise customer branch offices or headquarters. Commonly referred to as Hub site or Spoke site.
Typically, an Enterprise customer with many branches (sites) who subscribes to the SD-WAN offering
Tenant
provided by the provider.
Unified Threat Management provides additional security services that may include Web Filtering, Antivirus,
UTM
and Content Filtering.
VNF Virtualized Network Function – virtual router or other function that run on a virtual machine.
ZTP Zero touch provisioning, allow provisioning of devices automatically with minimal intervention.