Secure-SD-WAN-Workshop-7_2_x-XS.EXT

Managed Secure SD-WAN 7.2.
x
Lab Guide
Managed Secure SD-WAN 7.2.x
Sep 25, 2023
Revision 1.4
Copyright© 2023 Fortinet, Inc. All rights reserved.
Fortinet®, FortiGate®, and FortiGuard® are registered trademarks of Fortinet, Inc., and
other Fortinet names herein may also be trademarks of Fortinet.
This document contains confidential material proprietary to Fortinet, Inc.
This document and information and ideas herein may not be disclosed, copied,
reproduced or distributed to anyone outside Fortinet, Inc. without prior written consent of
Fortinet, Inc.
Fortinet reserves the right to change, modify, transfer, or otherwise revise this
publication without notice, and the most current version of the publication shall be
applicable.
1. Lab Introduction
This document describes the Managed Secure SD-WAN hands-on lab.
The lab can be done either during organized workshops or in a self-paced manner.
Although we will be doing our best to guide you through the caveats, this lab guide should NOT be
seen as a “step-by-step guide”. It is assumed that you have a prior knowledge about FortiManager,
FortiGate and Fortinet Secure SD-WAN Solution.
In this lab, you are going to use our publicly available set of Jinja Templates to deploy several SD-
WAN/ADVPN topologies with “BGP on Loopback” routing design. However, neither the Jinja CLI
Templates nor the “BGP on Loopback” routing design are the main focus of this lab. For both these
topics, we would like to refer you to the previous version of the lab - 7.0.x - where you will find a true
deep-dive into them.
The main objective of this lab is to deep-dive into some of the enhancements brought by FortiOS /
FortiManager 7.2.x.
In particular, we are going to cover the following functionality:
• Hub-to-Edge SD-WAN using SLA information embedded into health probes
• Advertising preferred Hub to the outside world
• Segmentation over Single Overlay (VRF-Aware Overlays)
• Providing Internet access in multi-VRF deployments
• Device Blueprints and Variables
All the configuration will be done through FortiManager (FMG), in part - manually and in part - using
REST API calls.
1.1. Lab Environment
The lab environment comes with 6 interconnected sites and a central management suite. All the
devices come partially preconfigured, with basic underlay connectivity and an out-of-band
management network.
Throughout the lab you will interconnect these devices into different topologies, as instructed by
each chapter. But first, here are some important things to know about this environment.
1.1.1. Host Credentials
If you are using one of the lab environments managed by Fortinet (such as FNDN Hands-On Labs),
you will find all the credentials on the control panel of your lab instance.
3
1.1.2. Licensing
If you are using one of the lab environments managed by Fortinet (such as FNDN Hands-On Labs),
this should not be a concern.
1.1.3. WAN Controller
WAN Controller (“wan_simulator”) is an important tool that you will use in this lab. Connect to it via
HTTP.
• First, it gives you granular control over all the WAN links of all the FGT devices. You can
manipulate link latency, bring links down and more.
• Second, if you scroll all the way down, you will find the buttons to start/stop Internet traffic
generators behind each site.
1.2. Before You Begin
To successfully follow the lab guide, you must complete the following preparation steps.
1.2.1. Prepare Postman
Many configuration tasks in this lab will be completed using JSON RPC API exposed by FMG.
Postman is one of the best tools that can be used to deal and interact with HTTP-based API. We
have prepared a Postman collection that you will use in this lab.
4
Keep in mind that any task in this lab related to FMG can be done either interactively or using API! You can
even configure the entire lab simply by running the entire API collection in bulk - as you will probably want
to do, if you use this lab environment for your future demonstrations, where you need to set it up quickly.
At the same time, if you simply run the collection and configure everything with API, this learning
experience will be short and not particularly useful.
In this lab, we will have a compromise. We are going to ask you to do some tasks interactively, to learn
how they work. But on many repetitive and basic tasks you will save your time by using API. And in the
end, the choice is yours!
You have two options to use Postman, both offered here:
1. You can download and install Postman on your local computer (unless you already have it)
2. You can also use the Web version - just click the “Launch Postman” button and login (you can
create a free account or simply login using your Google account)
Both options will work for this lab, and the UI will look similar.
• In Postman, go to its Preferences (Settings) and disable “SSL Certificate Verification”:
• Next step is to import the Postman collection. It is publicly available in this GitHub repository,
under the filename “Managed_SDWAN_7_2_x.postman.json”.
Click on “Import” in Postman GUI, navigate to “Link” tab and paste the direct (raw) link to the file.
Here it is.
5
• Now you must create a new environment to define some basic variables that will allow Postman to
connect and interact with our FMG.
In Postman, navigate to “Environments” and click on “+” to create a new environment. Call it with a
name of your choice and define the following variables (case-sensitive!):
◦ ip: Use the FMG ip and port (simply browse

to your FMG and copy the ip:port from the
address bar)
◦ username: admin
◦ password: fortinet
Make sure you save your new environment, then select it in the drop-down list:
6
Finally, check that everything is fine: open Postman collection, click on Login request and then click
“Send”.
You should receive an answer with “Status: 200 OK”, and in the reply body you should see a JSON
response containing code: 0 and message: OK :
In the next chapters you will be asked to run the API calls in one of the two ways:
• Sometimes you will be asked to run a particular API call. You do it exactly as you ran the “Login”
call above - simply by selecting it and clicking “Send”.
• Other times you will be asked to run the whole folder in bulk. You do it by selecting the folder and
clicking “Run”, after which you can run all or some of the API calls in that folder in bulk:
7
1.2.2. Download Jinja Templates
We provide a generic set of Jinja Templates that can be reused in multiple SD-WAN projects. It is
publicly available in this GitHub repository.
If you are familiar with GIT and have it installed on your workstation, feel free to clone the
release/7.2 branch of this repository:
git clone https://github.com/fortinet-solutions-cse/sdwan-advpn-reference.git -b release/7.2
You can also simply download the files using your Web browser. Browse to the GitHub repository and
select the release/7.2 branch. Then click on Code -> “Download ZIP” and extract the downloaded
archive.
8
We will guide you further in the next chapters!
1.2.3. Download Project Templates
As you should remember from the previous version of this lab (7.0.x), a Project Template is a crucial
part of our Jinja Templates set. It describes (using declarative language) the particular SD-WAN
project you are deploying.
We have prepared two Project Templates for this lab - for the two customers we are going to
introduce in the next chapters.
In your lab instance, connect to the Toolhost (“z_toolhost”) and find a ZIP archive containing both
Project Templates called /fortipoc/projects.zip :
root@z-toolhost:~# ls /fortipoc/proj*
/fortipoc/projects.zip
Download and extract this archive.
Hint: you can use SCP. The Toolhost is accessible via your lab instance IP and port 11019:
% scp -P 11019 root@<your-instance-ip>:/fortipoc/projects.zip .
We will guide you further in the next chapters!
1.2.4. Generate Inventory Files
In this lab we are going to import devices in bulk from CSV. The devices will be identified by their
Serial Numbers, which will be unique for each lab instance (derived from the FGT-VM licenses). For
9
this reason, we could not provide the inventory files for you in advance. But we did provide a simple
script to generate them!
Connect to your Toolhost (“z_toolhost”) using SSH and run the following commands:
root@z-toolhost:~# cd /fortipoc/autodeploy/
root@z-toolhost:/fortipoc/autodeploy# ./generate_inventory.py
Fetching S/N for site1-1...
Fetching S/N for site1-2...
Fetching S/N for site1-H1...
Fetching S/N for site1-H2...
inventory.CustomerA.csv
=======================
sn,device blueprint,name,vm_interface_number,hostname,loopback,profile,region,lan_ip
FGVM08TM11111111,Edge-DualISP,site1-1,10,site1-1,10.200.1.1,DualISP,SuperWAN,10.0.1.1/24
FGVM08TM22222222,Edge-DualISP,site1-2,10,site1-2,10.200.1.2,DualISP,SuperWAN,10.0.2.1/24
FGVM08TM33333333,Hubs-DualISP,site1-H1,10,site1-H1,10.200.1.253,DualISP,SuperWAN,10.1.0.1/24
FGVM08TM44444444,Hubs-DualISP,site1-H2,10,site1-H2,10.200.1.254,DualISP,SuperWAN,10.2.0.1/24
inventory.CustomerB.csv
=======================
sn,device blueprint,name,vm_interface_number,hostname,loopback,profile,region,lan_ip_edu,lan_ip_fin
FGVM08TM11111111,Edge-DualISP,site1-1,10,site1-1,10.200.1.1,DualISP,SuperWAN,10.0.1.1/24,10.0.101.1/24
FGVM08TM22222222,Edge-DualISP,site1-2,10,site1-2,10.200.1.2,DualISP,SuperWAN,10.0.2.1/24,10.0.102.1/24
FGVM08TM33333333,Hubs-DualISP,site1-H1,10,site1-H1,10.200.1.253,DualISP,SuperWAN,10.1.0.1/24,10.101.0.1/24
FGVM08TM44444444,Hubs-DualISP,site1-H2,10,site1-H2,10.200.1.254,DualISP,SuperWAN,10.2.0.1/24,10.102.0.1/24
Copy-paste the CSV outputs (only the comma-separated lines) to your workstation. Save them as
“inventory.CustomerA.csv” and “inventory.CustomerB.csv”.
Now we are ready to start the lab!
10
2. Hub-to-Edge SD-WAN
2.1. Project Overview
To explore our first topic, we will introduce a CustomerA who has a very straightforward Dual-Hub
SD-WAN/ADVPN topology, with all sites having two Internet links. There are workloads behind both
Edge and Hub sites, and they all need to communicate bi-directionally between each other.
Completing this topology is an external “legacy” site (not part of the SD-WAN solution) which needs
to communicate bi-directionally with the SD-WAN sites.
Here, by “bi-directionally” we mean that sessions can be originated from any side.
The following diagram illustrates this deployment:
We will start by quickly deploying this topology from scratch and then we will define the problem of
Hub-to-Edge SD-WAN and of course deep-dive into the solution.
11
2.2. Deploying the Project
2.2.1. Creating Foundation
In Postman, navigate to Environments, select your environment and add a new variable called adom
(in small letters!) with the value “CustomerA”:
Do not forget to Save the environment!
Run the entire Postman folder called Foundation :
This will create a new ADOM called “CustomerA” with the following objects inside it:
• Device Groups called “Edge” and “Hubs”, as well as the more specific nested groups “Edge-
DualISP” and “Hubs-DualISP”
• System Template called “Basic-Settings” defining some basic DNS, logging and admin access
settings that will be shared by all our devices
• Certificate Templates called “Edge” and “Hub” that will be used to issue IPSEC certificates
12
SD-WAN Templates for Edge and Hubs defining two SD-WAN Zones “underlay” and “overlay” with
•
some standard rules
• Static Route Template called “Default-Route” defining the default route towards the SD-WAN
• Firewall Policies for Edge and Hubs
There is, of course, one important element missing: our Jinja Templates!
Navigate to Provisioning Templates -> CLI Templates and perform the following tasks:
• Import (hint: More -> Import) the Project Template for CustomerA. You will find it in the folder
“CustomerA” that you have downloaded in the Introduction chapter. Do not forget to set its type to
“Jinja Script”!
• Click “Import”. You will be asked to create a missing variable lan_ip . Make sure to do it:
If these variables look different to you, compared to the previous FMG releases, that is because they
are different! We will talk about them shortly!
• Click “Import” again. The Project Template should be imported successfully.
• Now import all the Jinja Templates for the “BGP on Loopback” design flavor that you have
downloaded from our GitHub in the Introduction chapter. There is no need to import the templates
13
from subfolders. In total there will be 7 *.j2 files, as shown on the below screenshot. Also here,
do not forget to set their type to “Jinja Script”!
• Click “Import”. You will be again asked to create the missing variables (this time there will be four
of them). Make sure to do it. Then click “Import” again. All the templates should be imported
successfully.
• Create two CLI Template Groups as follows:
◦ Edge-Template:
▪ 01-Edge-Underlay
▪ 02-Edge-Overlay
▪ 03-Edge-Routing
◦ Hub-Template:
▪ 01-Hub-Underlay
▪ 02-Hub-Overlay
▪ 03-Hub-Routing
▪ 04-Hub-MultiRegion
You can do it manually, but if you search well enough, you will find an API call doing this for you inside
our Postman collection!
14
Finally, assign them to the device groups “Edge” and “Hubs” respectively:
•
As we agreed in the Introduction, deep-diving into the structure of our Jinja Templates is outside the
scope of this lab (we refer you again to the previous version of it - 7.0.x - where this topic has been
covered extensively). Let us just take a quick look at the Project Template and then also examine one
particular detail which has changed in FMG 7.2.x - namely, the Variables.
2.2.2. Project Template
Look at the Project Template that you have imported.

Good news: you can view it directly in the FMG, thanks to the new syntax highlighting for Jinja.
We still recommend using your favorite external plain text editor to modify Project Templates when
necessary. But in this lab it won’t be necessary.
If you are familiar with the structure of our Jinja Templates, you will easily figure out that we describe
a simple project with a single Dual-Hub region (called “SuperWAN”, which is probably the commercial
name of the Managed SD-WAN offering that our CustomerA is selling):
{% set regions = {
'SuperWAN': {
'as': '65001',
'hubs': [ 'site1-H1', 'site1-H2' ]
}
}
%}
There is a single device profile called “DualISP”:
{% set profiles = {
'DualISP': {
'interfaces': [
{
'name': 'port1',
'role': 'wan',
'ol_type': 'ISP1',
'ip': 'dhcp',
'dia': true
15
},
{
'name': 'port2',
'role': 'wan',
'ol_type': 'ISP2',
'ip': 'dhcp',
'dia': true
},
{
'name': 'port5',
'role': 'lan',
'ip': lan_ip
}
]
}
}
%}
This profile defines that:
• port1 connects to the ISP1

• port2 connects to the ISP2
• port5 connects to the LAN
The WAN IPs are acquired from DHCP and the LAN IP is set by a per-device variable lan_ip .
So what per-device variables will we need to set, so that our Jinja Templates render properly? This is
conveniently summarized at the very top of the Project Template, in the comments section:
• hostname , loopback , region and profile are always required for our Jinja Templates
• lan_ip is required, because we use it in the Project Template itself, to configure the LAN IP
2.2.3. Variables
One of the changes introduced in FMG 7.2.x is the implementation of the variables.
Provisioning Templates (including our Jinja Templates) will no longer use the per-device meta fields
familiar to you from the previous FMG releases. To be clear, the meta fields are not gone, but they are
used now for their more natural purpose. For example, to set a device address.
Provisioning Templates, on the other hand, will use the new type of variables that you can find if you
navigate to Policy & Objects -> Object Configurations and look under Advanced -> Metadata
Variables.
16
You may need to add these objects in the Tools -> Feature Visibility list.
Here you can find all those variables that you have created earlier, while importing the Jinja
Templates.
What can we say about these new variables?
1. They are ADOM-wide, meaning that each ADOM can have its own list of variables. This is useful in
multi-tenant deployments, when different tenants (customers) use different design flavors and/or
Project Templates, which in turn require a different set of variables.
2. They can have a default value. Which means that you do not have to set their value for each
device, if you know that most (or all) of your devices will use the same value.
3. If you double-click any of the variables, you will see the “Per-Device Mapping” section. This is how
we set their per-device values. Or to put it the other way around: whenever we set a per-device
value of a variable, this variable effectively gets a new per-device mapping.
2.2.4. Device Blueprints
Another new object introduced in FMG 7.2.x is the Device Blueprint. It is a “recipe” for creation of
Model Devices, defining what Provisioning Templates to assign to them, to which Device Group(s) to
add them and so on. This allows faster creation of Model Devices and even allows you to create them
in bulk, as we will do shortly.
In our deployment workflow, the devices are categorized using Device Groups. That is, when we
assign a new Model Device to the right Device Group, this automatically brings all the right
Provisioning Templates (not only Jinja Templates, but also SD-WAN Templates and so on). Therefore,
our Device Blueprints do not need to assign individual templates, instead they need simply to select
the right Device Group.
Let’s create a Device Blueprint for the group “Edge-DualISP”:
• Navigate to the Device Blueprint list:
• Create a new Device Blueprint for our Edges, with the following parameters:
17
Parameter Value
Name Edge-DualISP
Device Model FortiGate-VM64-KVM
Add to Device Group Edge-DualISP
Assign Policy Package Edge
• Leave the rest of the options unset and save the blueprint.
• Create another Device Blueprint for our Hubs, with the following parameters:
Parameter Value
Name Hubs-DualISP
Device Model FortiGate-VM64-KVM
Add to Device Group Hubs-DualISP
Assign Policy Package Hub
It comes without saying that in a real-world large-scale deployment you can (and should) create Device
Blueprints using the REST API. Our Postman collection already includes an API call for that!
2.2.5. Configuring Model Devices
In our lab, FGT devices come partially pre-configured, and DHCP Option 240 is in place for ZTP. It is likely
that by this moment the devices have already tried to contact the FMG.
IMPORTANT: Before you proceed, switch to the “root” ADOM and delete all the unauthorized devices that
you find there! Without this step, trying to import the devices with the same serial number will result in
different behavior than what is described below.
We are now ready to create Model Devices in bulk.
• Click on Add Device -> Import Model Devices from CSV File:
18
• Select the CSV file “inventory.CustomerA.csv” that you have generated for this lab. It will be
automatically parsed, making it easier to read its contents.
Each row in the file describes a Model Device. You will find the following columns:
Column Description
sn Serial Number of the device
device blueprint Name of the Device Blueprint to apply
name Name of the Model Device to create
The rest of the columns correspond to per-device variables. Here we set the values for all our
required variables: hostname , loopback , profile , region and lan_ip .
19
There is one more variable - vm_interface_number . It is pre-defined in FMG for the Model
Devices of FGT-VMs. Its purpose is to set the number of physical interfaces to create.
You may recall that in FMG 7.0.x we were using a Pre-Run CLI Template for this purpose.
• Click “OK” - and all the Model Devices will be created at once.
Feel free to confirm that each Model Device has been automatically assigned to the right Device
Group, which in turn resulted in the assignment of all the necessary Provisioning Templates. Also the
right Policy Package has been assigned to each Model Device.
If you right-click on a device and select “Edit Variable Mapping”, you will also find the right values of
all the required variables.
20
As we mentioned earlier, you can also see these values as “Per-Device Mapping” entries of the
variables (navigate to Policy & Objects -> Object Configurations -> Advanced -> Metadata Variables
and double-click on any variable in the list to see them).
We now need only a few more steps to complete the staging of our project:
• Navigate to Provisioning Templates -> Certificate Templates and right-click on the “Edge”
template. Click “Generate” and select the two Edge devices (“site1-1” and “site1-2”). Click “OK” to
issue the certificates.
• Similarly, issue the certificates for the Hubs (“site1-H1” and “site1-H2”), using “Hub” template.
• Go back to the device list, select all the devices, right-click and choose “Quick Install (Device DB)”,
to generate complete Underlay, Overlay, Routing and SD-WAN configuration for all the Model
Devices.
• Finally, select all the devices again, right-click and choose “Re-install Policy”, to install the Firewall
Policy Packages on all the Model Devices.
Now your Model Devices are ready for the real device deployment!
21
2.2.6. Linking Real Devices
It’s time to link the real FGT devices to their respective Model Devices.
In a real-world deployment, this is the moment when you would simply power on each newly arrived
FGT device. In our lab environment, the FGTs come partially pre-configured, hence we will need to
“factory-reset” them.
We have configured DHCP Option 240 to facilitate the ZTP process. Follow these steps to complete
the deployment:
• SSH to each of the 4 devices (“site1-1”, “site1-2”, “site1-H1” and “site1-H2”) and perform their
factory reset, while preserving their VM licenses (and hence their serial numbers):
execute factoryreset2 keepvmlicense
• After the devices reboot, they will trigger the usual ZTP process, pretty much like a real new FGT
device would do:
• You can follow this process in the System Settings -> Task Monitor. Make sure that all the tasks
complete successfully for all the 4 FGT devices.
22
Our project is now fully deployed.
23
2.3. Hub-to-Edge Traffic
2.3.1. Understanding the Problem
If you check the SD-WAN Templates assigned to our devices, you will notice the following:
• The Edges apply typical SD-WAN rules to the corporate traffic (rules “Corporate-H1” and
“Corporate-H2”), preferring ISP1 and applying the SLA target of 100 ms.
• The Hubs do not apply any SD-WAN rules and do not probe the Edges. At the moment, the SD-
WAN is enabled on the Hubs solely for the configuration consistency (creating the same SD-WAN
Zones as on the Edges) and maybe also to monitor health of the local Internet links (for visibility).
In other words, all the traffic on the Hub will be handled by the conventional routing.
24
Connect to the “client1-H1” (behind the Hub “site1-H1”) using SSH and start pinging the “client1-1”
(behind the Edge “site1-1”):
client1-H1# ping 10.0.1.101
The ping should work just fine. Use the packet sniffer on “site1-1” to determine what overlay this
traffic is using:
site1-1# diagnose sniffer packet any "host 10.1.0.7" 4 | grep H
The result can be either H1_ISP1 or H1_ISP2, because the Hub is doing a simple ECMP for this traffic.
Connect to the “WAN Controller” using HTTP and raise the latency of the corresponding Internet link
on “site1-1” above the threshold of 100 ms. For example, if the traffic is using H1_ISP1, then raise the
latency of S11-ISP1. You will notice that your ping starts suffering the high latency without switching
anywhere.
This is, of course, expected. And this is the first problem we need to solve.
We must ensure that the Hub selects only healthy overlays for the Hub-to-Edge sessions.
Let us not over-exaggerate the problem though. Connect to the “client1-1” and start pinging the
“client1-H1” (that is, create a session in the opposite direction, Edge-to-Hub):
client1-1# ping 10.1.0.7
You will notice that this ping is not suffering the high latency. A quick look at the sniffer reveals that
this traffic is using a healthy overlay bi-directionally (H1_ISP2 in our example).
The reason is that the Edge selects H1_ISP2 based on its SD-WAN rule and the Hub selects the reply
path symmetrically. This is a typical ECMP behavior on stateful devices, and FortiGate is not an
exception.
Hence, we can conclude that our problem does not apply to the replies to Edge-to-Hub traffic. It only
applies to the sessions originated behind the Hub!
2.3.2. Remote Health Probing
The main cause of our problem is that our Hubs do not actively probe the Edges. Indeed, if the Hub
had a usual SD-WAN Health Check (Performance SLA) probing the Edge, it would be able to select a
healthy overlay, just like the Edge can do it for the Edge-to-Hub sessions!
25
Unfortunately, this won’t scale. Each Hub might serve hundreds (if not thousands) of Edge devices.
Actively probing all their overlays would be a huge waste of resources. Especially because the Edge
devices already send bi-directional probes over those same overlays!
FOS 7.2.1 introduces a new solution to this problem, as follows:
1. Edge devices will probe the Hub’s loopback using PING, over each of the overlays, exactly as they
did before - measuring latency, jitter and packet loss, to be used by their local SD-WAN rules. In
other words, no change so far.
2. They will embed their measurements (latency, jitter and packet loss values) into the probe packets
themselves (as we all know, ICMP packets allow us to embed data).
3. On the other side of the tunnel, the Hub will read this data and store it, as if it measured it.
4. Finally, the Hub will apply different route priorities to the LAN prefixes learnt from the Edge
devices, making sure that the routes via healthy overlays get better priorities than those via
unhealthy overlays.
5. This is good enough for the conventional routing to do the trick. Instead of a simple ECMP, the
Hub can now prefer the healthy overlays!
Let’s configure this.
• Navigate to Provisioning Templates -> SD-WAN Templates and edit the “Edge-DualISP” template.
• Inside it, edit the “HUB” Health Check (Performance SLA) and enable the option “Embed Measured
Health”:
26
This option instructs the Edge to embed the measured values into the ICMP probe packets. Save
the modified SD-WAN Template.
• Edit the “Hub-DualISP” template. Create a new Health Check (Performance SLA) called “EDGE”:
◦ Set its probe mode to “Remote”
◦ Specify the overlays (EDGE_ISP1 and EDGE_ISP2) as participants
◦ Create an SLA Target defining what “healthy” means from the Hub’s perspective (let’s use the
same target of 100 ms latency)
◦ The values “Priority IN-SLA” and “Priority OUT-SLA” define the route priorities that will be
applied, based on the configured SLA target (let’s apply priority 5 to the routes via healthy
overlays and priority 8 to the routes via unhealthy overlays)
◦ Finally, set “Redistribute SLA ID” to 1 (this value is an index that determines what SLA Target to
use for the route priority manipulation - for the cases when you have more than one SLA Target
defined in the Health Check)
27
Save the modified SD-WAN Template.
• Install the configuration on all the devices (Edges and Hubs).
Just for the reference, the FOS CLI that you have just generated is summarized below.
On the Edge:
config system sdwan

config health-check
edit “HUB”
set server “10.200.99.1”
set embed-measured-health enable
set members 3 4 5 6
config sla
edit 1
set link-cost-factor latency
set latency-threshold 100
next
end
next
end
end
On the Hubs:
config system sdwan

config health-check
edit “EDGE”
set detect-mode remote
set sla-id-redistribute 1
28
set members 4 3
config sla
edit 1
set link-cost-factor latency
set latency-threshold 100
set priority-in-sla 5
set priority-out-sla 8
next
end
next
end
end
There is one more CLI option necessary in the BGP configuration to make it all work, which was
conveniently generated for you by the Jinja Templates (you’ll find it in the “03-Hub-Routing.j2”):
config router bgp

set recursive-inherit-priority enable
end
You will understand its purpose in the next section.
But before we go there, check the Hub-to-Edge traffic (your ping from “client1-H1” to “client1-1” is
probably still running) and confirm that it is now avoiding the unhealthy overlay!
Our problem is solved!
2.3.3. Verifying the Solution
We have already described it in words, now let’s check some outputs.
Clearly, our main point of interest is the Hub. Connect to “site1-H1” using SSH and look for the route
towards “client1-1” (10.0.1.0/24):
site1-H1 # get router info routing-table all

…
B 10.0.1.0/24 [200/0] via 10.200.1.1 (recursive via EDGE_ISP2 tunnel 10.0.0.4 [5]), 00:40:18
(recursive via EDGE_ISP1 tunnel 10.200.1.1 [8]), 00:40:18, [1/0]
S 10.200.1.1/32 [15/0] via EDGE_ISP2 tunnel 10.0.0.4, [5/0]
[15/0] via EDGE_ISP1 tunnel 10.200.1.1, [8/0]
We can confirm that the routes via EDGE_ISP2 have priority 5, and thus they are preferred over the
routes via EDGE_ISP1 which have priority 8. Exactly as we wanted!
What happened here?
• The Hub has received the health information embedded by the Edge (“site1-1”) into the ICMP
probes. This information can be seen using this command:
site1-H1 # diagnose sys sdwan health-check remote

Remote Health Check: EDGE(4)
Passive remote statistics of EDGE_ISP2(26):
29
EDGE_ISP2_0(10.200.1.2): timestamp=09-28 02:29:15, latency=1.339, jitter=0.166,
pktloss=0.000%, SLA id=1, pass
Remote Health Check: EDGE(3)
Passive remote statistics of EDGE_ISP1(25):
pktloss=0.000%, SLA id=1, fail
• According to the SLA target of 100 ms (configured on the Hub), the EDGE_ISP1 overlay is
unhealthy for “site1-1”. Note that this status has to be applied per Edge, because there is nothing
wrong with the EDGE_ISP1 overlay itself. For example, it is perfectly healthy for another Edge
“site1-2”!
• The Hub will apply the configured OUT-SLA priority to all the static routes towards the particular
Edge via the particular overlay which was found unhealthy.
In our example, the static route towards 10.200.1.1/32 via EDGE_ISP1 (injected by IKE, as usual for
“BGP on Loopback” design) has got the priority 8.
• Similarly, the Hub will apply the configured IN-SLA priority to all the static routes towards those
Edges and via those overlays which were found healthy.
In our example, the static route towards 10.200.1.1/32 via EDGE_ISP2 has got the priority 5.
• As usual for “BGP on Loopback” design, the Hub will recursively resolve the BGP route towards
10.0.1.0/24 (the LAN behind “site1-1”) using the above-mentioned static routes. But this time it will
inherit their priorities, thanks to the set recursive-inherit-priority enable command added
to the BGP configuration.
And this is how our BGP route has applied different priorities to each of the paths it resolved!
You can now reduce the latency back to 0 and confirm that the traffic switches back.
We would like to highlight that there is no change in the BGP advertisements between the Edge and the
Hub! The Edge is still advertising a single BGP route towards 10.0.1.0/24 using its loopback address
(10.200.1.1) as a BGP NH. The manipulation of priorities happens locally on the Hub, during the recursive
resolution of this route.
2.3.4. Primary/Backup ISP
When both overlays are healthy, all the routes have the same priority (Priority IN-SLA). This means
that the Hub will do ECMP, load-balancing sessions between them.
To illustrate this, let’s add another IP address to the “client1-H1” host and try to ping “client1-1” from it
(simulating another source client behind the Hub):
30
client1-H1# ip addr add 10.1.0.8/24 dev eth1
client1-H1# ping 10.0.1.101 -I 10.1.0.8
Use the sniffer on “site1-1” to confirm that one of these flows is using H1_ISP1 and another is using
H1_ISP2. Again, there is nothing surprising about it, because both overlays are healthy, and hence
both routes have the same priority 5 on the Hub, resulting in a simple ECMP:
site1-H1 # get router info routing-table bgp

Routing table for VRF=0
If both Internet connections are of the same type and both have unlimited traffic (with no extra
charges), this behavior may be exactly what you need. After all, our goal was simply to ensure that
the Hub selects only healthy overlays for the Hub-to-Edge sessions. And if, while doing that, we also
utilize both available ISPs, this is even better!
However, what if we have a reason to prefer ISP1 over ISP2? For example, what if our ISP2 is a backup
4G/LTE connection paid per traffic? Luckily, there is a very intuitive way to implement this!
Question: How do we solve this same problem for Edge-to-Hub traffic?

Answer: Using an SD-WAN rule on the Edge device.
Now we can do (almost) the same on the Hubs, for the Hub-to-Edge traffic! Edit the SD-WAN
Template for the Hubs (“Hub-DualISP”), adding a new SD-WAN rule as follows:
Parameter Value
Source Address CORP_LAN
Destination Address CORP_LAN
Strategy Manual
Interface Preference EDGE_ISP1, EDGE_ISP2
Advanced Options tie-break: fib-best-match
31
Do not underestimate the last row! It is important to enable the so-called “Best Route” mode for this
rule, by enabling the tie-break fib-best-match option!
Save your changes and install the configuration on the Hubs. Make sure that the traffic now always
prefers EDGE_ISP1, for both of your flows.
Sanity check: does the switchover still work?

Use the “WAN Controller” to raise the latency of S11-ISP1 above 100 ms. Make sure that both flows
switchover to ISP2.
How does it work?
As the name suggests, the “Best Route” mode selects only among the members having the best
route to the destination. In particular, it considers route priorities: the best route is the one having the
best priority.
Hence:
• When all the overlays are healthy and all the routes have the same priority (Priority IN-SLA), the
SD-WAN rule selects among all the listed members, preferring EDGE_ISP1 (the first one in the list).
• If EDGE_ISP1 goes out of SLA for a particular Edge, the corresponding route will get a worse
priority (Priority OUT-SLA) and so, EDGE_ISP1 will no longer be considered. The best route to the
destination is now only via EDGE_ISP2 (assuming that it is healthy), and hence EDGE_ISP2 will be
selected.
32
2.4. Advertising Preferred Hub
Before we can continue, we must complete the deployment for CustomerA, adding the “legacy” site
mentioned earlier (which will be represented by “site2-H1” in our lab). Each of our SD-WAN Hubs
(“site1-H1” and “site1-H2”) will build an IPSEC tunnel towards that legacy site, which in reality could
be serviced by a 3rd-party gateway.
In our lab, it will be served by a FortiGate. But we will deploy it in a separate ADOM, to highlight that it
is external to the SD-WAN solution. The main point is that this device has no SD-WAN capabilities
whatsoever. It only supports industry-standard tunnels and protocols, such as IPSEC and BGP.
Run the following two Postman folders:
CustomerA - Hub-to-Edge SD-WAN / Legacy Site

CustomerA - Hub-to-Edge SD-WAN / Hubs-to-Legacy
• The first folder will create a new ADOM called “OUTSIDE”, onboarding “site2-H1” and applying all
the necessary configuration.
• The second folder will add IPSEC Templates and BGP Templates to our “CustomerA” ADOM, in
order to connect our Hubs (“site1-H1” and “site1-H2”) to the legacy site.
Confirm that each of our Hubs now has an active IPSEC tunnel (called “OUTSIDE”) towards the legacy
site, with an EBGP session running over it.
Connect to the legacy gateway (“site2-H1”) using SSH and confirm that it learns the LAN prefixes of
our SD-WAN sites from both Hubs (ECMP):

B 10.0.1.0/24 [20/0] via 192.168.1.2 (recursive via SDWAN-H1 tunnel 100.64.1.1), 00:31:16, [1/0]
[20/0] via 192.168.2.2 (recursive via SDWAN-H2 tunnel 100.64.2.1), 00:31:16, [1/0]
[20/0] via 192.168.2.2 (recursive via SDWAN-H2 tunnel 100.64.2.1), 00:31:16, [1/0]
Now connect to the client on the legacy site (“client2-H1”) using SSH and start pinging “client1-1”:
client2-H1# ping 10.0.1.101
Use packet sniffer on “site1-1” to confirm the selected path. We expect that the legacy site picks one
of the SD-WAN Hubs by ECMP, using it as an entry point to our SD-WAN domain.
33
For example, let’s say, the currently selected overlay is H1_ISP1. Use “WAN Controller” to perform the
following tasks:
• Raise the latency on S1H1-ISP1 link (ISP1 on the Hub side!). The traffic is expected to escape the
high latency by switching to H1_ISP2, thanks to the Remote Health Probing we have implemented
previously.
• Raise the latency also on S1H1-ISP2 link (ISP2 on the Hub side). This time you will notice that the
traffic keeps suffering the high latency (possibly switching back to H1_ISP1, but this doesn’t really
matter).
What happens is that, while each of our Hubs has the mechanism to avoid its own unhealthy overlays,
the legacy site has no mechanism to select the optimal Hub as an entry point! Indeed, in this case it
would be smarter to use the other Hub (“site1-H2”), because its overlays are healthy. But the legacy
site simply has no knowledge about this!
And this brings us to the second problem that we need to solve:
We must ensure that external sites select only the Hubs with at least one healthy overlay as their entry
points to our SD-WAN domain.
There is one classical solution to this problem: we can prefer the routes advertised by our Primary
Hub (“site1-H1”). For example, it can advertise them to the legacy site with a better BGP MED value.
Then, as long as the Primary Hub is in service, the legacy site will choose it as an entry point (instead
of doing ECMP). This, however, will only help to cover the cases when the Primary Hub becomes
completely out of service (dead!).
In our use case here, on the other hand, the Primary Hub is still operational, but its overlays are
unhealthy. We must extend the classical solution (based on BGP MED) to cover this case!
2.4.2. Signaling SLA Status with BGP
Interaction between our SD-WAN and BGP is not new. Signaling of the SLA status to remote (possibly
3rd-party) peers via BGP is possible using SD-WAN Neighbor feature (configured under
config sys sdwan -> config neighbor stanza).
SD-WAN Neighbor feature specification (prior to FOS 7.2.1).
Given:
• BGP neighbor N
• SD-WAN Member M
• SD-WAN Health Check H with SLA Target T
we define that:
• The neighbor N is in-SLA (“healthy”) iff the member M meets the target T of the health check H
• Otherwise the neighbor N is out-of-SLA (“unhealthy”)
34
We then apply the following route-map to the outgoing BGP advertisements towards neighbor N:
• route-map-out-preferable iff the neighbor N is healthy

• route-map-out iff the neighbor N is unhealthy
The implication here is that the SLA status of one (and only one!) SD-WAN Member (M) can define
the SLA status of the BGP neighbor (N). This worked quite well with the traditional “BGP per Overlay”
routing design, because we had a separate BGP neighbor over each of the overlays. Hence, we could
have this 1:1 mapping (1 overlay member : 1 BGP neighbor).
This is no longer possible with the “BGP on Loopback” routing design. That is why we had to extend
the SD-WAN Neighbor feature to support a list of members!
SD-WAN Neighbor feature specification (FOS 7.2.1+).
Given:
• BGP neighbor N
• List of SD-WAN Members [ M1, M2, …, Mn ]
• SD-WAN Health Check H with SLA Target T
• Integer number i ( minimum-sla-meet-members )
we define that:
• The neighbor N is in-SLA (“healthy”) iff at least i members out of the list [ M 1, M2, …, Mn ] meet the
target T of the health check H
• Otherwise the neighbor N is out-of-SLA (“unhealthy”)
We then apply the following route-map to the outgoing BGP advertisements towards neighbor N:
• route-map-out-preferable iff the neighbor N is healthy

• route-map-out iff the neighbor N is unhealthy
By default, the value of minimum-sla-meet-members is set to 1. So, if an Edge device takes the
Primary Hub BGP neighbor and lists all of its overlays, then the neighbor will be considered “healthy”
(thus applying route-map-out-preferable ) whenever at least one of the overlays towards the
Primary Hub is healthy. Isn’t it exactly when we want our legacy sites to select the Primary Hub as
their entry point?
Let’s configure this:
• Edit the SD-WAN Template applied to the Edge devices (“Edge-DualISP”)
• Create a new SD-WAN Neighbor for the Primary Hub with the following parameters:
Parameter Value
IP 10.200.1.253
Interface Member H1_ISP1, H1_ISP2
35
Parameter Value
Performance SLA HUB
SLA 1 (100 ms)
• Create another SD-WAN Neighbor for the Secondary Hub with similar parameters:
Parameter Value
IP 10.200.1.254
Interface Member H2_ISP1, H2_ISP2
Performance SLA HUB
SLA 1 (100 ms)
• Save the SD-WAN Template and install the configuration on both Edge devices.
Just for the reference, the FOS CLI that you have just generated is highlighted below (on Edges):
config system sdwan

config neighbor
edit "10.200.1.253"
set member 4 3
set health-check "HUB"
set sla-id 1
next
edit "10.200.1.254"
set member 6 5
36
set health-check "HUB"
set sla-id 1
next
end
end
What about the route-maps on the Edges? Conveniently, that part was already prepared for us by the
Jinja Templates. We apply set route-map-out-preferable "SLA_OK" (see “03-Edge-Routing.j2”),
which applies a custom BGP community to all the routes advertised to the healthy BGP neighbor:
config router route-map

edit "SLA_OK"
config rule
edit 1
set set-community "{{ project.regions[region].as }}:99"
next
end
next
end
The Hubs (thanks to “03-Hub-Routing.j2”) are already prepared to match on that community:
config router community-list

edit "SLA_OK"
config rule
edit 1
set action permit
set match "{{ project.regions[region].as }}:99"
next
end
next
end
To summarize: when the Hub receives the SLA_OK community with a certain route, it means that at
least one of its overlays is healthy towards that particular Edge. Now the Hub can advertise this
valuable information to the legacy site, by applying the right MED value to the routes advertised over
the EBGP peering:
• The Primary Hub (“site1-H1”) will advertise MED 90 whenever SLA_OK is received and MED 100
otherwise
• The Secondary Hub (“site1-H2”) will advertise MED 95 whenever SLA_OK is received and MED 105
otherwise
Think of it for a moment - and you will see that it meets our requirements.
Let’s configure this:
• Navigate to Provisioning Templates -> BGP Templates and edit the “H1_OUTSIDE” template
37
Inside it, edit the only existing neighbor (192.168.1.1). Under the “IPv4 Filtering” section enable the
•
“Route Map Out” option and click on the “plus” icon to create a new route-map:
• … in theory, we could continue our step-by-step guidance here, but we have a better idea!
As you have probably already noticed, the BGP Templates closely follow the FOS CLI configuration.
Below we are going to give you the target FOS configuration that we want you to achieve. It will be
then your task to achieve it, by editing the BGP Templates “H1_OUTSIDE” and “H2_OUTSIDE”
respectively.
Simply edit the templates, install the configuration on the Hubs and verify the result. Here is what you
are supposed to end up with.
On Primary Hub (“site1-H1”):

edit “H1_TO_OUTSIDE”
config rule
edit 1
set match-community “SLA_OK”
set set-metric 90
next
edit 2
set set-metric 100
next
end
38
next
end
config router bgp
config neighbor
edit “192.168.1.1”
set soft-reconfiguration enable
set remote-as 65100
set route-map-out “H1_TO_OUTSIDE”
next
end
end
On Secondary Hub (“site1-H2”):

edit “H2_TO_OUTSIDE”
config rule
edit 1
set match-community “SLA_OK”
set set-metric 95
next
edit 2
set set-metric 105
next
end
next
end
config router bgp
config neighbor
edit “192.168.2.1”
set remote-as 65100
set route-map-out “H2_TO_OUTSIDE”
next
end
end
Good Luck!
2.4.3. Verifying the Solution
Hopefully, your route-maps on both Hubs now look exactly as they should.
Note that the route-maps are not applied immediately to the already established BGP peering. We
must trigger BGP updates by executing the following command on both Hubs:
execute router clear bgp external soft
Once you do that, you will notice how your ping (which is probably still flowing) finally stops suffering
the high latency. The traffic has switched over to the Secondary Hub!
39
Looking at the route towards 10.0.1.0/24 on “site2-H1”, we can confirm that it now prefers H2 (“site1-
H2”):

...
We can also understand why:
• The Edge (“site1-1”) doesn’t have any healthy overlays towards the Primary Hub (“site1-H1”), hence
the SD-WAN Neighbor is considered unhealthy, and “site1-H1” does not receive the SLA_OK
community:
site1-H1 # get router info bgp community-list SLA_OK

site1-H1 #
• At the same time, the Edge has healthy overlays towards the Secondary Hub (“site1-H2”), the SD-
WAN Neighbor is considered healthy, and “site1-H2” does receive the SLA_OK community:
site1-H2 # get router info bgp community-list SLA_OK

...
Network Next Hop Metric LocPrf Weight RouteTag Path
*>i10.0.1.0/24 10.200.1.1 0 100 0 0 i <-/1>
• As a result, the legacy site “site2-H1” receives MED 100 from the Primary Hub and MED 95 from
the Secondary Hub, hence preferring the latter:
site2-H1 # get router info bgp network

...
*> 10.0.1.0/24 192.168.2.2 95 0 0 65001 i <-/1>
* 192.168.1.2 100 0 0 65001 i <-/->
...
Feel free to perform additional tests on your own!
40
2.5. Optimal Shortcut Path
So far we did not pay much attention to ADVPN, as we were not focusing on the Edge-to-Edge
traffic. But it turns out that the new functionality can be useful also here.
Connect to one of the Hubs (“site1-H1”) and examine its policy routes. If you are familiar with our
traditional SD-WAN/ADVPN configuration, then you will be surprised to find out that it is… empty!
site1-H1 # show router policy

config router policy
end
What seems to be missing are our standard overlay stickiness policies! Our Jinja Templates could
generate them, but we have chosen not to do so.
This is controlled by an optional parameter set overlay_stickiness = <true|false> in the Project

Template.
As we have seen, the Hubs can manipulate the route priorities to select only the healthy overlay
paths towards the Edge devices. We can make use of it also for the Edge-to-Edge traffic, so that the
Hubs select only the healthy overlay for the 2nd half of the path. And this, in turn, will determine what
ADVPN shortcut will be built (intra-ISP or cross-ISP).
The best part is that we do not need to configure anything in addition to what we have already
configured! Let’s demonstrate this.
• Connect to “WAN Controller” and make sure that all the links are healthy.
• Connect to “client1-1” and start pinging “client1-2”:
root@client1-1:~# ping 10.0.2.101
• Connect to each of the Edge devices (“site1-1” and “site1-2”) and use get ipsec tunnel list
command to find out what ADVPN shortcut has been built.
◦ Since there is no overlay stickiness, the shortcut can be either intra-ISP (H1_ISP1_0 on both
sides) or cross-ISP (H1_ISP1_0 on “site1-1” and H1_ISP2_0 on “site1-2”).
◦ But because in one of the previous sections we have configured an SD-WAN rule on the Hub,
instructing it to prefer EDGE_ISP1, we expect that you find the intra-ISP shortcut (H1_ISP1_0).
This is fine, as long as all our links are healthy.
• Stop your ping and clear the shortcuts on both Edge devices:
41
diagnose vpn ike gateway flush name H1_ISP1_0
• Now use the “WAN Controller” to raise the latency on the S12-ISP1 link (for “site1-2”). Will our SD-
WAN solution be smart enough to avoid the unhealthy ISP1 link on “site1-2”, by building a cross-ISP
shortcut?
• Restart your ping and check what shortcut is built now. You can guess already by seeing how your
ping is not suffering the bad quality. Indeed, you should see the cross-ISP shortcut: H1_ISP1_0 on
“site1-1” and H1_ISP2_0 on “site1-2”.
What happened?
• When EDGE_ISP1 overlay towards “site1-2” became out of SLA, our Hub (“site1-H1”) has set the
OUT-SLA priority for the route towards “site1-2” via that overlay:

…
• As a result, the best route towards 10.0.2.0/24 is now via EDGE_ISP2. That is why the Hub will now
select it for your ping, and therefore the shortcut will be built accordingly.
Note that it doesn’t matter if you have your SD-WAN rule on the Hub or not:
◦ Your SD-WAN rule instructs the Hub to prefer EDGE_ISP1, but only as long as it has the best
route to the destination. Which is no longer the case (remember: “best route” means also best
priority!)
◦ And if there were no SD-WAN rule at all, the Hub would do ECMP, which again means choosing
among the interfaces with the best route to the destination. Which again eliminates
EDGE_ISP1.
• And this is how the Hub ensures that we build a cross-ISP shortcut in this case.
What behavior would we observe with the overlay stickiness?
• Since “site1-1” would still select ISP1, the Hub would stick to EDGE_ISP1. An intra-ISP shortcut
would be built (H1_ISP1_0).
• Shortly after, the ADVPN shortcut monitoring would detect that this shortcut was a bad choice.
• Then “site1-1” would switchover to ISP2, the Hub would stick to EDGE_ISP2. Another intra-ISP
shorcut would be built, this time over ISP2 (H1_ISP2_0)
So as you can see, we would be able to avoid the unhealthy ISP1 link on “site1-2” both with and
without the overlay stickiness. But only without it we can build the optimal shortcut right away,
without suffering from a bad choice first.
42
Notes:
• We believe this may be an optimal design for the topologies with multiple Internet links, at least
when all the Internet links can be considered equal. In such topologies, overlay stickiness does not
make sense anymore, especially now that the Hubs can make an educated choice between intra-
ISP and cross-ISP shortcuts!
• This design is NOT appropriate for the topologies with segregated transports (such as Internet +
MPLS), because cross-overlay shortcuts are physically impossible there. In such topologies,
overlay stickiness is still required for the correct ADVPN operation.
• The intelligence on the Hub is somewhat limited, compared to the Edge. For example, while Edge
devices can apply different SLA targets per application, the Hubs can rely only on the route
priorities. The priority of each route can be either IN-SLA or OUT-SLA, without respect to a
particular application.
If you use different SLA targets for different Edge-to-Edge applications, the best advice for you
will be to set the strictest among your SLA targets on the Hubs. For example, if you use SLA
targets of 100, 200 and 250 ms on the Edge, it makes sense to configure a 100 ms target on the
Hubs. This way, if any overlay exceeds 100 ms latency, the Hub will set OUT-SLA priority for all the
routes via that overlay, thus preferring another overlay(s) for all the applications. For some of them
it will be unnecessarily strict, but - more importantly - none of the applications will suffer bad
quality!
• Finally, to avoid any confusion: you can still keep the overlay stickiness, if you prefer the “old”
behavior. The new route priorities do not break it: if you configure the policy routes, they will take
precedence over the conventional priority-based route lookup. Hence, they will work just as they
did in the previous releases.
Use the “WAN Controller” to reduce the latency and go back to the initial state.
2.6. Conclusion and Cleanup
This concludes our exploration of the new Hub-to-Edge SD-WAN capabilities in FOS/FMG 7.2.x. As
you could see, we split the problem that CustomerA has into two parts:
1. How to select a healthy entry point (Hub) to our SD-WAN domain?
2. On the chosen Hub, how to select a healthy overlay towards the Edge device?
We solve these problems independently, and each solution gives quite a lot of flexibility to define the
behavior that we want to achieve end-to-end.
Note that both solutions have been designed for the “BGP on Loopback” routing design. But the first
part of the solution (“Remote Health Probing”) works also with the BGP-less design (such as
this one).
43
Finally, we have shown how the “Remote Health Probing” helps us select more optimal shortcut
paths, inclining us to stop using the overlay stickiness in some topologies.
If you opt for the traditional “BGP per Overlay” routing design, you can use the traditional approach to
solve both mentioned problems. Namely, you can use the original version of the SD-WAN Neighbor feature
paired with route-map-out-preferable , marking each overlay/neighbor as “healthy” or “unhealthy”.
This solution keeps working just as it did in the previous releases.
2.6.1. Cleanup
Before proceeding to the next chapter, please delete the devices deployed in the “CustomerA”
ADOM.
We are going to reuse the same devices for our next customer, and hence we must ensure that their
serial numbers are not in use.
44
3. Segmentation over Single Overlay

3.1. Project Overview
Our next customer is CustomerB who has a very similar Dual-Hub SD-WAN/ADVPN topology, with all
sites having two Internet links, just like CustomerA has. But there is a major difference. CustomerB
uses multiple VRFs in the network, for complete segregation of the routing domains. The requirement
is to preserve this segregation end-to-end, across the entire SD-WAN domain.
The following diagram illustrates CustomerB’s network:
This network has two segments that must remain fully segregated: Education and Finance.
This is our opportunity to demonstrate the new VRF-Aware Overlays introduced in FOS 7.2.x. Or, as
this functionality is often also called, SD-WAN Segmentation over Single Overlay.
45
3.2. Deploying the Project
3.2.1. Creating the Foundation
In Postman, edit your environment and set the value of the adom variable to “CustomerB”. Do not
forget to save the changes!
We start by running exactly the same Postman folder as we ran for CustomerA - Foundation .
Yes! Almost the entire foundation remains exactly the same! Isn’t it beautiful?
The main difference comes, of course, in Jinja Templates. We must import a different design flavor!
But this simply means importing the files from a different folder, nothing more than that:
• Import the Project Template for CustomerB. You will find it in the folder “CustomerB” that you have
downloaded in the Introduction chapter. Do not forget to set its type to “Jinja Script”!
• You will be asked to create the missing variables ( lan_ip_edu and lan_ip_fin ). Create them
and then finish importing the Project Template.
• Now import all the Jinja Templates for the “BGP on Loopback Multi-VRF” design flavor that you
have downloaded from our GitHub. There is no need to import the templates from subfolders. In
total there will be 7 *.j2 files. Also here, do not forget to set their type to “Jinja Script”!
Are you absolutely sure that you have imported the templates from the right design flavor?
• You will be again asked to create the missing variables (four of them). Create them and then finish
the import.
• Create two CLI Template Groups as follows:
◦ Edge-Template:
▪ 01-Edge-Underlay
▪ 02-Edge-Overlay
▪ 03-Edge-Routing
46
◦ Hub-Template:
▪ 01-Hub-Underlay
▪ 02-Hub-Overlay
▪ 03-Hub-Routing
▪ 04-Hub-MultiRegion
Do you remember that we have an API call that does it for you?
• Finally, assign them to the device groups “Edge” and “Hubs” respectively.
As promised: there is no much (visible) difference comparing to the CustomerA!
3.2.2. Project Template
Examine the Project Template that you have imported and note the small differences, comparing to
the one we used for CustomerA:
1. The region definition now includes the list of the VRFs:
{# Regions #}
{% set regions = {
'SuperWAN': {
'as': '65001',
'hubs': [ 'site1-H1', 'site1-H2' ],
'vrfs': [
{
'id': 11
},
{
'id': 12
}
]
}
}
%}
2. In the device profile we now specify that port5 is in VRF=11 and port6 is in VRF=12:
{# Device Profiles #}
{% set profiles = {
‘DualISP’: {
‘interfaces’: [
{
‘name’: ‘port5’,
‘role’: ‘lan’,
‘vrf’: 11,
‘ip’: lan_ip_edu
},
{
47
‘vrf’: 12,
‘ip’: lan_ip_fin
}
]
}
%}
3. Since we now need two LAN subnets (one per VRF), we are using two separate variables -
lan_ip_edu and lan_ip_fin . As you can guess, VRF=11 will be used for the Education segment
and VRF=12 - for the Finance segment.
The rest of the Project Template is identical to the one used for CustomerA. As we will discover in the
next chapter, there are quite a few differences in the FOS configuration. But they are all abstracted
away from us by the Jinja Templates.
3.2.3. Configuring Model Devices
We will use Device Blueprints identical to those used for CustomerA. You can use the following
Postman request to create them quickly:
Foundation / Device Blueprints / Create Device Blueprints
And now we are ready to create Model Devices in bulk.
• Click on Add Device -> Import Model Devices from CSV File.
• Select the CSV file “inventory.CustomerB.csv” that you have generated for this lab.
The only noticeable difference comparing to CustomerA is that now we have two separate
variables for the LAN prefixes.
• Click “OK” - and all the Model Devices will be created at once.
48
Not to forget: a few more steps to complete the staging of our project:
• Navigate to Provisioning Templates -> Certificate Templates and right-click on the “Edge”
template. Click “Generate” and select the two Edge devices (“site1-1” and “site1-2”). Click “OK” to
issue the certificates.
• Similarly, issue the certificates for the Hubs (“site1-H1” and “site1-H2”), using “Hub” template.
• Go back to the device list, select all the devices, right-click and choose “Quick Install (Device DB)”,
to generate complete Underlay, Overlay, Routing and SD-WAN configuration for all the Model
Devices.
• Finally, select all the devices again, right-click and choose “Re-install Policy”, to install the Firewall
Policy Packages on all the Model Devices.
Now your Model Devices are ready for the real device deployment!
49
3.2.4. Linking Real Devices
You know what to do:
• SSH to each of the 4 devices (“site1-1”, “site1-2”, “site1-H1” and “site1-H2”) and perform their
factory reset, while preserving their VM licenses (and hence their serial numbers):
execute factoryreset2 keepvmlicense
• After the devices reboot, they will trigger the usual ZTP process
• You can follow this process in the System Settings -> Task Monitor. Make sure that all the tasks
complete successfully for all the 4 FGT devices.
Our project is now fully deployed.
50
3.2.5. Does It Work?
Instead of adding multiple clients to our lab, we did a trick that our CustomerB would not appreciate
in the real world: we have connected our clients to both segments:
We will do our tests by keeping either eth1 or eth2 interface up. Both interfaces are connected to a
local FortiGate device (port5 and port6 respectively). And both can receive an IP address and a
default gateway using DHCP.
DHCP Servers for all the local clients have been automatically configured on all the FortiGates by our Jinja
Templates.
51
Test 1 - Education segment
Connect to the “client1-1” using SSH.

Ensure that it is attached to the Education segment (via eth1), by running this simple shell script:
root@client1-1:~# /fortipoc/connect_edu.sh
You can safely ignore any warnings it may show. Verify the result by checking that you have an IP
address on eth1, as well as the default route via eth1. These are the shell commands you’ll want to
use:
client1-1# ip address
…
1236: eth1@if1237: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default qlen 1000
link/ether 02:09:0f:04:02:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.101/24 brd 10.0.1.255 scope global dynamic eth1
valid_lft 604466sec preferred_lft 604466sec
inet6 fe80::9:fff:fe04:202/64 scope link
valid_lft forever preferred_lft forever
…
client1-1# ip route
default via 10.0.1.1 dev eth1
…
In parallel, connect to the “client1-2” and ensure that it is attached to the Education segment too (by
running the same shell script).
Now try pinging between the two clients:

PING 10.0.2.101 (10.0.2.101) 56(84) bytes of data.
64 bytes from 10.0.2.101: icmp_seq=1 ttl=61 time=7.59 ms
As you can see, the ping works (and it even builds an ADVPN shortcut - notice the TTL=62!).
Test 2 - Finance segment
Stop the ping. Then attach both clients to the Finance segment, by running this shell script on both
sides:
52
root@client1-1:~# /fortipoc/connect_fin.sh
Now try pinging again, this time using the Finance subnets (check the actual IP that the target client
gets on its eth2 interface!):

The ping works also this time. Curiously, the ADVPN shortcut is available from the very first packet
(TTL=62!). This is because we reuse the shortcut that our previous ping has created! No wonder:
after all, we are testing the Segmentation over Single Overlay, so we are not building a separate
shortcut per segment!
Feel free to connect to the “site1-1” and see that single shortcut ( get ipsec tunnel list ).
Test 3 - Cross-segment
Stop the ping. Attach the “client1-1” back to the Education segment ( /fortipoc/connect_edu.sh )
and leave the “client1-2” in the Finance segment. Try pinging again:

From 10.0.1.1 icmp_seq=1 Destination Net Unreachable
As expected, communication across the segments is not allowed.
And now that we see it working let’s move to the next chapter to understand how!
53
3.3. How Does It Work?
On a very high level, we can summarize our solution as follows:
• Network segments are segregated into different VRFs, thereby providing both data-plane and
control-plane segregation.
Since different VRFs effectively mean separate routing tables, the IP overlap is supported in different
VRFs. Although we are not going to demonstrate this in our lab.
• On the control-plane, we use MP-BGP VPNv4 to advertise VRF information together with each
prefix. The routes are advertised across the entire SD-WAN domain, allowing to preserve the
segregation across different SD-WAN sites.
• On the data-plane, the traffic is “tagged” using a new vpn-id-ipip encapsulation, when it is sent
between the SD-WAN nodes via the IPSEC overlays. In other words, we send the traffic from
multiple segments (VRFs) over the same IPSEC tunnel, while “tagging” it with its VRF information,
so that the receiving node can preserve the segregation. This is why we sometimes call this
feature “VRF-Aware Overlays”.
The “vpn-id-ipip” encapsulation is a variation of IPIP. An extra IP header is added inside the ESP
payload (encrypted), encoding the VRF ID into the IP address field.
For those of you who up until now saw the analogy with BGP/MPLS L3VPN: the “vpn-id-ipip”
encapsulation is a simplified replacement of the VPN label used in that technology. One advantage for
us is that all our current ASIC and SoC generations (NP6, NP7, SoC4…) support the acceleration of IPIP.
In this chapter we are going to focus on the FOS side, deep-diving into the configuration that we
have just deployed.
3.3.1. CE and PE VRFs
We use the following terminology:
• CE VRF is a Customer VRF to which the actual LAN segment is attached. Each LAN-facing
interface is assigned to one of the CE VRFs.
• PE VRF is an Edge VRF in which the VRF-Aware Overlays are located. The tunnel interfaces are
assigned to the PE VRF.
Thus, in a typical multi-VRF deployment we will have one PE VRF and multiple CE VRFs.
The maximum number of VRFs supported by FOS 7.2.4+ is 252 (from 0 to 251). In a typical multi-VRF
deployment this means: 1 PE VRF + 251 CE VRFs.
The following diagram summarizes the above:
54
So how do we choose the VRF IDs? There are several important guidelines:
• Do not use VRF=0 for PE (or at all). There are several ways in which VRF=0 is “special” on FOS. In
multi-VRF deployments, if you can, we recommend avoiding VRF=0 altogether (it can still be used
for purposes outside of the SD-WAN network - for example, for out-of-band management).
Sometimes, however, you cannot avoid using VRF=0. Not all FortiGate features can work in VRFs
other than 0. One good example is multicast: at the moment of this writing, FOS supports it only in
VRF=0. So what if your customer needs multicast in one of their segments? The only way to
support that is by attaching that segment to VRF=0, which makes it a CE VRF! But even in that
case your PE VRF must be elsewhere!
Therefore, the first rule is NOT to configure PE VRF=0.

Then, only if you cannot avoid it, use VRF=0 as your CE VRF.
When we talk about providing Internet access, we will describe another reason NOT to configure PE
VRF=0!
• Use the same VRF IDs everywhere. The VRF IDs have global significance in our solution. All the
SD-WAN nodes (Hubs and Edges!) must use the same CE/PE VRF IDs!
This means that a particular segment (e.g. Finance) must be attached to the same CE VRF across
the entire SD-WAN domain.
This also means that the PE VRF must be the same across the entire SD-WAN domain.
For those of you who are still thinking of the BGP/MPLS L3VPN analogy: this is where our solution
differs. In BGP/MPLS L3VPN technology, the VRF ID has only local significance, because VRF
information is transmitted as a VPN label. In our solution, on the other hand, the VRF ID itself is
transmitted.
• Use PE VRF for Internet access (and for all WAN underlays). In theory, we could talk about a
separate WAN VRF to which the WAN-facing underlay interfaces are assigned. Apart from
terminating the IPSEC overlays, it would have another very important duty - providing Internet
access to the CE VRFs. But we recommend using the PE VRF for this purpose, unless you have a
very good reason to overcomplicate your design.
55
In this lab: PE VRF = WAN VRF = Internet VRF.

In other words, our PE VRF includes all the WAN-facing interfaces (underlays and overlays).
With these constraints in mind, we recommend the following VRF allocation:
VRF ID Role
0 Either CE VRF or not part of SD-WAN network (e.g. OOB management)
1 PE VRF
2-251 CE VRFs
By default, our Jinja Templates configure VRF=1 as PE. This is controlled by an optional pe_vrf
parameter that you can change per-region in your Project Template.
3.3.2. Creating VRFs
Let’s connect to “site1-1” using SSH, to explore the actual FOS configuration.
VRFs are automatically created whenever we assign interfaces to them. Assigning the interfaces is
quite straightforward.
Here are our LAN-facing interfaces, assigned to the CE VRFs:
config system interface

edit “port5”
set vdom “root”
set vrf 11
set ip 10.0.1.1 255.255.255.0
set allowaccess ping
# …
next
edit “port6”
set vdom “root”
set vrf 12
set ip 10.0.101.1 255.255.255.0
# …
next
end
And here are the WAN-facing interfaces (underlays and overlays alike!), assigned to the PE VRF,
which our Jinja Templates set to VRF=1 by default:
edit “port1”
set vdom “root”
set vrf 1
set mode dhcp
56
# …
next
edit “H1_ISP1”
set vdom “root”
set vrf 1
set type tunnel
set interface “port1”
# …
next
# …
Tip: use some fancy CLI tricks, such as: show system interface | grep "port\|vrf\|ISP"
There is no distinction (yet) between CE and PE VRFs. We are simply creating separate routing tables,
and the first routes to be found there will be, of course, the connected routes of the corresponding
interfaces:
site1-1 # get router info routing-table connected

...
C 10.200.1.1/32 is directly connected, Lo
C 192.2.0.0/29 is directly connected, port1


3.3.3. Configuring Overlays
As you could already see, the tunnel interfaces (H1_ISP1, H1_ISP2, H2_ISP1 and H2_ISP2) are all
assigned to the PE VRF.
The only remaining configuration is to make them “VRF-Aware”, by enabling the vpn-id-ipip
encapsulation:
config vpn ipsec phase1-interface

edit "H1_ISP1"
set encapsulation vpn-id-ipip
...
next
end
That was quite easy so far, wasn’t it?
57
3.3.4. Configuring Routing
And now the real fun begins…
The general routing design has not changed: we are still using the “BGP on Loopback” flavor, so that
there is a single IBGP session from each Edge to each Hub, terminated on the “Lo” interface. Except
that now it will be the MP-IBGP session, with a new address family called VPNv4.
Let’s summarize the most important points about it:
• The prefixes are prepended with a route distinguisher (RD), to make them look unique.
Remember that IP overlap is permitted between different VRFs. So if two CE VRFs have a subnet
10.0.1.0/24, how shall we advertise both subnets across the network? The answer is: we will
prepend them with different RDs, unique per CE VRF. The BGP route (NLRI) will become
“RD1.10.0.1.0/24” for the first subnet and “RD2.10.0.1.0/24” for the second subnet. Now they are
seen as two different BGP routes.
• An extended community called route target (RT) is attached to each route, signaling from which
VRF this route is exported (advertised) and into which VRF the receiving peers must import it.
The combination of RD and RT allows us to advertise our CE VRF subnets, while preserving their VRF
information across the entire SD-WAN domain. Just as before, the Hubs will act as BGP Route
Reflectors, re-advertising the VPNv4 routes to all the Edges.
The exact value of the RT/RD doesn’t matter, as long as we configure it consistently on all the SD-WAN
nodes. Our Jinja Templates use the following convention by default: 65000:<vrf_id> .
These values will be used only by the SD-WAN nodes (FortiGates) and will never be visible to any external
peers. Therefore, they will not conflict with any existing MP-BGP deployments in the customer network
(such as existing BGP/MPLS L3VPNs).
The RDs, RTs and VRF roles are defined under the config vrf stanza of the BGP configuration:
config router bgp

config vrf
edit "1"
set role pe
next
edit "11"
set role ce
set rd "65000:11"
set export-rt "65000:11"
set import-rt "65000:11"
next
edit "12"
set role ce
set rd "65000:12"
set export-rt "65000:12"
set import-rt "65000:12"
next
58
end
end
Notes:
• RDs and RTs are set only for the CE VRFs, because we need to advertise only their prefixes as
VPNv4 routes
• The export-rt defines the RT that is attached to the routes advertised by this node from a
particular CE VRF.
The import-rt defines the RT(s) that must be imported into a particular CE VRF by this node.
While in theory these values can be different, for our practical use case they will always be
identical to each other.
And for simplicity, they will also be identical to the RD value.
So that: RD = Import RT = Export RT = 65000:<vrf_id> .
Simplicity is the King!
• Remember that the VRF IDs must remain consistent across the entire SD-WAN domain. In practice
this means that we should simply copy-paste the entire config vrf stanza to all our SD-WAN
nodes!
And this is exactly what the Jinja Templates did for us.
Again, simplicity is the King!
The rest of the BGP configuration should look familiar to you. The only difference compared to the
“usual” configuration of the “BGP on Loopback” flavor is that some of the statements have got their
-vpnv4 versions:
config router bgp

config neighbor
edit “10.200.1.253”
set advertisement-interval 1
set soft-reconfiguration-vpnv4 enable
set interface “Lo”
set remote-as 65001
set route-map-in “H1_TAG”
set route-map-in-vpnv4 “H1_TAG”
set route-map-out-vpnv4-preferable “SLA_OK”
set connect-timer 1
set update-source “Lo”
next
end
end
Note how we often use both the “new” and the “old” statements. For example, route-map-in and
route-map-in-vpnv4 . That’s because we are learning from the Hubs both VPNv4 and non-VPNv4
(“usual” IPv4) routes. We will look into it in the next section.
Feel free to examine the configuration on the Hub as well (“site1-H1”). For example, notice how we
enable BGP Route Reflection for the VPNv4 routes:
59
config router bgp

config neighbor-group
edit “EDGE”
set advertisement-interval 1
set next-hop-self enable
set soft-reconfiguration-vpnv4 enable
set interface “Lo”
set remote-as 65001
set update-source “Lo”
set route-reflector-client-vpnv4 enable
next
end
end
It’s time to see the results of all this BGP configuration!
3.3.5. Verifying BGP Operation
For this section in particular, we expect you to have a good understanding of our “BGP on Loopback”
design. We again refer you to the previous version of this lab - 7.0.x - where we have covered it
extensively.
Advertised Routes
The first thing to remember is that we have only one BGP session towards each Hub, and this session
is terminated in PE VRF, on the loopback interface (“Lo”).
Checking our BGP configuration on “site1-1”, we expect the following two LAN prefixes to be
advertised (one being our local Education subnet in CE VRF=11, another - the Finance subnet in CE
VRF=12):
config router bgp

config network
edit 1
set prefix 10.0.1.0 255.255.255.0
next
edit 2
set prefix 10.0.101.0 255.255.255.0
next
end
end
Do we see them advertised?
site1-1 # get router info bgp neighbors 10.200.1.253 advertised-routes

% No prefix for neighbor 10.200.1.253
60
Interestingly, no!
The reason is that this command is for the “usual” IPv4 routes. But our LAN prefixes belong to the CE
VRFs, and therefore they will be advertised as VPNv4 routes.
We have a longer command to see this:
site1-1 # get router info bgp neighbors 10.200.1.253 advertised-routes vpnv4

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

Route Distinguisher: 65000:11 (Default for VRF 11)
*>i10.0.1.0/24 10.200.1.1 100 32768 0 i <-/->
*>i10.0.101.0/24 10.200.1.1 100 32768 0 i <-/->
Total number of prefixes 2
Another new set of commands is under get router info bgp vpnv4 <...> , providing the VPNv4
versions of some known troubleshooting outputs.
For example:
site1-1 # get router info bgp vpnv4 network 10.0.1.0/24

Original VRF 11 local
Local
0.0.0.0 from 0.0.0.0 (10.200.1.1)
Origin IGP, localpref 100, weight 32768, valid, sourced, local, best
Extended Community: RT:65000:11
Last update: Thu Oct 20 10:07:17 2022
This is the VPNv4 route that has been generated for our local LAN prefix 10.0.1.0/24. Note how the
right RD and RT were added to it, because this route is from CE VRF=11.
To conclude:
The routes from CE VRFs are advertised to the peers as VPNv4 routes, with the right RTs attached, over
the BGP session terminated in the PE VRF.
Received Routes
Let’s now take a closer look at the prefix 10.0.2.0/24, which is the Education segment (CE VRF=11)
behind “site1-2”. How do we learn it on “site1-1” and how do we resolve it?
We can already guess the following:
61
The routes from the remote sites are learnt as VPNv4 routes over the BGP session terminated in the PE
VRF and are imported into the right CE VRFs, based on the RTs attached to them.
Here is the output for 10.0.2.0/24:
site1-1 # get router info routing-table all

...
B 10.200.0.0/14 [200/0] via 10.200.1.253 tag 1 (recursive via H1_ISP1 tunnel 100.64.1.1), 19:25:03
(recursive via H1_ISP2 tunnel 100.64.1.9), 19:25:03, [1/0]
[200/0] via 10.200.1.254 tag 2 (recursive via H2_ISP1 tunnel 100.64.2.1), 19:25:03
S 10.200.1.253/32 [15/0] via H1_ISP1 tunnel 100.64.1.1, [1/0]
[15/0] via H1_ISP2 tunnel 100.64.1.9, [1/0]
S 10.200.1.254/32 [15/0] via H2_ISP1 tunnel 100.64.2.1, [1/0]
[15/0] via H2_ISP2 tunnel 100.64.2.9, [1/0]
...
B V 10.0.2.0/24 [200/0] via 10.200.1.2 tag 1 (recursive via H1_ISP1 tunnel 100.64.1.1), 19:25:10
[200/0] via 10.200.1.2 tag 2 (recursive via H2_ISP1 tunnel 100.64.2.1), 19:25:10
...
• We find the BGP route towards 10.0.2.0/24 in the CE VRF=11. It arrived as a VPNv4 route, as
indicated by the “V” flag.
• We can see two copies of this route, one with tag 1 and another with tag 2. The tags indicate from
which Hub the route has been learnt. They were applied by the route-maps “H1_TAG” and
“H2_TAG” respectively.
To remind you, we had to apply these route-maps with the set route-map-in-vpnv4 command,
because we are handling VPNv4 routes here!
config router bgp

...
config neighbor
edit "10.200.1.253"
set route-map-in-vpnv4 "H1_TAG"
...
next
edit "10.200.1.254"
set route-map-in-vpnv4 "H2_TAG"
...
next
end
end
• The BGP NH = 10.200.1.2, which is the loopback of “site1-2”
• This BGP NH is recursively resolved using the summary route 10.200.0.0/14 which we find in the
PE VRF! Note that there is no “V” flag this time, so this is the “usual” IPv4 route! As always with
“BGP on Loopback” design, this is the loopback summary advertised by the Hubs, allowing the
Edges to resolve each others’ routes.
62
Also this summary exists in two copies, advertised by the two Hubs. And each route copy has its
•
respective tag. But for this tagging to happen, we had to use set route-map-in command
(without the -vpnv4 suffix!), because this is not a VPNv4 route! Of course, we reuse the same
route-maps for both commands.
config router bgp

...
config neighbor
edit "10.200.1.253"
set route-map-in "H1_TAG"
...
next
edit "10.200.1.254"
set route-map-in "H2_TAG"
...
next
end
end
• The BGP NH of each copy of the 10.200.0.0/14 summary is the loopback IP of the advertising Hub
(10.200.1.253 for “site1-H1” and 10.200.1.254 for “site1-H2”). As always with “BGP on Loopback”
design, these next-hops are recursively resolved using the static /32 routes towards the
loopbacks, injected by IKE ( exchange-ip-addrv4 feature). You can find those static routes right
there, in the PE VRF, injected over each of the available overlays.
• And this is how, finally, our BGP VPNv4 route towards 10.0.2.0/24 is resolved via all the available
overlays!
Interesting, isn’t it? The route towards 10.0.2.0/24 is in the CE VRF=11, but it is eventually resolved via
the interfaces from the PE VRF=1!
To conclude:
The VPNv4 routes are imported into the CE VRFs, but they are recursively resolved using the routes in the
PE VRF.
Before we finish this investigation, a few more words about the loopback summary 10.200.0.0/14.
We’ve already seen it in the PE VRF=1, and so we understand that it is advertised as a “usual” IPv4
route. But why?
For the answer we must connect to the Hub (“site1-H1”). Its BGP configuration looks pretty standard.
Here is how it advertises our loopback summary:
config router bgp

config network
edit 1
set prefix 10.200.0.0 255.252.0.0
set route-map "LOCAL_REGION"
next
63
end
end
But where is this summary coming from?

Apparently, our Jinja Templates configure it as a static blackhole route in the PE VRF:
config router static

edit 102
set dst 10.200.0.0 255.252.0.0
set blackhole enable
set vrf 1
next
end
Indeed, here it is, in the PE VRF=1:
site1-H1 # get router info routing-table static

...
S 10.200.0.0/14 [10/0] is a summary, Null, [1/0]
So this route does not come from any CE VRF! And this is exactly why the Hubs advertise it as a
simple IPv4 route, not as a VPNv4 route:
site1-H1 # get router info bgp neighbors 10.200.1.1 advertised-routes

VRF 1 BGP table version is 1, local router ID is 10.200.1.253
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

*>i10.200.0.0/14 10.200.1.253 100 32768 0 i <-/->
Total number of prefixes 1
This concludes our deep-dive into the new MP-BGP routing!
3.3.6. Configuring SD-WAN
How do the SD-WAN Templates reflect multiple VRFs?

Simple answer: they don’t!
Recall that our SD-WAN Templates for the CustomerB have been created by the same API calls that
we had used earlier for the CustomerA. Examine the SD-WAN Template “Edge-DualISP” - and you will
find the usual SD-WAN configuration, without any mention of the VRFs.
64
It is the underlying foundation that is largely responsible for the VRF segregation. The SD-WAN
Templates can be seen as VRF-agnostic (or multi-VRF, if you like).
• As usual, the overlay tunnels (H1_*, H2_*) become our SD-WAN Members. The fact that these
tunnels are VRF-aware doesn’t make any difference.
• A health-check called “HUB” is probing the loopback defined on the Hubs (Lo-HC with IP =
10.200.99.1), over all the overlays. There is just a single probe per overlay, no matter how many
VRFs will traverse it. It may be interesting to know that the probe itself is tagged with PE VRF,
when it traverses the tunnel.
This is true also for the ADVPN shortcut monitoring. Whenever a shortcut is built, the Edges will probe
each other over the shortcut, and the probes will be tagged with PE VRF.
• Finally, the SD-WAN rules select between the overlays, just like they did before. But the underlying
foundation ensures that:
1. All the route lookups (to check whether there is a valid route to the destination) happen in the
right CE VRF context
2. Once an overlay is selected by the SD-WAN rule, the packets are tagged with the correct CE
VRF
Let’s revisit the three tests we did in the previous section and try to understand how they worked.
Test 1 - Education segment
Source: a host in the Education segment behind “site1-1” (10.0.1.101)

Destination: a host in the Education segment behind “site1-2” (10.0.2.101)
• The traffic enters “site1-1” via port5.

CE VRF=11 context is assigned to it, based on the VRF configuration on port5.
• The traffic matches the SD-WAN rule “Corporate-H1”.
• The SD-WAN rule prefers the first overlay - H1_ISP1. It does a route lookup to the destination
(10.0.2.101) to determine if there is a route (even the best route, since the rule is using
tie-break fib-best-match ; but this is unimportant for our test)
• The route lookup is done in the context of CE VRF=11, and therefore the route 10.0.2.0/24 is
found.
• The overlay H1_ISP1 is selected.

The vpn-id-ipip encapsulation is applied to the traffic, tagging it with CE VRF=11.
65
Test 2 - Finance segment
Source: a host in the Finance segment behind “site1-1” (10.0.101.4)

Destination: a host in the Finance segment behind “site1-2” (10.0.102.4)

• The traffic matches the SD-WAN rule “Corporate-H1” - exactly the same rule as before!
• The SD-WAN rule again prefers the first overlay - H1_ISP1. It does a route lookup to the destination
(10.0.102.4), but this time it is automatically done in the context of CE VRF=12, and therefore the
route 10.0.102.0/24 is found.
• The overlay H1_ISP1 is again selected, and this time the traffic is tagged with CE VRF=12.
Test 3 - Cross-segment
Source: a host in the Education segment behind “site1-1” (10.0.1.101)

Destination: a host in the Finance segment behind “site1-2” (10.0.102.4)

• The traffic matches the SD-WAN rule “Corporate-H1” - still the same rule!
• The SD-WAN rule prefers the first overlay - H1_ISP1. It does a route lookup to the destination
(10.0.102.4) in the context of CE VRF=11. And no route is found!
• The SD-WAN rule then tries to select the second overlay - H1_ISP2.
Same outcome: no route is found in the context of CE VRF=11!
• The rule “Corporate-H1” is therefore not usable for this traffic. And it is easy to see that the same
will happen also for the rule “Corporate-H2”.
• In the end, the traffic will be handled by the Implicit Rule, which is effectively just a conventional
route lookup, again in the context of CE VRF=11. Hence, also here no route will be found.
• This traffic will be dropped by “site1-1” due to no route to destination!
VRF-Aware SD-WAN rules
You can see now that the SD-WAN rules are, by default, VRF-agnostic. They simply rely on the VRF
context maintained by the underlying FOS subsystems.
But what if we want to configure a VRF-aware SD-WAN rule? In other words, what if we want to
configure an SD-WAN rule that would apply only to some of the segments (one or more), but not to
the others?
To do this, we can match on the input-device . For example, if we create a rule with
input-device = port6 , this rule will apply only to the Finance traffic!
66
3.3.7. Firewall Policy
The situation with Firewall Policies is very similar to the situation with SD-WAN Templates. The
Firewall Policies are largely VRF-agnostic.
For example, in our “Edge” Policy Package, the first rule (“Corporate”) applies both to the Education
and to the Finance traffic:
• The System Zone lan_zone includes both port5 and port6

• The SD-WAN Zone overlay includes the VRF-Aware overlays H1_* and H2_*.
It is, of course, very likely that in the real world the Firewall Policies will differ per segment, simply due
to the different security requirements. No problem! You can define as many rules as needed, using
the traditional tools - such as different source interfaces. But there is no additional configuration
needed because of the different VRFs.
Unfortunately, this nice picture is true only for the corporate (internal) traffic. When it comes to the
Internet access, the situation changes!
Connect to the “client1-1” and try pinging an Internet destination:

^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 35ms
Unfortunately, our clients do not have Internet access at the moment! And it is not just because of
the wrong Firewall Rules (although also because of that).
We will have to explore the topic of Internet access in the multi-VRF environments in a dedicated
chapter.
67
3.4. Providing Internet Access
Looking again at the routing tables on “site1-1”, we can easily understand why our clients have no
Internet access:

...
S* 0.0.0.0/0 [1/0] via 192.2.0.2, port1, [1/1]
[1/0] via 192.2.0.10, port2, [1/1]
[1/0] via H1_ISP1 tunnel 100.64.1.1, [10/1]
[1/0] via H1_ISP2 tunnel 100.64.1.9, [10/1]
[1/0] via H2_ISP1 tunnel 100.64.2.1, [10/1]
[1/0] via H2_ISP2 tunnel 100.64.2.9, [10/1]

B V 10.0.2.0/24 [200/0] via 10.200.1.2 tag 1 (recursive via H1_ISP1 tunnel 100.64.1.1), 1d17h47m
(recursive via H1_ISP2 tunnel 100.64.1.9), 1d17h47m, [1/0]
[200/0] via 10.200.1.2 tag 2 (recursive via H2_ISP1 tunnel 100.64.2.1), 1d17h47m

[200/0] via 10.200.1.2 tag 2 (recursive via H2_ISP1 tunnel 100.64.2.1), 1d17h47m
(recursive via H1_ISP2 tunnel 100.64.1.9), 1d17h47m,
[1/0]
(recursive via H2_ISP2 tunnel 100.64.2.9), 1d17h47m,
[1/0]
...
While our PE VRF=1 has a default route, none of the CE VRFs do! They simply have no routes to
Internet destinations!
Since only the PE VRF is physically connected to the Internet, the traffic from the CE VRFs will have
to jump to the PE VRF, in order to get Internet access. In other words, we will need to connect the CE
VRFs to the PE VRF. On FOS, we can do that in the same way as we interconnect different VDOMs:
using VDOM links. We will call them VRF links in this section.
To avoid any concerns right away: it is NOT required neither to add a multi-VDOM license nor to configure
additional VDOMs. We are only reusing the same internal links to interconnect our VRFs.
However, do note that on the hardware appliances we will prefer using npu_link (which is accelerated),
while on the VMs we will use the software-based vdom-link . And the npu_link is currently hidden
in the CLI, until you enable the multi-VDOM mode. As a result, on the hardware appliances you will have to
enable the multi-VDOM mode (which does not bring any costs, but it does complicate somewhat the
interaction with the CLI):
68
config system global

set vdom-mode multi-vdom
end
The following diagram illustrates this design:
Notes:
• With this design, each flow is processed twice by the FortiGate, creating two independent
sessions: one from the LAN to the VRF link and another from (the other side of) the VRF link to
the WAN. The first session belongs to the CE VRF, the second - to the PE VRF.
• This design is used for both Direct and Remote Internet Access: once the traffic reaches the PE
VRF, it can egress either via an underlay (DIA) or via an overlay (RIA). Note that in the RIA case the
traffic reaches the Hub already in the PE VRF context (not mixing up with the internal CE VRF
traffic on the Hub).
• Notice the SD-WAN icon on the diagram, in the PE VRF. This is where the SD-WAN rules are
applied. We can select the best breakout option in the same way we do it in the “usual” (single-
VRF) deployments. The VRF links do not need to be configured as SD-WAN members, they
become ingress interfaces from the SD-WAN perspective. Only the session in the PE VRF is
subject to the SD-WAN processing.
• It is not the same for the Firewall Policies, because each of the two sessions must be
independently permitted by the Firewall rules. Therefore, we will need to create separate rules for
the CE VRF session and for the PE VRF session.
• Finally, we need to address the routing carefully:
◦ CE VRFs can simply have a static default route towards the VRF link, directing all the Internet
traffic towards the PE VRF.
◦ It is more tricky in the opposite direction, because the replies arriving from the Internet to the
PE VRF must find their way back to the right CE VRF. We could copy the LAN prefixes from the
CE VRFs to the PE VRF, but this would hardly be a good idea: the whole point of the VRFs is to
keep the LAN prefixes segregated inside their respective segments!
69
A better approach is to NAT the traffic on the VRF links. This way, the PE VRF doesn’t need to
know the LAN prefixes.
Source NAT on the VRF link is supported only when the target VRF is not 0. This is another reason
why you should not use VRF=0 as your PE VRF!
◦ Finally, there will be the “usual” Source NAT on the egress from the PE VRF (that is, before the
traffic leaves the Edge device). In other words, we are going to NAT twice!
It sounds like the configuration is going to be quite complex… Luckily, our Jinja Templates will hide
most of this complexity.
3.4.2. VRF Links
Navigate to Provisioning Templates -> CLI Templates and edit your Project Template. Inside the
profile, find the LAN-facing interface for the Education segment (“port5”) and add the following
parameter to it:
…
{
‘vrf’: 1,
‘allow_dia’: true,
‘ip’: lan_ip_edu
},
…
Save the changes. Then navigate to the device list, select all the devices and reinstall the
configuration (“Quick Install (Device DB)”).
Check the routing table on “site1-1” now - and you will find the new default route in the Education
segment (VRF=11):

...
S* 0.0.0.0/0 [10/0] is directly connected, vrf1_leak1, [1/0]
...
As you can guess, the Jinja Tempaltes have created the necessary elements, such as:
• A VRF link between VRF=11 and VRF=1:
config system vdom-link

edit “vrf11_leak”
next
end
70
config system interface
edit “vrf11_leak0”
set vdom “root”
set vrf 1
set ip 10.200.255.23 255.255.255.254
set type vdom-link
next
edit “vrf11_leak1”
set vdom “root”
set vrf 11
set ip 10.200.255.22 255.255.255.254
set type vdom-link
next
end
• A static default route for the VRF=11 (that you saw above):
config router static

edit 10011
set gateway 10.200.255.23
set device "vrf11_leak1"
next
end
However, our Jinja Templates do not handle the Firewall Policies. So we still need to adjust them.
3.4.3. Firewall Policies for DIA
Navigate to Policy & Objects -> Policy Packages and look at the “Edge” package. You will find the
following rules for the Internet access:
Unfortunately, they are not correct, as we have explained in the design notes above. The traffic will
need to match two separate rules on its way, one inside VRF=11 and another inside VRF=1. Namely:
• Inside the VRF=11: from lan_zone to the VRF link.

This rule must also apply Source NAT (see the design notes above for the explanation - that’s the
NAT on the VRF link!). Note that this part is identical for both DIA and RIA, so just one rule will
suffice.
71
Inside the VRF=1: from the other end of the VRF link to the Internet.
•
Here we will probably need separate rules for DIA and RIA, as you’ll see shortly. Let’s focus on the
DIA case first.
To simplify the Firewall Policy configuration, our Jinja Templates create the following two System
Zones by default:
config system zone

edit "pevrf_leak_zone"
set interface "vrf11_leak0"
next
edit "vrfs_leak_zone"
set interface "vrf11_leak1"
next
end
As you can see, the PE VRF end of the VRF link is placed into the pevrf_leak_zone , while the CE
VRF end is in the vrfs_leak_zone . These same zones will be used for all the created VRF links. With
this mind, let’s configure the Firewall Policies for DIA as follows:
• Create the two Normalized Interfaces for the above zones:
• Adjust the Firewall Policies as follows:
Note that both Firewall Rules have NAT enabled! That’s right, we are applying Source NAT twice: first
on the VRF link and then again on the egress WAN interface. The first NAT is not visible outside of the
Fortigate - remember that we use it only to allow the PE VRF forward the replies back to the correct CE
VRF.
Reinstall the policy on all the Edge devices. Then connect to the “client1-1” and test the Internet
access:
72

^C
2 packets transmitted, 2 received, 0% packet loss, time 1ms
rtt min/avg/max/mdev = 1.481/1.699/1.918/0.222 ms
Our Direct Internet Access now finally works for the Education segment! You can also use the sniffer
on “site1-1” to see how each packet is processed twice, including the effect of our double NAT:
site1-1 # diagnose sniffer packet any "host 8.8.4.4" 4

...
1.840767 port5 in 10.0.1.101 -> 8.8.4.4: icmp: echo request <<< from lan_zone (CE VRF=11)
1.840864 vrf11_leak1 out 10.200.255.2 -> 8.8.4.4: icmp: echo request <<< to vrfs_leak_zone (CE VRF=11)
1.840867 vrf11_leak0 in 10.200.255.2 -> 8.8.4.4: icmp: echo request <<< from pevrf_leak_zone (PE VRF=1)
1.841117 port2 out 192.2.0.9 -> 8.8.4.4: icmp: echo request <<< to underlay (PE VRF=1)
...
1.842789 port2 in 8.8.4.4 -> 192.2.0.9: icmp: echo reply <<< reply from underlay (PE VRF=1)
1.842852 vrf11_leak0 out 8.8.4.4 -> 10.200.255.2: icmp: echo reply <<< reply to pevrf_leak_zone (PE VRF=1)
1.842854 vrf11_leak1 in 8.8.4.4 -> 10.200.255.2: icmp: echo reply <<< reply from vrfs_leak_zone (CE
VRF=11)
1.842884 port5 out 8.8.4.4 -> 10.0.1.101: icmp: echo reply <<< reply to lan_zone (CE VRF=11)
Conclusion: DIA works!
3.4.4. Firewall Policies for RIA
The same design can support also backhauling the Internet traffic via the Hub (RIA). There is just one
tricky question: which second Source NAT to apply in this case? In other words, what source IP
should the packets have when they leave the Edge device?
If we skip the second Source NAT altogether, the packets will leave with the IP of the VRF link (due to
the first Source NAT). This is not good, because that IP is purely internal, it is not advertised
anywhere, and therefore the replies will never find their way back to the originating Edge!
The most straightforward option would be to NAT the source IP to the egress interface (like we did
for DIA), but unfortunately this is not going to work either - remember, we do not have tunnel IPs
anymore!
Our suggestion is to use the loopback IP! That same loopback that we use for BGP (“Lo”). After all, it
is already reachable via the overlays, so the replies will find their way back to the right Edge device!
• First of all, navigate to Policy & Objects -> Object Configurations -> Advanced -> Metadata
Variables and set the default value for the variable loopback . It doesn’t matter which value you
set, because we anyway override it on per-device basis. But we need the default value to avoid
validation errors, when we use this variable in the IP Pool configuration on the FortiManager. Let’s
put a value “10.200.0.0”:
73
• Now we can create an IP Pool for our Source NAT. Under Object Configurations, navigate to
Firewall Objects -> IP Pools and create a new IPv4 Pool called “Lo-Pool” with the IP range set
using our variable:
• Add a new Firewall Rule for RIA to the Policy Package “Edge”:
Install the policy on all the Edge devices.
In order to test the RIA, let us add a new SD-WAN rule steering our test PING traffic to the overlay.
Edit the SD-WAN Template “Edge-DualISP” and add the following rule:
74
Parameter Value
Name Remote-Internet-Access
Destination Address “all”
Protocol “1” (ICMP)
Strategy Manual
Interface Preference H1_ISP1
Save the Template and install the configuration on all the Edge devices (“Quick Install (Device DB)”).
From “client1-1”, try pinging an Internet destination, such as 8.8.4.4:

^C
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 2.349/3.408/4.467/1.059 ms
On “site1-1”, use the sniffer to see how the packets flow:
site1-1 # diagnose sniffer packet any "host 8.8.4.4" 4

...
75
2.620153 port5 in 10.0.1.101 -> 8.8.4.4: icmp: echo request
2.620304 vrf11_leak1 out 10.200.255.2 -> 8.8.4.4: icmp: echo request
2.620311 vrf11_leak0 in 10.200.255.2 -> 8.8.4.4: icmp: echo request
2.621957 H1_ISP1 out 10.200.1.1 -> 8.8.4.4: icmp: echo request
...
2.626074 H1_ISP1 in 8.8.4.4 -> 10.200.1.1: icmp: echo reply
2.626136 vrf11_leak0 out 8.8.4.4 -> 10.200.255.2: icmp: echo reply
2.626137 vrf11_leak1 in 8.8.4.4 -> 10.200.255.2: icmp: echo reply
2.626164 port5 out 8.8.4.4 -> 10.0.1.101: icmp: echo reply
Conclusion: RIA works!
76

Secure-SD-WAN-Workshop-7_2_x-XS.EXT

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Secure-SD-WAN-Workshop-7_2_x-XS.EXT

Uploaded by

Copyright:

Available Formats

Managed Secure SD-WAN 7.2.

Sep 25, 2023

Copyright© 2023 Fortinet, Inc. All rights reserved.

This document contains confidential material proprietary to Fortinet, Inc.

In particular, we are going to cover the following functionality:

• Hub-to-Edge SD-WAN using SLA information embedded into health probes

• Advertising preferred Hub to the outside world

• Segmentation over Single Overlay (VRF-Aware Overlays)

• Providing Internet access in multi-VRF deployments

• Device Blueprints and Variables

1.1. Lab Environment

1.1.1. Host Credentials

1.1.3. WAN Controller

1.2. Before You Begin

1.2.1. Prepare Postman

You have two options to use Postman, both offered here:

• In Postman, go to its Preferences (Settings) and disable “SSL Certificate Verification”:

◦ ip: Use the FMG ip and port (simply browse

1.2.2. Download Jinja Templates

git clone https://github.com/fortinet-solutions-cse/sdwan-advpn-reference.git -b release/7.2

We will guide you further in the next chapters!

1.2.3. Download Project Templates

Download and extract this archive.

% scp -P 11019 root@<your-instance-ip>:/fortipoc/projects.zip .

We will guide you further in the next chapters!

1.2.4. Generate Inventory Files

Now we are ready to start the lab!

The following diagram illustrates this deployment:

2.2. Deploying the Project

2.2.1. Creating Foundation

Do not forget to Save the environment!

Run the entire Postman folder called Foundation :

• Firewall Policies for Edge and Hubs

• Click “Import” again. The Project Template should be imported successfully.

• Create two CLI Template Groups as follows:

2.2.2. Project Template

Look at the Project Template that you have imported.

There is a single device profile called “DualISP”:

This profile defines that:

• port1 connects to the ISP1

What can we say about these new variables?

2.2.4. Device Blueprints

Let’s create a Device Blueprint for the group “Edge-DualISP”:

• Navigate to the Device Blueprint list:

Device Model FortiGate-VM64-KVM

Add to Device Group Edge-DualISP

Assign Policy Package Edge

Device Model FortiGate-VM64-KVM

Add to Device Group Hubs-DualISP

Assign Policy Package Hub

2.2.5. Configuring Model Devices

We are now ready to create Model Devices in bulk.

sn Serial Number of the device

device blueprint Name of the Device Blueprint to apply

name Name of the Model Device to create

2.2.6. Linking Real Devices

execute factoryreset2 keepvmlicense

Our project is now fully deployed.

2.3. Hub-to-Edge Traffic

2.3.1. Understanding the Problem

client1-H1# ping 10.0.1.101

site1-1# diagnose sniffer packet any "host 10.1.0.7" 4 | grep H

client1-1# ping 10.1.0.7

2.3.2. Remote Health Probing