Download as pdf or txt
Download as pdf or txt
You are on page 1of 179

Document change control

The latest version of this document is available on FortiVision:


FortiVision → GCSS → TAC → TAC related trainings → Hardware acceleration for TAC

Date Changelog Authors

15 Sep 2016 Draft from ex Hardware Acceleration doc + eXpert Academy 2016 + other additions. Used for training remote session Cedric Gustave

21 Oct 2016 Additions from feedback, more references, details on platform without ISF, new platform example (2500E, 1500DT), np6 session update new behavior (here) Cedric Gustave

26 Oct 2016 Detailed SoC2 section (link) Cedric Gustave

04 Nov 2016 Fix incorrect info on NP6 SA based on testings from Stephan on FGT1500D (link) + adjusted paging for printing + review from Laurent Cedric Gustave

12 dec 2016 Update np6 session update from Mantis #386626 Cedric Gustave

16 dec 2016 Example of FortiGate-1500D for NP6/N-Turbo/Ipsengine interrupts and core mapping (missing diagram) (link) Cedric Gustave

30 Mar 2017 Example of FortiGate-1500D for npu-vlink XAUIs used for egress first NP6 and ingress second NP6 Cedric Gustave

24 July 2017 Update with notes/documents from FortiVision Hardware training page + details on NP6 lags confirmed by Yi Cedric Gustave

10 Nov 2017 Understanding EHP drops (link) + NP6 shaping protection summary (link) Cedric Gustave

04 Jan 2018 NP6 session update + statistic counters (link + link) + hardware acceleration with 2 asymmetric wan interfaces (link) + EHP drops with lags (link) Cedric Gustave

29 May 2018 LAG enhancement (set lag-sw-out-trunk enable) explanation for both NP module and non NP module devices (link) Cedric Gustave

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 1­ www.fortinet.com
Table of contents

Hardware Fundamentals
PCI Bus ­ Peripheral Component Interconnect
PCI­E Bus : Peripheral Component Interconnect – Express (also PCIe)
Bus Interrupt ReQuest ­ IRQ
Advanced Programmable Interrupt Controller ­ APIC
Message Signals Interrupts ­ MSI­X
Internal and external data transmission

Hardware accelerators families overview


Families
List hardware accelerators on a unit

Hardware acceleration features


ipv4 unicast session acceleration
Link aggregations
DSCP marking
Per session traffic accounting and traffic distribution
ipv6 unicast session acceleration
IPv4, IPv6 tunneling and translation
IPSec encryption/decryption and hashing
Passthrough ESP session acceleration
inter vdom (npu­vlink) traffic acceleration
IPS traffic fast­path (N­turbo acceleration) IPSA (IPS acceleration)
HA A­A load­balancing
Traffic shaping
Syn proxy
HPE protection
IPv4 multicast session acceleration

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 2­ www.fortinet.com
IPv6 multicast session acceleration
SCTP traffic hardware acceleration
CAPWAP data (not DTLS) hardware acceleration (259431)
fp­anomaly
Deprecated : IPS anomaly and signature (XH0/XG2)
Features breaking hardware acceleration

NP chips
NP6
From the outside
Form factors
Integration
Integration with a switch fabric
This platform is similar to the classical 1500D but has 4x10G RJ45 copper ports
Integration without a switch fabric
NP6 Performance figures
Improvements from NP4
From the inside
Functional blocks
Traffic flow examples
Configuration options impacting NP6
NP6 monitoring additions for drift sessions (diag and SNMP)
NP6 IPsec out­of­order and sub­engine settings
References :
NP6 limitations and bugs
Understanding EHP drops
NP6 shaping protection summary
N­Turbo NP6 IRQ mapping
Diag commands and counters
diag npu np6 fastpath <enable*|disable> <np6_id>
diag npu np6 dce (dce­all) <np6_id>

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 3­ www.fortinet.com
List of functional modules referred in NP6 drop counters
DCE TABLE 0 : HRX drops
DCE TABLE 1 : Anomaly drops
diag npu np6 anomaly­drop (anomaly­drop­all) <npu_id>
diag npu np6 hrx­drop (hrx­drop­all) <npu_id>
diag npu np6 session­stats (session­stats­clear) <npu_id>
diag npu np6 sse­stats (sse­stats­clear)
diag npu np6 xgmac­stats (xgmac­stats­clear) <npu_id>
diag npu np6 gmac­stats (gmac­stats­clear)
diag npu np6 gige­port­stats (gige­port­stats­clear) <port_name>
diag npu np6 port­list
diag npu np6 ipsec­stats (ipsec­stats­clear)
diag npu np6 eeprom­read <np6_id>
diag npu np6 npu­feature
diag npu np6 register
diag npu np6 synproxy­stats
Design recommendations
Limitations and workarounds, fixed bugs
SoC3 (NP6 light)
NP4
From the outside
Form factors
Performance figures
Integration
IRQ distribution
From the inside
Configuration options impacting NP4
Diag commands
Limitations and workarounds, fixed bugs
SoC2 (NP4 lite + CP8 lite)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 4­ www.fortinet.com
Limitations and workarounds, fixed bugs
FortiGate­3700DX overview
FortiCarrier (Carrier Grade Nat) overview

Reference websites and documents

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 5­ www.fortinet.com
Document objective
This document objective is to provide the TAC Engineers with an up­to­date reference on all information around FortiGate hardware acceleration.
It is subjected to be updated frequently to keep­up with new knowledge and product.

Document’s history
The document is sourced from the document “FortiGate hardware acceleration components and architectures” which started in mai 2005. This
document was getting too big because of the addition of various “sides” topics and contained too much information related to legacy product. The official
Fortigate “Hardware Acceleration” has also been improved and has more details than before that had to be covered. This results in document covering
less topics but more focus on what is not already covered or detailed enough.

Disclaimer
Information in this document cannot be guaranteed 100% correct for several reasons. First, implementations are subject to change so information can
be initially correct but may become obsolete or wrong. Second, information inputs are from different sources such as lab tests results, mantis
information, bits of information or experience shared by Fortinet colleagues… that may be valid in their context but might turn to be wrong in a more
general or different context or simply that could not be verified.

Feedback
Feedback such as pointers or sharing of information is of course welcome and necessary to maintain a pertinent content in this document.
You can provide your feedback through Fortivision bugnotes at the location of the document. Thank you in advance for your contribution !

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 6­ www.fortinet.com
Confidentiality

This document should not be shared externally, it contains internal references and content about Fortinet technology.
For external communication, the official documentation FortiGate Hardware Acceleration at http://doc.fortinet.net should be used instead.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 7­ www.fortinet.com
Hardware Fundamentals
PCI Bus ­ Peripheral Component Interconnect
● The PCI bus provides a communication channel between a PCI peripheral device and the main system
→ ex: delivers data from a network card to the computer operating system
● It is a common standards to facilitate devices integration and vendor interoperability
● It is based on a parallel interface (parallel lines of data synchronized by a common clock reference
● Requires wiring layers on motherboard
● Half duplex
● Multiple revisions and technologies
● Each device announce itself on the Bus (see lspci)
● Each device needs an IRQ (Interrupt Request) to speak with CPU
● PCI bridge : IRQ managed at bus bridge => possible to share IRQ on PCI devices
● PCI­X ­ PCI eXtended: Extension of PCI (complex wiring and expensive)

Bus Type Bus Width Bus Speed Bandwidth MB/Sec

PCI 32 bits 33 MHz 132 MBps

PCI 64 bits 33 MHz 264 MBps

PCI 64 bits 66 MHz 512 MBps

PCI­X 64 bits 133 MHz 1 GBps

PCI­X 2.0 64 bits 266 MHz 2.15 GBps

PCI­X 2.0 64 bits 533 MHz 4.3 GBps

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 8­ www.fortinet.com
PCI­E Bus : Peripheral Component Interconnect – Express (also PCIe)
● Developed by Intel, Faster than PCI
● High speed Serial bus (simpler than parallel to implement, less pins)
● Point to point technology : Each device meshed (+ connection with the host)
○ non shared media, multiple access
○ up to 32 lanes multiplex between 2 devices
○ increase of end­point does not affect performance
● Capability of using multiple lanes is advertised by each device and negotiated
● PCI­E standards allows x1, x4, x8, x16, x32 (not common) lanes multiplexed
● Bi­directional (full­duplex) between every end­point
● Packet based data encapsulation
● Supports INTx, MSI, MSI­X interruptions

Bus Type date Bandwidth per lane MB/Sec


for each direction per lane

PCI­E 1.0 x1 2003 250 MB/s

PCI­E 2.0 (or 2.1) x1 2007 500 MB/s

PCI­E 3.0 x1 2010 985 MB/s

PCI­E 4.0 expected in 2017 1.969 GB/s

PCI­E 5.0 far future 3 or 4 GB/s

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 9­ www.fortinet.com
Bus Interrupt ReQuest ­ IRQ
Whenever a device needs to send data through a bus, it is first required to raise an interrupt request (IRQ) to the bus controller.
This IRQ is delivered to the destination (kernel in our case) to notify data is available to be pulled.
Data packets are waiting on the sender FIFO queue to be pulled (for instance on the network interface).
If data is not pulled fast enough the sender queue may become full and start dropping packets.

In the FortiGate implementation, a single IRQ may transfer up to 64 packets. If after data
has been pulled from the sender queue there are still packets remaining because not all of
them could be pulled at once, there is no need for the sender to trigger another IRQ. This is
why the IRQ rate is first linear with the packet rate, then IRQ rate becomes flat but packet rate
continues to increase. It is therefore expected that the raise of interrupt rate is not
proportional with the rate of packet rate.

With multiprocessors architecture like FortiGates, interrupts need to be distributed amongst


the cores using APIC or MSI­X technics. FortiGate command ‘diagnose hardware
sysinfo interrupts’ ( or ‘fnsysctl cat /proc/interrupts’)

Advanced Programmable Interrupt Controller ­ APIC


● Chipset Intel on SMP motherboard
● CPU and I/O APIC ship connected via APIC bus
● For inter­core/cpu interruptions
● For core/cpu to external devices interruptions
● Interrupts may be shared by multiple devices

When APIC is used, ‘diag hard sys interrupts’ refers to ‘IO­APIC­edge’ or ‘IO­APIC­level’

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 10 ­ www.fortinet.com
Message Signals Interrupts ­ MSI­X

● No chipset, use of message through PCI­E or PCI 3.0


● Emulation of interruptions via special message packets sent on the bus
● Flexible distribution of interruptions on cores/cpu
● IRQ to CPU affinities are defined in a kernel configuration file
● can be user redefined via /proc (dangerous ! not recommended)
● devices with high bandwidth may distributes interrupts to multiple cores using multiple IRQs (like NP4 or NP6)

When MSI­X is used, ‘diag hard sys interrupts’ refers to ‘PCI­MSI­edge’.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 11 ­ www.fortinet.com
IRQ mapping

IRQ mapping, also called affinity, maps for each IRQ id which CPU core should be used. There are commands available to find the affinity map :

● diagnose hardware sysinfo interrupts


○ provides stats on number of IRQ raised per core
○ is not helpful if no traffic is flowing (all counters are 0 if no IRQ were generated)
○ is easy to read

FGT1KB­8 # diagnose hardware sysinfo interrupts


CPU0 CPU1 CPU2 CPU3
0: 58370496 0 0 0 IO­APIC­edge timer
2: 0 0 0 0 XT­PIC cascade
3: 4771181 0 0 0 IO­APIC­edge serial
4: 4802 0 0 0 IO­APIC­edge serial
7: 0 0 0 0 IO­APIC­edge LCD_KEYPAD
8: 0 0 0 0 IO­APIC­edge rtc
16: 7737 0 0 0 IO­APIC­level ehci_hcd, usb­uhci, usb­uhci, ipsec0, port39
17: 15 0 0 0 IO­APIC­level libata, usb­uhci, usb­uhci, bcm56319_0, port40
18: 6 0 0 0 IO­APIC­level ehci_hcd, usb­uhci, usb­uhci, bcm56319_1
64: 30070773 0 0 0 PCI­MSI­edge np4_0
65: 0 0 0 0 PCI­MSI­edge np4_0
66: 0 0 3 0 PCI­MSI­edge np4_0
67: 0 0 0 1 PCI­MSI­edge np4_0
68: 0 0 0 0 PCI­MSI­edge np4_0
69: 0 0 0 0 PCI­MSI­edge np4_0
70: 0 0 3 0 PCI­MSI­edge np4_0
71: 0 0 0 1 PCI­MSI­edge np4_0
72: 1 0 0 0 PCI­MSI­edge np4_1
73: 0 0 0 0 PCI­MSI­edge np4_1
74: 0 0 0 0 PCI­MSI­edge np4_1
75: 0 0 0 0 PCI­MSI­edge np4_1
76: 0 0 0 0 PCI­MSI­edge np4_1
77: 0 0 0 0 PCI­MSI­edge np4_1
78: 0 0 0 0 PCI­MSI­edge np4_1
79: 0 0 0 0 PCI­MSI­edge np4_1
NMI: 58370370 58370466 58370437 58370408
LOC: 58370042 58370010 58370045 58370043
ERR: 0
MIS: 0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 12 ­ www.fortinet.com
Comments:
● It is easy to see which devices are using APIC or MSI­X
● some devices may share the same IRQ id, ex: ipsec0 (aka CP6 on this unit) and port39
● choosing port39 and port40 to send a high packet rate is a poor choice as the are only mapped to 1 core compared to the NP4 enabled
ports
● devices using APIC seems to have all their IRQ mapped to CPU0
● We see that the first NP4 (np4_0) is using 8 IRQs from 64: to 71: however because no traffic has flown through the ports since the reboot
it is hard to verify that all interrupts from np4_0 were balanced on different core
● We see clearly the 2x NP4 of the FortiGate­1000B (np4_0 and np4_1)

● ‘diagnose system cpuset interrupt <irq_id>’

The output from the command references a CPU MASK in hexadecimal format 2^(cpu_mask) corresponding to the following CPU
CPU_id CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15

# 0x1 0x2 0x4 0x8 0x10 0x20 0x40 0x80 0x100 0x200 0x400 0x800 0x1000 0x2000 0x4000 0x8000

CPU_id CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 CPU24 CPU25 CPU26 CPU27 CPU28 CPU29 CPU30 CPU31

# 0x10000 0x20000 0x40000 0x80000 0x100000 0x200000 0x400000 0x800000 0x100000 0x200000 0x400000 0x800000 0x10000000 0x20000000 0x40000000 0x80000000
0 0 0 0

An output like : ‘The cpuset of irq­114 is 0xffffffffffffffff’ means that the IRQ could be distributed on any CPU cores.
In our example from our FGT­1KB­8, we want to verify np4_1 IRQ affinity because all counters show ‘0’ in the interrupt list.
We want to verify interrupts from 72: to 79: in this case :
FGT1KB­8 # diagnose sys cpuset interrupt 72 FGT1KB­8 # diagnose sys cpuset interrupt 76
The cpuset of irq­72 is 0x1. The cpuset of irq­76 is 0x1.

FGT1KB­8 # diagnose sys cpuset interrupt 73 FGT1KB­8 # diagnose sys cpuset interrupt 77
The cpuset of irq­73 is 0x2. The cpuset of irq­77 is 0x2.

FGT1KB­8 # diagnose sys cpuset interrupt 74 FGT1KB­8 # diagnose sys cpuset interrupt 78
The cpuset of irq­74 is 0x4. The cpuset of irq­78 is 0x4.

FGT1KB­8 # diagnose sys cpuset interrupt 75 FGT1KB­8 # diagnose sys cpuset interrupt 79
The cpuset of irq­75 is 0x8. The cpuset of irq­79 is 0x8.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 13 ­ www.fortinet.com
● Listing with /proc
Another alternative (but less user friendly) is to get the information directly from /proc:

fnsysctl cat /proc/irp/<irq_id>/smp_affinity

Example for IRQ 50 :


# fnsysctl cat /proc/irq/50/smp_affinity
ffffffffffffffff

References : #137691, #294145

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 14 ­ www.fortinet.com
● Remapping : ‘diagnose system cpuset interrupt <irq_id> <cpu_mask>’
Disclaimer ! remapping IRQ affinities is generally dangerous and should not be done on customers’ units !

● Temporary affinity remap can be done through a ‘diag’ command. This change would not survive to a reboot. Reference #191000.

diagnose sys cpuset interrupt <irq id> <cpu mask>


Example: ‘diagnose sys cpuset interrupt 65 efff’ remaps IRQ 65: to CPU range : CPU0­CPU11, CPU13­CPU15:

0xEFFF = 1110 1111 1111 1111


1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1

CPU15 CPU14 CPU13 CPU12 CPU11 CPU10 CPU9 CPU8 CPU7 CPU6 CPU5 CPU4 CPU3 CPU2 CPU1 CPU0

● Permanently port to CPU core remapping:

config system npu → config port­cpu­map → edit <port> → set cpu­core <CPU_id>
There is it seems for now, no way to permanently remap IRQ affinities, there is however an interesting feature to mention that would have
impact on IRQ distribution. It is documented in #272428, this feature is to statically map a port to a single host RX queue and
therefore to a single CPU core. This feature clearly breaks the natural CPU core distribution for traffic received on 1 port.
This feature is available on NP6 only and not on all platforms.

config system npu


config port­cpu­map
edit interface "port1"
set cpu­core 1
next
end
end

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 15 ­ www.fortinet.com
IRQ impact on CPU

Each IRQ is processed by an assigned CPU. When triggered, it burns CPU resources on the mapped CPU core

Before 5.24.2, the cost of IRQs was accounted generically as ‘system’ cpu, just like any other kernel processes. This was problematic because it did
not allow to measure the cost of the IRQ compared to the cost of packet processing in the kernel itseld.

After 5.24.2, dedicated categories were added to count physical IRQ (APIC) : ‘irq’, and software IRQ (MSI­X) : ‘softirq’.
After this change, the system indicator is no longer polluted with the IRQ cost.

FGT1KB­8 # get sys status FG900D­4 # get sys status


Version: FortiGate­1240B v5.2.8,build0727,160629 (GA) Version: FortiGate­900D v5.4.1,build1064,160608 (GA)

FGT1KB­8 # get sys performance status FG900D­4 # get sys performance status
CPU states: 0% user 0% system 0% nice 100% idle CPU states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU0 states: 0% user 0% system 0% nice 100% idle CPU0 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU1 states: 0% user 0% system 0% nice 100% idle CPU1 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU2 states: 0% user 0% system 0% nice 100% idle CPU2 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU3 states: 0% user 0% system 0% nice 100% idle CPU3 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 16 ­ www.fortinet.com
ASICs and FPGAs

Both ASICs and FPGAs are hardware acceleration chips used in FortiGate platforms. They both have their specificities making them used in different
context.

● ASIC ­Application­Specific Integrated Circuit

ASICs are very fast (faster than FPGA), this is why they are used as network processors (NP) but their development is very complex and long (couple
of years) and costly. When designed, the price per unit is very cheap for a massive quantity order.
Once, burned in the silicon, an ASIC ‘program’ can’t be changed because it is made of electronic logic units connected to each others.
Consequently, bugs may not be fixable at all nor any new feature can be added. Sometime bugs can be fixed by adjusting the vast number of available
settings controlling the hardware. In other cases,changing their behaviours for a serious problem requires a new revision of the chip, call ‘respin’. This is
avoided as much as possible.

● FPGA ­ Field Programmable Gate Array

Unlike ASICs, FPGAs logic runs closer to a computer program which allow reprogramming, including field reprogramming, for instance during a
regular system upgrade of the Appliance. FPGA are not as fast as ASIC, they cost less in design but the price per chip is much more expensive than
ASICs so a high volume of chip costs far much more. They are suitable for low volume, so for very specific features not deployed on high number
of appliances.
FPGA’s are actually used during the design phase of an ASIC to test the logic and fix bugs before the silicon phase.
ASICs and FPGA may be combined to work together where the FPGA is here to extends the ASIC features via an external call. This is the case for the
FortiGate­3700DX where the FPGA extends the NP6 capability with GTP and GRE hardware acceleration.

● Fortinet use of ASICs and FPGAs

Fortinet’s approach is more to use ASICs and eventually rely on FPGA for specific functions running on specific devices. The addition of an FPGA can
be a temporary solution before the logic is added to the next version of the ASIC.
All NPs and CPs are based on ASICs.
Legacy ‘SP’, FortiDDoS ‘TP2’, FortiController ‘DP’, FortiGate­3700DX TP2 (GRE, GTP) and FortiCore are based on FPGAs.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 17 ­ www.fortinet.com
Internal and external data transmission
Several other components are required to get packets transmitted internally between the different components such as physical port and NPs as well as
outside the unit on Ethernet network.

● The Internal Switch Fabric ­ ISF


The ISF is the central point where packets are exchanged internally between the different components. It connects
physical ports to NP chips and provide the interconnections between NPs. It may also allow the attachment of interface modules.
ISF is generally made of a single non blocking 10G ports integrated switching chip from Broadcom. It may connect 10G and
1G endpoints. The main goal of the ISF is fast and reliable non­blocking 10G connectivity however some switch features
such as link­aggregation, priority queuing, packet buffering and flow­control, statistic reports are also used from the chip.
The chip runs its own micro­system and comes with software drivers driven by FortiOS. Like other components, it is attached to
the PCI bus from where it is controlled. It is initialized at boot time.
An ISF is also required to support 40G or 100G interfaces (FortiGate­3700D) : in this case, an internal LAG configured on the ISF
bundles a group of 10G ports. This LAG is not visible to users.
Detection of ports down may be done from the ISF.
Not all FortiGate are built around an ISF, for instance SoC based FortiGate (all is integrated in the SOC) as well as mid­range NP6 devices (see NP6
integration). Such platforms will therefore not benefit from the ISF features, such as flow control and priority queueing, and require physical attachments
between NPs if more than 1 NP is used.

● Fortitag
Proprietary ISF internal switch tagging labels are appended to packets, they are used to forward packets between interfaces and chips. Those
tags are referred to ‘FortiTag’. When an ISF is used, the NP knows how to add and remove the right FortiTags depending on where the packet is sent.

● LIF ­ Logical InterFace


NP don’t know about the physical port of a FortiGate (like mgmt, port1 …), instead NP deal with Logical Interfaces (LIF). Vlans can be associated with
an LIF to define a virtual interface. For an NP, every possible entry/exit is associated with an LIF. This could be Host Interfaces (to kernel), link to other
NP, tunnel interfaces. Multiple LIFs can be combined to a trunk LIF. When an ISF is used, a mapping between ISF FortiTag and LIF is made by the NP.

● PHY
The ISF chip does not provide all the low layer functions needed to send a packet on the wire according to Ethernet standard.
This is not needed for the communication between the internal components using other physical standards.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 18 ­ www.fortinet.com
For a packet to leave the Fortigate through a physical interface, some electronic is required to build the electric signal compliant with the Ethernet
standard. This is the goal of component called a PHY, it connects the link layer called MAC Medium Access Control, to physical medium like
optical fiber or copper port. A PHY chip may encode and decode signals for multiple ports. It is logically located between the ISP and the external port
connectors.

#172299 : it is possible to get PHY details for some broadcom ship.


Use ‘diag hard dev nic <port>’ and identify the sw_port, then use command like :
fnsysctl cat /proc/net/bcm56820_0/iphy/<sw_port>

rosealnlabfw01p # fnsysctl cat /proc/net/bcm56820_0/iphy/12


Port 12: link Down, speed 1000, duplex Full
ie_ctrl1 (0000)=2040 ie_stat1 (0001)=0002
ie_id1 (0002)=0143 ie_id2 (0003)=bff0
ie_anadv (0004)=0001 ie_pabili (0005)=00a3
ie_an_ext (0006)=0000 ie_ctrl2 (0007)=0000
ie_stat2 (0008)=8000 ie_tx_dis (0009)=0000
ie_rx_sdet (000a)=0000 ie_ext_ab (000b)=0000
ffde (ffde)=0000 800d (800d)=000f
8000 (8000)=2c2f 8372 (8372)=6000
8052 (8052)=04ff FFE4 (ffe4)=00a0
8329 (8329)=0011 832B (832b)=0400

● XAUI
A XAUI (pronounced “zowie”) is a 10 G attachment standard where X is for 10 (roman numeral). It is oftenly used for of the NP6 10G chip­to­chip
attachment. The NP6 4x10 G attachment are XAUI.

● Potential misleading ‘TX’ and ‘RX’ directions in the chip modules

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 19 ­ www.fortinet.com
It may look obvious, but this is actually misleading : Some modules on NP are referenced from the chip peer point­of­view, so seen from the outside
of the chip. For example, the host modules for the PCI host interface communication (towards the kernel) would be named ‘HRX’ for traffic moving from
the chip to the kernel whereas the ‘HTX’ module deal with traffic moving from kernel to chip.

● Life of a packet in the NP : PBA ­ Packet Buffer Allocator & PBUF ­ Packet Buffer

The 2 acronyms PBA and PBUF are directly linked to the life of a packet in a NP.
What happens when a packet enters an NP ?
What happen to this packet when it is processed inside the NP ?
What happen to a packet leaving the NP ?
These are the questions that will be covered here.

When a packet enters the NP, the complete packet is first copied to a central buffer called Packet Buffer (PBUF). The NP module in charge of
buffering the packets is called Packet Buffer Allocator (PBA).
At a second step, a Packet Descriptor is created and stored in memory. The packet descriptor compiles the packet L2, L3 & L4 Headers only, payload
is dropped. Because the NP does not work on packet payload, such packet descriptor is enough for NP processing.
This is generally this packet short descriptor and NOT the full packet, that will navigate through the NP functional blocks and have its information
modified on the flow (there are exception, for instance with IPSec where payload is needed). Depending on the processing, flags may also be added
to the packet descriptor.
When packet has to leave the NP, the entire packet is generated from the corresponding packet descriptor and the original payload stored
initially.
The lifetime of the packet buffer and packet descriptor associated with the packet starts when packet enters the NP and stops when the packet has left
the NP. If for some reason packet buffer or PDQ is not freed when packet has left the NP or has been dropped, a PBA leak occurs.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 20 ­ www.fortinet.com
● Packet Descriptor Queue : FIFO queues of packet descriptors with priority

NP and CP chips contain a lot of FIFO (First In, First Out) queues. Chips are made of modular functional blocks where packets transit from one block to the
other. Module blocks have ingress and egress packet descriptor queues (PDQ) to store a small amount of packet descriptors. Packet descriptors are then
pushed from one module queue to the other.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 21 ­ www.fortinet.com
In case of packet burst, or busy module, packet descriptors are likely to increase in the queues. When a queue gets full, packets would be dropped. Such
drops are accounted in the NP ‘drop table’ of the Drop Counter Engine (DCE) available with command ‘diagnose npu np6 dce x’.

Some queues may provide a mechanism to prioritize some types of packets on ingress (more or less chance to enter the queue when the queue starts to
be loaded. NP6 can defined up to 8 different priority (internally defined and not visible by user). For instance, control plane traffic such as ARP, OSPF, BGP,
IKE… may be given a higher chance to enter the queue than normal traffic data. This has nothing to do with the traffic shaping feature, it is not something a
user can modify.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 22 ­ www.fortinet.com
Hardware accelerators families overview
Families
We can split the hardware accelerators in multiple families :

● Network processors (NP)


○ Directly attached to network interfaces (direct to NP or through ISF)
○ ASIC chip
○ Performance in packets forwarding offload
○ Provide additional simple features

● Security processors (SP)


○ Directly attached to network interfaces (direct to SP or through ISF)
○ Integrated systems on a board (CPU, memory...)
○ Performance in complex features
○ Packet forwarding offload
○ Become legacy : more service capability on the NP

● Content processors (CP)


○ Not bound to interface, closer to applications
○ ASIC chips, can be see as co­processors
○ Offloads of specific CPU intensive operations

● Integrated processors (SoC, DP, TP, XP...)


○ System on Chip (SoC, SoC2, SoC3) : light CPU + light NP + light CP
○ Distribution processor (DP) : used in FortiController for session load­balancing
○ FortiDDoS (FPGA TP) : Specialized DDoS functions
○ FortiGate­3700DX – GRE, GTP acceleration with TP2 FPGA

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 23 ­ www.fortinet.com
List hardware accelerators on a unit
To find the hardware accelerators in a particular platform, several commands can be used :

● get hardware status


This is the most user friendly command for customers.

FG1K5D # get hardware status FG900D # get hardware status


Model name: FortiGate­1500D Model name: FortiGate­900D
ASIC version: CP8 ASIC version: CP8
ASIC SRAM: 64M ASIC SRAM: 64M
CPU: Intel(R) Xeon(R) CPU E5­1650 0 @ 3.20GHz CPU: Intel(R) Xeon(R) CPU E3­1225 v3 @ 3.20GHz
Number of CPUs: 12 Number of CPUs: 4
RAM: 15973 MB RAM: 15978 MB
Compact Flash: 30653 MB /dev/sda Compact Flash: 1925 MB /dev/sda
Hard disk: 114473 MB /dev/sdb Hard disk: 244198 MB /dev/sdb
USB Flash: not available USB Flash: not available
Network Card chipset: Broadcom 570x Tigon3 Ethernet Adapter Network Card chipset: Intel(R) PRO/1000 Network Connection
(rev.0x5717100) (rev.0000)
Network Card chipset: FortiASIC NP6 Adapter (rev.) Network Card chipset: FortiASIC NP6 Adapter (rev.)

Good things about the command: Warning:


● simple, easy to read ● No quantity : Both 1500D and 900D have multiple NP6 but there
● provides more than just PCI components (CPU, RAM…) is only one line “FortiASIC NP6 Adapter”. Same for CP8

● diag hard lspci ­v (or even ­vv or ­vvv)


This command provides the complete list of the registered devices on the PCI bus. This list is built during unit boot­up when all components register
themselves on the bus.
● To increase the verbosity of the command use ­v or ­vv / ­vvv for even more verbose output
● Check for potential hardware issues
We have seen in the field that a dead component like a CP would cause the FortiGate to misbehave especially with SSL or IPSec. If the
component is no longer listed in the lspci list, this is very likely to be a hardware issue. Make sure you count/see the full list of the components (a
unit may have more than one NP6, you need to count them all if you have a doubt)
From experience, we have seen that a reboot could eventually get the component back in the list but failure is likely to happen again.
If you miss a component in the lspci list, gather the output as a proof and trigger an RMA.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 24 ­ www.fortinet.com
● I see “Unknown device” in lspci output, should I be worried ?
example: 02:00.0 Network and computing encryption device: Unknown device 1a29:4338
No. It only means that a text definition is missing in the pci components device description field for this PCI Id. This field is not always up­to­date.

● Well known Fortinet PCI identifiers


The following table lists Fortinet well known CPI devices IDs and description as provided with the lspci output.
PCI ID name lspci output Comments

1a29:4006 CP6 Network and computing encryption device: Unknown device ­

1a29:4307 CP7 Network and computing encryption device: Unknown device ­

1a29:4338 CP8 Kernel driver in use: cp8 ­

­ CP9 ­ ­

1a29:0701 FA2 (rev1) ­ ­

1a29:0702 FA2 (rev2) ­ ­

1a29:0703 NP2 ­ ­

1a29:0702 NP4 ­ ­

1a29:4339 Soc2/NP4light ­ ­

1a29:4e36 NP6 ­ ­

­ Soc3/NP6light ­ ­

10b5:8114 SP1 (CE4) ­ ­

10e3:8114 SP2 (XG2) ­ ­

184e:1004 SP3 (XH0) ­ ­

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 25 ­ www.fortinet.com
● lspci output sample

FG900D­4 # diagnose hardware lspci FG900D­4 # diagnose hardware lspci ­v


00:00.0 Class 0600: Device 8086:0c08 (rev 06) 00:00.0 Class 0600: Device 8086:0c08 (rev 06)
00:01.0 Class 0604: Device 8086:0c01 (rev 06) Flags: bus master, fast devsel, latency 0
00:01.1 Class 0604: Device 8086:0c05 (rev 06) Capabilities: [e0] Vendor Specific Information: Len=0c <?>
00:01.2 Class 0604: Device 8086:0c09 (rev 06)
00:14.0 Class 0c03: Device 8086:8c31 (rev 05) 00:01.0 Class 0604: Device 8086:0c01 (rev 06)
00:1a.0 Class 0c03: Device 8086:8c2d (rev 05) Flags: bus master, fast devsel, latency 0, IRQ 5
00:1c.0 Class 0604: Device 8086:8c10 (rev d5) Bus: primary=00, secondary=01, subordinate=04, sec­latency=0
00:1c.1 Class 0604: Device 8086:8c12 (rev d5) Memory behind bridge: ef600000­ef7fffff
00:1c.2 Class 0604: Device 8086:8c14 (rev d5) Capabilities: [88] Subsystem: Device 8086:0000
00:1d.0 Class 0c03: Device 8086:8c26 (rev 05) Capabilities: [80] Power Management version 3
00:1f.0 Class 0601: Device 8086:8c56 (rev 05) Capabilities: [90] MSI: Enable­ Count=1/1 Maskable­ 64bit­
00:1f.2 Class 0106: Device 8086:8c02 (rev 05) Capabilities: [a0] Express Root Port (Slot+), MSI 00
00:1f.3 Class 0c05: Device 8086:8c22 (rev 05)
01:00.0 Class 0604: Device 111d:806c (rev 02) 00:01.1 Class 0604: Device 8086:0c05 (rev 06)
02:02.0 Class 0604: Device 111d:806c (rev 02) Flags: bus master, fast devsel, latency 0, IRQ 5
02:03.0 Class 0604: Device 111d:806c (rev 02) Bus: primary=00, secondary=05, subordinate=05, sec­latency=0
03:00.0 Class 1000: Device 1a29:4338 Memory behind bridge: ef800000­ef9fffff
04:00.0 Class 1000: Device 1a29:4338 Capabilities: [88] Subsystem: Device 8086:0000
05:00.0 Class 1000: Device 1a29:4e36 Capabilities: [80] Power Management version 3
06:00.0 Class 1000: Device 1a29:4e36 Capabilities: [90] MSI: Enable­ Count=1/1 Maskable­ 64bit­
07:00.0 Class 0200: Device 8086:10d3 Capabilities: [a0] Express Root Port (Slot+), MSI 00
08:00.0 Class 0200: Device 8086:10d3
09:00.0 Class 0c03: Device 10b5:2380 (rev ab) 00:01.2 Class 0604: Device 8086:0c09 (rev 06)
Flags: bus master, fast devsel, latency 0, IRQ 5
Bus: primary=00, secondary=06, subordinate=06, sec­latency=0
Memory behind bridge: efa00000­efbfffff
Capabilities: [88] Subsystem: Device 8086:0000
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable­ Count=1/1 Maskable­ 64bit­
Capabilities: [a0] Express Root Port (Slot+), MSI 00
../..

● PCI bus address scheme : <bus>:<slot>.<function>


● PCI class codes (Class 0604…) : https://www­s.acm.illinois.edu/sigops/2007/roll_your_own/7.c.1.html

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 26 ­ www.fortinet.com
Hardware acceleration features
ipv4 unicast session acceleration
This NP feature is probably the most useful one when using hardware acceleration. It may look simple in the first place but understanding it in detail
requires some fundamentals concepts that will be detailed here.

● Relevant hardware components:

Available since the first NP (FA2) and all SPs

● Principle of hardware acceleration:

Session hardware acceleration consists in intercepting packets at an early stage when they enter the FortiGate so they don’t have to be processed
by main CPUs. The physical packet interception is done by the Network Processor, just after being received from the network interface, however it
is still the kernel that sends the interception order to the NP. New session first packets can’t be accelerated because kernel need them for
session creation. The same goes for any packets of the session that would trigger a session state change.

First packets and traffic security profiles:


Just like the first packets of the session, packets for sessions requiring inspection from security profiles can’t be accelerated. Some specific
features enabled on interfaces may also not permit hardware acceleration. Generally, hardware acceleration is disabled when the kernel needs to
‘see’ the packets to enforce its filtering. Other types of offload techniques involving the kernel may still be applicable , like N­Turbo acceleration.
Over time, more processing capabilities are added to NPs so some feature applied to traffic may still allow NP hardware acceleration like SYN proxy
with NP6 so hardware acceleration criteria may be different from an NP to another. Refer to the section in this document detailing each NP.

● Benefits:
○ The goal is to reduce the CPU cost of processing packets for which a sessions already exist.
○ Significant drop of system CPU usage
○ Reduce the load on PCI buses
○ Shorten the packet latency when traversing the FortiGate

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 27 ­ www.fortinet.com
● recognizing an offloaded session

Hardware accelerated sessions are visible in the firewall session list from the line “npu info” at the bottom of the session. Session entry details are
covered in document “FortiGate System” so we only talk about the npu info line here (copy of System Document)

npu info: flag=0x81/0x81, offload=1/1, ips_offload=0/0, epid=2/2, ipid=2/2, vlan=32977/32977

npu info line is optional in a session list entry. It is only displayed when the session is passing through interfaces from the same NP.
epid and ipid are non­zero when offload is taking place.

● flags: The flag field encodes certain attributes of the session as it relates to NPU offload regardless of whether the session is eventually
offloaded. Each bit represents one piece of information
(src: https://askbot.fortinet.com/question/682/what­does­the­flag­field­of­a­session­npu­info­mean )

Bit # Meaning

0 Session has destination MAC

1 If offloaded, perform IPSec

2 If offloaded, perform IPIP encapsulation

3 If offloaded, perform CAPWAP encapsulation

7 Flag information is valid/up­to­date

Example with 0x81:


0x 8 1

#7 #6 #5 #4 #3 #2 #1 #0

8 4 2 1 8 4 2 1

1 0 0 0 0 0 0 1

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 28 ­ www.fortinet.com
● offload: (forward_direction)/(reverse_direction). 0 when not offloaded

# Accelerator chip

0 not offloaded

1 (*) NP1 (FA2) or any chip before #151934

2 NP1A (FA2)

3 NP2

4 NP4

5 SP1/SP2

6 NP4Light/Soc2

7 SP3

8 NP6

(*): See #151934 : Before 4.3.2, value ‘1’ in offload field meant “generically hardware accelerated” without details on the accelerator chip.

● no­ofld­reason:
Another line was added in the session list to provide more information on the reason why a session passing through NP interfaces is not
accelerated. This line is discussed in Features breaking hardware acceleration

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 29 ­ www.fortinet.com
● packets exceptions

Even if the session acceleration has been programmed for a session, some packets from the flow may bypass the hardware acceleration to reach the
kernel. Whenever NPs receive a TCP packet with a FIN or RST flag set, it is forwarded to the kernel regardless if a forward entry exists or not. This is
a requirement to allow the kernel to change the session state. The kernel would then notify the NPs to delete the corresponding forward entries
associated with the deleted session.
There are other cases of packets with a need of hardware acceleration exemption :

● packets requiring the sending of an ICMP message (ex : TTL=1)


NPs do not have the capability to generate ICMP error messages therefore packets requiring error handling are transmitted to kernel.
(this would need to be confirmed with a lab. NP may not send the packet and only notify the kernel to send an ICMP).

● Packets arriving fragmented on inband interface are pushed to the kernel.


Warning : A packet arriving not fragmented but requiring fragmentation because the outbound MTU is shorter does NOT need to be pushed to
the kernel. This egress fragmentation is performed on the NP itself.

● session revalidation

As documented in the “FortiGate System” document, a routing change or network­related config change may cause the removal of hardware
acceleration entries in the NP. This is legitimate, for instance a routing change may simply route the traffic to another interface and a config
change may deny a traffic which was previously authorized. The session revalidation is a pure system firewall concept and also applies on non
hardware accelerated session. However, it has a consequence on hardware acceleration : whenever the kernel flags a session for revalidation by
applying the “dirty” flag, the corresponding hardware acceleration entries are removed from the NP.

● Impact of hardware acceleration on bandwidth/packet volume and counters

See (Traffic accounting)

● fast­path potential misunderstanding

The term “fast­path” should be avoided when referring to hardware accelerated sessions because it may be interpreted in two way. There is also a
fast­path in the kernel which has nothing to do with hardware acceleration. When the kernel handles the traffic, it first performs a hash on the packet to

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 30 ­ www.fortinet.com
see if a session already exists. If a session matches, the packet goes in the (kernel) fast­path because there is no need for policy lookup. If the packet
does not match a session, it goes “slow­path” (routing lookup, policy lookup…). There is also another “fast­path” when using N­Turbo.

● Forward entry:

The forward entries contain the required information for the NP to process hardware accelerated flows. They are stored in fast memory close to the NP
chips. Each forward entry has a key. This key is made from hashing a 5­tuples : src_ip, src_port, dst_ip, dst_port, proto_number defining the
flow. Forward entries also have additional information required when packets need to be recreated for egress, like the original source mac
address (needed for transparent mode) or the destination MAC address to use. It also contains information necessary to process packets such as
:
● Timestamp (see session keepalive below)
● L3 and L4 for NAT purposes
● MTU
● Ipsec SA reference for encryption/decryption as well as reference to keys for integrity check
● outband interface reference: the logical virtual interface (LIF) and the associated vlan telling where to send the packet to
The vlan LIF and vlan distinction is required to find out which vdom the forward entry belongs to because forward entry contains no vdom
reference (vdom is not part of the 5 tuple hash).
● tunneling information (for instance v4/v6 encapsulation in NP6)
● processing action

Overall, a forward entry (for a single direction) 128 bytes long.

Forward entries are unidirectional so a hardware accelerated bi­directional flow requires 2 forward entries. If the FortiGate has more than 1 NP,
these two unidirectional forward entries may be programmed on different NPs. Two NPs don’t share the same forward entries tables, they are unique to
each NP.

Forward entries are created following a hardware acceleration request from the kernel firewall module when the session is created.
The firewall module does not have the knowledge of how many NPs are available, the request is sent to the ‘NP driver’ module running in the kernel.
The NP driver then takes care of programming the two forward entries to the correct NPs for the session.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 31 ­ www.fortinet.com
● Primary and secondary tables (PHT & OFT):

Forward entry lookup in the NP may be done in 1 or 2 stages and requires 2 tables : the primary table (PHT) and the overflow table (OFT).
The primary table uses a 5­tuples hash to point to an index. The index is one entry in the session table.
Primary table is checked first. The 5 tuple hashing function may return a similar key for multiple sessions. In such situation, a second lookup in
the overflow table is performed to identify from these sessions which one corresponds to the processed packet. In this operation, each field of the
forward entries is compared to the packet header until the match is found.
Performance is higher if the primary table is smaller. The maximum number of linked entries in the overflow table is the table “depth”.

The following example shows a table depth of 3, so at least 3 entries have the same session key. A packet hashed with this key would need potentially
to try up to 3 entries until the match is confirmed by comparing the session fields.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 32 ­ www.fortinet.com
● Session search engines (SSE)

The session search engine is at the heart of the NP hardware acceleration, it is the functional block inside the NP responsible for creating a hash
for each received packets and find out if a forward entry match with it in the tables. When a match is found, the packet is processed according to
the definition of the forward entry. The hash function in SSE is similar to CRC32 (110 bits). There is generally more than one session search engine
per NP (2 in NP4 and NP6). Different distribution mechanisms exist to balance packets across multiple SSE, however all packets belonging to the
same direction of the session must be processed on the same SSE. Before being processed in SSE, packets descriptors are buffered in a FIFO
queue in front of the SSE. In case of packet burst, packet may be dropped in the NP if the FIFO queue is full. NPs have a ‘diag hard’ command to report
SSE stats like installed sessions or dropped packets in the queues (refer to to NP chapter)

● Session hardware acceleration setup steps

The setup of hardware acceleration has multiple steps. The steps depend on the traffic flow, for instance a unidirectional flow would only program a
single forward entry but two for a bidirectional session. The two forward entries are not programmed at the same time. Which packet triggers the
programming of the forward entries depend also on the protocol used:

● UDP : The original direction is programmed after the first packet is seen by the kernel.
The reverse direction is programmed when the first packet from the reverse direction is seen.

● TCP : Nothing happen at the reception of the first SYN packet. At the reception of the second packet (SYN/ACK) by the kernel, the reverse
forward entry (Server → Client) is programmed first. Then, when the third packet (ACK, client → Server) is received, the original
direction Client → Server forward entry is programmed.
Some cases were seen where this sequence may create out­of­order packet during the session setup. To avoid it, a CLI parameter has been
added (config system npu → set delay­tcp­npu­session enable|disable*), refer to #365497.

Note : It is a common belief that TCP session acceleration all takes place at the 3rd packet from the session setup. This is incorrect.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 33 ­ www.fortinet.com
● The session push

To trigger the creation of a forward entry in the NP, the kernel NP driver sends a special packet to the NP host interface (via the PCI bus), the
‘session push’ packet. This special crafted packet does not contain data from the user traffic but only administrative data from the session push
packet. Session push packets are intended for the SSE that will handle the remaining traffic. Upon its reception from the SSE, the forward entry is
programmed and the push packet discarded.

NP4 and NP6 implementation are a bit different. In NP6, the session push packet follows the exact same path as a regular data packet. Because
of this, it benefits from the natural distribution of IRQs across CPU cores so the session push cpu cost is distributed. The session push packet is sent
just before the corresponding data packets and follows the same path, uses the same queues and raises IRQ on the same CPU core. The session push
packet is expected to arrive before the data packet. In NP4, a dedicated command channel is used on the host interface and unlike NP6, this
command channel only raises IRQ on a single CPU core (the first one associated with the NP4 half). Upon burst of commands, this architecture can
cause an increase of the command queue and delay hardware acceleration programming, resulting in a few additional packets using slow path. Another
consequence is a not so well balanced system load on the 4 cores allocated to a half NP4.

Related reference : #365497 (possible packet out­of­order with NP6 during TCP session establishment)

● Session update (NP to Kernel session keepalive)

When traffic is hardware accelerated, the kernel has no visibility of packet ‘shortcuted’ in the NP. As a result, the kernel firewall session­timer would
decrease even if packets do flow. To keep the firewall session alive as if packets were seen. One update message is required per session. Two different
scheduling of updates for NP4 and NP6.
NP4 sends keepalive messages to the kernel every 40 sec for each live session.
With NP6, a new mechanism has been added where the session are updated based on the session expiration timer (established state timer for
tcp). The update is triggered when the session lifetime reaches a random value between 1/2 and 4/5 of the expiration timer. This is now the
default behavior on NP6. For NP6, the session update behavior is configured in ‘config system np6’, see “configuration options impacting NP6”
(#386626)

Session keepalive messages from the NP are asynchronous so not all session are updated at the same time : All NP entries have a timestamp to avoid
triggering session updates at the same time (unless the session were created at the same time). Session statistics are also updated by the update
message. When the session is deleted, NP generates an update message to update statistics.

● NP6 session stats update

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 34 ­ www.fortinet.com
Prior to NP6, NPs couldn’t provide accurate session statistic update (nb of packets and bytes) because NP was missing the capability to counter traffic.
This feature was added in NP6 (see Per session traffic accounting and traffic distribution). From testing (5.6.3), we see that kernel firewall session
update its statistic counter with the session update message so there is no specific message of statistic counter updates, it comes with session update
message.

# traffic logging is enabled on the policy (NP6 accounting is therefore automatically enabled in this version)
# constant telnet traffic passing through the NP session however the kernel session is not updated in real­time

# session dump 3 seconds before reaching half of the session­ttl (300/2 = 150)

FG1K5D­7 # diagnose sys session list

session info: proto=6 proto_state=01 duration=152 expire=147 timeout=300 flags=00000000 sockflag=00000000 sockport=0
av_idx=0 use=4
origin­shaper=
reply­shaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=log may_dirty npu f00
statistic(bytes/packets/allow_err): org=112/2/1 reply=60/1/1 tuples=2
tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0
orgin­>sink: org pre­>post, reply pre­>post dev=10­>32/32­>10 gwy=10.5.21.2/10.100.5.2
hook=pre dir=org act=noop 10.100.5.2:2828­>10.5.21.2:23(0.0.0.0:0)
hook=post dir=reply act=noop 10.5.21.2:23­>10.100.5.2:2828(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=0
serial=0007e0ab tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=153/131, ipid=131/153, vlan=0x0000/0x0000
vlifid=131/153, vtag_in=0x0000/0x0000 in_npu=1/2, out_npu=1/2, fwd_en=0/0, qid=4/4
total session 1

# session dump immediately after half of session lifetime

# statistic counters are updated along with the session expiration timer reset

FG1K5D­7 # diagnose sys session list

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 35 ­ www.fortinet.com
session info: proto=6 proto_state=01 duration=154 expire=299 timeout=300 flags=00000000 sockflag=00000000 sockport=0
av_idx=0 use=4
origin­shaper=
reply­shaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=log may_dirty npu f00
statistic(bytes/packets/allow_err): org=10172/193/1 reply=7278/103/1 tuples=2
tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0
orgin­>sink: org pre­>post, reply pre­>post dev=10­>32/32­>10 gwy=10.5.21.2/10.100.5.2
hook=pre dir=org act=noop 10.100.5.2:2828­>10.5.21.2:23(0.0.0.0:0)
hook=post dir=reply act=noop 10.5.21.2:23­>10.100.5.2:2828(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=0
serial=0007e0ab tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=153/131, ipid=131/153, vlan=0x0000/0x0000
vlifid=131/153, vtag_in=0x0000/0x0000 in_npu=1/2, out_npu=1/2, fwd_en=0/0, qid=4/4
total session 1

# Next statistic counter update will also need to wait for the next half lifetime of session­ttl

● Forward entry deletion

When a session is removed from the firewall function in the kernel, the correspondent forward entries are also removed from NP.
Forward entries are also deleted from NP upon a routing lookup change or a policy configuration change. After session revalidation, forward entries are
re­installed to NP.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 36 ­ www.fortinet.com
● Hardware accelerated sessions across 2 NPs

As mentioned before, the 2 unidirectional forward entries resulting from a bidirectional accelerated session may be created on 2 different NPs.
We will review how packets are flowing. There are two different cases, the first one “cross NP acceleration with ISF” is when the FortiGate platform has
a Internal Switch Fabric (ISF) between NPs and ports. This is generally the case for high end unit, and always the case when NP4 are involved. The
second option is a possible option on “medium­range” units equipped with the NP6 form­factor (3x 10G + 16x 1G, see NP6), this is “cross NP
acceleration without ISF”. For each scenario, the two cases “hardware accelerated or not” will be considered.

● cross NP acceleration with ISF

● Non hardware accelerated traffic

­ The yellow flow on the left has ingress and egress ports attached to the
same NP. Traffic reaches the CPU using the PCI­e bus, making use of system
CPU for ‘Soft IRQ’ first (see MSI­X interruptions), then packets are processed
by the kernel for delivery.

­ The green flow on the right has ingress and egress ports attached to
different NPs on the same ISF. Each NP sees a unidirectional traffic. Traffic
flows to kernel via PCI­e bus and also make use of system CPU.

● Hardware accelerated traffic

­ The yellow flow follows the same path across the ISF but is
‘shortcuted’ inside the NP so does not make use of any CPU at all.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 37 ­ www.fortinet.com
­ For the “cross­np” green flow, we need to decompose each direction:
original direction in green (left), and reverse direction in blue (right) and
focus on which NP 10G xaui the traffic leaves the NP. The rule is : The NP
10G xaui used to egress from NP is the same xaui position as the one
where the traffic egress port is attached.

● cross NP acceleration without ISF (NP6 only)

● Non hardware accelerated traffic

Traffic flow is similar to “with ISF” scenario

● Hardware accelerated traffic

Without ISF, the only way to push traffic over to the other NP port is
to pass by the 10G link inter­np6 link. The example shown here is a
FortiGate­900D with 10G ports + 6x 1G ports per NP so it there is a
potential to oversubscribe the 10G inter­np6 link.

● Important : Lags with interface members attached to 2 different NP6 is not


supported (can’t be configured)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 38 ­ www.fortinet.com
● hardware acceleration with 2 asymmetric wan interfaces

This scenario involves 3 interfaces : a single interface on client side and 2 interfaces on server side (wan1 and wan2). Packets of the session egress on
wan1 but reply packets are received from wan2. This scenario is supported by the kernel : a single session is used and session is stateful however this
will be problematic with hardware acceleration. The sessions will keep changing state from non accelerated to one way accelerated and will constantly
be dirtied. It is not possible to support a stable hardware acceleration of this kind of sessions

In more details :
­ the session can be offloaded if multiple consecutive packets arrive in the original direction
­ as soon as a packet is received in the reply direction on an interface other than the one used to egress, then the session is marked ‘dirty’ and
the NP hardware accelerated session is removed (as expected)
­ if the session is bidirectional, it may end up perpetually in the dirty state or go in an out of the dirty state.

references :
#0464329: B1547 : no hardware acceleration when using different ports for egress to server and ingress from server
top3 #464594

● public references

http://docs.fortinet.com/uploaded/files/2855/fortigate­hardware­acceleration­54.pdf
The FortiGate Hardware Acceleration guide version 5.4.1 provides architecture diagram for all platforms.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 39 ­ www.fortinet.com
Link aggregations

Session hardware acceleration is also available in the case of aggregated interfaces. NP allows link aggregation of interfaces attached to a single NP
but also when the aggregation is made of interfaces distributed over different NPs.
Using link aggregation across multiple NP provides a way to increase the performance by making use of the power of more than 1 NP.

The command showing interface mapping to NP XAUI ( “diag npu np6 port­list" ) does not consider if link aggregation is configured on not. With
redundant interface or lag, traffic received from an interface may be potentially sent to a different NP or XAUI than the one the interface is attached.

In a similar way, if relying on the sw_np_port from ‘diag hard dev nic’ to see on which NP/XAUI the interface is linked, may provide erroneous
information in case of lag or redundant interface : in this case, the reference returned is a ‘trunk id reference’ and not a ‘port reference’ (#389055).

Link aggregation is supported on NP, however control­plane LACP traffic is handled through the kernel.
The CLI command (in management vdom) ‘diag netlink aggregate name <aggregateName> ’ ligne ‘npu: y’ provides confirmation that
hardware acceleration is performed.

FGT (root) # diag netlink aggregate name agg1


LACP flags: (A|P)(S|F)(A|I)(I|O)(E|D)(E|D)
(A|P) ­ LACP mode is Active or Passive
(S|F) ­ LACP speed is Slow or Fast
(A|I) ­ Aggregatable or Individual
(I|O) ­ Port In sync or Out of sync
(E|D) ­ Frame collection is Enabled or Disabled
(E|D) ­ Frame distribution is Enabled or Disabled
status: up
npu: y
oid: 6
ports: 2
distribution algorithm: L4
LACP mode: active
LACP speed: slow
LACP HA: enable

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 40 ­ www.fortinet.com
● Port hashing
On NP4, the choice of which port should be used to send traffic (so the hashing) is not done by the kernel. This choice is programmed with the
session creation in the NP (#126252).

On NP6, a trunk can be defined with and LIF associated to the trunk. The NP6 SSE makes a port­to­trunk lookup from the mapping table to
define if the port is associated to a trunk, if so, then, the choice of egress port is done through a lookup in the trunk table. The programming of trunk
tables is done from FortiOS and can be programed to control traffic distribution. The port resolution is made so that if there is no NAT, forward and
return packet are using the same port.
Packet to be sent to a LAG has destination LIF corresponding to a LAG → Port­to­trunk resolution changes the destination LIF with the LIF of the
chosen egress port.

● asic helper: y

Refers to the feature on the ISF that would distribute packets received on FGT port that are in a LAG to multiple NP6 XAUI involved in the lag.
#218813 defines asic­helper : The NP6 introduces a new mechanism to help link fail­over for bond interface. The mechanism is disabled if ASIC
helper is set to disable. Basically, there are two phases during link fail­over, link down detected and traffic fail­over to a good member in bond
interface. The new mechanism doesn't help first phase, link down detection. But it help second phase, fail over to a good member. What's meaning
from application point view? Fail­over time is reduced, For TCP connection, the connection isn't requested to establish again. The asic­helper has
to be disabled before you can add any 40G ports into an aggregated interface

● Mantis references related to link aggregation:

● #218813 Improve failover on lag interfaces (thanks to asic­helper)


● #246488 : about lags of ports connected to multiple NP6
○ even if 1 fgt port goes down, the session that were originated on the NP6 for the lost ports can carry­on on the same NP6.
○ Ingress packet (from ISF) are distributed toward the multiple NP6s following a 5­tuple hash
○ Egress packets : A NP6 trunk is created to distribute egress traffic to different LAG member
● #290597 Cross­np6 link aggregation of redundant interfaces are not allowed

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 41 ­ www.fortinet.com
Diagram representing processing of inbound packet received on FGT’s port towards NP6 when port is member of a LAG.
The lag is defined in the ISF, the asic­helper makes the decision to forward the packet to one of the 2 NP6 XAUIs attached to either port1 or port2.
The balancing between NP6s is based on a 5­tuple hash. Inside the ISF, a coretag corresponding to LAG interface is used.
Once arrived on NP6, the ITP module traduces the LAG coretag to LAG LIF for NP6 processing.
(information confirmed by Yi)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 42 ­ www.fortinet.com
Diagram representing processing of outbound packet when the destination fortigate interface is a lag : Inside NP6, packet destination is a LAG LIF.
SSE converts the LAG LIF to a fortigate port LIF attached to the NP XAUI (trunk­to­port conversion). In ETP, the port LIF is translated to a coretag for
transport inside ISF. The decision to egress port is not influenced by the ISF LAG, it is made only by NP6 from its trunk­to­port conversion.
(information confirmed by Yi)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 43 ­ www.fortinet.com
Lag enhancement feature

Introduced in top3 #464594 on special branch br_5­4_nokia (5.4.7) for FortiGate­3800D. Implemented also in non service module units for FortiGate­3700D in
top3 #469106. Merged in 6.2, merges planned for 5.6.5 and 6.0.2 for FG38xxD, 39xxE, 5001E, 6K, 7K + 1200D → 3700D
The goal is to reduce NP6 EHP drops caused by egress collision from multiple sources to one single egress XAUI.
The solution is to force NP6 traffic to egress on the same XAUI from which traffic was originally received from.
This limits the congestion of all traffic received from the ISF. A possible collision still exists between egress kernel trafic with NP6 traffic but this condition
is less likely to happen and could still be handled by NP6 buffers. This new distribution allows a control of egress as a direct consequence of ingress
control on the same XAUI. Ingress XAUI congestion is much better handled thanks to ISF larger buffers.

­ Large ISF RX buffers on ingress directly control XAUI egress


­ No collision possible for traffic received from multiple XAUIs
­ Still possible possible with egress kernel (slow path) traffic

NP service module platforms (FortiGate­3800D) :

no configuration required

Non NP service module platforms (FortiGate­3700D) :

config system npu


set lag­sw­out­trunk enable
end

→ reboot required
→ also automatically changes lag interface mode :

config system interface


edit <LAG_NAME>
set algorithm npu
next

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 44 ­ www.fortinet.com
What is the best choice or ports in a lag ?

Scope :
This section applies to platforms having a fixed binding between interface and NP6 port XAUI (FortiGate­1500D, FortiGate­3700D…),
It does not apply to platform like Fortigate­3800D that don’t have such binding and where all NP6 XAUIs are within a LAG on the ISF.

Logic for choice of lag ports :

● if not using lag enhancement, applicable to unit with ISF (1200 to 3700D)

­ Distribute lag ports on each NP6 so make sure each NP6 can be used so the pressure of the traffic is distributed on multiple (all) NP6s.
­ In a lag, choose different XAUI id, even if spread amongst different NP6s.
­ use non­connected interfaces in the lag definition to distribute on even more NP6 if needed.

● with lag enhancement : config system npu → set lag­sw­out­trunk enable feature (5.6.5, 6.0.2, 6.2), applicable to unit with ISF (1200 to 3700D)
­ use all possible XAUI from all NP6 however do not mix 10G and 40G lags on the same NP6
­ use non­connected interfaces in the lag definition to distribute on even more NP6 if needed
­ In case of multiple 10G LAG, also use all XAUI even if already used by another NP6.
Egress congestion is control by ingress (what egress a xaui has ingressed from this xaui so the throttling is managed at ingress using ISF
buffers)
­ Note : not prefered for ipsec concentrator

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 45 ­ www.fortinet.com
Example from Nokia using set lag­sw­out­trunk enable feature

In this example :

­ Dedicated 2x NP6 for 2x 40G port lag, not mixed with the 2x 10G lag XAUIs
­ Each 10G lag has 8 ports configured but only 4 connected to benefit from 8 XAUIs on 2 NP6s on ingress
­ The 2x 10G LAGs are sharing XAUIs allowing a full distribution on all 2 NP6 XAUIs, the collision of the 2 LAG traffic is done on ingress

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 46 ­ www.fortinet.com
DSCP marking
DSCP marking consists in setting up the DSCP bits on the received packet by the NP itself allowing hardware acceleration.
Seems to be already supported and NP4 and also available in NP6 hardware (to be confirmed with NP6 with lab testing)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 47 ­ www.fortinet.com
Per session traffic accounting and traffic distribution
● per session traffic accounting may not be accurate with hardware acceleration, same with traffic distribution

NP may not have the capability or may not be explicitly configured to report accurate accelerated traffic volume or packet volume. In this case, as
session accounting is done by the kernel and because accelerated traffic is not visible by the kernel, the reported traffic volume and packet rate
provided at the termination of the session may be wrong. Kernel typically reports the few packets seen during session setup and terminaison
which is not representative of the traffic that has flown in the session. Before the NP6 (FA2, NP2, NP4), NPs did not have the capability to report
the session traffic volume accurately.

NP6 has this capability if configured to do so, with impact on performance. To update the kernel statistics, NP generates session update (default
every 40 seconds.

● NP2, NP4 : no solutions:


○ reference #144856 : “NP4 does support per­session accounting. This can’t be fixed”
○ reference #140323 : “NP4 doesn't support packet­distribution counters.”

● NP6 and SoC3 (NP6Light) :

NP6 and SoC3 have been improved to allow upon a configuration change accurate per-session accounting. Enabling per-session accounting
may cause a CPS drop up to ⅓. When the feature is enabled, for each packet received, the NP need to update its session counters.

There are changes of behavior and default settings done in 5.2 and 5.4 branch (see below)

○ config system np6 → edit <np_id> → set per-session-accounting disable|enable|enable-by-log

In 5.2, default setting is “set per-session-accounting disable” without ‘enable-by-log’ option


In 5.4, default setting is “set per-session-accounting enable-by-log”

○ since 5.4.0 : automatic switch as soon as log enabled on the policy (#268426, #273376) :

“Since the session accounting is most useful in traffic log, we should tie it to traffic log. Specifically, if in a policy, traffic log is not enabled, we
don't enable traffic accounting on NP6, this will help to preserve the NP6 performance in terms of throughput. When in a policy, traffic log is

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 48 ­ www.fortinet.com
enabled, we automatically enable session accounting for all sessions allowed from that policy. This will help the traffic log to record correct
bytes/packets information. Alls this should be transparent to the user, so the benefit is that end user will see correct accounting information from
traffic log (with or without NP6)” (quoting of #268476)

● upgrade from 5.2 to 5.4.x changes per-session-accounting default setting from ‘disable’ to ‘enable-by-log’ (#273377).

When upgrading from 5.2, the npu setting per-session-accounting default settings changes from ‘disable’ to ‘enable-by-log’
This causes a potential risk of performance drop in CPS after upgrade.

● Impact on CPU : may lower CPS up to ⅓ (#251207) : “ NP6 supports per-session accounting. But it brings extra overhead on packet forwarding
rate, because we need to write into DDR memory where the session is saved. Packet forwarding rate may lower by 1/3 when it is enabled for small
packet flow. For big packets(>1K), the impact is not noticeable. It doesn't have impact on session offloading.”

● Kernel stats update on living sessions with NP6 accurate­session­accounting


Kernel session statistic counters (nb packets and bytes) are updated along with the NP6 session update so there is no special NP to Kernel
message for statistic update. Since session update may only start at half of the session­ttl, it is expected to see the first stats counter update on
the kernel only when session duration reaches half of the session­ttl lifetime

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 49 ­ www.fortinet.com
ipv6 unicast session acceleration

to be detailed

IPv4, IPv6 tunneling and translation

to be detailed

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 50 ­ www.fortinet.com
IPSec encryption/decryption and hashing
ESP packet encryption, decryption and hashing can be performed by the network processor.
Encryption and hashing algorithms supports depends on NP model and revision (see the specific NP chapters).

This chapter is covered extensively in Stephane Hamelin’s IPSec guide for TAC hosted at :
FortiVision → GCSS → TAC → TAC related Trainings → IPSec VPN Training Material

We will only review a few key points related to hardware acceleration in this chapter

● IPsec offloading without session offloading


It is not necessary to have the session fully hardware accelerated to have IpSec encryption/decryption and hashing done on the NP. Encryption and
decryption could be made independently in the kernel or in the NP. A session could also be fully accelerated on the NP, from packet shortcut to ipsec
encryption, encryption and hashing with no packet flowing by the kernel.

When non NP ports are involved, the CP can be used to offload the kernel from cryptographic functions.

● required configuration
Since FortiOS 5.0, hardware acceleration on NP does not need to specify the local gateway ip as it used to be in 4.3.
Hardware acceleration is the default choice in the phase1 configuration (set npu­offload enabled on phase1)

● SAs installation
IPSec SAs are installed on the NPU with the first packets flowing. Both SA may be installed during two different phases. If a tunnel is up but no packets
have flown, it is expected that tunnel list reports no hardware acceleration because no SA has been installed yet.

● outer DF­bit value, IPSec post­fragmentation


Refer to IPsec guide for TAC.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 51 ­ www.fortinet.com
● Forwarding session

IPsec acceleration SA are only programmed on a single NP even if the tunnel is attached to a link
aggregation or if the phase2 may see traffic from different incoming port. In this case the IPSec
accelerated outbound packet may not be received on the NP carrying the encryption SA. For this
scenario, the NP6 driver also installs ‘forwarding sessions’ on the NPs attached to the other ports (of
the lag for instance). The goal of the ‘forwarding session’ is to forward the packet to the NP handling
the encryption SA. This is done through the ISF and it is transparent to the kernel.
Forwarding session are not processed by NP6 SSE by the FDB module
Forward session are accounted in NP6 session stats are regular session, they can’t be distinguished.
The cost in NP resources of the forwarding session is much less than the cost of
encryption/decryption but NP buffers still have to process traffic.

● Encryption and decryption SA NP

From Steph tests in labs with 5.4.1 with FG1500D : encrypt SA and decrypt SA are installed on
the NP6 linked to interface on public network (NP_2 here). A Forward entry is required on
NP6_1 to push clear text traffic received on NP6_1 to NP6_2 for encryption. If lags are used,
encrypt and decrypt SA may be installed on different NP)

● 5.4 session with NP6 (does not work for NP4)


Additional info on the session list has been added to tell on which NP were the forward entry
AND the SA installed. Format in_npu=<SA>/<FWD> (forward direction) and
out_npu=<SA>/<FWD> (reply direction). To know if the SA is the encrypt or decrypt direction,
the diag vpn tun list output dec_npuid=1 enc_npuid=3 is needed. See the following example.
Need to remove 1 for the NPU id :

FG74E43E16000055 [FPM04] (ipsec_s) # diagnose sys session list

session info: proto=1 proto_state=00 duration=114 expire=30 timeout=0 flags=00000000 sockflag=00000000 sockport=0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 52 ­ www.fortinet.com
av_idx=0 use=3
origin­shaper=
reply­shaper
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/swan_p1 vlan_cos=0/255
state=may_dirty npu synced
statistic(bytes/packets/allow_err): org=168/2/1 reply=168/2/1 tuples=2
tx speed(Bps/kbps): 1/0 rx speed(Bps/kbps): 1/0
orgin­>sink: org pre­>post, reply pre­>post dev=141­>133/133­>141 gwy=10.10.5.40/10.118.0.1
hook=pre dir=org act=noop 10.118.0.1:25730­>10.10.5.40:8(0.0.0.0:0)
hook=post dir=reply act=noop 10.10.5.40:25730­>10.118.0.1:0(0.0.0.0:0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=4
serial=260fa445 tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x82, offload=8/8, ips_offload=0/0, epid=572/572, ipid=1005/572, vlan=0x80c0/0x83db
vlifid=1005/572, vtag_in=0x0000/0x03db in_npu=1/3, out_npu=1/3, fwd_en=0/0, qid=⅔

# Comment : in_npu = 1 / 3 => SA installed in NPU_0 (need to remove 1 !) and FWD in NPU_2 (remove 1)
# Comment : out_npu = 1 / 3 => same the opposite for the reply direction

diag vpn tunnel list

name=swan_p1 ver=2 serial=1 172.31.193.198:4500­>172.31.203.130:4500


bound_if=132 lgwy=static/1 tun=intf/0 mode=auto/1 encap=none/8 options[0008]=npu
proxyid_num=1 child_num=0 refcnt=19 ilast=7 olast=488 auto­discovery=0
stat: rxp=115539 txp=119826 rxb=17990454 txb=9601690
dpd: mode=on­demand on=1 idle=20000ms retry=3 count=0 seqno=2
natt: mode=silent draft=0 interval=10 remote_port=4500
proxyid=swan_p2 proto=0 sa=1 ref=4 serial=1
src: 0:10.10.0.0/255.255.240.0:0
dst: 0:10.118.0.0/255.255.0.0:0
SA: ref=6 options=2e type=00 soft=0 mtu=1422 expire=41635/0B replaywin=2048 seqno=800 esn=0 replaywin_lastseq=00000640
life: type=01 bytes=0/0 timeout=43147/43200
dec: spi=8004d3c7 esp=aes key=16 8538429a58d39c0f8d6dbeb563e8a44e
ah=sha1 key=20 dde915c625d2cc4f2f58f5d281cc24ab2c3eaffa
enc: spi=c9d7594b esp=aes key=16 a8821e38e70d6047ec1e18754a547da4
ah=sha1 key=20 7c69075c738e5a9ae0427eeeeacdb860f6d54a00
dec:pkts/bytes=1600/16356, enc:pkts/bytes=2048/16366
npu_flag=03 npu_rgwy=172.31.203.130 npu_lgwy=172.31.193.198 npu_selid=0 dec_npuid=1 enc_npuid=3

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 53 ­ www.fortinet.com
This command does not work for NP4

● ipsec engines

NP have multiple ipsec engines and sub­engines. For the same tunnel, a load­balancing is done across the engines. NP6 has 2 ipsec engines which
have each 8 sub­engines. (detailed in NP6 chapter)

● diag vpn tunnel list

The genuine ‘diag vpn tunnel list’ command tells if ipsec hardware acceleration is performed in NP6.

An additional line npu_flag provides NP acceleration information.

npu_flag value Meaning

00 Session is not (or not yet) hardware accelerated in NP. No SA yet pushed to NP

01 Session is only NPU accelerated in the outbound direction (encryption)


Only the outbound SA has been pushed to NP

02 Session is only NPU accelerated in the inbound direction (decryption)


Only the inbound SA has been pushed to NP

03 Session is NPU accelerated for both direction


Inbound and Outbound SA have been pushed to NP

20 IPsec SA cannot be offloaded to NPU because cipher or HMAC is not supported by


NPU (as of 5.4.1, 5.2.8)

40 NPU SA sequence number space has been exhausted. The SA should no longer be
used.

80 Dirty flag of the NPU SA is set. The SA is expiring and should no longer be used

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 54 ­ www.fortinet.com
Warning : remember that an initial packet on the direction is required to trigger the SA copy on the NP. Because of this, it is normal to have a
tunnel not showing npu_flag=0 if packets have not yet used the tunnel in each directions. Check carefully the number of packet dec and enc for both
directions, at least 1 packet should have gone through the tunnel to have a correct hardware acceleration statement for this direction.

FGT # diag vpn tunnel list


list all ipsec tunnel in vd 0
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
name=toHQ ver=1 serial=1 172.31.224.200:0­>192.168.182.30:0 lgwy=static tun=intf mode=auto
bound_if=8
proxyid_num=1 child_num=0 refcnt=7 ilast=0 olast=0
stat: rxp=6 txp=6 rxb=556 txb=504
dpd: mode=active on=1 idle=1000ms retry=3 count=0 seqno=428
natt: mode=none draft=0 interval=0 remote_port=0
proxyid=toHQ_p2 proto=0 sa=1 ref=2 auto_negotiate=0 serial=2
src: 0:10.174.0.0/255.255.254.0:0
dst: 0:192.168.255.0/255.255.255.0:0
SA: ref=5 options=0000000c type=00 soft=0 mtu=1436 expire=1493 replaywin=0 seqno=7
life: type=01 bytes=0/0 timeout=1752/1800
dec: spi=8e740fc9 esp=3des key=24 f76f0cb1516f269a1fc9f3afe2223595e74c1d33cd3ad29e
ah=sha1 key=20 1d693ddda0b86100902ec3a909fd9aa888fc5fae
enc: spi=bf97be5a esp=3des key=24 7ff6a2983e79fe2398b763642475006791819bba21cb99e4
ah=sha1 key=20 f9ec91ff5549c2f7b22b774d544a444f7346f5cc
npu_flag=03 npu_rgwy=192.168.182.30 npu_lgwy=172.31.224.200 npu_selid=1, dec:pkts/bytes=1/84, enc:pkts/bytes=6/816

Improvement in 5.4 : Since NP index has been added to tells on which NP the SA was installed (dec_npuid and enc_npuid) where :
0 means no SA copy, 1­4 is the NPU_id+1 (you need to remove 1 to the value to get NP6 id, so 5 is for NP6_4).
Because of bug (#375910) the 2 directions are inverted ! so enc is dec and vis versa (fixed in 5.6)
3810D­182 # dia vpn tunnel list name p1­44­v101
list ipsec tunnel by names in vd 0
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
name=p1­44­v101 ver=1 serial=1 2.2.0.2:0­>2.2.0.1:0
bound_if=33 lgwy=static/1 tun=intf/0 mode=auto/1 encap=none/8 options[0008]=npu
proxyid_num=1 child_num=0 refcnt=2064 ilast=7 olast=7 auto­discovery=0
stat: rxp=172930240 txp=0 rxb=239155512922 txb=0
dpd: mode=on­demand on=1 idle=20000ms retry=3 count=0 seqno=0
natt: mode=none draft=0 interval=0 remote_port=0
proxyid=p2­44­v101 proto=0 sa=1 ref=2050 serial=1
src: 0:0.0.0.0/0.0.0.0:0
dst: 0:0.0.0.0/0.0.0.0:0
SA: ref=4 options=2e type=00 soft=0 mtu=1280 expire=40253/0B replaywin=2048 seqno=1 esn=0 replaywin_lastseq=0a4ea4c0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 55 ­ www.fortinet.com
life: type=01 bytes=0/0 timeout=43175/43200
dec: spi=3ef55ca4 esp=aes key=16 8d9b784f67dd54f194c8466c0a2237ea
ah=sha1 key=20 a37578d17472ae044efc87969ce0b46f3083b8f9
enc: spi=9d214199 esp=aes key=16 4f74ec0d175d706cd52654c86b9aa1b1
ah=sha1 key=20 19490174793b766cf0692c150ec75bac6d59fa1f
dec:pkts/bytes=172926203/239149873376, enc:pkts/bytes=0/0
npu_flag=02 npu_rgwy=2.2.0.1 npu_lgwy=2.2.0.2 npu_selid=0 dec_npuid=0 enc_npuid=5
● out­of­order limitation (in ipsec context)

The load balancing across sub­engine for the packet targeted to the same SA may cause out­of­order packets situation. When the traffic flow is made of
long packets followed by short packets because short packets would take less time to process and may get out earlier than the preceding long packet.
Some workaround could be made with special images (ex: special image fg_5­0_Orange_LTE_269247/build_tag_8942 based on 5.0.10), see top3
#269247). This build introduce new CLI command to control the number of engines used for inband and outband :

config system np6


edit np6_0
set ipsec­enc­subengine­mask <engine_mask_hex>
set ipsec­dec­subengine­mask <engine_mask_hex>
end
end
<engine_mask_hex> Hexadecimal number(0x01 ­ 0xff, default 0xff).
Each bit represents one of 8 IPsec sub­engine in each of the IPsec engines (2 per NP6).
Encryption and decryption engine mask could overlap.

Note : If anti­replay is enabled for IPSEC, this should automatically configure 1 IPSEC engine for decryption, and keep the
configured ones for encryption (anti­replay bug workaround)

A follow­up bug has been opened to merge special image new cli commands:
#370586 Add CLI commands to configure limited IPSEC engine on NP6 to solve out­of­order issue (not merged as per today)

● anti­replay limitation
Significant packets dropped may occur with ipsec hardware acceleration and anti­replay enabled. This is due to a hardware limitation (#275195).
When the number of tunnel is significant (more than 50), it is not recommended to use anti­replay with NP6. If this is a strict requirement, special image
exist to limit the impact however the performance would still be a significantly degraded to ¼ or ⅕ of the performance.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 56 ­ www.fortinet.com
● Disabling hardware acceleration

● on NP (key exchange and enc/dec/hashing)

config vpn ipsec phase1(­interface) → edit <ph1_name> → set npu­offload disable

● on CP

● Disable globally CP

config sys global


set ipsec­asic­offload {enable*|disable}
end

● Disable HMAC with CP6 on decryption (CP6 only!)


CP6 does not hash on the full packet, a full inspection would require a kernel based decryption.
If this is a requirement, use the following

config sys global


set ipsec­hmac­offload {enable*|disable}
end

● Hardware acceleration table summary

excerpt from Stephane’s document:

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 57 ­ www.fortinet.com
Notes :
● NP4 can’t do SHA­2 HMAC, watchout the proposals to have traffic hardware accelerated

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 58 ­ www.fortinet.com
● Legacy NP2/NP4 constraints

Replay detection : With replay detection set, encryption/decryption and hashing may be left to Content processor (CP) or processed by NP2/NP4
depending on 'config system npu' settings and depending on the software version.

FortiGate # config system npu


dec­offload­antireplay: enable
enc­offload­antireplay: disable
offload­ipsec­host : disable

Note : enc­offload­antireplay and offload­ipsec­host must have the same value in NP2, NP4 context where this settings are used.

For NP4 : Settings applicable for NP4 between 4.3.10 and 5.2.2 (not since 5.2.3). Does not apply to NP4Lite (>= 4.3.10) and NP6 where 'config system
npu' is always ignored.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 59 ­ www.fortinet.com
Passthrough ESP session acceleration

The goal of the feature is to have NP accelerating ipsec ESP passthrough traffic. In this context, the Fortigate is not the ipsec tunnel end­point but
just sitting in between 2 ipsec gateways without nat. The fortigate sees incoming ESP packets (ip proto 50) and need to process this traffic and
egress on the interface towards the destination ipsec gateway. Originally ESP passthrough traffic was not eligible for hardware acceleration
This was added as of 5.2.2 and is supported since 5.4 GA

History:
­ This feature was originally implemented on NP4 through a special image (#229874)
­ It was also implemented in NP6 (#253221)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 60 ­ www.fortinet.com
inter vdom (npu­vlink) traffic acceleration
Inter­vdom links were available since a long time on FortiOS, they allow traffic to transit from a vdom to another one via a logical interface handled by
the kernel and therefore can’t be hardware accelerated which is a big problem for performance in MSSP scenario.
Hardware acceleration inter­vdom link, called ‘npu­vlink’ has started since NP4 and was enhanced in NP6.
Genuine non­accelerated ‘vdom­link’ should be avoided when NPs are available on the unit.

● concept

● There is only 1 npu­vlink point­to­point interface existing per NP.


Each side of the point­to­point is noted vlink0 and vlink1.
○ example for NPU id 0 : the pair is npu0_vlink0 <­> npu0_vlink1

● To interconnect multiple vdoms, you need to create vlan interfaces based on the npu_vlink (virtual npu_vlink). Both ends of the virtual npu_vlink
should be on the same vlan.
○ example npu0_vlink0_100 (on vlan 100, based on interface npu0_vlink0)
and npu0_vlink1_100 (on vlan 100, based on interface npu0_vlink1)
● associate a vdom to each virtual npu link
● See also “Configuring Inter­VDOM link acceleration with NP6 processors” in Fortigate­hardware­acceleration.pdf” for more details

config system interface edit "npu0­vlink0_100"


edit "npu0­vlink0" set vdom "vdom_A"
set vdom "root" set ip 1.1.1.1 255.255.255.252
set status down set snmp­index 35
set type physical set interface "npu0­vlink0"
set snmp­index 33 set vlanid 100
next next
edit "npu0­vlink1" edit "npu0­vlink1_100"
set vdom "root" set vdom "vdom_B"
set status down set ip 1.1.1.2 255.255.255.252
set type physical set snmp­index 36
set snmp­index 34 set interface "npu0­vlink1"
next set vlanid 100
next
../.. end

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 61 ­ www.fortinet.com
● NP6 implementation

If the external ports from the 2 connected vdoms with npu­vlink are on the same NP6, there is no packet leaving the
NP6 before the packet egress from the second vdom. In the example above, if ‘port1’ and ‘port2’ are both attached
to the same NP6, once the session is accelerated, a packet would enter the NP6 by port1 and be processed for both
vdoms ‘vdom_A’ and ‘vdom_B’ without leaving the NP6 at all. It would only get out from NP6 to egress on port2.
Of course if port1 and port2 would be linked to 2 different NP6s, packet would need to reach the second NP6 through
the NP6 to NP6 link provided by the ISF. The same would also apply if the traffic ingress or egress from LAG
interfaces where ports are distributed amongst different NP6s.

NP6 load distribution in MSSP context :


In MSSP context with a lot of vdoms and npu­vlinks attachments, it is recommended to create
virtual npu_vlink (vlan based) on all available NP6s and distributes the virtual npu_vlink interface
pairs from the different NP6 to vdoms. Doing this would distribute npu_vlink processing effort on all
the NP6, instead of one single NP6 loaded by receiving all the traffic

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 62 ­ www.fortinet.com
NP6 : which XAUI on egress when reaching npu­vlink from a different NP6 ?
In the context of the example just above (using npu­vlink from different np6 to distribute vdoms connections), the question is :
If the session is passing through 2 NP6 using npu­vlink and the npu­vlink is delivered by the second NP6, which XAUI will be used on the first NP6 to
reach the second one ? (see diagram).

The flow is :
­ session is accelerated on a 1st vdom process on NP6_0 (because the ingressing port is attached to NP6_0)
­ session is pushed via an npu­vlink delivered by NP6_1. For this, packet need to egress on NP6_0 XAUI to reach ISF
⇒ (question 1) which XAUI is used to egress on first NP6 ?
­ packet coming from ISF enters NP6_1.
⇒ (question 2) Which XAUI is used to ingress on second NP6 ?

Tested with a FortiGate­1500D with 2 pairs of ports [port33, port38] and [port35, port38]

FG1K5D­7 (global) # diagnose npu np6 port­list


Chip XAUI Ports Max Cross­chip
Speed offloading
­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­
np6_0 0 port1 1G Yes np6_1 0 port9 1G Yes
0 port5 1G Yes 0 port13 1G Yes
0 port17 1G Yes 0 port25 1G Yes
0 port21 1G Yes 0 port29 1G Yes
0 port33 10G Yes 0 port37 10G Yes
1 port2 1G Yes 1 port10 1G Yes
1 port6 1G Yes 1 port14 1G Yes
1 port18 1G Yes 1 port26 1G Yes
1 port22 1G Yes 1 port30 1G Yes
1 port34 10G Yes 1 port38 10G Yes
2 port3 1G Yes 2 port11 1G Yes
2 port7 1G Yes 2 port15 1G Yes
2 port19 1G Yes 2 port27 1G Yes
2 port23 1G Yes 2 port31 1G Yes
2 port35 10G Yes 2 port39 10G Yes
3 port4 1G Yes 3 port12 1G Yes
3 port8 1G Yes 3 port16 1G Yes
3 port20 1G Yes 3 port28 1G Yes
3 port24 1G Yes 3 port32 1G Yes
3 port36 10G Yes 3 port40 10G Yes

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 63 ­ www.fortinet.com
Answers are :

­ Question 1 : XAUI for egress is the same as the one used to ingress on first NP6
­ Question 2 : XAU for ingress on the second NP6 has the same ID as the XAU used for egress on the first NP6 :
Example port33/port38 : XAUI 0 is used to egress on NP6_0 so XAUI 0 is also used for ingress on NP6_1
Example port35/port38 : XAUI 2 is used to egress on NP6_0 so XAUI 2 is also used for ingress on NP6_1

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 64 ­ www.fortinet.com
● limitations

­ #383624 multicast traffic on npu­vlan cause PBA leak.


Hardware bug, workaround by disabling multicast acceleration on npu vlink
⇒ fixed in 5.6.0, 5.2.9 covered by 282472
⇒ fixed in 5.4.6

● NP4 implementation

The NP4 implementation requires the packet to be sent to the ISF even if the ingress and egress ports or connected to the same NP4.
In NP4 implementation, the npu­vlink point­to­point is a logical interface where each end is attached to an NP4 10G port.

● References :
­ Expert Academy 2016

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 65 ­ www.fortinet.com
IPS traffic fast­path (N­turbo acceleration) IPSA (IPS acceleration)
Keep in mind that N­Turbo is essentially a Software solution based on the kernel.
The rule of the NP/Soc is only reduced to pushing the packet from traffic flows eligible for IPS/N­Turbo acceleration to dedicated channels on the
Host interface instead of the “per­default” channel to the kernel. From this, traffic is handled in a different way by the kernel that allows a fast­path
and a better distribution to ipsengine processes.
The main focus of N­Turbo is to increase the IPS processing performance by distributing the cost of processing to different CPU cores. One of the
idea is to avoid using the same CPU core used for IRQs and ipsengine processing. For this, a lot of attention is given to the balancing of IRQs used with
N­Turbo ips acceleration across the CPU core. Each platform has its own hardware characteristics such as different types and number of processors
having a different number of cores, use of hyperthreading or not, different type and number of hardware acceleration chips… The consequence is that
getting the best performance of ips acceleration for each platform requires different choices of settings per platform such as :
more or less CPU cores (and therefore Host interface channels) used for packet transfer between NP6 to Kernel and kernel to NP6, more or less
ipsengines processes each one bound to a dedicated CPU...

This topic has been covered in one of the chapter from Expert Academy 2016 Support Team section where the overall solution is explained, please
refer to Expert Academy 2016.

● NP6

See N­Turbo NP6 IRQ mapping for the NP6 contribution to the N­Turbo based IPS acceleration.

● SoC3

The SoC3 integrated chip has all the requirements to allow ips N­turbo acceleration making it available to SoC3 based ‘E’ platforms such as :
FortiGate­60E, FortiGate­90E, FortiGate­100E, FortiGate­200E,...

● Legacy NP4

The first N­Turbo acceleration has started on NP4 enabled platforms such as FortiGate­3240C, FortiGate­3600C, FortiGate­5001C however NP4 had a
weakness : it can only be bound to 8 IRQs and more is required to use dedicated channels on the Host interface.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 66 ­ www.fortinet.com
Because of this, it is not possible to perform ips acceleration with an NP4. The solution used is to let another additional chip with large IRQ capability in
charge of pushing packets to the kernel through the dedicated host interface channels. This chip is an Intel 82599 10G interface with its XAUI
connected to the switch fabric. NP4 only has to transfer packets to it via the ISF.

The following diagram compare the 2 different architectures : The NP4/Intel 82599/ISF solution is on the left, the native NP6 solution is on the right.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 67 ­ www.fortinet.com
HA A­A load­balancing
This feature is available since the first NPs. It is made to offload the CPU of the master unit from an active­active cluster. In active­active
load­balancing all traffic received on the cluster reaches first the master unit. The master unit would then retransmit eventually the packet to one of the
active slaves for load­balancing purpose. For the packet retransmission from Master to Slave the master changes the source and destination mac
address so the source­mac is the master interface mac address sending the packet, and the destination mac address is the targeted slave real mac
address (instead of the HA virtual MAC originally used).
When the ingressing packet reaches an NP interface, the kernel creates a session and sends a request to the NP to position the forward entry so
packets are retransmitted to the designated slave.

­ First packet goes to kernel


­ The kernel creates the session and designate a cluster member to process the traffic
­ Kernel retransmit packet to the designated member using the ingress interface.
­ Kernel programs the NP to offload the retransmission of the following packets

­ The retransmission of the following packets of the sessions is handled by the NP until
the session is deleted.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 68 ­ www.fortinet.com
Traffic shaping
● Concept

NPs may have built­in basic traffic shaping engines. Their only purpose is to set a traffic cap for packet rate.
No real shaping is done in a sense that packets are not delayed, the feature is much closer to traffic policing where packets are dropped to adjust the
rate to the threshold.

There are two different independent shapers configured from the same common CLI commands:

● the genuine kernel shaper


● NP shapers

Both are doing their work independently from the other one. The kernel shaping has more options, such as bandwidth reservation or priority queuing,
interface base limit, which are not available on the NP shaper. The policing algorithms are also different so a session may be shaped with a different
pattern when it becomes accelerated.
Note from #373203 :

The difference between software and NP is shaper implementation in software has more buffer to smooth out the traffic burst. NP drops packets when traffic rate is higher than
configured value without any buffering mechanism. Packet drop will trigger TCP congestion control. Host TCP stack will lower the flow to half of original throughput or lower.
This is a hardware limitation. Please disable npu offloading in policy for shaper when the bandwidth limit is low

Using NP shaping should only be used in simple traffic policing scenario if required.
NP measure traffic rate by using shaper objects which are limited resources on NPs and can be overrun (for instance if per ip shaper is used).

When more control is needed on shaping, the only way is often to get to bypass NP hardware acceleration so only the kernel shaper is used. This is the
only way if the flows to shape are not all ingressing/egressing on pairs of interface connected to the NP

● Forwarding sessions

Just like IPSec, traffic shaping in NP6 can only be done be done on one single SA. All packets from different ports sharing the same shaper have to be
sent to the same NP6 for an accurate accounting. This is done by installing forwarding sessions.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 69 ­ www.fortinet.com
Syn proxy
is the first NP to implement a SYN proxy feature. This feature was available on legacy SP2,SP3 chips, it is configured the same way.
The configuration is the same as a genuine DDoS profile with the addition of action=’proxy’
A threshold has to be defined, it defines at which packet rate the syn proxy feature should get activated.

● Generic SYN proxy principle The FortiGate is acting as a proxy for 3­way handshake SYN, SYN/ACK, ACK packets. It provides a better protection
against SYN/Flood attack compared to DoS action ‘block’. Legitimate tcp connection with a proper handshake are allowed, even if their connection rate
is higher than the defined threshold while SYN attack packets are dropped.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 70 ­ www.fortinet.com
● references
● #218425
● #272927 (lifetime of proxy session can be defined in config system np6 → edit np6_x → set garbage­session­collector enable & set
session­collector­interval 8)
● #370592 DOS­Profile using parameter "tcp_syn_flood", option "set action proxy" sets TCP window size value to 0 and no options in :
FGT send SYN/ACK with windows size is 0, and no option. that because you enabled synProxy, FGT as a "man" in the
middle.
1. FGT/NP6 received a SYN, it will send a SYN/ACK to client, next if Client send back a ACK, NP6 will do synflood
checking, if it is not attacking. Then FGT will initiate a SYN to real server. if it is attacking, then No SYN was
send to server.
1. WindowSize was set to 0, this to prevent client send any data pkt, AS before NP6 check the ACK packet.
the real connection to server has not been established.
2. No option at first SYN/ACK, this because before SYN was snd to server, FGT doesn't known what option server will
response.
3. After ACK from client passed NP6 checking, NP6 initiate SYN to server, server will response real SYN/ACK,
now FGT known all option and window size, then FGT will send final correct SYN/ACK PKT to client.

● Configuration example
config firewall DoS­policy
edit 1
set interface "port5"
set srcaddr "all"
set dstaddr "all"
set service "ALL"
config anomaly
edit "tcp_syn_flood"
set status enable
set log enable
set action proxy <­­­ new option
set threshold 1 <­­­ unack'd syn threshold
next
end
next
end

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 71 ­ www.fortinet.com
● monitoring

Syn proxy rules can be monitored from the GUI

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 72 ­ www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 73 ­ www.fortinet.com
HPE protection
● concept

This feature was introduced by top3 #363398 as a workaround to limit kernel impact on DDoS attack on an SLBC cluster.
The concept is to apply a traffic policer on the NP towards the host interface (path to the kernel, via PCI bus), protect the kernel from bursts of packets
that may affect the unit stability. Obviously, the hpe policer is not applied on hardware accelerated traffic.
It could be used to recover a working access or troubleshoot a unit under dos attack.
There are several queues on host interface, all the queues are considered.
The feature was originally planned from 5.4.3 but has been only merged in 5.6.0.
Available via special image in 5.4.2 #395452 (fg_5­4_HPE/build_tag_9739) and #441731 (fg_5­4_orange_gi/build_tag_3250)
There are no logs generated when the level is reached, only dce counter ‘diag npu np6 dce­all <np_id>→TPE_HPE’ would increase.

● configuration

The configuration is CLI based only, applied on the targeted NP6 :

config system np6


edit "np6_0"
config hpe
set type­shaping­tcp­max 15000
set type­shaping­udp­max 10000
set type­shaper enable
end
next
end

● configurable traffic options

type­shaping­tcpsyn­max NPU HPE shaping based on the maximum number of TCP SYN packets received (10000 ­ 10000000000 pps, default = 5000000).
type­shaping­tcp­max NPU HPE shaping based on the maximum number of TCP packets received (10000 ­ 10000000000 pps, default = 5000000).
type­shaping­udp­max NPU HPE shaping based on the maximum number of UCP packets received (10000 ­ 10000000000 pps, default = 5000000).
type­shaping­icmp­max NPU HPE shaping based on the maximum number of ICMP packets received (10000 ­ 10000000000 pps, default = 1000000).
type­shaping­sctp­max NPU HPE shaping based on the maximum number of SCTP packets received (10000 ­ 10000000000 pps, default = 1000000).
type­shaping­ipsec­esp­max NPU HPE shaping based on the maximum number of IPsec ESP packets received (10000 ­ 10000000000 pps, default = 1000000).
type­shaping­ip­frag­max NPU HPE shaping based on the maximum number of fragmented IP packets received (10000 ­ 10000000000 pps, default = 1000000).
type­shaping­ip­others­max NPU HPE shaping based on the maximum number of other IP packet types received (10000 ­ 10000000000 pps, default = 1000000).
type­shaping­arp­max NPU HPE shaping based on the maximum number of ARP packet types received (10000 ­ 10000000000 pps, default = 1000000).

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 74 ­ www.fortinet.com
type­shaping­others­max NPU HPE shaping based on the maximum number of other layer 2 packet types received (10000 ­ 10000000000 pps, default = 1000000).

● references

● #363398 (top3 feature introduction)

● #379835 (HPE does not consider hardware accelerated traffic, as expected)

● #384692 (HPE traffic policer is not reported in "diagnose npu np6 npu­feature")

● #384699 (shows type of pattern and default values)

● #385200 (HPE and 3810D all­interface­in­a­lag architecture)

● #389845 (HPE shaper does not distinguish base and fabric interface in SLBC)

● #384617 (HPE shaper dropping N­Turbo traffic)

IPv4 multicast session acceleration

The Session Search Engine component of the NP is at the heart of it by referencing a chain of interfaces where packet should be sent.
There is a current limitation to 256 destination for 1 multicast session (a top3 case increase to 20k, reference to find)

References : top3 #272428 (add NP6 hash algorithm with src/dst ipaddr to assure same multicast flow to same cpu core)

● limitations :
○ #383624 multicast traffic on npu­vlan cause PBA leak
⇒ Fix : Disable multicast offloading across npu inter­vdom link (see NP6 limitations)

IPv6 multicast session acceleration


to be developed

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 75 ­ www.fortinet.com
SCTP traffic hardware acceleration
to be developed

CAPWAP data (not DTLS) hardware acceleration (259431)

to be developed

● references

● #259431

fp­anomaly
NP have a simple built­in packet anomaly engine providing protection from a few well­known attacks based on malformed or unexpected types of
packets. The configuration is CLI only. per interface, under config system interface (set fp­anomaly […]).
It is available for both IPv6 (NP7 only) and IPv4 and is per default disabled.

NP6 unit based configuration


config system np6
config fp­anomaly­v4
set icmp­frag {allow | drop | trap­to­host}
set icmp­land {allow | drop | trap­to­host}
set ipv4­land {allow | drop | trap­to­host}
set ipv4­optlsrr {allow | drop | trap­to­host}
set ipv4­optrr {allow | drop | trap­to­host}
set ipv4­optsecurity {allow | drop | trap­to­host}
set ipv4­optssrr {allow | drop | trap­to­host}
set ipv4­optstream {allow | drop | trap­to­host}
set ipv4­opttimestamp {allow | drop | trap­to­host}

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 76 ­ www.fortinet.com
set ipv4­proto­err {allow | drop | trap­to­host}
set ipv4­unknopt {allow | drop | trap­to­host}
set tcp­land {allow | drop | trap­to­host}
set tcp­syn­fin {allow | drop | trap­to­host}
set tcp­winnuke {allow | drop | trap­to­host}
set tcp_fin_noack {allow | drop | trap­to­host}
set tcp_fin_only {allow | drop | trap­to­host}
set tcp_no_flag {allow | drop | trap­to­host}
set tcp_syn_data {allow | drop | trap­to­host}
set udp­land {allow | drop | trap­to­host}
end

config fp­anomaly­v6
set ipv6­daddr_err {allow | drop | trap­to­host}
set ipv6­land {allow | drop | trap­to­host}
set ipv6­optendpid {allow | drop | trap­to­host}
set ipv6­opthomeaddr {allow | drop | trap­to­host}
set ipv6­optinvld {allow | drop | trap­to­host}
set ipv6­optjumbo {allow | drop | trap­to­host}
set ipv6­optnsap {allow | drop | trap­to­host}
set ipv6­optralert {allow | drop | trap­to­host}
set ipv6­opttunnel {allow | drop | trap­to­host}
set ipv6­proto­err {allow | drop | trap­to­host}
set ipv6­saddr_err {allow | drop | trap­to­host}
set ipv6­unknopt {allow | drop | trap­to­host}
end

NP4 unit based configuration


Before NP6, fp­anomaly where defined in the interface section :

config system interface


edit <port­name>
set fp­anomaly <anomalies>
../..
next
end

● references

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 77 ­ www.fortinet.com
● See hardware acceleration guide (docs.fortinet.com)

Deprecated : IPS anomaly and signature (XH0/XG2)


SP2 and SP3 based unit/card are deprecated. They are not anymore supported in 5.4 platforms.
Please refer to old version of hardware_acceleration document for more details.

Features breaking hardware acceleration

It is well known that proxy based UTM breaks hardware acceleration, there are however more features, not so obvious that would lead to the same
effect. They may be configured at different level like on the interface, on the policy or on a dedicated configuration statement. A new feature introduced
in 5.4 adds a new line “no_ofld_reason” in the session list to provide more information on the reason why the session is not offloaded.

session info: proto=6 proto_state=01 duration=891 expire=3593 timeout=3600 flags=00000000 …


origin­shaper=
reply­shaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=may_dirty
statistic(bytes/packets/allow_err): org=72342/1107/1 reply=1440409/968/1 tuples=2
tx speed(Bps/kbps): 81/0 rx speed(Bps/kbps): 1615/12
orgin­>sink: org pre­>post, reply pre­>post dev=66­>69/69­>66 gwy=192.168.0.198/192.168.201.254
hook=pre dir=org act=noop 192.168.1.2:10001­>192.168.0.198:80(0.0.0.0:0)
hook=post dir=reply act=noop 192.168.0.198:80­>192.168.1.2:10001(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=1
serial=00124576 tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x020000
no_ofld_reason: sflow

Note : there were actually 2 steps leading to the “no_ofld_reason”, the first step introduced in #245447 added a line in session list like :
NPU driver internal error: code=7. <­­ this line shows why np4 is not offloaded.
But the error code was a bit cryptical and was later changed to the more user friendly ‘no_ofld_reason:’ line ( mantis reference ?)

This section tries to summarize this cases with more details. This section requires more testings and update. Inputs are welcome !

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 78 ­ www.fortinet.com
Feature / conditions no_ofld_reason configuration sample or diag command status Mantis #

session is dirty dirty Routing and/or config change while session is in not verified #381788
established state and no new revalidation packet
seen yet.

Session not in established state not­established TCP session is not in its established state not verified #387310
(proto_state=01)

ESP acceleration not supported. offload­denied NP4 and Soc3 (NP4Light) don’t support ESP hardware confirmed #310606,
Protocol is not supported for Offload acceleration by design (NP4 has special build #308902
#229874).
A protocol is not suitable for offload

Access from SSLVPN portal, local Session is accessed through an SSL portal (case1) not verified #387629,
Explicit proxy involved Explicit proxy is used on the FortiGate (case2) #377926

diag temporary bypass ? diag

Hw acceleration has been disabled on disable­by­policy Hardware acceleration has been disabled by policy not verified #386626
the policy. config system interface
edit <intf>
set auto­asic­offload disable

Softswitch is used disabled­by­policy config system switch­interface comfirmed #387500


non­npu­intf edit "soft­switch" #460607
set vdom "vdom1"
set member "port18" "port20"

interface based bypass

Session is inspected by ipsengine redir­to­ips A flow based profile is applied and n­turbo not verified #377711
(Signature., App control,...) without acceleration does not apply to the device
n­turbo acceleration possible

sflow (1) sflow config system interface confirmed


edit DMZ_FTS
set sflow­sampler enable
end

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 79 ­ www.fortinet.com
Device authentication (src­visibility) mac­host­check The session is inspected for source visibility. It #355970
may becomes hardware accelerated when the device
type has been identified

Offload is disabled because of a offload­denied A session helper is involved. Seen with GTP­C traffic on FortiCarrier not verified #378910
session helper helper : GTPu is disabled when logging is enabled on GTPc or GTPu

Comments:
● (1) sflow : can’t be accelerated because sflow requires periodic traffic sampling that can only be done in kernel

● Some examples of no_ofld_reason seen that need to be added in the table:

no_ofld_reason: redir­to­ips denied­by­nturbo (#389743)


ofld_fail_reason(kernel, drv): not­established/not­established, none(0)/none(0) npu_state_err=04/04 (#389708)
no_ofld_reason: block­by­ips redir­to­ips (#300846)
no_ofld_reason: npu­flag­off (#308902)
no_ofld_reason: block­by­ips disabled­by­policy redir­to­ips denied­by­nturbo (#385338)
no_ofld_reason: redir­to­av disabled­by­policy (#371526)

● Related references #245447

­ “If the session flags contain any of the following then (ignoring turbo mode IPS) then session offload will not be offloaded :­
'redir' ­ some kind of proxying,
'auth' ­ (firewall) authentication
'src­vis' ­ device detection
'ndr' ­ IPS
'nb' ­ IPS (block)
'nds' ­ IPS
'ndri' ­ IPS (interface­based)
'os' ­ traffic shaping (but see below)
'rs' ­ traffic shaping (but see below)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 80 ­ www.fortinet.com
­ does ingress and egress device support offload
­ if traffic shaping enabled does the NPU support that specific type of traffic shaping, some do and some (older ones) do not.
­ if traffic is locally terminated GRE then no offload
­ If IPsec is involved does the NPU support the chosen cipher suite

There is also more transient reasons for offload failing which usually only affect the first few packets :­

­ has the destination MAC address being resolved.


­ depending on NPU TCP handshake not offloaded to ensure simultaneous connect is not abused.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 81 ­ www.fortinet.com
NP chips
NP6

From the outside


In this part the NP6 is considered as a black box and the focus is on its connectivity with its surrounding neighborhood and different types of integrations of
NP6s in FortiGate platforms.

Form factors
NP6 chips comes in 2 different form factors depending on its connectivity.
The first form is made of 4x 10G, it is more tailored for high end unit where the 10G ports are attached to an ISF.
The second, made of 3x 10G + 16x1G, allows a more direct attachment of the FortiGate ports to the NP6 without switch fabric. It is more used in mid­range
units.
An NP6 is first of all a NIC : It generates interrupts when packets comes in so both form factors have the same BUS connectivity to the CPU core via a
dual PCI­e bus allowing a potential distribution to up to 256 CPU cores.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 82 ­ www.fortinet.com
Integration

The public Fortinet Hardware Acceleration guide v4.1 document contains all FortiGate platform architectures so we won’t repeat them in this document.
Instead, we will focus on some representative platform and depict their specificities. This chapter is split in two parts : integration with and without switch
fabric.

Integration with a switch fabric


Switch fabric are generally needed on high­end unit where high performance is expected with multiple NPs on 10G interfaces. It is necessary when the
platform has 40G or 100G ports.

● FortiGate­3700D

The FortiGate­3700D is a good example of multi­NP6 platform that also include the ‘low­latency’ feature.
It has the following synoptic :

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 83 ­ www.fortinet.com
● CPU mapping scheme

­ The broadcom chip switch fabric (ISF) is attached to all FortiGate front ports PHYs not shown in the diagram)
­ FortiGate­3700D is composed of 4x NP6 in 4x10G form factor with each XAUI attached to the ISF
­ Each NP6 has 16 host queues, each one attached to a CPU block #1 core. CPU block #1 is reserved for NP interrupts processing
­ The second CPU block #2 is free from any interface interrupts

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 84 ­ www.fortinet.com
● Port mapping scheme

­ Traffic from each port is bound to a single NP6


­ Has 4x 40G interfaces, each one requires a quad attachment to ISF (4x10 = 40G). In this case, the 4 ports of the ISF are
bundled as a LAG which is made transparent to users.
­ Maximum port speed is 40G (bundle of 4x10G port on the same NP6)
­ Possible oversubscriptions on some ports linked to the same NP6 XAUIs
­ Possible unbalanced configuration with some NP6 heavily used and some barely used

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 85 ­ www.fortinet.com
● FortiGate­3810D

­ FortiGate­3810D is the first platform to implement 100G ports


­ All 8x NP6 XAUIs are bundled in 32 ports aggregate
­ Traffic from one interface is handled by every NP6
→ Difficult to trace where traffic goes
→ NP6 Dos meters independant by NP6

(from ‘fortigate­hardware­acceleration­54,pdf’)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 86 ­ www.fortinet.com
● FortiGate­1500DT

This platform is similar to the classical 1500D but has 4x10G RJ45 copper ports

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 87 ­ www.fortinet.com
Integration without a switch fabric

This was reserved for medium range unit, however we also see it the 2500E with 4xNP6. The removal of the switch fabric would slightly
decrease the packet delay. Because the switch chip offers buffering and flow control features, traffic received on NP6 chips coming from port
without crossing ISF is likely to be more bursty and more susceptible to packet loss because of more tension on the NP6 queues.
Another consequence is the impossibility to create a lag of ports attached to different NP6s.
For unit with multiple NP6 without ISF, inter­NP6 port acceleration may importante limitations : If linked with a 10G XAUI together, there is a
potential oversubscription (FortiGate­900D). If not linked at all (FortiGate­2500E), the hardware acceleration is not possible.

● Design limitations :
○ (no reference) More susceptible to packet loss with bursts (no buffering and flow­control on ISF ship)
○ #290597 Cross­np6 link aggregation of redundant interfaces are not allowed
Note : Data sheet of those platform should have “small prints” at the end about this. This is also mentioned in official hardware guide
○ (no reference) No possible hardware acceleration if NP6 are not interconnected (FortiGate­2500E)
No reference, but this is obvious from the design.
○ #300206 Packet loss on 1G port directly attached to NP6. Mitigation ECO with dedicated queue and shaping.

● FortiGate­900D

FortiGate­900D is top mid­range level unit made of 2x NP6s and offering


10G ports. Its design does not include an ISF.
As mentioned in chapter “ipv4 unicast session acceleration” there is a risk to
overrun the inter­NP6 10G XAUI.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 88 ­ www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 89 ­ www.fortinet.com
● FortiGate­1000D

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 90 ­ www.fortinet.com
● FortiGate­2500E

Comments:
­ Combines the 2 NP6 form factors
­ Has a bypass module on 2 LC connectors where the fibers is directly connected (no SFP module needed)
­ no inter­np6 interconnections

Mantis references:
● #375609 Merge FGT­2000E/2500E to v5 trunk (schedule 5.6.0)
NPI with 5.4.0 GA, with special

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 91 ­ www.fortinet.com
● Other platforms without ISP but with one single NP6.
This platforms don’t have restriction on LAG because of the single NP6.
They still suffer from tension on NP6 1G interfaces (SGMII interface for 1G, XAUI for 10G)

Mantis references:
○ #300206 Packet loss on 1G port directly attached to NP6. Mitigation ECO with dedicated queue and shaping
○ #389858 ICMP ping lost once traffic is offloaded to NP6 in FGT500D (same cause as #300206)

For other platforms, please refer Official Hardware Guide section “FortiGate NP6 architecture”.

­ FortiGate­300D
­ FortiGate­400D
­ FortiGate­500D
­ FortiGate­600D

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 92 ­ www.fortinet.com
NP6 Performance figures

● 40 Gbps forwarding throughput (~30Gbps at 64 bytes)


● more than 10 million sessions
● 25 Gpbs throughput IPSec ESP encryption/decryption (AES256/SHA­1)
● Lines rate capabilities between 2 ports on the same NP
○ At any pkt size
○ At any numbers of sessions (30M pps at 10M sessions is expected per NP)
○ The CPS between two ports used all available cpu.
● Numbers (source from Tiger Team)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 93 ­ www.fortinet.com
Improvements from NP4

● IPv6 traffic hardware acceleration


● Tunneled traffic hardware acceleration : v4 → v6, v6 → v4, v4 → v4, v6 → v6
● UDP/TCP translation : v4 → v6, v6 → v4, v4 → v4, v6 → v6
● Multicast traffic hardware acceleration
● SCTP traffic hardware acceleration
● CAPWAP data (not dtls) hardware acceleration
● IPSEC SHA2­256 and 512
● SYN proxy host and server protection
● Accurate per session accounting capability (see previous chapter)
● Inter vdoms internal acceleration: Unlike NP4 which is using the ISF to handle npu­vlink (1 x 10G port from each core), NP6 has a built­in
“loopback” interface that can re­inject the packet from 1 vdoms to the other one without using XAUI.

Internal improvements:

● full IRQ mapping on each np6 ports: No need to choose port anymore to ensure the max CPU availability. All NP have the same IRQ/ cpu
mapping providing a linear cpu usage across the load and the NP
● No need of extra Intel chip to provide Nturbo services : The NP6 is able to interrupt directly the cpu for user space processes like ipsengine.
● Hardware acceleration lag helper : Unlike NP4, NP6 allows to keep offloading sessions (to other NP6) in case of lag members loss. Traffic is still
redirected to original NP6 port by broadcom after failover.
● Session entry purging to accommodate routing change (any reference here ?)
● Reversible hash in RX/TX queues : Both direction of the same session is tight to the same cpu/core
● 4 level priority queues inside NP6 (NP4 has 2 only) mapped as follow:
○ Priority 3 (highest) : control plane traffic (ARP, OSPF, BGP, IKE, etc)
○ Priority 2 : data plane traffic control packet (ICMP, TCP RST, etc)
○ Priority 1 : high priority data traffic
○ Priority 0 (lowest) : normal data traffic
● Session push packet follows data packet path:
In NP4, a dedicated session management queue exists, interrupts mapped to 1 CPU core :
­ Session creation and packet data coming from 2 different sides
=> potential loss of sync of half loss

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 94 ­ www.fortinet.com
­ No CPU distribution on the command queue
In NP6 session creation command precedes the data packet path, following the same path
­ improved data / command synchro
­ session creation CPU load distributed on all NP6 cores
● IP fragments with multiple core distribution : All received IP fragments on NP6 are sent to the kernel. Unlike NP4, NP6 uses all RX CPU queues
to distribute ip fragments to kernel (#401333)

● improved debug commands:


○ Anomaly, host and forward drop
○ Session stat and distribution across SSE
○ Pkt receive and sent per ports and per SSE
○ Port vs NP mapping
○ All counters can be clear to gives more visibilities

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 95 ­ www.fortinet.com
From the inside

Functional blocks

Like other NPs, the NP6 is internally architectured around different functional blocks. A functional block implements a set of functions organized around
a common mission.

There are 5 majors groups of functional blocks in NP6 :

● Interface group TX :
Deal with packets received from the outside world, either from ports or kernel.

● Switching group :
Lookup for session and dispatching

● Service group :
Packet processing for special services

● Interface group RX :
Processing the required steps when packets leave NP6, either to external interfaces or kernel via Host interface

Reminder about TX/RX : this is seen “from the outside world”

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 96 ­ www.fortinet.com
● global view

The following diagram shows a high level view of groups and functional block

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 97 ­ www.fortinet.com
● Detail of Interface group TX

All packets entering the NP6 have first to go through this block. This block has two parallel paths. One path (on the left side) is dedicated to packet
received from the ‘Host interface’. This corresponds to all packets arriving from the kernel via the PCI­e bus. The other path (on the right side), is
dedicated to packets received from the XAUI interface of the NP6. There are several possible sources for the XAUIs : The ISF, a direct external
interface attachment, another NP6 directly attached. Depending on the source, different kind of processing is required with common missions : identify
the sender and translate the source into a LIF (Logical InterFace), sanity check the packet format to protect the NP from fuzzing attack, apply
eventually well known anomaly checks and finally register the packet inside the NP6 by copy its content to memory and create a ‘Packet
Descriptor’ based on the L2,L3,L4 packet headers (see Internal and external data transmission).

When packet comes from the Host interface, it contains information about what processing are required in NP6. This allows the NP6 to forward the
packet applying the requirement without extra processing work. There is 1 ITP and 1 IHP per XAUI : ITP0­ITP3 , IHP0­IHP3 and 2 HTX HTX0, HTX1,
one for each PCI­e bus
Details of functional­groups : HTX, ITP, IHP

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 98 ­ www.fortinet.com
● Detail of Switching group

The switching group is the heart of the NP6, this is where received packets are hashed and session matching lookup is done with NP6 primary
and secondary table (see ipv4 unicast session acceleration). If no session match is found, the packet would be routed over to the kernel via the
Host Interface. In case of a match, the processing of the packet will depend on the ‘action’ field of the session. Packet may be dropped, may be
routed towards an external interface, may be routed towards a special service block for special processing, or may be sent to specific queues on
the Host interface corresponding to special handling like N­Turbo.

The switching group is composed of 5 main functional block : ISW, FDB, SSE1, SSE2, OSW

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 99 ­ www.fortinet.com
The ISW (Inbound Switch) is the entry point, it has dedicated queues for each sources and can prioritize packet on the queues. Packets from the
kernel containing pre­defined routing information may be sent to the Packet Forward Engine (FDB) acting as a fast­path dispatching with little
processing required on the packet. The ISW also distributes the packets to two Session Search Engines based on a packet 5­tuples hash.

Each Session Search Engine (SSE1 and SSE2) is independent from the other one, each one has access to a dedicated fast memory DDR3 RAM
where session tables are stored. It hashes the Packet Descriptor received on their receiving PDQ to form a session key. The key is looked­up in
the primary session table. If more than one match exist, the overflow table will have to be checked until the session corresponding the the PDQ
details is found. When found, the action and other information and flags for the session are retrieved and packet sent to OSW block.

The OSW (Outband Switch) goal is to route the packet to the next required processing block, following the order received either directly by
kernel (path via FDB), or order received by the SSE. The routing is based on the destination LIF. It also has the mission to perform the trunk­to­port
resolution. This is done when the destination LIF references a trunk instead of a port. In this case, a hashing is done using the programmed algorithm.
If no natting is done on the function, the algorithm used tries to use the same interface for egress and ingress.

OSW also contains the TPE (Traffic Policy Engine) that get’s optionally involved priori any switching function whenever traffic shaping or
accounting is requested for the session. The TPE manages indexed tables of shapers and counters. One session may have multiple indexes, for
instance a session counter and a per­ip shaper.

A “Loopback” path allows to re­inject packets from OSW back to ISW. This is used when npu­link interfaces are used as target to perform the
hardware accelerated vdom link function.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 100 ­ www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 101 ­ www.fortinet.com
● Detail of Service group

The Service groups regroups functional blocks performing packet transforms actions like tunneling, translation or further inspection function like SYN
proxy. Each individual functional block is tight to its dedicated mission. For some of them, like IPSec encryption/decryption, the packet PDQ is not
enough for the job and an access to the packet payload in the memory buffer is required.

Note : in the case of the FortiGate­3700DX unit, we could say that the extra functions delivered by the TP2 FPGA like GTP inspection, can be seen as
Service group function block located outside the NP6.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 102 ­ www.fortinet.com
● Detail of Interface Group RX

This is the final steps for all packets leaving the NP6. This block is the similar to the TX group but works on the other direction. It also has two parallel
paths. One path (on the left side) is dedicated to packet received from the ‘Host interface’. This corresponds to all packets arriving from the kernel
via the PCI­e bus. The other path (on the right side), is dedicated to packets transmitted to the XAUI interface of the NP6. Depending on the source,
different kind of processing is required with common missions : identify the destination and translate the destination LIF (Logical InterFace) to
interface. The packet has to recreated based on its updated PDQ and payload in memory. Another mission is to deal with packet fragmentation in
case the outgoing interface MTU is shorter than the packet length. Protocol
translation (example IPv4 → IPv6) is also done here and checksum
calculation as well. Finally packet is prepared to be sent out either on Host
interface or on XAUI. When an ISF is involved, packets need to be appended
with the correct ‘Core Tag’ required for switching in the ISF to send the
packet on the wire corresponding to the egress interface of the FortiGate.

Egress Header Processing (EHP) recreates the packet with fragmentation and
checksum recalculation. There is one EHP per XAUI (EHP0­EHP3)

Egress Tag Processing (ETP) is the reverse function of “Ingress Tag


Processing”. It happens the Core Tag when an ISF is involved. It does not do it
when the port is a direct attachment to a port. It also apply the correct ethernet
vlan if needed. There is one ETP per XAU (ETP0­ETP3)

Host Receive (HRX) block, corresponds to the path towards the kernel via the
Host interface. It distributed the packet to the required host queue. The choice
for the queue would correspond to packet processing from a different CPU core,
this is how CPU load is distributed. Some host queues are specific like for
N­Turbo queues. There are two HRX HRX0­HRX1, one for each PCI­e bus.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 103 ­ www.fortinet.com
Traffic flow examples
This section provides some example of typical packet flows passing through NP6

● Basic host receive path (left) and Basic host transmit path (right)

The two directions of a flow between and external port and the kernel host interface takes 2 different paths inside the NP6. One direction interface to
kernel is going through the session search engine while other direction from Kernel to external port makes use of the FDB shortcut.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 104 ­ www.fortinet.com
● Basic Firewall path

This is the typical hardware acceleration between 2 ports.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 105 ­ www.fortinet.com
● IPSec inbound path (left) and IPSec outbound path (right)

A shortcut through the ‘FDB’ exists for the IPSec outbound path because the interface to egress is part of the known information.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 106 ­ www.fortinet.com
● Accelerated inter vdom path

For inter­vdom acceleration, the Loopback is used to reinject traffic back for a second round to process the second vdom lookup.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 107 ­ www.fortinet.com
● Multicast with IPSec outbound

The inbound multicast packet (in yellow) reaches one of the session search engines.
Packets for the different destination egress interface are duplicated on the SSE. Each packet would then live its own live in the NP6, using the next
block appropriate for the interface it egresses.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 108 ­ www.fortinet.com
Configuration options impacting NP6

● config system np6 ­> edit np6_x­> set fastpath (enable*|disable)

This command globally disabled hardware acceleration fast­path on the whole NP6 so all interfaces bound to the NP6 are affected.
Should not be very commonly used, eventually needed if the kernel shaping function are required.
warning : dangerous command, has problem with vlans #372526 / #364448
“We didn't expect user disable fastpath at "config system np6". The CLI is only for debugging purpose. VLAN interface won't work when fast path is disabled.”

● config system np6 ­> edit np6_x­> set low­latency­mode (enable|disable)

This is only applicable to FortiGate­3700D for now on 2 of the 4 NP6 available on the unit. This is the command to attach directly the fortigate ports
to the NP6 without traversing the ISF. When in low­latency mode, no hardware acceleration or lag is available with ports from other NP6s.
Benefit of low­latency : latency drops from 3.5 micro sec to 1.6 micro sec

● config system np6 ­> edit np6_x­> set garbage­session­collector (enable|disable*)

This feature should not be needed in normal use. It is meant to be activated in case of a PBA leak is discovered to recover periodically the blocked
memory. Under normal condition, memory de­allocation is taking place normally when the packet is dropped on NP6 or leave the NP6.
It could be useful when syn­proxy is used to clear attack sessions (to be confirmed)
When enabled, the lifetime of a session is limited (see below chapter)

● config system np6 ­> edit np6_x­> set session­collector­interval (default 64 s)

This counter seems to be the maximum TTL for a session in NP6. It is default to 64 s. If a session in the NP6 remains, without packet seen, the
entry would be deleted after the defined seconds. The same timer seems also to be used for syn­proxy sessions (#272927)

● config system np6 ­> edit np6_x­> set session­timeout­fixed enable|disable*

Choose NP6 to kernel session update time from the 2 methods :


­ every fixed time (+­ range) with a default of 40 seconds adjustable (given by set session­timeout­interval / set session­timeout­random­range)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 109 ­ www.fortinet.com
­ depending on session expiration timer, from a range between 1/2 to 4/5 of expiration timer (#386626). For TCP, the established state timer is used.
Verified with default session­ttl 300, first update is after 150 s

● config system np6 ­> edit np6_x­> set session­timeout­interval (default 40s)

If method ‘set session­timeout­fixed enable’ is used, this defines the base timer for session update (see random­range)

● config system np6 ­> edit np6_x­> set session­timeout­random­range

If method ‘set session­timeout­fixed enable’ is used, this defines the random part added to the base timer

● config system np6 ­> edit np6_x­> config fp­anomaly­v4

This is the section were all NP6 anomalies for IPv4 traffic are configured (see fp­anomaly)

● config system np6 ­> edit np6_x­> config fp­anomaly­v6

This is the section were all NP6 anomalies for IPv6 traffic are configured

● config system npu ­> set dedicated­management­cpu enable (201257, 218083, 251776)

Reserves CPU0 for other processes so all interrupts originally scheduled on CPU0 or moved to CPU1 (on top of its own interrupts).
The goal is to avoid management slow down or management disconnection from FortiManager.
This was introduced in 5.0.5 for NP4 and applied later to NP6 in 5.0.10, 5.2.1
The drawback of this command is a possible excessive CPU load of CPU1.

● config system npu ­> set np6­cps­optimization­mode (#230523)

This option is available on 3700D. It is made to increase the CPS by distributing the NP6 interrupts to 32 CPU cores in lab conditions.
It is not recommended on production network where CPU power should be saved for other critical function like HA, logging…
● renamed "np6-cps-optimization-mode" by #262981 instead of np6_cpu_optimization_mode
● removed from platforms having a dual CPU socket (like 3700D), kept in 1200D/1500D as of 5.6.0 by #291819
● mantis to remove in 5.4 as well (not done so far, may never be done) #399659
● other references #305096, #300975, #301536

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 110 ­ www.fortinet.com
● Note : no impact from config system npu ­> set enc­offload/dec­offload/offload­ipsec

As a reminder, NP6 ignores the set enc­offload/dec­offload/offload­ipsec from the ‘config system npu’ group.

NP6 monitoring additions for drift sessions (diag and SNMP)

Following bug causing the stacking-up of NP6 forward entries (see #422746, #441532), monitoring capabilities were added via diag command and SNMP
OIDs to monitor if sessions are drifting in NP6. If so (and this would mean a bug exists), a diag command to clear idle sessions has been added.

● Counter drv-drift : represents the potentially drifting sessions

drv-drift = (instot - deltot) - enttot with


● instot: Number of session insertion.
● deltot: Number of session deletion.
● entot: Active session in NP6.

Variable drv-drift could have a negative or positive value. Negative value indicates there are more deletion than insertion. The bug we had before will
cause the drv-drift go negative because we deleted the session at wrong NPU. When drv-drift has a negative value, there is possibility that we failed
to delete session from another NPU.

After purge procedure removes idle session from NPU, drv_drift will have positive value because value of entot decreased after purging.
Please be aware that drv-drift could have positive value in multicast case because single session deletion will remove all the sessions in the same
multicast chain.

● diagnose npu np6 sse-drift-summmary : show summary of drv-drift of all the NP6 chips in the system, and calculate the sum of drv_drift.
Normally, sum is 0.

NPU drv­drift
­­­­­ ­­­­­­­­­
np6_0 0
np6_1 0
­­­­­ ­­­­­­­­­

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 111 ­ www.fortinet.com
Sum 0
­­­­­ ­­­­­­­­­

● diagnose npu np6 sse-purge-drift <npu_id> <time> :

The command will purge idle session from NP_<dev_id>. Argument [time] is option.
Default purging time is 300 seconds. It will take roughly 2-4 seconds for NP6 to
walk through the whole session table.

Example:
The procedure may take up to 10 Secs.
Please wait until the procedure is finished. Stopping in the middle may cause system malfunctioning.
Starting to clean up idle sessions in NP6_0.
Purging progress sse0/sse1:57470975/57470974, 0 idle sessions were purged.
NP6_0 session clean­up finished in 11.000000 Seconds.
Total session purged: 0

● diagnose npu np6 sse-stats <dev_id> : Addition of a drv-drift counter in the sse-stats

diagnose npu np6 sse­stats 0


Counters SSE0 SSE1 Total
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
active 0 0 0
...
...
PBA 3001
drv­drift 0

● SNMP OIDs were also added for the same purpose :

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 112 ­ www.fortinet.com
Add fgNPU counters to system MIBs.
FORTINET­FORTIGATE­MIB::fgNPUNumber.0 = INTEGER: 2
FORTINET­FORTIGATE­MIB::fgNPUName.0 = STRING: NP6
FORTINET­FORTIGATE­MIB::fgNPUDrvDriftSum.0 = INTEGER: 0
FORTINET­FORTIGATE­MIB::fgNPUIndex.0 = INTEGER: 0
FORTINET­FORTIGATE­MIB::fgNPUIndex.1 = INTEGER: 1
FORTINET­FORTIGATE­MIB::fgNPUSessionTblSize.0 = Gauge32: 33554432
FORTINET­FORTIGATE­MIB::fgNPUSessionTblSize.1 = Gauge32: 33554432
FORTINET­FORTIGATE­MIB::fgNPUSessionCount.0 = Gauge32: 0
FORTINET­FORTIGATE­MIB::fgNPUSessionCount.1 = Gauge32: 0
FORTINET­FORTIGATE­MIB::fgNPUDrvDrift.0 = INTEGER: 0
FORTINET­FORTIGATE­MIB::fgNPUDrvDrift.1 = INTEGER: 0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 113 ­ www.fortinet.com
NP6 IPsec out­of­order and sub­engine settings

Problem description : NP6 is likely to create out­of­order packets when IPsec is done in NP6. This is cause by the distribution method across the 8 IPsec
sub­engines.

Cause : Packets from the same ipsec session may be processed by different sub-engines in parallel. A big packets followed by a small packet distributed to two
different sub-engines will likely to be sent out in order small, then big because the time needed to process small packet is lower.
It is not possible to change the distribution algorithm so packets from the same SA is processed by the same sub-engine.
This problem exists for both encryption (distribution of clear text packets) and decryption (distribution of ESP packets)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 114 ­ www.fortinet.com
Distribution across IPsec engines inside NP6 :

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 115 ­ www.fortinet.com
NP6 IPsec modules is made of 2 banks, each bank is made of 8 sub-engines.
First SAs are distributed across BANKS. For each packets of an SA, a sub-engine is selected. The selection rule is to first try to use the first
sub-engine. If the engine is busy, then the next engine is tried, and so on. It is then expected to see the first engine more busy than the second.

● Engine status

diag npu np6 register <x> | grep engine_status provides engine (aka: Bank) and sub-engine status (idle/busy).

SEGFARGEV01 (global) # diag npu np6 register 0 | grep engine_status


engine_status =000000c0 [16:23]
engine_status =00000080 [16:23]
engine_status =00000000 [16:23]
engine_status =00000000 [16:23]

Comments :
● The 2 last lines are not used, first line is for Bank 0, second line is for Bank 1.
● The 2 last bytes represents sub-engine status :
FF (so 8 bits: 1111 1111) mean all engines are IDLE (should be the case when traffic stops)
00 (so 8 bits 0000 0000) mean all engines are busy
● Command is a snapshot so you may need to try the command multiple time to capture sub-engine 1 in idle state

Another diag command has been done to provide engine status to avoid using the long register dump :

(fn)sysctl cat /proc/net/np6_0/ipsec-engine


(fn)sysctl cat /proc/net/np6_1/ipsec-engine

# fnsysctl cat /proc/net/np6_0/ipsec­engine


IPSec Idle status of Engine 0:ff
IPSec Idle status of Engine 1:ff

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 116 ­ www.fortinet.com
Mitigation :

The only possible mitigation is to limit the number of sub­engines for encryption and decryption. Doing this guarantees that all packets for a specific SA will be
serialized, treated one after each other.
Zero out­of­order packet can be achieved in decryption direction if a single sub­engine is used for decryption and zero out­of­order packet can be achieved in
encryption if a single sub­engine is used for encryption. The 2 banks can still be used simultaneously as they process different SAs.

The selection of sub­engine has been made configurable. The command applies globally to all NP6 on the unit.

● config system npu → ipsec­enc­subengine­mask <hex> /ipsec­dec­subengine­mask <hex>

The mask to supply is a 2 bits value (in hexa), for performance reason, it is recommended to use different sub­engines for encryption and decryption if a single
sub­engine is targeted.

A recommended configuration to guarantee zero­out­of­order packet for encryption and decryption direction is :
config system npu
set ipsec­dec­subengine­mask 0x01 # 0x01 = 00000001 ==> sub­engine 1 only enabled for decryption
set ipsec­enc­subengine­mask 0x10 # 0x10 = 00010000 ==> sub­engine 5 only enabled for encryption
end

Performance impact :

CRT tab testings with a FGT­1500D has show that each NP6 sub­engine processes roughly 2 G of traffic (with anti­replay disabled). Considering the 2 banks,
with multiple SAs involved, a zero­out­of­order configuration would deliver 4G max by direction (tests show between 3.9 G and 5.2 G depending on packet
size).

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 117 ­ www.fortinet.com
SA distribution on BANKs for inbound and outband :

● fnsysctl cat /proc/net/np6_0/ipsec-stats

(global) # fnsysctl cat /proc/net/np6_0/ipsec­stats


Counters IB0 IB1 OB0 OB1
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
active­SA 0 0 127 127
timeout 0 0 0 0
invalid_idx 0 0 0 0
tbl_full 0 0 0 0
nr_flush 0 0 0 0
nr_busy 0 0 0 0
nr_antireplay 0 0 0 0
cache_disabled 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
IPSec0­eng­mask(enc/dec) = 0x01/0x01
IPSec1­eng_mask(enc/dec) = 0x01/0x01
Idle status of IPSec Engine 0:fe
Idle status of IPSec Engine 1:fe

This command output details distribution of inband and outband SA on the 2 banks
IB0 : Inbound SA on bank0 (here 127 SA installed)
IB1 : Inbound SA on bank1 (here 127 SA installed)
OB0 : Outband SA on bank0
OB1 : Outband SA on bank1

Note : The distribution algorithm of SA in the 2 banks in so far unknown but it seems that traffic for one SA only goes to the same bank, therefore the
same sub­engine if masking is used. This garanties 0 out­of­order (we could not produce ooo in the lab with this).

References :
● #403883
○ details on ipsec engine status, 2 banks of 8 su­engines. Status is represented by 2 bytes and each bit tells if engine is idle(1) or busy(0)
○ sub­engine selection : first and lowest available engine. That means, it chooses the available engine from Engine_0 to Engine_7. So, they may
frequently see something like: 0xF8, 0xFC, 0xF2
○ PDQ_OSW_IPTO' ­ the counter from Outbound Switch to IP Tunnel Outbound

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 118 ­ www.fortinet.com
NP6 limitations and bugs

There are 2 types of problems : Issues in the NP6 hardware are generally not fixable. Sometimes a software based solution or mitigation may exist but it
generally comes with consequences on performances. Applying the solution may or not require a specific configuration. Some other bugs are at the NP6 driver
level and can be fixed without impact.

Hardware limitations (hardware bugs than can’t be fixed by software or without an important impact on feature or performance) :

● #275195 IPsec anti-replay important performance degradation


● #284547 Packet loss with anti-replay enabled, and IPSEC outage on 1500D

Description : Performance of ipsec with anti-replay in NP6 is bad. The greater number of tunnel, the more impact. It was originally observed on a
Fortigate-1500D with 500 tunnels used for LTE.
Cause : hardware bug : NP6 anti-replay cache corruption.
Mitigations :
○ special image available to improve the performance but still with high degradation (top3 #275195).
○ top3 #373505 came with a big improvement that may be merged with #380600 without CLI option
○ performance impact of fix (24G → 6G , so divided by 4)
○ No specific configuration needed for mitigation.

Comment : reducing the number of ipsec sub-engines to encrypt or decrypt is NOT a workaround for anti-replay issue.
Other references with valuable information : #437462 Add IPSec Anti-Replay workaround based on 5.4.4 3700DX branch

● #370586 ipsec out-of-order caused by distribution of traffic for same SA on multiple engine/sub-engines

Description : Packets from the same ipsec session may be processed by different sub-engines in parallel. A big packets followed by a small packet
distributed to two different sub-engines will likely to be sent out in order small, then big because the time needed to process small packet is lower.
Cause : hardware bug
Mitigations :
○ special image available (fg_5­0_Orange_LTE_269247/build_tag_8942) to improve the performance by defining/limiting number of encryption and
decryption engines (CLI change)
○ #370486 Add CLI commands to configure limited IPSEC engine on NP6
=> requires user specific configuration (generally one sub-engine for encryption, one sub-engine for decryption)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 119 ­ www.fortinet.com
=> config system npu ; set ipsec­dec­subengine­mask <engine_mask_hex>; set ipsec­enc­subengine­mask <engine_mask_hex> ; end
=> implemented in 5.4.4

● #383624 Sending multicast traffic across NP6 npu­vlink may cause interfaces to stop sending/receiving

Description : multicast traffic on npu­vlan caused PBA leak. At some point, traffic stops
Cause : Hardware bug,
Mitigation : the fix in software disables multicast acceleration on npu vlink
⇒ fixed in 5.6.0, fixed in 5.4.6, 5.2.9 covered by 282472

● #277747 NP6 maximum frame size is limited to 15360 bytes

Description : Max frame size NP6 can transmit is 15360. Packets will get dropped if the size is bigger than this value.
Cause : Hardware limit

● #416102 Traffic over IPsec VPN getting dropped after 2 pings when it is getting offloaded to NPU

Description : Traffic may be dropped when a tunnel is npu offload after routing revalidation. This is more the case when the tunnel is bound to a
loopback interface advertised via dynamic routing protocol.
Cause : Seems to be more mishandling of revalidation cases

The new behavior to expect described 415155 : no npu offload for ipsec when tunnel bound on loopback.
Today (170719) there is no plan to fix it, the solution seems to be to avoid acceleration in this case.

● #396027 Single flow exceeds 10GB causes all BGP peers to drop randomly

Description : Though it understood that a 40G interface is made of 4x10G, a single 10G path saturated would cause degradation on other 10G paths.
Cause : The ISF is using shared buffers for all 10G paths. Buffers for would be used-up if 1 is saturated
Comment : counter sw_in_drop_pkts in ‘diag hard dev nic’ increases.
Fix : none foreseen

● #392436 Bad throughput using 10G interfaces [1G / 10G port mix, devices without ISF like FGT600D]

Description : Due to limited NP6 internal packet buffer, offloaded packets from a 10G interface to a 1G interface can be dropped

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 120 ­ www.fortinet.com
Fix/mitigation : 5.6.1 only, add a new CLI command to control 10G/1G flow (for units without ISF)
Comment : Units with ISF don’t have the problem thanks to ISF packet buffers
New CLI command : config system npu
set host-shortcut-mode bi-directional Offload TCP and IP Tunnel sessions in both directions between 10G and 1G interfaces (normal operation)
set host-shortcut-mode host-shortcut Only offload TCP and IP Tunnel sessions received by 1G interfaces. Select if packets are dropped for offloaded
traffic between 10G to 1G interfaces.

● EHP drops (egress traffic from NP6 to XAUI)


⇒ NP6 egress queues are limited, in case of burst, queues may be full and drop packets.
⇒ see chapter below “understanding EHP drops”
● source top3 #412338 (Apple), #437911

Mitigation : The new feature in this top3 allows throttles input to NPU, forcing the ISF buffer to be used for ingress traffic. EHP drops are seen when
traffic has short burst and because the NP6 has short queues. The idea is to take the benefit of the ISF queues that are bigger than NP6 ones.

Commands (in config system npu)

● set gtse-quota <0M (ie: disabled)|200M|400M|600M|800M|2G|4G|8G|10G>


⇒ throttling packets received from Kernel (session establishment, closing, session helper, alg, NOT N-Turbo)
This is a kernel shaper when egress to NP6 on pcie bus (mantis #436111)

● set sw-np-bandwidth <0G (ie disabled)|2G|4G|5G|6G|7G|8G|9G>


⇒ mantis #424214, merged for 5.6.3 (+ top3 special builds)
⇒ throttling ingress packets fastpath (hardware accelerated packet). Note that it may also impact traffic received from another NP6 via the ISF (this
could be the case for either ipsec forwarded traffic (fwd_entry) or npu-vlink received traffic.

Comments :

● N-Turbo throttling : N-turbo is another source for incoming packet to NP6, it does not use the same path as the regular ‘slow-path’ (except for
session establishment, log report, timer sync, etc), as such, its ingress traffic on NP6 is not impacted by the gtse-quote. N-Turbo which does not
use the kernel has its own throttling. A fixed shaper at around 6G has been programmed in kernel to limit traffic to NP6 and allow the use of kernel
bigger buffer than NP6 to avoid dropping packets.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 121 ­ www.fortinet.com
● drops on the ISF can be measured by counters : sw_in_drop_pkts, sw_out_drop_pkts, sw_np_in_drop_pkts, sw_np_out_drop_pkts from ‘diag hard
dev nic <port>’

● EHP drops caused by lag (compare to single port without lag)

● Description : it was observed much higher EHP drops when a lag is used (even with a single interface) compared to the single interface with no lag.
Observed with 5.4.6, 5.6.3 (and probably with all other versions).
Seen on both service module (38xxD, 39xxE, 5001E, 6000F and 7000E) and non service module platform (1200D → 3700D)

● Cause : egress congestion due to multiple traffic source


● Source : #464340 EHP drops when using LAG + #410240 Significant packet drop 2x40g lag
● Fix for non service module platform (#489956) : using a new lag algo (new cli : config system npu ⇒ set lag-sw-out-trunk enable (this
automatically sets lag mode with ‘npu’)
○ 5.4 : special build br_5-4_nokia only (see Nokia top3) + 6.2.0 / 6.0.2 / 5.6.5 (planned)
● Fix for service module platforms (#468684) : new algo, no cli change
○ 5.4 : special build br_5-4_nokia only (see Nokia top3) + planned for 5.6.5, 6.0.2, 6.2.0

● HPE protection ( SLBC clusters collapse under DDoS attacks with fragmented packets)
source #363398
Refer to chapter HPE Protection

Description : kernel CPU could be impacted in case of slow path traffic attack.

Mitigation : Use NP6 shapers to limit DDoS attack traffic to the kernel.
Shapers could be defined for tcp, udp, icmp, sctp, esp, ipfrag, arp on traffic egressing from NP6 towards kernel via PCIE host interface.

● N-Turbo throttling

Description : A hard-coded shaper was added for traffic coming from from N-turbo to NP6 to avoid bursts that would congestion NP6 egress interface.
When this shaper drops traffic counter from n-turbo stats (fnsysctl command) increases

sources :
#257607 ECO for 1500D : apply 7G gtse shaper and change np6 tx spmask to 1 to slower the tx speed
#251104 Two sets of traffic shaper are implemented for FGT 300D/500D and FGT 1500D/3700D [Actually all other platforms]

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 122 ­ www.fortinet.com
● 300D/500D : 1G shaper
● Others : about 6G shaper

Fix : 5.0.10, 5.2.1

● #412664 (and top3 #413388) DSCP EF (express forwarding) marked traffic is not prioritized in NP

Description : There is no consideration of DSCP flags inside NP6/NP6Light (Soc3)


Mitigations :
● Soc3 : special build (#413388) works with FG100E but only DMZ/MGMT port (ports not connected to internal switch)
● NP6 : no solution
● Workaround : disable HW acceleration and use kernel shaper (works fine)

Fixed bug (bugs with fix without significant impact on feature or performance)

● top3 #422746 NP6 exhibited 90% failure rate for SSE insert-success.
● #284694 High CPU, NP6 counter drops and traffic loss

Description : multi NP6 platforms with ipsec and/or npu-vlinks may stack forward-entries in NP6
Symptoms : session supposedly accelerated are not (CPU increases), when reaching limits traffic outage is visible
Cause : NP6 driver deletes forward sessions on the wrong NP6. It causes session deletion failure on the NP6 where the session is not installed and also
causes stacking of session on the NP6 where the forward-entry was added (because never removed)
Fix : 5.2.5, 5.4.0, 5.6.0
Monitoring additions : #441532 for GA merge : diag commands and snmp monitoring for specific NPU counter have been added, see section ‘NP6
monitoring additions for drift sessions’

● #401847 locked­up ipsec subengines

Description : when receiving an over­padded packet either from clear_text interface to encrypt or cipher_text interface to decrypt, IPsec sub­engine
may enter in a locked­up state. Only a reboot can help to recover.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 123 ­ www.fortinet.com
Cause : no packet sanity check performed on the ipsec engine
Fix for clear text side : force packets to first loop through the nat module that have an sanity check
=> 5.6.1 (B1458)
Fix for cipher_text side : force packets to first loop thourhg the IPT module (IP tunneling)
=> special image top3 #403883 B9612 with CLI command “config system npu →set strip­esp­padding enable”
=> fixed in GA 5.6.1 (with cli change) #416950
Comments :
­ more information in top3 #403883
­ sub­engine lockup detection : use “fnsysctl cat /proc/net/np6_0/ipsec-stats -> ipsec engines idle status’ : during low traffic, we expect to see the
counter showing the ‘FF’ values.
- Though it is a hardware problem, a safe fix exists without big impact on performance.

● #387675 PBA leak in NP6 causing traffic outage

Caused by software bug, fixable


5.2 : fixed in 5.2.10 ; confirmed not affected : 5.2.3 ; affected : 5.2.8, 5.2.9
5.4 : fix scheduled for 5.4.2 ;

● #386626 kernel session expire for some hardware accelerated traffic when ‘virtual­wire’ with vlan configuration is used

Description : NP6 fails to update vlan sessions in virtual­wire case.


Fixes : 5.6.0, 5.4.3 (no fix for 5.2)

● top3 #255526/ Mantis #255349 IPSEC multicast acceleration problem : duplicated forwarding end leftover session

Description : Current multicast offloading code has issues in multicast­over­ipsec, which cause duplicated multicast forwarding and leftover
sessions inside NP6.
Fix : 5.4.0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 124 ­ www.fortinet.com
Understanding EHP drops

Packets may be dropped when they egress from an NP6 XAUI. This happens at the end of packet processing chain inside the NP6.
Drops are caused by the merging of different packets origins that all have to egress on the same XAUI.
In case of packets bursts from some or multiple origins, the NP6 egress queue which is very limited on NP6 would be full and cause drops.

The drops probability would increase with :

● The number of origin sources increase :


○ More CPU cores
○ Use of N­turbo (dedicated host path)
○ Slow­path and fast­path mix
○ More NPs with cross NP traffic (like npu­vlink or IPsec on other NP6s)
○ More interfaces

● The packet burst from the source increases :


○ Faster CPU capable of delivering high bursts
○ Faster PCI­E buses
○ Faster interfaces (100G …) allowing short inter­packet gap

A high bandwidth traffic is not necessarily required to produce burst. The conjunction of traffic micro­burst from different sources could be enough.
NP6 buffers are small and don’t allow the storage of a lot of packets resulting in packets drops when full.

The following diagram is a good summary of HPE drops. Bubbles represent packets from the different sources filling­up the EHP buffer ‘bucket’ and causing
drops.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 125 ­ www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 126 ­ www.fortinet.com
NP6 shaping protection summary

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 127 ­ www.fortinet.com
N­Turbo NP6 IRQ mapping

N­Turbo is covered in Laurent’s chapter from the Expert Academy 2016

From NP6 standpoint, packets allowing ipsengine kernel fast­path are pushed to specific queues on the Host interface.
On the reverse direction, NP6 receives packets from ipsengines as well on specific host logical interfaces (LIF).
The interrupts cost for those queues are handled by different CPU cores than the ones used for regular NP6 to kernel path on the host interface for a
better load­balancing on the cores. Different platforms may have different CPU allocations.

This is the contribution from NP6 to the N­Turbo mechanic, the remaining part takes place in the kernel.

N­Turbo has two different types of packets : control packet and data packets.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 128 ­ www.fortinet.com
● Control packets

Control packets correspond to first packets of new sessions. At this point, the session is also unknown from ipsengine processes because the
kernel has not yet notify about a new session to ips inspect and with what profile. Those packets are sent by NP6 to kernel using the ‘regular’ host
path channels. Interruption raised and CPU cores used are the usual ones. They don’t benefit from any special acceleration compared to other
packets.

(1) NP6 receives packet from ISF, no forwarding entries found

(2) Packet is sent to OS through slow path

(3) Session created in kernel, kernel chose an IPS engine based on N­Turbo scheduler info.

(4) Kernel send to NP6 with packed info (IPS view­id, DNAT, MTU)

(5) Packet is sent to the corresponding load­balancer via specific N­Turbo host interface

(6) Load­balancer stores packet in ipsengine RX queue and sends notification interrupt to
ipsengine

(7) ipsengine receives packet from its N­Turbo RX queue, looks at its N­Turbo table and find
this is a new session. Add a new entry with received out­of­band data, optionally refragment
here and process with IPS with action=pass/block/shaping; fragment if needed. If application
control is enabled and app identified, ips engine notifies kernel to update session

(8) ipsengine put packet to TX queue and interrupt load­balancer

(9) Load­balancer forward packet to NP6 HIF TX queue. NP6 forward to ISF

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 129 ­ www.fortinet.com
● Data packets

Data packets correspond to packets from already known sessions. At this point, the session is known from ipsengine processes. Those packets
are sent by NP6 to kernel N­Turbo using dedicated host channels. The interruptions raised and CPU cores used are different ones.
These packet benefit from N­Turbo IPS acceleration in the kernel.

(1) NP6 receives packet from ISF, a forwarding entries is found, ipsengine process selection
and profiles are known (see note *)

(2) NP6 OSW sends to N­Turbo via N­Turbo specific host interface passing by its loopback

(3) (4) no change

(5) ipsengine receives packet from its N­Turbo RX queue, looks at its N­Turbo table and find
existing session. Corresponding out­of­band info is retrieved from N­Turbo session. Optionally
refragment here and process with IPS with action=pass/block/shaping; fragment if needed. If
application control is enabled and app identified, ips engine notifies kernel to update session

(6)(7) no change

Note : before 5.4, ips view­id was encoded in packet VLAN/DMAC. No more view­id in 5.4

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 130 ­ www.fortinet.com
● N­Turbo out­of­band information in packet

Extra required information are stored in packet destinated to N­Turbo

● inside packet VLAN or VLAN+DMAC fields (in NAT mode) from NP6 to N­Turbo :

These fields carry:


­ IPS view ID (has disappeared in 5.4),
­ IPS engine ID (tells N­Turbo which IPS engine should receive the packet)

● piece of data attached to the end of control packets which contains:


­ DNAT information : NATed IP and NATed Port
­ MTU
­ Egress VLANs to use

Note : each ipsengine maintains a local N­Turbo session list that stores the info so that it can be found for fast­path data packet.

This mechanism may have changed in 5.4 since the removal of the ips­view which was initially a requirement for SP2/SP3 ips hardware acceleration.
They are not anymore supported in 5.4.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 131 ­ www.fortinet.com
● debug commands related to N­Turbo ips acceleration

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 132 ­ www.fortinet.com
The following command is given for reference only has it is not hardware oriented.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 133 ­ www.fortinet.com
The irq mapping dump from “diag diagnose hardware sysinfo interrupts” shows the different types of IRQs and mapping used in N­Turbo

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 134 ­ www.fortinet.com
● Limitations :

● NTurbo doesn't offload the following types of traffic (#244082) :

­ traffic other than TCP and UDP


­ traffic intercept by firewall helpers (e.g. DNS, FTP, etc.)
­ traffic passing through interfaces on which interface/DoS policies are configured;
­ inspected by proxy­based applications
­ IPSec encrypted of GRE tunneled traffic
­ HA A­A mode
­ In TP mode, any session with IPS view­id > 255 ! (different in 5.4)
­ Whatever traffic that does not have firewall­only fast­path can't be N­Turbo accelerated

● disabling N­Turbo globally :


N­Turbo acceleration can be disabled by a configuration option. The NP6 standard hardware acceleration is not affected by this command (the name is
a bit confusing)

config ips global


set np­accel­mode none

Note : An old reference #178521 refers to ‘set hardware­accel­mode’ which seem to have been changed to ‘np­accel­mode’ later.

● disabling N­Turbo on policy­based (as of 5.6.1)


source : #398960

config firewall policy


edit 1
set np­accelation enable*/disable
end

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 135 ­ www.fortinet.com
● example of CPU mapping for a FortiGate­1500D ips N­Turbo

FortiGate 1500D : 2x NP6, 2x NTurbo, each N­Turbo maps 5 ipsengines (10 ipsengine), 12 CPU cores
This mapping is extracted as follow :
­ retrieve interrupts names and corresponding IDs from “diag hard sys interrupt”
­ find for each interrupts the CPU id mapping from “diag system cpuset interrupt <irq>” (see IRQ Mapping)
­ find for each N­Turbo the mapped ipsengine PID from “fnsysctl cat /proc/nturbo/<N­Turbo­id>/drv
­ for each ipsengine, find the allowed cpu from the “Cpu_allowed_list” entry in “fnsysctl cat /proc/<pid>/status

Ref Name IRQ CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11

NP6 <­> kernel slow path interrupts

[1] np6_0­tx­rx0 75 X

[2] np6_0­tx­rx1 76 X

[3] np6_0­tx­rx2 77 X

[4] np6_0­tx­rx3 86 X

[5] np6_0­tx­rx4 87 X

[6] np6_0­tx­rx5 88 X

[7] np6_1­tx­rx0 97 X

[8] np6_1­tx­rx1 98 X

[9] np6_1­tx­rx2 99 X

[10] np6_1­tx­rx3 103 X

[11] np6_1­tx­rx4 104 X

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 136 ­ www.fortinet.com
[12] np6_1­tx­rx5 105 X

NP6 <­> N­Turbo 0 & 1 interrupts

[13] np6_0­nturbo­tx­rx0 (to NTurbo­0) 79 X

[14] np6_0­nturbo­tx­rx1 (to NTurbo­1) 90 X

[15] np6_1­nturbo­tx­rx0 (to NTurbo­0) 101 X

[16] np6_1­nturbo­tx­rx1 (to NTurbo­1) 107 X

NTurbo 0 & 1 <­> IPsengine interrupts

[17] np6_0­nturbo­ips­0 (N­Turbo 0) 81 X

[18] np6_0­nturbo­ips­1 (N­Turbo 0) 82 X

[19] np6_0­nturbo­ips­2 (N­Turbo 0) 83 X

[20] np6_0­nturbo­ips­3 (N­Turbo 0) 84 X

[21] np6_0­nturbo­ips­4 (N­Turbo 0) 85 X

[22] np6_0­nturbo­ips­5 (N­Turbo 1) 92 X

[23] np6_0­nturbo­ips­6 (N­Turbo 1) 93 X

[24] np6_0­nturbo­ips­7 (N­Turbo 1) 94 X

[25] np6_0­nturbo­ips­8 (N­Turbo 1) 95 X

[26] np6_0­nturbo­ips­9 (N­Turbo 1) 96 X

IPSengines CPU mapping

[27] ipsengine 1 (N­Turbo 0) X

[28] ipsengine 2 (N­Turbo 0) X

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 137 ­ www.fortinet.com
[29] ipsengine 3 (N­Turbo 0) X

[30] ipsengine 4 (N­Turbo 0) X

[31] ipsengine 5 (N­Turbo 0) X

[32] ipsengine 1 (N­Turbo 1) X

[33] ipsengine 2 (N­Turbo 1) X

[34] ipsengine 3 (N­Turbo 1) X

[35] ipsengine 4 (N­Turbo 1) X

[36] ipsengine 5 (N­Turbo 1) X

← suggest a diagram like this to show each interrupt with the reference id from the table
above in the lines. Use colors on the interupt names in the table with same color in the
diagram.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 138 ­ www.fortinet.com
● IPS/N­Turbo and IPsec improvement (as of 5.6.1)
Source : #398960
Feature : NTurbo is used for IPSEC+IPS case. The IPSEC SA info is passed to NTURBO as part of VTAG for control packet and will be used for the
xmit.
Note: If the packets need to go through IPSEC interface, the traffic will be always offloaded to Nturbo. But for the case that SA has not been installed to
NP6 because of hardware limitation or SA offload disable, the packets will be sent out through raw socket by IPS instead of Nturbo, since the software
encryption is needed in this case.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 139 ­ www.fortinet.com
Diag commands and counters
● diag npu np6 fastpath <enable*|disable> <np6_id>

A diag command to disable all hardware acceleration on the given NP6 id. Upon a reboot, this setting goes back to default ‘enable’ value.
For troubleshooting purpose, when a particular session need kernel tracing, it is recommended to apply a specific, dedicated policy for the traffic to trace with
option “set auto­asic­offload disable”. This approach is safer and would avoid a potential huge CPU impact.

● diag npu np6 dce (dce­all) <np6_id>

Dumps the Drop Counter Engine counters for the requested np6 id.

Note : ‘dce’ vs ‘dce­all’


If using “diag npu np6 dce”, only the counters that have changed their values from the previous command run would be displayed. If ‘dce­all’ is used, all
counters are displayed regardless of the previous dump. This rule applies to all commands listed below.

Different types of counters are printed. The ones with ‘PDQ’ in the name refer to number of packets dropped because of a full packet descriptor queue. The
name of the queue generally refers to the functional block pushing the packet and the one receiving the packet.
Each NP6 XAUI is linked to a total of 4 blocks : ITP, IHP, ETP, EHP so you would see references such as PDQ_OSW_EHP0 and PDQ_OSW_EHP1 that are
similar Packet drop counters, all linked to OSW but with a connection to a different XAU (EHP0 and EHP1)

Examples:
PDQ_SSE0_SSE1 : packet sent from SSE0 to SSE1 using the loopback (npu­vlink)
PDQ_OSW_HRX0 : PDQ dropped between OSW and HRX0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 140 ­ www.fortinet.com
List of functional modules referred in NP6 drop counters

Module identifier Module function

ISW Input Switch

OSW Output Switch

SSE Session Search Engine

FDB Forwarding Data Block

XHP Header parser for IPsec inbound

IHP Ingress Header Processor

EHP Egress Header Processor

IPSEC0I IPsec Engine 0 inbound

IPSEC0O IPsec Engine 0 outbound

IPSEC1I IPsec Engine 1 inbound

IPSEC1O IPsec Engine 1 outbound

CWI Capwap Engine Inbound

CWO Capwap Engine Outbound

IPTI IP Tunnel Engine Inbound

IPTO IP Tunnel Engine Outbound

SYN SYN/DNS proxy

HRX Host Receive Engine

HTX Host Transmit Engine

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 141 ­ www.fortinet.com
DCE TABLE 0 : HRX drops
From diagnose npu np6 hrx­drop­all <chip­id>

Name Index Description

VHIF_TX0_DROP ~ VHIF_TX127_DROP 0x0 ~ 0x7f Per virtual host transmit PDQ (to ISW) drop, total 128 TX queues; generally
shows forwarding path does not have enough processing power

VHIF_RX0_DROP ~ VHIF_RX127_DROP 0x80 ~ 0xff Per virtual host receive PDQ (from OSW) drop, total 128 RX queues; generally
means host does not have enough processing power to handle all incoming
packets

DCE TABLE 1 : Anomaly drops


From diagnose npu np6 anomaly­drop­all <chip­id>

Name Index Description

Refer to per type APS drop counter table for individua 0x0 ~ 0x1f Per type packet anomaly drop in IHP0
counter meaning of each group.
0x20 ~ 0x3f Per type packet anomaly drop in IHP1

0x40 ~ 0x5f Per type packet anomaly drop in IHP0 (same ???)

0x60 ~ 0x7f Per type packet anomaly drop in IHP1 (same???)

0x80 ~ 0x9f Per type packet anomaly drop in XHP0

0xa0 ~ 0xbf Per type packet anomaly drop in XHP1

0xc0 ~ 0xdf Per type packet anomaly drop in HTX0

0xe0 ~ 0xff Per type packet anomaly drop in HTX1

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 142 ­ www.fortinet.com
Name Index Description

DCE2_IDX_DROP_MACFIL_BASE0 0x00 Destination MAC mismatch drop for packet from XAUI0
DCE2_IDX_DROP_MACFIL_BASE1 0x01 Destination MAC mismatch drop for packet from XAUI1
DCE2_IDX_DROP_MACFIL_BASE2 0x02 Destination MAC mismatch drop for packet from XAUI2
DCE2_IDX_DROP_MACFIL_BASE3 0x03 Destination MAC mismatch drop for packet from XAUI3
DCE2_IDX_DROP_MACFIL_BASE4 0x04 Destination MAC mismatch drop for packet rom CAPWAP tunnel inbound
DCE2_IDX_DROP_MACFIL_BASE5 0x05 Destination MAC mismatch drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_MACFIL_BASE6 0x06 Destination MAC mismatch drop for packet from IP tunnel inbound
DCE2_IDX_DROP_MACFIL_BASE7 0x07 Destination MAC mismatch drop for packet from IP tunnel outbound
DCE2_IDX_DROP_MACFIL_BASE8 0x08 Destination MAC mismatch drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_MACFIL_BASE9 0x09 Destination MAC mismatch drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_MACFIL_BASE10 0x0a Destination MAC mismatch drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_MACFIL_BASE11 0x0b Destination MAC mismatch drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_MACFIL_BASE12 0x0c Destination MAC mismatch drop for packet from host transmit HTX 0
DCE2_IDX_DROP_MACFIL_BASE13 0x0d Destination MAC mismatch drop for packet from host transmit HTX 1
DCE2_IDX_DROP_MACFIL_BASE14 0x0e Destination MAC mismatch drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_MACFIL_BASE15 0x0f Destination MAC mismatch drop for packet from loopback interface

DCE2_IDX_DROP_ISW_L2ACT_TPRT0 0x10 Target interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_TPRT1 0x11 Target interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_TPRT2 0x12 Target interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_TPRT3 0x13 Target interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_TPRT4 0x14 Target interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT5 0x15 Target interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT6 0x16 Target interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT7 0x17 Target interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT8 0x18 Target interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT9 0x19 Target interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT10 0x1a Target interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT11 0x1b Target interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT12 0x1c Target interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_TPRT13 0x1d Target interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_TPRT14 0x1e Target interface action drop for packet from SYN/DNS proxy

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 143 ­ www.fortinet.com
DCE2_IDX_DROP_ISW_L2ACT_TPRT15 0x1f Target interface action drop for packet from loopback interface
DCE2_IDX_DROP_ISW_L2ACT_ETHR0 0x20 L2 Ethertype action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_ETHR1 0x21 L2 Ethertype action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_ETHR2 0x22 L2 Ethertype action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_ETHR3 0x23 L2 Ethertype action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_ETHR4 0x24 L2 Ethertype action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR5 0x25 L2 Ethertype action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR6 0x26 L2 Ethertype action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR7 0x27 L2 Ethertype action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR8 0x28 L2 Ethertype action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR9 0x29 L2 Ethertype action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR10 0x2a L2 Ethertype action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR11 0x2b L2 Ethertype action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR12 0x2c L2 Ethertype action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_ETHR13 0x2d L2 Ethertype action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_ETHR14 0x2e L2 Ethertype action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_ETHR15 0x2f L2 Ethertype action drop for packet from loopback interface

DCE2_IDX_DROP_ISW_L2ACT_SVIF0 0x30 Source virtual interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_SVIF1 0x31 Source virtual interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_SVIF2 0x32 Source virtual interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_SVIF3 0x33 Source virtual interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_SVIF4 0x34 Source virtual interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF5 0x35 Source virtual interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF6 0x36 Source virtual interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF7 0x37 Source virtual interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF8 0x38 Source virtual interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF9 0x39 Source virtual interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF10 0x3a Source virtual interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF11 0x3b Source virtual interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF12 0x3c Source virtual interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_SVIF13 0x3d Source virtual interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_SVIF14 0x3e Source virtual interface action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_SVIF15 0x3f Source virtual interface action drop for packet from loopback interface

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 144 ­ www.fortinet.com
DCE2_IDX_DROP_ISW_L2ACT_SPRT0 0x40 Source interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_SPRT1 0x41 Source interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_SPRT2 0x42 Source interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_SPRT3 0x43 Source interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_SPRT4 0x44 Source interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT5 0x45 Source interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT6 0x46 Source interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT7 0x47 Source interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT8 0x48 Source interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT9 0x49 Source interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT10 0x4a Source interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT11 0x4b Source interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT12 0x4c Source interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_SPRT13 0x4d Source interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_SPRT14 0x4e Source interface action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_SPRT15 0x4f Source interface action drop for packet from loopback interface

DCE2_IDX_DROP_APS_IHP0 0x50 Packet anomaly check drop for packet from XAUI0
DCE2_IDX_DROP_APS_IHP1 0x51 Packet anomaly check drop for packet from XAUI1
DCE2_IDX_DROP_APS_IHP2 0x52 Packet anomaly check drop for packet from XAUI2
DCE2_IDX_DROP_APS_IHP3 0x53 Packet anomaly check drop for packet from XAUI3
DCE2_IDX_DROP_APS_XHP0 0x54 Packet anomaly check drop for packet from IPSec egnine 0 inbound
DCE2_IDX_DROP_APS_XHP1 0x55 Packet anomaly check drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_APS_CWI 0x56 Packet anomaly check drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_APS_IPTI 0x57 Packet anomaly check drop for packet from IP tunnel inbound
DCE2_IDX_DROP_APS_HTX0 0x58 Packet anomaly check drop for packet from host transmit HTX 0
DCE2_IDX_DROP_APS_HTX1 0x59 Packet anomaly check drop for packet from host transmit HTX 1
DCE2_IDX_DROP_IHP0_PKTCHK 0x5a Packet sanity check drop for packet from XAUI0
DCE2_IDX_DROP_IHP1_PKTCHK 0x5b Packet sanity check drop for packet from XAUI1
DCE2_IDX_DROP_IHP2_PKTCHK 0x5c Packet sanity check drop for packet from XAUI2
DCE2_IDX_DROP_IHP3_PKTCHK 0x5d Packet sanity check drop for packet from XAUI3
DCE2_IDX_DROP_XHP0_PKTCHK 0x5e Packet sanity check drop for packet from IPSec egnine 0 inbound
DCE2_IDX_DROP_XHP1_PKTCHK 0x5f Packet sanity check drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_CWI_PKTCHK 0x60 Packet sanity check drop for packet from CAPWAP tunnel inbound

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 145 ­ www.fortinet.com
DCE2_IDX_DROP_IPTI_PKTCHK 0x61 Packet sanity check drop for packet from IP tunnel inbound
DCE2_IDX_DROP_HTX0_PKTCHK 0x62 Packet sanity check drop for packet from host transmit HTX 0
DCE2_IDX_DROP_HTX1_PKTCHK 0x63 Packet sanity check drop for packet from host transmit HTX 1
DCE2_IDX_DROP_SSE0_SHAPER 0x64 SSE engine 0 session shaper packet drop
DCE2_IDX_DROP_SSE0_SESSION 0x65 SSE engine 0 session action dictated drop
DCE2_IDX_DROP_SSE0_TTL 0x66 SSE engine 0 IPv4 TTL or IPv6 Hop Limit check failure drop
DCE2_IDX_DROP_SSE0_MTU 0x67 SSE engine 0 MTU check failure drop
DCE2_IDX_DROP_SSE0_PROXY 0x68 SSE engine 0 SYN proxy temporary session triggered drop due to TCP SEQ mismatch
or the packet not being TCP ACK only
DCE2_IDX_DROP_SSE0_MCAST 0x69 SSE engine 0 forwarded packet count by multicast session
(Note: this is NOT a drop counter and for debugging only)
DCE2_IDX_DROP_SSE1_SHAPER 0x6a SSE engine 1 session shaper packet drop
DCE2_IDX_DROP_SSE1_SESSION 0x6b SSE engine 1 session action dictated drop
DCE2_IDX_DROP_SSE1_TTL 0x6c SSE engine 1 IPv4 TTL or IPv6 Hop Limit check failure drop
DCE2_IDX_DROP_SSE1_MTU 0x6d SSE engine 1 MTU check failure drop
DCE2_IDX_DROP_SSE1_PROXY 0x6e SSE engine 1 SYN proxy temporary session triggered drop due to TCP SEQ mismatch
or the packet not being TCP ACK only
DCE2_IDX_DROP_SSE1_MCAST 0x6f SSE engine 1 forwarded packet count by multicast session
(Note: this is NOT a drop counter and for debugging only)
DCE2_IDX_DROP_CWI_HDRCHK 0x70 CAPWAP inbound engine packet header check failure drop
(CAPWAP header and internal 802.3 header)
DCE2_IDX_DROP_CWI_RMACMIS 0x71 CAPWAP inbound engine packet inner frame radio MAC lookup failure drop
DCE2_IDX_DROP_CWI_SMACMIS 0x72 CAPWAP inbound engine packet inner frame source MAC lookup failure drop
DCE2_IDX_DROP_CWI_DMACMIS 0x73 CAPWAP inbound engine packet inner frame destination MAC lookup failure drop
DCE2_IDX_DROP_CWI_DMAC 0x74 CAPWAP inbound engine packet inner frame unicast destination MAC configured drop
DCE2_IDX_DROP_CWI_BDMIS 0x75 CAPWAP inbound engine packet inner frame broadcast domain lookup failure drop
DCE2_IDX_DROP_CWI_BC 0x76 CAPWAP inbound engine packet inner frame broadcast destination MAC configured drop
(per broadcast domain)
DCE2_IDX_DROP_CWI_MC 0x77 CAPWAP inbound engine packet inner frame multicast destination MAC configured drop
(per broadcast domain)
DCE2_IDX_DROP_CWI_ETHER 0x78 CAPWAP inbound engine packet inner frame Ethertyp configured drop
DCE2_IDX_DROP_CWO_DMACMIS 0x79 CAPWAP outbound engine inner frame destination MAC lookup failure drop

DCE2_IDX_DROP_CWO_DMAC 0x7a CAPWAP outbound engine inner frame destination MAC configured drop
DCE2_IDX_DROP_CWO_BDMIS 0x7b CAPWAP outbound engine packet inner frame broadcast domain lookup failure drop

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 146 ­ www.fortinet.com
DCE2_IDX_DROP_IPTO_ODF 0x7c IP tunnel outbound engine packet drop due to outer IPv4 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPTO_IDF 0x7d IP tunnel outbound engine packet drop due to inner IPv4 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPTO_IV6 0x7e IP tunnel outbound engine packet drop due to inner IPv6 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPSEC0_IQUEUE 0x7f IPSec engine 0 packet drop due to invalid SA, invalid crypto suite, invalid padding/length,
or insufficient tunnel traffic quota
DCE2_IDX_DROP_IPSEC0_ENGINB 0x80 IPSec engine 0 packet drop due to authentication failure
DCE2_IDX_DROP_IPSEC1_IQUEUE 0x88 IPSec engine 1 packet drop due to invalid SA, invalid crypto suite, invalid padding/length,
or insufficient tunnel traffic quota
DCE2_IDX_DROP_IPSEC1_ENGINB 0x89 IPSec engine 1 packet drop due to authentication failure
DCE2_IDX_DROP_CWO_TUNINV 0x92 CAPWAP outbound packet drop due to invalid tunnel
DCE2_IDX_DROP_IPTO_TUNINV 0x93 IP tunnel outbound packet drop due to invalid tunnel
DCE2_IDX_DROP_TPE_SHAPER 0x94 Traffic policy engine policy based traffic shaping triggered packet drop
DCE2_IDX_DROP_TPE_PRTSHP 0x95 Traffic policy engine port based traffic shaping triggered packet drop
DCE2_IDX_DROP_TPE_HPE 0x96 Host protection engine triggered packet drop (due to host protection policies)

Following lists the Packet descriptor queue (PDQ) full drop. PDQ is used to move packet around different functional blocks inside NP6.
Each PDQ drop counter includes a source module name and a target module name. A PDQ full drop generally means the target module
cannot process packet fast enough or get stuck due to abnormal conditions.

DCE2_IDX_DROP_PDQ_ISW_SSE0 0x97
DCE2_IDX_DROP_PDQ_ISW_SSE1 0x98
DCE2_IDX_DROP_PDQ_SSE0_SSE0 0x99
DCE2_IDX_DROP_PDQ_SSE0_SSE1 0x9a
DCE2_IDX_DROP_PDQ_SSE1_SSE0 0x9b
DCE2_IDX_DROP_PDQ_SSE1_SSE1 0x9c
DCE2_IDX_DROP_PDQ_ISW_FDB 0x9d
DCE2_IDX_DROP_PDQ_IPSEC0I_XHP 0x9e
DCE2_IDX_DROP_PDQ_IPSEC1I_XHP 0x9f
DCE2_IDX_DROP_PDQ_OSW_EHP0 0xa1
DCE2_IDX_DROP_PDQ_OSW_EHP1 0xa2

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 147 ­ www.fortinet.com
DCE2_IDX_DROP_PDQ_OSW_EHP2 0xa3
DCE2_IDX_DROP_PDQ_OSW_EHP3 0xa4
DCE2_IDX_DROP_PDQ_OSW_IPSEC0I 0xa5
DCE2_IDX_DROP_PDQ_OSW_IPSEC0O 0xa6
DCE2_IDX_DROP_PDQ_OSW_IPSEC1I 0xa7

APSTYPE_CWI0 ~ APSTYPE_CWI31 0xc0 ~ 0xdf Per type packet anomaly drop in CAPWAP inbound engine
ASPTYPE_IPTI0 ~ APSTYPE_IPTI31 0xe0 ~ 0xff Per type packet anomaly drop in IP tunnel inbound engine

Comment : For HPE (Host Protection Engine), see also HPE protection

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 148 ­ www.fortinet.com
FG1K5D­3 # diagnose npu np6 dce­all 0 IPTO_IV6 :0000000000000000 [7e] IPSEC0_IQUEUE :0000000000000000 [7f]
MACFIL_BASE0 :0000000000000009 [00] MACFIL_BASE1 :0000000000000000 [01] IPSEC0_ENGINB0 :0000000000000000 [80] IPSEC0_ENGINB1 :0000000000000000 [81]
MACFIL_BASE2 :0000000000000000 [02] MACFIL_BASE3 :0000000000000000 [03] IPSEC0_ENGINB2 :0000000000000000 [82] IPSEC0_ENGINB3 :0000000000000000 [83]
MACFIL_BASE4 :0000000000000000 [04] MACFIL_BASE5 :0000000000000000 [05] IPSEC0_ENGINB4 :0000000000000000 [84] IPSEC0_ENGINB5 :0000000000000000 [85]
MACFIL_BASE6 :0000000000000000 [06] MACFIL_BASE7 :0000000000000000 [07] IPSEC0_ENGINB6 :0000000000000000 [86] IPSEC0_ENGINB7 :0000000000000000 [87]
MACFIL_BASE8 :0000000000000000 [08] MACFIL_BASE9 :0000000000000000 [09] IPSEC1_IQUEUE :0000000000000000 [88] IPSEC1_ENGINB0 :0000000000000000 [89]
MACFIL_BASE10 :0000000000000000 [0a] MACFIL_BASE11 :0000000000000000 [0b] IPSEC1_ENGINB1 :0000000000000000 [8a] IPSEC1_ENGINB2 :0000000000000000 [8b]
TBD :0000000000000000 [0c] TBD :0000000000000000 [0d] IPSEC1_ENGINB3 :0000000000000000 [8c] IPSEC1_ENGINB4 :0000000000000000 [8d]
TBD :0000000000000000 [0e] TBD :0000000000000000 [0f] IPSEC1_ENGINB5 :0000000000000000 [8e] IPSEC1_ENGINB6 :0000000000000000 [8f]
ISW_L2ACT_TPRT0 :0000000000000000 [10] ISW_L2ACT_TPRT1 :0000000000000000 [11] IPSEC1_ENGINB7 :0000000000000000 [90] TBD_91 :0000000000000000 [91]
ISW_L2ACT_TPRT2 :0000000000000000 [12] ISW_L2ACT_TPRT3 :0000000000000000 [13] TBD_92 :0000000000000000 [92] TBD_93 :0000000000000000 [93]
ISW_L2ACT_TPRT4 :0000000000000000 [14] ISW_L2ACT_TPRT5 :0000000000000000 [15] TPE_SHAPER :0000000000000000 [94] TPE_PRTSHP :0000000000000000 [95]
ISW_L2ACT_TPRT6 :0000000000000000 [16] ISW_L2ACT_TPRT7 :0000000000000000 [17] TPE_HPE :0000000000000000 [96] PDQ_ISW_SSE0 :0000000000000000 [97]
ISW_L2ACT_TPRT8 :0000000000000000 [18] ISW_L2ACT_TPRT9 :0000000000000000 [19] PDQ_ISW_SSE1 :0000000000000000 [98] PDQ_SSE0_SSE0 :0000000000000000 [99]
ISW_L2ACT_TPRT10:0000000000000000 [1a] ISW_L2ACT_TPRT11:0000000000000000 [1b] PDQ_SSE0_SSE1 :0000000000000000 [9a] PDQ_SSE1_SSE0 :0000000000000000 [9b]
TBD :0000000000000000 [1c] TBD :0000000000000000 [1d] PDQ_SSE1_SSE1 :0000000000000000 [9c] PDQ_ISW_FDB :0000000000000000 [9d]
TBD :0000000000000000 [1e] TBD :0000000000000000 [1f] PDQ_IPSEC0I_XHP :0000000000000000 [9e] PDQ_IPSEC1I_XHP :0000000000000000 [9f]
ISW_L2ACT_ETHR0 :0000000000000000 [20] ISW_L2ACT_ETHR1 :0000000000000000 [21] TBD_A0 :0000000000000000 [a0] PDQ_OSW_EHP0 :0000000000000000 [a1]
ISW_L2ACT_ETHR2 :0000000000000000 [22] ISW_L2ACT_ETHR3 :0000000000000000 [23] PDQ_OSW_EHP1 :0000000000000000 [a2] PDQ_OSW_EHP2 :0000000000000000 [a3]
ISW_L2ACT_ETHR4 :0000000000000000 [24] ISW_L2ACT_ETHR5 :0000000000000000 [25] PDQ_OSW_EHP3 :0000000000000000 [a4] PDQ_OSW_IPSEC0I :0000000000000000 [a5]
ISW_L2ACT_ETHR6 :0000000000000000 [26] ISW_L2ACT_ETHR7 :0000000000000000 [27] PDQ_OSW_IPSEC0O :0000000000000000 [a6] PDQ_OSW_IPSEC1I :0000000000000000 [a7]
ISW_L2ACT_ETHR8 :0000000000000000 [28] ISW_L2ACT_ETHR9 :0000000000000000 [29] PDQ_OSW_IPSEC1O :0000000000000000 [a8] PDQ_OSW_CWI :0000000000000000 [a9]
ISW_L2ACT_ETHR10:0000000000000000 [2a] ISW_L2ACT_ETHR11:0000000000000000 [2b] PDQ_OSW_CWO :0000000000000000 [aa] PDQ_OSW_IPTI :0000000000000000 [ab]
TBD :0000000000000000 [2c] TBD :0000000000000000 [2d] PDQ_OSW_IPTO :0000000000000000 [ac] PDQ_OSW_SYN :0000000000000000 [ad]
TBD :0000000000000000 [2e] TBD :0000000000000000 [2f] PDQ_OSW_HRX0 :0000000000000000 [ae] PDQ_OSW_HRX1 :0000000000000000 [af]
ISW_L2ACT_SVIF0 :0000000000000000 [30] ISW_L2ACT_SVIF1 :0000000000000000 [31] PDQ_IHP0_ISW :0000000000000000 [b0] PDQ_IHP1_ISW :0000000000000000 [b1]
ISW_L2ACT_SVIF2 :0000000000000000 [32] ISW_L2ACT_SVIF3 :0000000000000000 [33] PDQ_IHP2_ISW :0000000000000000 [b2] PDQ_IHP3_ISW :0000000000000000 [b3]
ISW_L2ACT_SVIF4 :0000000000000000 [34] ISW_L2ACT_SVIF5 :0000000000000000 [35] PDQ_XHP0_ISW :0000000000000000 [b4] PDQ_XHP1_ISW :0000000000000000 [b5]
ISW_L2ACT_SVIF6 :0000000000000000 [36] ISW_L2ACT_SVIF7 :0000000000000000 [37] PDQ_IPSEC0O_ISW :0000000000000000 [b6] PDQ_IPSEC1O_ISW :0000000000000000 [b7]
ISW_L2ACT_SVIF8 :0000000000000000 [38] ISW_L2ACT_SVIF9 :0000000000000000 [39] PDQ_CWI_ISW :0000000000000000 [b8] PDQ_CWO_ISW :0000000000000000 [b9]
ISW_L2ACT_SVIF10:0000000000000000 [3a] ISW_L2ACT_SVIF11:0000000000000000 [3b] PDQ_IPTI_ISW :0000000000000000 [ba] PDQ_IPTO_ISW :0000000000000000 [bb]
TBD :0000000000000000 [3c] TBD :0000000000000000 [3d] PDQ_SYN_ISW :0000000000000000 [bc] PDQ_OSW_ISW :0000000000000000 [bd]
TBD :0000000000000000 [3e] TBD :0000000000000000 [3f] PDQ_HTX0_ISW :0000000000000000 [be] PDQ_HTX1_ISW :0000000000000000 [bf]
ISW_L2ACT_SPRT0 :0000000000000000 [40] ISW_L2ACT_SPRT1 :0000000000000000 [41] APSTYPE_CWI0 :0000000000000000 [c0] APSTYPE_CWI1 :0000000000000000 [c1]
ISW_L2ACT_SPRT2 :0000000000000000 [42] ISW_L2ACT_SPRT3 :0000000000000000 [43] APSTYPE_CWI2 :0000000000000000 [c2] APSTYPE_CWI3 :0000000000000000 [c3]
ISW_L2ACT_SPRT4 :0000000000000000 [44] ISW_L2ACT_SPRT5 :0000000000000000 [45] APSTYPE_CWI4 :0000000000000000 [c4] APSTYPE_CWI5 :0000000000000000 [c5]
ISW_L2ACT_SPRT6 :0000000000000000 [46] ISW_L2ACT_SPRT7 :0000000000000000 [47] APSTYPE_CWI6 :0000000000000000 [c6] APSTYPE_CWI7 :0000000000000000 [c7]
ISW_L2ACT_SPRT8 :0000000000000000 [48] ISW_L2ACT_SPRT9 :0000000000000000 [49] APSTYPE_CWI8 :0000000000000000 [c8] APSTYPE_CWI9 :0000000000000000 [c9]
ISW_L2ACT_SPRT10:0000000000000000 [4a] ISW_L2ACT_SPRT11:0000000000000000 [4b] APSTYPE_CWI10 :0000000000000000 [ca] APSTYPE_CWI11 :0000000000000000 [cb]
ISW_L2ACT_SPRT12:0000000000000000 [4c] ISW_L2ACT_SPRT13:0000000000000000 [4d] APSTYPE_CWI12 :0000000000000000 [cc] APSTYPE_CWI13 :0000000000000000 [cd]
ISW_L2ACT_SPRT14:0000000000000000 [4e] ISW_L2ACT_SPRT15:0000000000000000 [4f] APSTYPE_CWI14 :0000000000000000 [ce] APSTYPE_CWI15 :0000000000000000 [cf]
APS_IHP0 :0000000000000000 [50] APS_IHP1 :0000000000000000 [51] APSTYPE_CWI16 :0000000000000000 [d0] APSTYPE_CWI17 :0000000000000000 [d1]
APS_IHP2 :0000000000000000 [52] APS_IHP3 :0000000000000000 [53] APSTYPE_CWI18 :0000000000000000 [d2] APSTYPE_CWI19 :0000000000000000 [d3]
APS_XHP0 :0000000000000000 [54] APS_XHP1 :0000000000000000 [55] APSTYPE_CWI20 :0000000000000000 [d4] APSTYPE_CWI21 :0000000000000000 [d5]
APS_CWI :0000000000000000 [56] APS_IPTI :0000000000000000 [57] APSTYPE_CWI22 :0000000000000000 [d6] APSTYPE_CWI23 :0000000000000000 [d7]
APS_HTX0 :0000000000000000 [58] APS_HTX1 :0000000000000000 [59] APSTYPE_CWI24 :0000000000000000 [d8] APSTYPE_CWI25 :0000000000000000 [d9]
IHP0_PKTCHK :0000000000000000 [5a] IHP1_PKTCHK :0000000000000000 [5b] APSTYPE_CWI26 :0000000000000000 [da] APSTYPE_CWI27 :0000000000000000 [db]
IHP2_PKTCHK :0000000000000000 [5c] IHP3_PKTCHK :0000000000000000 [5d] APSTYPE_CWI28 :0000000000000000 [dc] APSTYPE_CWI29 :0000000000000000 [dd]
XHP0_PKTCHK :0000000000000000 [5e] XHP1_PKTCHK :0000000000000000 [5f] APSTYPE_CWI30 :0000000000000000 [de] APSTYPE_CWI31 :0000000000000000 [df]
CWI_PKTCHK :0000000000000000 [60] IPTI_PKTCHK :0000000000000000 [61] APSTYPE_IPTI0 :0000000000000000 [e0] APSTYPE_IPTI1 :0000000000000000 [e1]
HTX0_PKTCHK :0000000000000000 [62] HTX1_PKTCHK :0000000000000000 [63] APSTYPE_IPTI2 :0000000000000000 [e2] APSTYPE_IPTI3 :0000000000000000 [e3]
SSE0_SHAPER :0000000000000000 [64] SSE0_SESSION :0000000000000000 [65] APSTYPE_IPTI4 :0000000000000000 [e4] APSTYPE_IPTI5 :0000000000000000 [e5]
SSE0_TTLEQ0 :0000000000000000 [66] SSE0_TTLEQ1 :0000000000000000 [67] APSTYPE_IPTI6 :0000000000000000 [e6] APSTYPE_IPTI7 :0000000000000000 [e7]
SSE0_MCAST :0000000000000000 [68] SSE1_SHAPER :0000000000000000 [69] APSTYPE_IPTI8 :0000000000000000 [e8] APSTYPE_IPTI9 :0000000000000000 [e9]
SSE1_SESSION :0000000000000000 [6a] SSE1_TTLEQ0 :0000000000000000 [6b] APSTYPE_IPTI10 :0000000000000000 [ea] APSTYPE_IPTI11 :0000000000000000 [eb]
SSE1_TTLEQ1 :0000000000000000 [6c] SSE1_MCAST :0000000000000000 [6d] APSTYPE_IPTI12 :0000000000000000 [ec] APSTYPE_IPTI13 :0000000000000000 [ed]
TBD :0000000000000000 [6e] TBD :0000000000000000 [6f] APSTYPE_IPTI14 :0000000000000000 [ee] APSTYPE_IPTI15 :0000000000000000 [ef]

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 149 ­ www.fortinet.com
CWI_HDRCHK :0000000000000000 [70] CWI_RMACMIS :0000000000000000 [71] APSTYPE_IPTI16 :0000000000000000 [f0] APSTYPE_IPTI17 :0000000000000000 [f1]
CWI_SMACMIS :0000000000000000 [72] CWI_DMACMIS :0000000000000000 [73] APSTYPE_IPTI18 :0000000000000000 [f2] APSTYPE_IPTI19 :0000000000000000 [f3]
CWI_DMAC :0000000000000000 [74] CWI_BDMIS :0000000000000000 [75] APSTYPE_IPTI20 :0000000000000000 [f4] APSTYPE_IPTI21 :0000000000000000 [f5]
CWI_BC :0000000000000000 [76] CWI_MC :0000000000000000 [77] APSTYPE_IPTI22 :0000000000000000 [f6] APSTYPE_IPTI23 :0000000000000000 [f7]
CWI_ETHER :0000000000000000 [78] CWO_DMACMIS :0000000000000000 [79] APSTYPE_IPTI24 :0000000000000000 [f8] APSTYPE_IPTI25 :0000000000000000 [f9]
CWO_DMAC :0000000000000000 [7a] CWO_BDMIS :0000000000000000 [7b] APSTYPE_IPTI26 :0000000000000000 [fa] APSTYPE_IPTI27 :0000000000000000 [fb]
IPTO_ODF :0000000000000000 [7c] IPTO_IDF :0000000000000000 [7d] APSTYPE_IPTI28 :0000000000000000 [fc] APSTYPE_IPTI29 :0000000000000000 [fd]
APSTYPE_IPTI30 :0000000000000000 [fe] APSTYPE_IPTI31 :0000000000000000 [ff]

● diag npu np6 anomaly­drop (anomaly­drop­all) <npu_id>

This command provide counters related to well­known anomaly (see fp­anomaly) from different inputs of the NP6 : IHP0, IHP1, IHP2, IHP3

FG1K5D­3 # diagnose npu np6 anomaly­drop­all 0


IHP0:
IPV4_LAND :0000000000000000 [00] IPV4_PROTO_ERR :0000000000000000 [01]
IPV4_UNKNOPT :0000000000000000 [02] IPV4_OPTRR :0000000000000000 [03]
IPV4_OPTSSRR :0000000000000000 [04] IPV4_OPTLSRR :0000000000000000 [05]
IPV4_OPTSTREAM :0000000000000000 [06] IPV4_OPTSECURITY :0000000000000000 [07]
IPV4_OPTTIMESTAMP :0000000000000000 [08] IPV6_LAND :0000000000000000 [09]
IPV6_PROTO_ERR :0000000000000000 [0a] IPV6_UNKNOPT :0000000000000000 [0b]
IPV6_SADDR_ERR :0000000000000000 [0c] IPV6_DADDR_ERR :0000000000000000 [0d]
IPV6_OPTRALERT :0000000000000000 [0e] IPV6_OPTJUMBO :0000000000000000 [0f]
IPV6_OPTTUNNEL :0000000000000000 [10] IPV6_OPTHOMEADDR :0000000000000000 [11]
IPV6_OPTNSAP :0000000000000000 [12] IPV6_OPTENDPID :0000000000000000 [13]
IPV6_OPTINVLD :0000000000000000 [14] TCP_SYN_FIN :0000000000000000 [15]
TCP_FIN_NOACK :0000000000000000 [16] TCP_FIN_ONLY :0000000000000000 [17]
TCP_NO_FLAG :0000000000000000 [18] TCP_SYN_DATA :0000000000000000 [19]
TCP_WINNUKE :0000000000000000 [1a] TCP_LAND :0000000000000000 [1b]
UDP_LAND :0000000000000000 [1c] ICMP_LAND :0000000000000000 [1d]
ICMP_FRAG :0000000000000000 [1e] APS_CHK :0000000000000000 [1f]
IHP1:
IPV4_LAND :0000000000000000 [20] IPV4_PROTO_ERR :0000000000000000 [21]
IPV4_UNKNOPT :0000000000000000 [22] IPV4_OPTRR :0000000000000000 [23]
IPV4_OPTSSRR :0000000000000000 [24] IPV4_OPTLSRR :0000000000000000 [25]
IPV4_OPTSTREAM :0000000000000000 [26] IPV4_OPTSECURITY :0000000000000000 [27]
../..
IHP2:
IPV4_LAND :0000000000000000 [40] IPV4_PROTO_ERR :0000000000000000 [41]
IPV4_UNKNOPT :0000000000000000 [42] IPV4_OPTRR :0000000000000000 [43]
../..
IHP3:
IPV4_LAND :0000000000000000 [60] IPV4_PROTO_ERR :0000000000000000 [61]
IPV4_UNKNOPT :0000000000000000 [62] IPV4_OPTRR :0000000000000000 [63]
../..

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 150 ­ www.fortinet.com
XHP0:
IPV4_LAND :0000000000000000 [80] IPV4_PROTO_ERR :0000000000000000 [81]
IPV4_UNKNOPT :0000000000000000 [82] IPV4_OPTRR :0000000000000000 [83]
../..
XHP1:
IPV4_LAND :0000000000000000 [a0] IPV4_PROTO_ERR :0000000000000000 [a1]
IPV4_UNKNOPT :0000000000000000 [a2] IPV4_OPTRR :0000000000000000 [a3]
../..
HTX0:
IPV4_LAND :0000000000000000 [c0] IPV4_PROTO_ERR :0000000000000000 [c1]
IPV4_UNKNOPT :0000000000000000 [c2] IPV4_OPTRR :0000000000000000 [c3]
../..
HTX1:
IPV4_LAND :0000000000000000 [e0] IPV4_PROTO_ERR :0000000000000000 [e1]
IPV4_UNKNOPT :0000000000000000 [e2] IPV4_OPTRR :0000000000000000 [e3]
../..

● diag npu np6 hrx­drop (hrx­drop­all) <npu_id>

Provides packet drop counters for each host sub­interfaces for RX and TX sides. Each sub interface has a dedicated purpose, for instance
N­Turbo, but counters are given by an index for each direction. Index are from 0 to 127
Sample :

FG1K5D­3 # diagnose npu np6 hrx­drop­all 0


VHIF_TX0_DROP :0000000000000000
VHIF_TX1_DROP :0000000000000000 VHIF_TX2_DROP :0000000000000000
VHIF_TX3_DROP :0000000000000000 VHIF_TX4_DROP :0000000000000000
VHIF_TX5_DROP :0000000000000000 VHIF_TX6_DROP :0000000000000000
VHIF_TX7_DROP :0000000000000000 VHIF_TX8_DROP :0000000000000000
VHIF_TX9_DROP :0000000000000000 VHIF_TX10_DROP :0000000000000000
VHIF_TX11_DROP :0000000000000000 VHIF_TX12_DROP :0000000000000000
VHIF_TX13_DROP :0000000000000000 VHIF_TX14_DROP :0000000000000000
../.. ../..
VHIF_RX123_DROP :0000000000000000 VHIF_RX124_DROP :0000000000000000
VHIF_RX125_DROP :0000000000000000 VHIF_RX126_DROP :0000000000000000
VHIF_RX127_DROP :0000000000000000

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 151 ­ www.fortinet.com
● diag npu np6 session­stats (session­stats­clear) <npu_id>

Provides statistics on number of sessions installed and delete by NP for both ipv4 and ipv6 including v4 to v6 and v6 to v4 sessions.
These correspond to special ‘session_push’ and ‘session_delete’ special packet to program the SSE (see ipv4 unicast session acceleration)
Each insert or delete order may comes from a different channel (qid) of the Host interface where each channel is associated with a different
interrupt. The more the numbers are balanced on the ‘qid’, the best is the CPU core system load distribution.
The total for each channel is provided in the last line. The difference between ‘insert’ and ‘delete’ should provide the current number of installed
session on the NP
FGT1500D (global) # diagnose npu np6 session­stats 0
qid ins44 ins46 del4 ins64 ins66 del6
ins44_e ins46_e del4_e ins64_e ins66_e del6_e
­­­­­­­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­
0 1164536088 0 1164500209 0 0 0
0 0 0 0 0 0
1 1168139559 0 1168103576 0 0 0
0 0 0 0 0 0
2 1165064519 0 1165028582 0 0 0
0 0 0 0 0 0
3 1168117367 0 1168081430 0 0 0
0 0 0 0 0 0
4 1112353090 0 1112318764 0 0 0
0 0 0 0 0 0
5 1114545834 0 1114511481 0 0 0
0 0 0 0 0 0
6 1112961759 0 1112927472 0 0 0
0 0 0 0 0 0
7 1115203965 0 1115169714 0 0 0
0 0 0 0 0 0
8 1113204955 0 1113170696 0 0 0
0 0 0 0 0 0
9 1115123725 0 1115089425 0 0 0
0 0 0 0 0 0
10 1112352398 0 1112318231 0 0 0
0 0 0 0 0 0
11 1114861876 0 1114827601 0 0 0
0 0 0 0 0 0
­­­­­­­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­
Total 691563247 0 691145293 0 0 0
0 0 0 0 0 0
­­­­­­­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­ ­­­­­­­­­­
Question : What means “_e” ?Error ?
Answer : no clue yet, but I have always seen value ‘0’ so far :­) Guess: ephemeral ? Pointers are welcome.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 152 ­ www.fortinet.com
The counter could be reset with command ‘diag npu np6 session­stat­clear <npu_id>

● diag npu np6 sse­stats (sse­stats­clear)

This command provides details of session for reach SSE and the total :
­ Active : number of current sessions installed
­ insert­total / insert­success should be the same unless some failed. This is the number of insertion order received
­ delete­total / delete­success : same for session deleted.
­ purge­total/purge­success : refers to purge request that can be enabled on the NP (never tried, not sure of the usage)
­ search total : number of session lookup done following the reception of a packet in the SSE
­ search­hit : How many of those packet had a match in the session table (no confirmed)
­ pht­size : size of the primary table in memory (see ipv4 unicast session acceleration)
­ oft­size : size of the overflow table
­ oft­free : free memory within the overflow table
­ PBA : Number of packet slots left in the Packet Buffer Allocator. 3001 is the nominal value, it may get under temporarily but should
always get back to 3001 when load drops, if not there may be a PBA leak taking place.

FGT1500D (global) # diagnose npu np6 sse­stats 0


Counters SSE0 SSE1 Total
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
active 208268 208008 416276
insert­total 2494578573 2495255694 694866971
insert­success 2494578573 2495255694 694866971
delete­total 2494370305 2495047686 694450695
delete­success 2494370305 2495047686 694450695
purge­total 0 0 0
purge­success 0 0 0
search­total 1390629840 1393363176 2783993016
search­hit 1385494584 1384677652 2770172236
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
pht­size 8421376 8421376
oft­size 8355840 8355840
oftfree 8353045 8353068
PBA 3001

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 153 ­ www.fortinet.com
● diag npu np6 pdq <np6_id>

Provides details on packet descriptor queues.


Unknown reading of this information for now. Pointers welcome.

FGT1500D (global) # diagnose npu np6 pdq 0


IPSEC ipqd0 opqd0 ipqd1 opqd1
IHP­>ISW ihp0 ihp1 ihp2 ihp3 wpcnt 00000000 00000000 00000000 00000000
wpcnt 98798bd5 1e7a598a 003f0a9c 0082aaee rpcnt 00000000 00000000 00000000 00000000
rpcnt 98798bd5 1e7a598a 003f0a9c 0082aaee XHP XHP0 XHP1
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ wpcnt 00000000 00000000
ISW pipe0 pipe1 rpcnt 00000000 00000000
wpcnt 6c9b6ea9 6ca807fb IPSEC­>ISW qxhp0 qxhp1 qipsec0o qipsec1o
rpcnt 6c9b6ea9 6ca807fb wpcnt 00000000 00000000 00000000 00000000
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ rpcnt 00000000 00000000 00000000 00000000
SSE0 qisw0 qisw1 qlblc qlbex ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­
wpcnt 29e8524e 29e3d850 00000000 00000000 IPT ipti­ipdq ipto­ipdq ipto­ppdq
rpcnt 29e8524e 29e3d850 00000000 00000000 wpcnt 00000000 00000000 00000000
SSE1 qisw0 qisw1 qlblc qlbex rpcnt 00000000 00000000 00000000
wpcnt 29fbca20 29fa4502 00000000 00000000 IPT­>ISW qipti qipdqo
rpcnt 29fbca20 29fa4502 00000000 00000000 wpcnt 00000000 00000000
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ rpcnt 00000000 00000000
FDB qisw0 qisw1 qppdq ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­
wpcnt 18b75234 18c9eaa8 31813cdc CAPWAP cwi­ipdq cwo­opdq cwo­ppdq
rpcnt 18b75234 18c9eaa8 31813cdc wpcnt 00000000 00000000 00000000
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ rpcnt 00000000 00000000 00000000
OSW qsse0 qsse1 qfdb CAPWAP­>ISW qcwi qcwo
wpcnt 53cc2a9f 53f60f23 31813cdc wpcnt 00000000 00000000
rpcnt 53cc2a9f 53f60f23 31813cdc rpcnt 00000000 00000000
OSW­>EHP ehp0 ehp1 ehp2 ehp3 ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­
wpcnt bfa4524f 611dad55 607c5307 6618a83c SYNP sse­>synp synp­>isw
rpcnt bfa4524f 611dad55 607c5307 6618a83c wpcnt 00000000 00000000
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ rpcnt 00000000 00000000
HRX tnnpdq0 tunpdq1 ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­
wpcnt eb89c402 0662b7b6 LB­>ISW qlpbk
rpcnt eb89c402 0662b7b6 wpcnt 00000000
HTX­>ISW qhtx0 qhtx1 rpcnt 00000000
wpcnt b0e1f78b 70abe430
rpcnt b0e1f78b 70abe430
­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­ ­­­­­­­­­­­

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 154 ­ www.fortinet.com
● diag npu np6 xgmac­stats (xgmac­stats­clear) <npu_id>

Provides packet statistics on the 4x NP6 XAUI (‘x’ for 10G)


In this context RX is for packets received by the XAUI and TX packet sent out of the XAUI
There are dedicated counters for broadcast, Multicast, unicast, Pauseframes, undersized packets, oversized packets, fragments.. as well as
number of packets per size range. This counter is useful to see if the use of the XAUIs is balanced. This is not the case in the below example
where 2 XAUIs out of 4 are used.
FG1K5D­3 # diagnose npu np6 xgmac­stats 0
Counters XE0 XE1 XE2 XE3
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
RX_BCAST 56085168 47527323 0 0
RX_MCAST 3925336 4552222 0 0
RX_UCAST 93790 946878 0 0
RX_PAUSEFRM 0 0 0 0
RX_UNDERSIZE 0 0 0 0
RX_OVERSIZEP 0 0 0 0
RX_FRAG 0 0 0 0
RX_JAB 0 0 0 0
RX_FCS 0 0 0 0
RX_WFULL 0 0 0 0
RX_GOODOCTET 21888074658 5366623121 0 0
RX_OCTET 21888074658 5366623121 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
TX_BCAST 18 57 0 0
TX_MCAST 0 3 0 0
TX_UCAST 40145 927415 0 0
TX_COL 0 0 0 0
TX_LATECOL 0 0 0 0
TX_EXCESSCOL 0 0 0 0
TX_UNDERRUN 0 0 0 0
TX_XPX_QFULL 0 0 0 0
TX_GOODOCTET 30302246 91469871 0 0
TX_OCTET 30302246 91469871 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
PKT1024TOMAX 12584825 78346 0 0
PKT512TO1023 435998 664892 0 0
PKT256TO511 275194 194426 0 0
PKT128TO255 26585919 1617105 0 0
PKT65TO127 20262521 51399129 0 0
PKT64 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
More details about LAG in Expert Academy 2016

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 155 ­ www.fortinet.com
● diag npu np6 gmac­stats (gmac­stats­clear)

Similar command than the xgmac­stats but this time related to 1G ports when using the mix 1Gx10G NP6 form factor chip like with a
FortiGate­500D for instance. Ports are numbered from port1 to port16.

FGT (global) # diag npu np6 gmac­stats 0


Counters port1|GIGE14 port2|GIGE15 port3|GIGE12 port4|GIGE13 Counters port9|GIGE1 port10|GIGE0 port11|GIGE3 port12|GIGE2
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
RX_BCAST 358 3168 0 0 RX_BCAST 0 0 0 0
RX_MCAST 789 685 0 0 RX_MCAST 0 0 0 0
RX_UCAST 28161468 28284200 0 0 RX_UCAST 0 0 0 0
RX_PAUSEFRM 0 0 0 0 RX_PAUSEFRM 0 0 0 0
RX_UNDERSIZE 0 0 0 0 RX_UNDERSIZE 0 0 0 0
RX_OVERSIZEP 0 0 0 0 RX_OVERSIZEP 0 0 0 0
RX_FRAG 0 0 0 0 RX_FRAG 0 0 0 0
RX_JAB 0 0 0 0 RX_JAB 0 0 0 0
RX_FCS 0 0 0 0 RX_FCS 0 0 0 0
RX_WFULL 0 0 0 0 RX_WFULL 0 0 0 0
RX_GOODOCTET 24163605975 24355384613 0 0 RX_GOODOCTET 0 0 0 0
RX_OCTET 24163605975 24355386203 0 0 RX_OCTET 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
TX_BCAST 7889 0 0 0 TX_BCAST 0 0 0 0
TX_MCAST 95 663 0 0 TX_MCAST 0 0 0 0
TX_UCAST 28208074 28237590 0 0 TX_UCAST 0 0 0 0
TX_COL 0 0 0 0 TX_COL 0 0 0 0
TX_LATECOL 0 0 0 0 TX_LATECOL 0 0 0 0
TX_EXCESSCOL 0 0 0 0 TX_EXCESSCOL 0 0 0 0
TX_UNDERRUN 0 0 0 0 TX_UNDERRUN 0 0 0 0
TX_XPX_QFULL 0 0 0 0 TX_XPX_QFULL 0 0 0 0
TX_GOODOCTET 24195728086 24240167820 0 0 TX_GOODOCTET 0 0 0 0
TX_OCTET 24195728086 24240167820 0 0 TX_OCTET 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
PKT1024TOMAX 30244767 30400495 0 0 PKT1024TOMAX 0 0 0 0
PKT512TO1023 0 0 0 0 PKT512TO1023 0 0 0 0
PKT256TO511 1776118 1776278 0 0 PKT256TO511 0 0 0 0
PKT128TO255 1776969 1776977 0 0 PKT128TO255 0 0 0 0
PKT65TO127 13063772 13060445 0 0 PKT65TO127 0 0 0 0
PKT64 9517048 9512115 0 0 PKT64 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­

Counters port5|GIGE8 port6|GIGE11 port7|GIGE9 port8|GIGE10 Counters port13|GIGE5 port14|GIGE4 port15|GIGE7 port16|GIGE6
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
RX_BCAST 0 0 0 0 RX_BCAST 0 0 0 0
RX_MCAST 0 0 0 0 RX_MCAST 0 0 0 0
RX_UCAST 0 0 0 0 RX_UCAST 0 0 0 0
RX_PAUSEFRM 0 0 0 0 RX_PAUSEFRM 0 0 0 0
RX_UNDERSIZE 0 0 0 0 RX_UNDERSIZE 0 0 0 0
RX_OVERSIZEP 0 0 0 0 RX_OVERSIZEP 0 0 0 0
RX_FRAG 0 0 0 0 RX_FRAG 0 0 0 0
RX_JAB 0 0 0 0 RX_JAB 0 0 0 0
RX_FCS 0 0 0 0 RX_FCS 0 0 0 0
RX_WFULL 0 0 0 0 RX_WFULL 0 0 0 0
RX_GOODOCTET 0 0 0 0 RX_GOODOCTET 0 0 0 0
RX_OCTET 0 0 0 0 RX_OCTET 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
TX_BCAST 0 0 0 0 TX_BCAST 0 0 0 0
TX_MCAST 0 0 0 0 TX_MCAST 0 0 0 0
TX_UCAST 0 0 0 0 TX_UCAST 0 0 0 0
TX_COL 0 0 0 0 TX_COL 0 0 0 0
TX_LATECOL 0 0 0 0 TX_LATECOL 0 0 0 0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 156 ­ www.fortinet.com
TX_EXCESSCOL 0 0 0 0 TX_EXCESSCOL 0 0 0 0
TX_UNDERRUN 0 0 0 0 TX_UNDERRUN 0 0 0 0
TX_XPX_QFULL 0 0 0 0 TX_XPX_QFULL 0 0 0 0
TX_GOODOCTET 0 0 0 0 TX_GOODOCTET 0 0 0 0
TX_OCTET 0 0 0 0 TX_OCTET 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­
PKT1024TOMAX 0 0 0 0 PKT1024TOMAX 0 0 0 0
PKT512TO1023 0 0 0 0 PKT512TO1023 0 0 0 0
PKT256TO511 0 0 0 0 PKT256TO511 0 0 0 0
PKT128TO255 0 0 0 0 PKT128TO255 0 0 0 0
PKT65TO127 0 0 0 0 PKT65TO127 0 0 0 0
PKT64 0 0 0 0 PKT64 0 0 0 0
­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­

● diag npu np6 gige­port­stats (gige­port­stats­clear) <port_name>

Similar information than gmac­stats but the portname is used as argument instead of the NP_id
This table is easier to read and parse for one port.

qb­hagsko­fwha03­haga (global) # diag npu np6 gige­port­stats port1


port1|GIGE14
­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­
RX_BCAST :0000000000000358 RX_MCAST :0000000000000789
RX_UCAST :0000000028161468 RX_PAUSEFRM :0000000000000000
RX_UNDERSIZE :0000000000000000 RX_OVERSIZEP :0000000000000000
RX_FRAG :0000000000000000 RX_JAB :0000000000000000
RX_FCS :0000000000000000 RX_WFULL :0000000000000000
RX_GOODOCTET :0000024163605975 RX_OCTET :0000024163605975
­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­
TX_BCAST :0000000000007889 TX_MCAST :0000000000000095
TX_UCAST :0000000028208074 TX_COL :0000000000000000
TX_LATECOL :0000000000000000 TX_EXCESSCOL :0000000000000000
TX_UNDERRUN :0000000000000000 TX_XPX_QFULL :0000000000000000
TX_GOODOCTET :0000024195728086 TX_OCTET :0000024195728086
­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­
PKT1024TOMAX :0000000030244767 PKT512TO1023 :0000000000000000
PKT256TO511 :0000000001776118 PKT128TO255 :0000000001776969
PKT65TO127 :0000000013063772 PKT64 :0000000009517048
­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­­­ ­­­­­­­­­­­­­­­­

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 157 ­ www.fortinet.com
● diag npu np6 port­list

Provides the mapping table between external port and their NP6 XAUI attachment. This side by side output from a FortiGate­1500D,
FortiGate­3700D and FortiGate­900D shows the different specificities of the port to XAUI associations.

FGT1500D # diagnose npu np6 port­list FGT3700D # diagnose npu np6 port­list FG900D # diagnose npu np6 port­list
Chip XAUI Ports Max Cross­chip Chip XAUI Ports Max Cross­chip Chip XAUI Ports Max Cross­chip
Speed offloading Speed offloading Speed offloading
­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­
np6_0 0 port1 1G Yes np6_0 0 port26 10G Yes np6_0 0
0 port5 1G Yes 1 port25 10G Yes 1 port17 1G Yes
0 port17 1G Yes 2 port28 10G Yes 1 port18 1G Yes
0 port21 1G Yes 3 port27 10G Yes 1 port19 1G Yes
0 port33 10G Yes 0­3 port1 40G Yes 1 port20 1G Yes
1 port2 1G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ 1 port21 1G Yes
1 port6 1G Yes np6_1 0 port30 10G Yes 1 port22 1G Yes
1 port18 1G Yes 1 port29 10G Yes 1 port23 1G Yes
1 port22 1G Yes 2 port32 10G Yes 1 port24 1G Yes
1 port34 10G Yes 3 port31 10G Yes 1 port27 1G Yes
2 port3 1G Yes 0­3 port3 40G Yes 1 port28 1G Yes
2 port7 1G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ 1 port25 1G Yes
2 port19 1G Yes np6_2 0 port5 10G Yes 1 port26 1G Yes
2 port23 1G Yes 0 port9 10G Yes 1 port31 1G Yes
2 port35 10G Yes 0 port13 10G Yes 1 port32 1G Yes
3 port4 1G Yes 1 port6 10G Yes 1 port29 1G Yes
3 port8 1G Yes 1 port10 10G Yes 1 port30 1G Yes
3 port20 1G Yes 1 port14 10G Yes 2 portB 10G Yes
3 port24 1G Yes 2 port7 10G Yes 3
3 port36 10G Yes 2 port11 10G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­
­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ 3 port8 10G Yes np6_1 0
np6_1 0 port9 1G Yes 3 port12 10G Yes 1 port1 1G Yes
0 port13 1G Yes 0­3 port2 40G Yes 1 port2 1G Yes
0 port25 1G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ 1 port3 1G Yes
0 port29 1G Yes np6_3 0 port15 10G Yes 1 port4 1G Yes
0 port37 10G Yes 0 port19 10G Yes 1 port5 1G Yes
1 port10 1G Yes 0 port23 10G Yes 1 port6 1G Yes
1 port14 1G Yes 1 port16 10G Yes 1 port7 1G Yes
1 port26 1G Yes 1 port20 10G Yes 1 port8 1G Yes
1 port30 1G Yes 1 port24 10G Yes 1 port11 1G Yes
1 port38 10G Yes 2 port17 10G Yes 1 port12 1G Yes
2 port11 1G Yes 2 port21 10G Yes 1 port9 1G Yes
2 port15 1G Yes 3 port18 10G Yes 1 port10 1G Yes
2 port27 1G Yes 3 port22 10G Yes 1 port15 1G Yes
2 port31 1G Yes 0­3 port4 40G Yes 1 port16 1G Yes

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 158 ­ www.fortinet.com
2 port39 10G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­ 1 port13 1G Yes
3 port12 1G Yes 1 port14 1G Yes
3 port16 1G Yes 2 portA 10G Yes
3 port28 1G Yes 3
3 port32 1G Yes ­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­
3 port40 10G Yes
­­­­­­ ­­­­ ­­­­­­­ ­­­­­ ­­­­­­­­­­

Comments:
● 40G ports are bundling 4x10G ports at the ISF
● Lots of units have possible oversubscription on NP6 XAUI port, example above :
○ FortiGate­1500D np6_0 XAUI 0 ⇒ 4x1G +1x10G = 14G < 10 G
○ FortiGate­3700D np6_2 XAUI 0 ⇒ 4x10G = 40G < 10G (x4 !)
○ FortiGate­900D np6_0 XAUI 1 ⇒ 16x1G = 16 G < 10 G

● diag npu np6 ipsec­stats (ipsec­stats­clear)

Details related to ipsec for all NPs. There are no precise description of the command output but some of them are quite obvious.

FG1K5D­3 # diagnose npu np6 ipsec­stats np6_1:


vif_start_oid 03ed vif_end_oid 03fc sa_install 00000000000 sa_ins_fail 00000000000
IPsec Virtual interface stats: sa_remove 00000000000 sa_del_fail 00000000000
vif_get 00000000051 vif_get_expired 00000000000 4to6_ses_ins 00000000000 4to6_ses_ins_fail 00000000000
vif_get_fail 00000000000 vif_get_invld 00000000000 4to6_ses_del 00000000000 4to6_ses_del_fail 00000000000
vif_set 00000000005 vif_set_fail 00000000000 spi_ses6_ins 00000000000 spi_ses6_ins_fail 00000000000
vif_clear 00000000005 vif_clear_fail 00000000000 spi_ses6_del 00000000000 spi_ses6_del_fail 00000000000
np6_0: spi_ses4_ins 00000000000 spi_ses4_ins_fail 00000000000
sa_install 00000000012 sa_ins_fail 00000000000 spi_ses4_del 00000000000 spi_ses4_del_fail 00000000000
sa_remove 00000000012 sa_del_fail 00000000000 sa_map_alloc_fail 00000000000 vif_alloc_fail 00000000000
4to6_ses_ins 00000000000 4to6_ses_ins_fail 00000000000 sa_ins_null_adapter 00000000000 sa_del_null_adapter 00000000000
4to6_ses_del 00000000000 4to6_ses_del_fail 00000000000 del_sa_mismatch 00000000000 ib_chk_null_adpt 00000000000
spi_ses6_ins 00000000000 spi_ses6_ins_fail 00000000000 ib_chk_null_sa 00000000000 ob_chk_null_adpt 00000000000
spi_ses6_del 00000000000 spi_ses6_del_fail 00000000000 ob_chk_null_sa 00000000000 rx_vif_miss 00000000000
spi_ses4_ins 00000000012 spi_ses4_ins_fail 00000000000 rx_sa_miss 00000000000 rx_mark_miss 00000000000
spi_ses4_del 00000000012 spi_ses4_del_fail 00000000000 waiting_ib_sa 00000000000 sa_mismatch 00000000000
sa_map_alloc_fail 00000000000 vif_alloc_fail 00000000000 msg_miss 00000000000
sa_ins_null_adapter 00000000000 sa_del_null_adapter 00000000000
del_sa_mismatch 00000000000 ib_chk_null_adpt 00000000000

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 159 ­ www.fortinet.com
ib_chk_null_sa 00000000000 ob_chk_null_adpt 00000000000
ob_chk_null_sa 00000000000 rx_vif_miss 00000000000
rx_sa_miss 00000000000 rx_mark_miss 00000000000
waiting_ib_sa 00000000000 sa_mismatch 00000000000
msg_miss 00000000000

● diag npu np6 eeprom­read <np6_id>

Unknown usage so far.

FG1K5D­3 # diagnose npu np6 eeprom­read 0


Major ID :3
Minor ID :0
Chip ID :0
MAC NR :20
MAC_Base :08:5b:0e:71:96:22
SN :FG1K5D3I14800503
mode :0

● diag npu np6 npu­feature

Dumps a simple table with features enabled on the different NP6s of the unit.
Example output from FortiGate­900D running 5.2.9 and a FortiGate­3700DX running 5.2.7
FG900D # diagnose npu np6 npu­feature moi­FG37DX­1­LAB (global) # diagnose npu np6 npu­feature
np_0 np_1 np_0 np_1 np_2 np_3
­­­­­­­­­­­­­­­­­­­ ­­­­­­­­­ ­­­­­­­­­ ­­­­­­­­­­­­­­­­­­­ ­­­­­­­­­ ­­­­­­­­­ ­­­­­­­­­ ­­­­­­­­­
Fastpath Enabled Enabled Fastpath Enabled Enabled Enabled Enabled
Low­latency­mode Disabled Disabled Low­latency­mode Disabled Disabled Disabled Disabled
Low­latency­cap No No Low­latency­cap Yes Yes No No
IPv4 firewall Yes Yes IPv4 firewall Yes Yes Yes Yes
IPv6 firewall Yes Yes IPv6 firewall Yes Yes Yes Yes
IPv4 IPSec Yes Yes IPv4 IPSec Yes Yes Yes Yes
IPv6 IPSec Yes Yes IPv6 IPSec Yes Yes Yes Yes
IPv4 tunnel Yes Yes IPv4 tunnel Yes Yes Yes Yes
IPv6 tunnel Yes Yes IPv6 tunnel Yes Yes Yes Yes
GRE tunnel No No GRE tunnel Yes Yes Yes Yes

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 160 ­ www.fortinet.com
IPv4 Multicast Yes Yes IPv4 Multicast Yes Yes Yes Yes
IPv6 Multicast Yes Yes IPv6 Multicast Yes Yes Yes Yes
CAPWAP No No CAPWAP No No No No

● Comments :
○ capwap offload is not available in 5.0. It has only been added since B0961 (5.4 GA)
○ GRE tunnel is only available on FortiGate­3700DX (thanks from TP2 FPGA)
○ Low latency is only available on 2 NP6 of FortiGate­3700D and FortiGate­3700DX

● References #239441 (capwap support on B0961)

● diag npu np6 register

Very details command on internal NP6 registers values. Unknown usage for TAC, might be requested by devs eventually.
Only dumping the first lines for illustration purpose only for fans of hex…

FG1K5D­3 # diagnose npu np6 register 0


BANK_0
top
pba_num =00000bb9 (ffffff0000a63000)
pba_num =00000bb9 [0:12]
pba_empty =00000000 [31:31]
strap =00000000 (ffffff0000a63388)
pll_fsel =00000000 [0:1]
serdes_vsel =00000000 [2:3]
pcie0_acdc =00000000 [4:4]
pcie1_acdc =00000000 [5:5]
pcie0_txsw =00000000 [6:6]
pcie1_txsw =00000000 [7:7]
xaui_mode =00000000 [8:8]
gpio_isr =00000000 (ffffff0000a63400)
gpio_imr =00000000 (ffffff0000a63408)
gpio_imrc =00000000 (ffffff0000a63410)
gpio_isel =00000000 (ffffff0000a63418)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 161 ­ www.fortinet.com
gpio_ivcr =00000000 (ffffff0000a63420)
pcs_isr =00000000 (ffffff0000a63440)
pcs_imr =00000000 (ffffff0000a63448)
pcs_imrc =00000000 (ffffff0000a63450)
pcs_isel =00000000 (ffffff0000a63458)
pcs_ivcr =00000000 (ffffff0000a63460)
pe00_isr =00000000 (ffffff0000a63480)
pe00_isrc =00000000 (ffffff0000a63488)
pe00_imr =00000000 (ffffff0000a63490)

● diag npu np6 synproxy­stats


Information related to np6 synproxy

FG1K5D­3 # diagnose npu np6 synproxy­stats


DoS SYN­Proxy:
Number of proxied TCP connections : 0
Number of working proxied TCP connections : 0
Number of retired TCP connections : 0
Number of attacks, no ACK from client : 0

● fnsysctl cat /proc/net/np6_0/msg

Unknown usage so far, asked by Ken Yan in #386626


FG15DT3I15800036 # fnsysctl cat /proc/net/np6_0/msg
sse_query_seq :0
nr_of_msg_skb :0
nr_of_query :0
tasklet_entry :0
nr_of_deliver :0
quota :64
weitght :64
quota_default :64

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 162 ­ www.fortinet.com
sse_qry0 :0
sse_qry1 :0
sse0_timeout :0
sse1_timeout :0
sse_tmout_miss :0
sse_qry_miss :0
wrong_msg_len :0
wrong_msg_type :0
sa_exp_by_trf :0
sa_sn_exhausted :0
sa_sn_update :0
sa_throughput_update:0
sa_inb_antireply_update:0
sa_reconnect :0
tce_tmo :0
cwi_tmo :0
cwo_byte :0
cwo_pkt :0
cwo_tmo :0
ipto_update :0
tpe_update :0
ipsec_vif_miss :0
ipt_vif_miss :0
gre_vif_miss :0
ulif_miss :0
mcast_null :0
tpe_mcast :0
tpe_ipt :0
tpe_gre :0
tpe_cwi :0

Design recommendations

● Choose the right port, remember that oversubscription is possible


● Try to use as many NP as possible, try to balance their XAUI load.
● Use of LAGs to distribute traffic on all NP6 and all XAUIs as well as for Ipsec SA distribution
● for inter vdom links, use npu­vlink and try to balance across all available NP6s if possible
● Avoid breaking hardware acceleration if possible

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 163 ­ www.fortinet.com
Limitations and workarounds, fixed bugs
● Fixed bug 365497 : possible packet out­of­order with NP6 during TCP session establishment
workaround: CLI command on policy : delay­tcp­npu­session enable|disable*
● Fixed bug 309458 : Passthrough UDP 4500 not accelerated, bug fixed in 5.4.1/5.2.8
● Fixed bug 0263634 / 0270666 : multicast is not offload in TP mode (fixed in 5.4, not in 5.2)
● #310482 IPv6 HA A­A cluster master forwarding traffic does not go offloading

SoC3 (NP6 light)

Need to be documented. No pointers collected so far (inputs welcomed)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 164 ­ www.fortinet.com
NP4

This NP4 chapter is considered legacy

From the outside

Form factors
● 1 single form factor : dual core ship, each core has a single XAUI attachment
● PCI­E x 8 lane bus

Performance figures
● 20 Gbps maximum Firewall throughput
● Sessions : 6 millions
● 6 Gpbs throughput IPSec ESP encryption/decryption

Integration
● NP4 are always connected to ISF (legacy exception amc­xd4 module)

IRQ distribution
Each core of the NP4 has 4 IRQ mapped. This makes 8 IRQs overall for the NP4. This is not enough to allow a direct N­Turbo ips acceleration

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 165 ­ www.fortinet.com
From the inside

● Made of 2 cores
● MSI­X support : multi­RX and multi­TX queues (since mantis #139358,4.3 build 423)
● 2 Session Search Engines : one per NP4 core
● Shapers : 2048 maximum (mantis #137405)
● 1 IPSec engine shared between the 2 cores with 8 sub­engines.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 166 ­ www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 167 ­ www.fortinet.com
Configuration options impacting NP4
● config system npu -> set dedicated-management-cpu enable (#201257, #218083, #251776)
● config system npu -> set dedicated-tx-npu enable (FG3600C only, mantis #256367)
● config system npu -> enc-offload-antireplay/dec-offload-antireplay/offload-ipsec-host (see ipsec part and Stephane’s IPSec Guide)

Diag commands

● diagnose npu np4 list

Provides NP4 chips port mapping without details on which core/XAUI is used.

FG3K9B3E10700030 # diagnose npu np4 list


ID Model Slot Interface
0 3950B I/O port1 port2 port3 port4
port5 port6

● diagnose hard de nic <port>

Provide all the details thanks to the “sw_port” and “sw_np_port” information.

sw_np_port : BCM switch port where the NP4 core 10 g interface is connected
sw_port : BCM switch port where the interface is connected

FG3K9B3E10700030 # diagnose hardware deviceinfo nic port1


sw_port :25
sw_np_port :13
half_id :0

FG3K9B3E12700356 # diagnose hardware deviceinfo nic port2

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 168 ­ www.fortinet.com
sw_port :26
sw_np_port :14
half_id :1

FG3K9B3E12700356 # diagnose hardware deviceinfo nic port3


sw_port :27
sw_np_port :13
half_id :0

FG3K9B3E12700356 # diagnose hardware deviceinfo nic port4


sw_port :28
sw_np_port :14
half_id :1

FG3K9B3E12700356 # diagnose hardware deviceinfo nic port5


sw_port :15
sw_np_port :13
half_id :0

FG3K9B3E12700356 # diagnose hardware deviceinfo nic port6


sw_port :16
sw_np_port :14
half_id :1

Limitations and workarounds, fixed bugs

● hashing function to chose one of the IRQ is missing destination IP address in the hash (only src_ip, src_port,
dst_port)
● #310606 ESP passthrough is not accelerated by NP4
Special image fix, not merged was made via top3 (#229874)
● For maximum performance, interfaces should be chosen to use the 2 NP4 cores.
● reminder : no ipv6 acceleration, no multicast ipv4 acceleration. All packets sent to the first queue (first IRQ) of the
core which may cause a distribution issue if the traffic is high (#217643, #140153)
● Dedicated command channel for the kernel : session queue (fastpath setup) and message queue
(keepalive+ipsec). Both command chanels are triggering only the first IRQ (#217643, #263580)
● antireplay settings in ‘config system npu’ may or may not be considered depending on FortiOS version (see ipsec
chapter)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 169 ­ www.fortinet.com
● Impact from dedicated management cpu command (#201257, #218083, #251776)
● Potential HPE drop due to congestion on egress XAUI between “inter­core” accelerated traffic and non accelerated traffic.
Mitigations :
­ Use as many NP4 as possible in the lab
­ Spread LAG ports (for a lag) on multiple NP4
­ Use the same LAG hash on switch and FortiGate (L3)
­ Make sure return path goes through a different pairs of NP4 cores
­ Try to have accelerated and non accelerated traffic on different NP4
­ Mantis #256367 for FG3600C : config system npu → set dedicated­tx­npu enable (the 3rd NP4 is dedicated

Kent Yann (#256367) :


Counter PDQ_SSE_EHP0/1 indicates packet drop at NP4's XAUI interface connected to internal ISF. The reason is traffic from fastpath and slowpath(from CPU) collide at
egress XAUI. There is no remedy to the issue. Also NP4's two XAUI interfaces are shared among all the ports attached to it. Traffic from 10G ports and 1G ports will collide at
XAUI interface.

● Possible no ipsec acceleration after a interface outage in a lag for outbound traffic only (#189140, #267252)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 170 ­ www.fortinet.com
SoC2 (NP4 lite + CP8 lite)
Used on FortiGate­60D, FortiGate­70D, FortiGate­90D, FortiGate­200D, FortiGate­240D,
FortiGate­280D.
Soc2 is using an NP4 light.

● NP4Lite architecture
○ 1 single core
○ 4 x RGMII ( reduced RGMII : 1G bandwidth total for TX and RX)
○ 4 different interrupts but distribution on 2 the CPU cores available
○ another interrupt exist showing 0 interrupts (unknown usage)

● FortiGate­200D and FortiGate­240D

Comments :
­ The RGMII attachments are different between models
­ There is no command known to establish port mapping (test with traffic may be required to see which counter increases)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 171 ­ www.fortinet.com
FortiGate­60D and FortiGate­90D

Comments :
­ May have a single or a “dual­connected” switch fabric
­ In a dual switch fabric, each switch has its own RGMII link so bandwidth may be better between ports from 2 different switch fabric

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 172 ­ www.fortinet.com
Interrupts
FG200D­1­LAB # diagnose hardware sysinfo interrupts FG240D # diagnose hardware sysinfo interrupts
CPU0 CPU1 CPU0 CPU1
0: 538988 0 IO­APIC­edge timer 0: 77576284 0 IO­APIC­edge timer
2: 0 0 XT­PIC cascade 2: 0 0 XT­PIC cascade
4: 9 0 IO­APIC­edge serial 4: 5827 0 IO­APIC­edge serial
7: 0 0 IO­APIC­edge LCD_KEYPAD 7: 0 0 IO­APIC­edge LCD_KEYPAD
8: 0 0 IO­APIC­edge rtc 8: 0 0 IO­APIC­edge rtc
16: 0 0 IO­APIC­level ehci_hcd, ehci_hcd 16: 0 0 IO­APIC­level ehci_hcd, ehci_hcd
17: 5743 0 IO­APIC­level libata, usb­uhci, 17: 450364 0 IO­APIC­level libata, usb­uhci,
usb­uhci, net2280 usb­uhci, net2280
18: 0 0 IO­APIC­level usb­uhci, usb­uhci 18: 0 0 IO­APIC­level usb­uhci, usb­uhci
19: 0 0 IO­APIC­level usb­uhci 19: 0 0 IO­APIC­level usb­uhci
64: 298362 0 PCI­MSI­edge mgmt­Q0 64: 24897124 0 PCI­MSI­edge mgmt­Q0
65: 9 0 PCI­MSI­edge mgmt 65: 8 0 PCI­MSI­edge mgmt
66: 6653177 0 PCI­MSI­edge np4lite 66: 817918 0 PCI­MSI­edge np4lite
67: 0 6990207 PCI­MSI­edge np4lite 67: 0 11229931 PCI­MSI­edge np4lite
68: 8188509 0 PCI­MSI­edge np4lite 68: 893281 0 PCI­MSI­edge np4lite
69: 0 5626707 PCI­MSI­edge np4lite 69: 0 1051194 PCI­MSI­edge np4lite
70: 0 0 PCI­MSI­edge np4lite 70: 0 0 PCI­MSI­edge np4lite
71: 0 0 PCI­MSI­edge cp8 71: 10111 0 PCI­MSI­edge cp8
72: 0 0 PCI­MSI­edge cp8 72: 0 10567 PCI­MSI­edge cp8
73: 0 0 PCI­MSI­edge cp8 73: 0 0 PCI­MSI­edge cp8
74: 0 0 PCI­MSI­edge cp8 74: 0 0 PCI­MSI­edge cp8
75: 0 0 PCI­MSI­edge cp8 75: 0 0 PCI­MSI­edge cp8
NMI: 538926 538962 NMI: 77576216 77576252
LOC: 538904 538903 LOC: 77576576 77576575
ERR: 0 ERR: 0
MIS: 0 MIS: 0

FortiGate­60D and FortiGate­90D run on 1 core only


FGT60D (global) # diagnose hardware sysinfo interrupts
0: 310968968 Timer Tick
8: 19654 soc2_vpn
10: 100311 soc2_pkce2
20: 236861775 np4lite
27: 0 ehci_hcd
28: 69829 ehci_hcd
29: 0 fsoc1_udc
32: 2706 serial
Err: 0

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 173 ­ www.fortinet.com
Drop counter table
FG200D­1­LAB # fnsysctl cat /proc/fsoc/npl/dce 0083: 00000086 CRC Error drop in Ch2
000c: 00000161 Inbound total valid Packets in Ch0 008c: 027aa6f2 Inbound total valid Packets in Ch2
000d: 00000309 Outbound total valid packets in Ch0 008d: 02758887 Outbound total valid packets in Ch2
0011: 00000001 SSE to EHP PDQ Full in Ch0 0091: 00055c3c SSE to EHP PDQ Full in Ch2
0012: 0000010c SSE to HRX PDQ Full drop 009a: 00005af5 Inbound Statistic number for Broadcast Packet in Ch2
001a: 0000003c Inbound Statistic number for Broadcast Packet in Ch0 009b: 00000025 Inbound Statistic number for Multicast Packet in Ch2
001b: 00000125 Inbound Statistic number for Multicast Packet in Ch0 009c: 027a4bd8 Inbound Statistic number for Unicast Packet in Ch2
001c: 00062f9e Inbound Statistic number for Unicast Packet in Ch0 009d: 000078fc Outbound statistic number for Broadcast Packet in Ch2
001d: 000000ed Outbound statistic number for Broadcast Packet in Ch0 009e: Outbound statistic number for Multicast Packet in Ch2 ?
001e: Outbound statistic number for Multicast Packet in Ch0 ? 009f: 02750f8b Outbound Statistic number for Unicast Packet in Ch2
001f: 00000309 Outbound Statistic number for Unicast Packet in Ch0 00ba: 02758887 SSE to EHP PDQ Read Commit Number in Ch2
0036: 002c1ea7 SSE to HRX PDQ Read Commit Number 00bb: 02758887 SSE to EHP PDQ Write Commit Number in Ch2
0037: 002c1ea7 SSE to HRX PDQ Write Commit Number 00be: 027aa6f2 IHP to SSE PDQ Read Commit Number in Ch2
003a: 00000309 SSE to EHP PDQ Read Commit Number in Ch0 00bf: 027aa6f2 IHP to SSE PDQ Write Commit Number in Ch2
003b: 00000309 SSE to EHP PDQ Write Commit Number in Ch0 00c3: 0000001b CRC Error drop in Ch3
00cc: 00005ae1 Inbound total valid Packets in Ch3
003e: 00000161 IHP to SSE PDQ Read Commit Number in Ch0 00da: 00005ae1 Inbound Statistic number for Broadcast Packet in Ch3
003f: 00000161 IHP to SSE PDQ Write Commit Number in Ch0 00db: 0000006c Inbound Statistic number for Multicast Packet in Ch3
004c: 00000049 Inbound total valid Packets in Ch1 00dc: 00000221 Inbound Statistic number for Unicast Packet in Ch3
004d: 00185d5f Outbound total valid packets in Ch1 00dd: 00000009 Outbound statistic number for Broadcast Packet in Ch3
0051: 0000000f SSE to EHP PDQ Full in Ch1 00de: Outbound statistic number for Multicast Packet in Ch3 ?
0052: SSE to HRX PDQ Full drop ? 00df: 0000040f Outbound Statistic number for Unicast Packet in Ch3
005a: 00000049 Inbound Statistic number for Broadcast Packet in Ch1 00fa: 0000040f SSE to EHP PDQ Read Commit Number in Ch3
005b: 00000002 Inbound Statistic number for Multicast Packet in Ch1 00fb: 0000040f SSE to EHP PDQ Write Commit Number in Ch3
005c: 00163b83 Inbound Statistic number for Unicast Packet in Ch1 00fe: 00005ae1 IHP to SSE PDQ Read Commit Number in Ch3
005d: 00000096 Outbound statistic number for Broadcast Packet in Ch1 00ff: 00005ae1 IHP to SSE PDQ Write Commit Number in Ch3
00ee: Outbound statistic number for Multicast Packet in Ch1 ? 0114: 0015ea3b HTX0 to SSE PDQ Write Commit Number
005f: 00185cc9 Outbound Statistic number for Unicast Packet in Ch1 0115: 00161707 HTX1 to SSE PDQ Write Commit Number
007a: 00185d5f SSE to EHP PDQ Read Commit Number in Ch1 0138: 00140d09 SSE to HRX0 PDQ Read Commit Number
007b: 00185d5f SSE to EHP PDQ Write Commit Number in Ch1 0139: 0018119e SSE to HRX1 PDQ Read Commit Number
007e: 00000049 IHP to SSE PDQ Read Commit Number in Ch1 013c: 00140d09 SSE to HRX0 PDQ Write Commit Number
007f: 00000049 IHP to SSE PDQ Write Commit Number in Ch1 013d: 0018119e SSE to HRX1 PDQ Write Commit Number
../..

Comments:
­ only shows line when counter increases. This output has been built from multiple dumps on multiple units.
­ each time the command is send, counter are reset
­ Lines with ‘?’ at the end are guessed from similar lines on other groups
­ DCE counter table in 5 sections : Ch0, Ch1, Ch2, Ch3 (one for each RGMII port probably), host interface 0 and 1
­ each counters has a reference (000c: 000d: ..), there are some holes in this dump (missing ref)
­ seems to have 1 channel per RGMII (Ch0, Ch1, Ch2, Ch3)

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 174 ­ www.fortinet.com
Statistics
FG200D­1­LAB # fnsysctl cat /proc/fsoc/npl/stats
cmd_alloc_fail :0000000000 cmd_resc_flush :0000000000
cmd_resc_flush_fail :0000000000 cmd_issue_fail :0000000000
ses_ins_total :0007947784 fw_ses_ins_orig :0003973485
fw_ses_ins_reply :0003974299 ses_del_total :0007880055
fw_ses_del :0007880055 ses_timeout :0000000000
sa_set_total :0000000000 sa_set_ib :0000000000
sa_set_ib_nomem :0000000000 sa_set_ib_dfail :0000000000
sa_set_ib_ses_fail :0000000000 sa_set_ob :0000000000
sa_set_ob_nomem :0000000000 sa_set_ob_dfail :0000000000
sa_del_total :0000000000 sa_del_ib :0000000000
sa_del_ib_nomem :0000000000 sa_del_ib_dfail :0000000000
sa_del_ib_ses_fail :0000000000 sa_del_ob :0000000000
sa_del_ob_nomem :0000000000 sa_del_ob_dfail :0000000000
check_ipsec_offload :0000000000 check_ipsec_offload_ok :0000000000

Comments:
­ 3 different zones : command, sessions, ipsec

● FortiGate­240D :
­ Port1 to port40 share same uplink. So the total bandwidth is 1Gbps. They other RGMII link is used for DMZ ports. (#278064)
­ 240D has 4 RGMII link which are dedicated to WAN1, WAN2, ALL_LAN_PORTS, ALL_DMZ_PORTS respectively (#278064)
⇒ This is contradiction with FG­240D functional diagram that shows a different distribution. Don’t know which one is correct…

Limitations and workarounds, fixed bugs

● #278064 Throughput drops from 1Gb to 500Mb with duplex connections


Architecture based on RGMII that shares 1G for both direction

● #310606 ESP passthrough is not accelerated by Soc2 (NP4Light)

● #295622 High packet dropped when using a port in 100Mb connected directly to NPlight RGMII. Use a port connected to internal switch instead to
buffer packets.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 175 ­ www.fortinet.com
FortiGate­3700DX overview

In short, 3700D with addition of 2x FGPA called TP2 used as an external extension of the NP6 service-group to accelerated GTP traffic and GRE tunneling.
The platform has also been used to workaround the NP6 out-of-order issue using the TP2 to do packet re-ordering.

TP2 diag counters (#393264) from top3 implementation “3700DX OOOfixes ”

FG37DX4614800016 # diagnose tp2 ?


status show TP2 cards status
register View NP2 registers
xgmac­stats Show XGMAC MIBs counters
xgmac­stats­clear Clear XGMAC MIBS counters
sel­cnt Show SEL module counters
sel­cnt­clear Clear SEL module counters
update Update TP2 cards images

Syntax:
diagnosis tp2 status <dev_id>
diagnosis tp2 register <dev_id>
diagnosis tp2 xgmac­stats <dev_id>
diagnosis tp2 xgmac­stats­clear <dev_id>
diagnosis tp2 sel­cnt <dev_id>
diagnosis tp2 sel­cnt­clear <dev_id>
diagnosis tp2 update

top3 #437462 "Add IPSec Anti-Replay workaround based on 5.4.4 3700DX branch "

So far (as of 170718) there was no deployment of 3700DX for ipsec ooo but the code exists.
With regards to GTP and GRE acceleration another solution has been found with regular 3700D in pure CPU (using kernel acceleration based on
n­turbo) with higher performance.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 176 ­ www.fortinet.com
FortiCarrier (Carrier Grade Nat) overview
Work to be done.
In short, a unit based on Forticore hardware dedicated to high performance natting. No NP6 involved.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 177 ­ www.fortinet.com
Reference websites and documents
The following useful references are available related to Fortinet hardware acceleration

● docs.fortinet.com

○ FortiOS Handbook ­ Hardware Acceleration, version 5.4.1 [public]


http://docs.fortinet.com/uploaded/files/2855/fortigate­hardware­acceleration­54.pdf
Fortinet official hardware acceleration document.
What’s new in Hardware Acceleration for FortiOs 5.4
Hardware acceleration overview
NP6 Acceleration
FortiGate NP6 architectures
NP4 Acceleration
FortiGate NP7 architectures
Hardware acceleration get and diagnose commands

● Related publications

● Expert Academy 2016 : The back of the rack [ Fortinet internal / Fortinet official partners]
https://fortivision.fortinet.com/index.php?/topic/11786­emea­expert­academy­2016­the­back­of­the­rack
Hardware acceleration and the NP6 processor
NextGen Firewall ­ IPS/AppCtrl with N­Turbo
ADVPN ­ Configuring & troubleshooting

● IPSec VPN guide for TAC [ Fortinet internal ]


https://fortivision.fortinet.com/index.php?/topic/98­ipsec­vpn­training­material
Very detailed document reference related to FortiGate Ipsec vpn, including hardware acceleration details.

Fortinet confidential, do not distribute outside Fortinet ­ authors : Cedric Gustave (SET) ­ 178 ­ www.fortinet.com

You might also like