Professional Documents
Culture Documents
Fortinet_Hardware_Acceleration
Fortinet_Hardware_Acceleration
15 Sep 2016 Draft from ex Hardware Acceleration doc + eXpert Academy 2016 + other additions. Used for training remote session Cedric Gustave
21 Oct 2016 Additions from feedback, more references, details on platform without ISF, new platform example (2500E, 1500DT), np6 session update new behavior (here) Cedric Gustave
04 Nov 2016 Fix incorrect info on NP6 SA based on testings from Stephan on FGT1500D (link) + adjusted paging for printing + review from Laurent Cedric Gustave
12 dec 2016 Update np6 session update from Mantis #386626 Cedric Gustave
16 dec 2016 Example of FortiGate-1500D for NP6/N-Turbo/Ipsengine interrupts and core mapping (missing diagram) (link) Cedric Gustave
30 Mar 2017 Example of FortiGate-1500D for npu-vlink XAUIs used for egress first NP6 and ingress second NP6 Cedric Gustave
24 July 2017 Update with notes/documents from FortiVision Hardware training page + details on NP6 lags confirmed by Yi Cedric Gustave
10 Nov 2017 Understanding EHP drops (link) + NP6 shaping protection summary (link) Cedric Gustave
04 Jan 2018 NP6 session update + statistic counters (link + link) + hardware acceleration with 2 asymmetric wan interfaces (link) + EHP drops with lags (link) Cedric Gustave
29 May 2018 LAG enhancement (set lag-sw-out-trunk enable) explanation for both NP module and non NP module devices (link) Cedric Gustave
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 1 www.fortinet.com
Table of contents
Hardware Fundamentals
PCI Bus Peripheral Component Interconnect
PCIE Bus : Peripheral Component Interconnect – Express (also PCIe)
Bus Interrupt ReQuest IRQ
Advanced Programmable Interrupt Controller APIC
Message Signals Interrupts MSIX
Internal and external data transmission
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 2 www.fortinet.com
IPv6 multicast session acceleration
SCTP traffic hardware acceleration
CAPWAP data (not DTLS) hardware acceleration (259431)
fpanomaly
Deprecated : IPS anomaly and signature (XH0/XG2)
Features breaking hardware acceleration
NP chips
NP6
From the outside
Form factors
Integration
Integration with a switch fabric
This platform is similar to the classical 1500D but has 4x10G RJ45 copper ports
Integration without a switch fabric
NP6 Performance figures
Improvements from NP4
From the inside
Functional blocks
Traffic flow examples
Configuration options impacting NP6
NP6 monitoring additions for drift sessions (diag and SNMP)
NP6 IPsec outoforder and subengine settings
References :
NP6 limitations and bugs
Understanding EHP drops
NP6 shaping protection summary
NTurbo NP6 IRQ mapping
Diag commands and counters
diag npu np6 fastpath <enable*|disable> <np6_id>
diag npu np6 dce (dceall) <np6_id>
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 3 www.fortinet.com
List of functional modules referred in NP6 drop counters
DCE TABLE 0 : HRX drops
DCE TABLE 1 : Anomaly drops
diag npu np6 anomalydrop (anomalydropall) <npu_id>
diag npu np6 hrxdrop (hrxdropall) <npu_id>
diag npu np6 sessionstats (sessionstatsclear) <npu_id>
diag npu np6 ssestats (ssestatsclear)
diag npu np6 xgmacstats (xgmacstatsclear) <npu_id>
diag npu np6 gmacstats (gmacstatsclear)
diag npu np6 gigeportstats (gigeportstatsclear) <port_name>
diag npu np6 portlist
diag npu np6 ipsecstats (ipsecstatsclear)
diag npu np6 eepromread <np6_id>
diag npu np6 npufeature
diag npu np6 register
diag npu np6 synproxystats
Design recommendations
Limitations and workarounds, fixed bugs
SoC3 (NP6 light)
NP4
From the outside
Form factors
Performance figures
Integration
IRQ distribution
From the inside
Configuration options impacting NP4
Diag commands
Limitations and workarounds, fixed bugs
SoC2 (NP4 lite + CP8 lite)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 4 www.fortinet.com
Limitations and workarounds, fixed bugs
FortiGate3700DX overview
FortiCarrier (Carrier Grade Nat) overview
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 5 www.fortinet.com
Document objective
This document objective is to provide the TAC Engineers with an uptodate reference on all information around FortiGate hardware acceleration.
It is subjected to be updated frequently to keepup with new knowledge and product.
Document’s history
The document is sourced from the document “FortiGate hardware acceleration components and architectures” which started in mai 2005. This
document was getting too big because of the addition of various “sides” topics and contained too much information related to legacy product. The official
Fortigate “Hardware Acceleration” has also been improved and has more details than before that had to be covered. This results in document covering
less topics but more focus on what is not already covered or detailed enough.
Disclaimer
Information in this document cannot be guaranteed 100% correct for several reasons. First, implementations are subject to change so information can
be initially correct but may become obsolete or wrong. Second, information inputs are from different sources such as lab tests results, mantis
information, bits of information or experience shared by Fortinet colleagues… that may be valid in their context but might turn to be wrong in a more
general or different context or simply that could not be verified.
Feedback
Feedback such as pointers or sharing of information is of course welcome and necessary to maintain a pertinent content in this document.
You can provide your feedback through Fortivision bugnotes at the location of the document. Thank you in advance for your contribution !
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 6 www.fortinet.com
Confidentiality
This document should not be shared externally, it contains internal references and content about Fortinet technology.
For external communication, the official documentation FortiGate Hardware Acceleration at http://doc.fortinet.net should be used instead.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 7 www.fortinet.com
Hardware Fundamentals
PCI Bus Peripheral Component Interconnect
● The PCI bus provides a communication channel between a PCI peripheral device and the main system
→ ex: delivers data from a network card to the computer operating system
● It is a common standards to facilitate devices integration and vendor interoperability
● It is based on a parallel interface (parallel lines of data synchronized by a common clock reference
● Requires wiring layers on motherboard
● Half duplex
● Multiple revisions and technologies
● Each device announce itself on the Bus (see lspci)
● Each device needs an IRQ (Interrupt Request) to speak with CPU
● PCI bridge : IRQ managed at bus bridge => possible to share IRQ on PCI devices
● PCIX PCI eXtended: Extension of PCI (complex wiring and expensive)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 8 www.fortinet.com
PCIE Bus : Peripheral Component Interconnect – Express (also PCIe)
● Developed by Intel, Faster than PCI
● High speed Serial bus (simpler than parallel to implement, less pins)
● Point to point technology : Each device meshed (+ connection with the host)
○ non shared media, multiple access
○ up to 32 lanes multiplex between 2 devices
○ increase of endpoint does not affect performance
● Capability of using multiple lanes is advertised by each device and negotiated
● PCIE standards allows x1, x4, x8, x16, x32 (not common) lanes multiplexed
● Bidirectional (fullduplex) between every endpoint
● Packet based data encapsulation
● Supports INTx, MSI, MSIX interruptions
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 9 www.fortinet.com
Bus Interrupt ReQuest IRQ
Whenever a device needs to send data through a bus, it is first required to raise an interrupt request (IRQ) to the bus controller.
This IRQ is delivered to the destination (kernel in our case) to notify data is available to be pulled.
Data packets are waiting on the sender FIFO queue to be pulled (for instance on the network interface).
If data is not pulled fast enough the sender queue may become full and start dropping packets.
In the FortiGate implementation, a single IRQ may transfer up to 64 packets. If after data
has been pulled from the sender queue there are still packets remaining because not all of
them could be pulled at once, there is no need for the sender to trigger another IRQ. This is
why the IRQ rate is first linear with the packet rate, then IRQ rate becomes flat but packet rate
continues to increase. It is therefore expected that the raise of interrupt rate is not
proportional with the rate of packet rate.
When APIC is used, ‘diag hard sys interrupts’ refers to ‘IOAPICedge’ or ‘IOAPIClevel’
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 10 www.fortinet.com
Message Signals Interrupts MSIX
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 11 www.fortinet.com
IRQ mapping
IRQ mapping, also called affinity, maps for each IRQ id which CPU core should be used. There are commands available to find the affinity map :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 12 www.fortinet.com
Comments:
● It is easy to see which devices are using APIC or MSIX
● some devices may share the same IRQ id, ex: ipsec0 (aka CP6 on this unit) and port39
● choosing port39 and port40 to send a high packet rate is a poor choice as the are only mapped to 1 core compared to the NP4 enabled
ports
● devices using APIC seems to have all their IRQ mapped to CPU0
● We see that the first NP4 (np4_0) is using 8 IRQs from 64: to 71: however because no traffic has flown through the ports since the reboot
it is hard to verify that all interrupts from np4_0 were balanced on different core
● We see clearly the 2x NP4 of the FortiGate1000B (np4_0 and np4_1)
The output from the command references a CPU MASK in hexadecimal format 2^(cpu_mask) corresponding to the following CPU
CPU_id CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
# 0x1 0x2 0x4 0x8 0x10 0x20 0x40 0x80 0x100 0x200 0x400 0x800 0x1000 0x2000 0x4000 0x8000
CPU_id CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 CPU24 CPU25 CPU26 CPU27 CPU28 CPU29 CPU30 CPU31
# 0x10000 0x20000 0x40000 0x80000 0x100000 0x200000 0x400000 0x800000 0x100000 0x200000 0x400000 0x800000 0x10000000 0x20000000 0x40000000 0x80000000
0 0 0 0
An output like : ‘The cpuset of irq114 is 0xffffffffffffffff’ means that the IRQ could be distributed on any CPU cores.
In our example from our FGT1KB8, we want to verify np4_1 IRQ affinity because all counters show ‘0’ in the interrupt list.
We want to verify interrupts from 72: to 79: in this case :
FGT1KB8 # diagnose sys cpuset interrupt 72 FGT1KB8 # diagnose sys cpuset interrupt 76
The cpuset of irq72 is 0x1. The cpuset of irq76 is 0x1.
FGT1KB8 # diagnose sys cpuset interrupt 73 FGT1KB8 # diagnose sys cpuset interrupt 77
The cpuset of irq73 is 0x2. The cpuset of irq77 is 0x2.
FGT1KB8 # diagnose sys cpuset interrupt 74 FGT1KB8 # diagnose sys cpuset interrupt 78
The cpuset of irq74 is 0x4. The cpuset of irq78 is 0x4.
FGT1KB8 # diagnose sys cpuset interrupt 75 FGT1KB8 # diagnose sys cpuset interrupt 79
The cpuset of irq75 is 0x8. The cpuset of irq79 is 0x8.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 13 www.fortinet.com
● Listing with /proc
Another alternative (but less user friendly) is to get the information directly from /proc:
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 14 www.fortinet.com
● Remapping : ‘diagnose system cpuset interrupt <irq_id> <cpu_mask>’
Disclaimer ! remapping IRQ affinities is generally dangerous and should not be done on customers’ units !
● Temporary affinity remap can be done through a ‘diag’ command. This change would not survive to a reboot. Reference #191000.
CPU15 CPU14 CPU13 CPU12 CPU11 CPU10 CPU9 CPU8 CPU7 CPU6 CPU5 CPU4 CPU3 CPU2 CPU1 CPU0
config system npu → config portcpumap → edit <port> → set cpucore <CPU_id>
There is it seems for now, no way to permanently remap IRQ affinities, there is however an interesting feature to mention that would have
impact on IRQ distribution. It is documented in #272428, this feature is to statically map a port to a single host RX queue and
therefore to a single CPU core. This feature clearly breaks the natural CPU core distribution for traffic received on 1 port.
This feature is available on NP6 only and not on all platforms.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 15 www.fortinet.com
IRQ impact on CPU
Each IRQ is processed by an assigned CPU. When triggered, it burns CPU resources on the mapped CPU core
Before 5.24.2, the cost of IRQs was accounted generically as ‘system’ cpu, just like any other kernel processes. This was problematic because it did
not allow to measure the cost of the IRQ compared to the cost of packet processing in the kernel itseld.
After 5.24.2, dedicated categories were added to count physical IRQ (APIC) : ‘irq’, and software IRQ (MSIX) : ‘softirq’.
After this change, the system indicator is no longer polluted with the IRQ cost.
FGT1KB8 # get sys performance status FG900D4 # get sys performance status
CPU states: 0% user 0% system 0% nice 100% idle CPU states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU0 states: 0% user 0% system 0% nice 100% idle CPU0 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU1 states: 0% user 0% system 0% nice 100% idle CPU1 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU2 states: 0% user 0% system 0% nice 100% idle CPU2 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU3 states: 0% user 0% system 0% nice 100% idle CPU3 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 16 www.fortinet.com
ASICs and FPGAs
Both ASICs and FPGAs are hardware acceleration chips used in FortiGate platforms. They both have their specificities making them used in different
context.
ASICs are very fast (faster than FPGA), this is why they are used as network processors (NP) but their development is very complex and long (couple
of years) and costly. When designed, the price per unit is very cheap for a massive quantity order.
Once, burned in the silicon, an ASIC ‘program’ can’t be changed because it is made of electronic logic units connected to each others.
Consequently, bugs may not be fixable at all nor any new feature can be added. Sometime bugs can be fixed by adjusting the vast number of available
settings controlling the hardware. In other cases,changing their behaviours for a serious problem requires a new revision of the chip, call ‘respin’. This is
avoided as much as possible.
Unlike ASICs, FPGAs logic runs closer to a computer program which allow reprogramming, including field reprogramming, for instance during a
regular system upgrade of the Appliance. FPGA are not as fast as ASIC, they cost less in design but the price per chip is much more expensive than
ASICs so a high volume of chip costs far much more. They are suitable for low volume, so for very specific features not deployed on high number
of appliances.
FPGA’s are actually used during the design phase of an ASIC to test the logic and fix bugs before the silicon phase.
ASICs and FPGA may be combined to work together where the FPGA is here to extends the ASIC features via an external call. This is the case for the
FortiGate3700DX where the FPGA extends the NP6 capability with GTP and GRE hardware acceleration.
Fortinet’s approach is more to use ASICs and eventually rely on FPGA for specific functions running on specific devices. The addition of an FPGA can
be a temporary solution before the logic is added to the next version of the ASIC.
All NPs and CPs are based on ASICs.
Legacy ‘SP’, FortiDDoS ‘TP2’, FortiController ‘DP’, FortiGate3700DX TP2 (GRE, GTP) and FortiCore are based on FPGAs.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 17 www.fortinet.com
Internal and external data transmission
Several other components are required to get packets transmitted internally between the different components such as physical port and NPs as well as
outside the unit on Ethernet network.
● Fortitag
Proprietary ISF internal switch tagging labels are appended to packets, they are used to forward packets between interfaces and chips. Those
tags are referred to ‘FortiTag’. When an ISF is used, the NP knows how to add and remove the right FortiTags depending on where the packet is sent.
● PHY
The ISF chip does not provide all the low layer functions needed to send a packet on the wire according to Ethernet standard.
This is not needed for the communication between the internal components using other physical standards.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 18 www.fortinet.com
For a packet to leave the Fortigate through a physical interface, some electronic is required to build the electric signal compliant with the Ethernet
standard. This is the goal of component called a PHY, it connects the link layer called MAC Medium Access Control, to physical medium like
optical fiber or copper port. A PHY chip may encode and decode signals for multiple ports. It is logically located between the ISP and the external port
connectors.
● XAUI
A XAUI (pronounced “zowie”) is a 10 G attachment standard where X is for 10 (roman numeral). It is oftenly used for of the NP6 10G chiptochip
attachment. The NP6 4x10 G attachment are XAUI.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 19 www.fortinet.com
It may look obvious, but this is actually misleading : Some modules on NP are referenced from the chip peer pointofview, so seen from the outside
of the chip. For example, the host modules for the PCI host interface communication (towards the kernel) would be named ‘HRX’ for traffic moving from
the chip to the kernel whereas the ‘HTX’ module deal with traffic moving from kernel to chip.
● Life of a packet in the NP : PBA Packet Buffer Allocator & PBUF Packet Buffer
The 2 acronyms PBA and PBUF are directly linked to the life of a packet in a NP.
What happens when a packet enters an NP ?
What happen to this packet when it is processed inside the NP ?
What happen to a packet leaving the NP ?
These are the questions that will be covered here.
When a packet enters the NP, the complete packet is first copied to a central buffer called Packet Buffer (PBUF). The NP module in charge of
buffering the packets is called Packet Buffer Allocator (PBA).
At a second step, a Packet Descriptor is created and stored in memory. The packet descriptor compiles the packet L2, L3 & L4 Headers only, payload
is dropped. Because the NP does not work on packet payload, such packet descriptor is enough for NP processing.
This is generally this packet short descriptor and NOT the full packet, that will navigate through the NP functional blocks and have its information
modified on the flow (there are exception, for instance with IPSec where payload is needed). Depending on the processing, flags may also be added
to the packet descriptor.
When packet has to leave the NP, the entire packet is generated from the corresponding packet descriptor and the original payload stored
initially.
The lifetime of the packet buffer and packet descriptor associated with the packet starts when packet enters the NP and stops when the packet has left
the NP. If for some reason packet buffer or PDQ is not freed when packet has left the NP or has been dropped, a PBA leak occurs.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 20 www.fortinet.com
● Packet Descriptor Queue : FIFO queues of packet descriptors with priority
NP and CP chips contain a lot of FIFO (First In, First Out) queues. Chips are made of modular functional blocks where packets transit from one block to the
other. Module blocks have ingress and egress packet descriptor queues (PDQ) to store a small amount of packet descriptors. Packet descriptors are then
pushed from one module queue to the other.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 21 www.fortinet.com
In case of packet burst, or busy module, packet descriptors are likely to increase in the queues. When a queue gets full, packets would be dropped. Such
drops are accounted in the NP ‘drop table’ of the Drop Counter Engine (DCE) available with command ‘diagnose npu np6 dce x’.
Some queues may provide a mechanism to prioritize some types of packets on ingress (more or less chance to enter the queue when the queue starts to
be loaded. NP6 can defined up to 8 different priority (internally defined and not visible by user). For instance, control plane traffic such as ARP, OSPF, BGP,
IKE… may be given a higher chance to enter the queue than normal traffic data. This has nothing to do with the traffic shaping feature, it is not something a
user can modify.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 22 www.fortinet.com
Hardware accelerators families overview
Families
We can split the hardware accelerators in multiple families :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 23 www.fortinet.com
List hardware accelerators on a unit
To find the hardware accelerators in a particular platform, several commands can be used :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 24 www.fortinet.com
● I see “Unknown device” in lspci output, should I be worried ?
example: 02:00.0 Network and computing encryption device: Unknown device 1a29:4338
No. It only means that a text definition is missing in the pci components device description field for this PCI Id. This field is not always uptodate.
CP9
1a29:0703 NP2
1a29:0702 NP4
1a29:4339 Soc2/NP4light
1a29:4e36 NP6
Soc3/NP6light
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 25 www.fortinet.com
● lspci output sample
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 26 www.fortinet.com
Hardware acceleration features
ipv4 unicast session acceleration
This NP feature is probably the most useful one when using hardware acceleration. It may look simple in the first place but understanding it in detail
requires some fundamentals concepts that will be detailed here.
Session hardware acceleration consists in intercepting packets at an early stage when they enter the FortiGate so they don’t have to be processed
by main CPUs. The physical packet interception is done by the Network Processor, just after being received from the network interface, however it
is still the kernel that sends the interception order to the NP. New session first packets can’t be accelerated because kernel need them for
session creation. The same goes for any packets of the session that would trigger a session state change.
● Benefits:
○ The goal is to reduce the CPU cost of processing packets for which a sessions already exist.
○ Significant drop of system CPU usage
○ Reduce the load on PCI buses
○ Shorten the packet latency when traversing the FortiGate
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 27 www.fortinet.com
● recognizing an offloaded session
Hardware accelerated sessions are visible in the firewall session list from the line “npu info” at the bottom of the session. Session entry details are
covered in document “FortiGate System” so we only talk about the npu info line here (copy of System Document)
npu info line is optional in a session list entry. It is only displayed when the session is passing through interfaces from the same NP.
epid and ipid are nonzero when offload is taking place.
● flags: The flag field encodes certain attributes of the session as it relates to NPU offload regardless of whether the session is eventually
offloaded. Each bit represents one piece of information
(src: https://askbot.fortinet.com/question/682/whatdoestheflagfieldofasessionnpuinfomean )
Bit # Meaning
#7 #6 #5 #4 #3 #2 #1 #0
8 4 2 1 8 4 2 1
1 0 0 0 0 0 0 1
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 28 www.fortinet.com
● offload: (forward_direction)/(reverse_direction). 0 when not offloaded
# Accelerator chip
0 not offloaded
2 NP1A (FA2)
3 NP2
4 NP4
5 SP1/SP2
6 NP4Light/Soc2
7 SP3
8 NP6
(*): See #151934 : Before 4.3.2, value ‘1’ in offload field meant “generically hardware accelerated” without details on the accelerator chip.
● noofldreason:
Another line was added in the session list to provide more information on the reason why a session passing through NP interfaces is not
accelerated. This line is discussed in Features breaking hardware acceleration
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 29 www.fortinet.com
● packets exceptions
Even if the session acceleration has been programmed for a session, some packets from the flow may bypass the hardware acceleration to reach the
kernel. Whenever NPs receive a TCP packet with a FIN or RST flag set, it is forwarded to the kernel regardless if a forward entry exists or not. This is
a requirement to allow the kernel to change the session state. The kernel would then notify the NPs to delete the corresponding forward entries
associated with the deleted session.
There are other cases of packets with a need of hardware acceleration exemption :
● session revalidation
As documented in the “FortiGate System” document, a routing change or networkrelated config change may cause the removal of hardware
acceleration entries in the NP. This is legitimate, for instance a routing change may simply route the traffic to another interface and a config
change may deny a traffic which was previously authorized. The session revalidation is a pure system firewall concept and also applies on non
hardware accelerated session. However, it has a consequence on hardware acceleration : whenever the kernel flags a session for revalidation by
applying the “dirty” flag, the corresponding hardware acceleration entries are removed from the NP.
The term “fastpath” should be avoided when referring to hardware accelerated sessions because it may be interpreted in two way. There is also a
fastpath in the kernel which has nothing to do with hardware acceleration. When the kernel handles the traffic, it first performs a hash on the packet to
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 30 www.fortinet.com
see if a session already exists. If a session matches, the packet goes in the (kernel) fastpath because there is no need for policy lookup. If the packet
does not match a session, it goes “slowpath” (routing lookup, policy lookup…). There is also another “fastpath” when using NTurbo.
● Forward entry:
The forward entries contain the required information for the NP to process hardware accelerated flows. They are stored in fast memory close to the NP
chips. Each forward entry has a key. This key is made from hashing a 5tuples : src_ip, src_port, dst_ip, dst_port, proto_number defining the
flow. Forward entries also have additional information required when packets need to be recreated for egress, like the original source mac
address (needed for transparent mode) or the destination MAC address to use. It also contains information necessary to process packets such as
:
● Timestamp (see session keepalive below)
● L3 and L4 for NAT purposes
● MTU
● Ipsec SA reference for encryption/decryption as well as reference to keys for integrity check
● outband interface reference: the logical virtual interface (LIF) and the associated vlan telling where to send the packet to
The vlan LIF and vlan distinction is required to find out which vdom the forward entry belongs to because forward entry contains no vdom
reference (vdom is not part of the 5 tuple hash).
● tunneling information (for instance v4/v6 encapsulation in NP6)
● processing action
Forward entries are unidirectional so a hardware accelerated bidirectional flow requires 2 forward entries. If the FortiGate has more than 1 NP,
these two unidirectional forward entries may be programmed on different NPs. Two NPs don’t share the same forward entries tables, they are unique to
each NP.
Forward entries are created following a hardware acceleration request from the kernel firewall module when the session is created.
The firewall module does not have the knowledge of how many NPs are available, the request is sent to the ‘NP driver’ module running in the kernel.
The NP driver then takes care of programming the two forward entries to the correct NPs for the session.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 31 www.fortinet.com
● Primary and secondary tables (PHT & OFT):
Forward entry lookup in the NP may be done in 1 or 2 stages and requires 2 tables : the primary table (PHT) and the overflow table (OFT).
The primary table uses a 5tuples hash to point to an index. The index is one entry in the session table.
Primary table is checked first. The 5 tuple hashing function may return a similar key for multiple sessions. In such situation, a second lookup in
the overflow table is performed to identify from these sessions which one corresponds to the processed packet. In this operation, each field of the
forward entries is compared to the packet header until the match is found.
Performance is higher if the primary table is smaller. The maximum number of linked entries in the overflow table is the table “depth”.
The following example shows a table depth of 3, so at least 3 entries have the same session key. A packet hashed with this key would need potentially
to try up to 3 entries until the match is confirmed by comparing the session fields.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 32 www.fortinet.com
● Session search engines (SSE)
The session search engine is at the heart of the NP hardware acceleration, it is the functional block inside the NP responsible for creating a hash
for each received packets and find out if a forward entry match with it in the tables. When a match is found, the packet is processed according to
the definition of the forward entry. The hash function in SSE is similar to CRC32 (110 bits). There is generally more than one session search engine
per NP (2 in NP4 and NP6). Different distribution mechanisms exist to balance packets across multiple SSE, however all packets belonging to the
same direction of the session must be processed on the same SSE. Before being processed in SSE, packets descriptors are buffered in a FIFO
queue in front of the SSE. In case of packet burst, packet may be dropped in the NP if the FIFO queue is full. NPs have a ‘diag hard’ command to report
SSE stats like installed sessions or dropped packets in the queues (refer to to NP chapter)
The setup of hardware acceleration has multiple steps. The steps depend on the traffic flow, for instance a unidirectional flow would only program a
single forward entry but two for a bidirectional session. The two forward entries are not programmed at the same time. Which packet triggers the
programming of the forward entries depend also on the protocol used:
● UDP : The original direction is programmed after the first packet is seen by the kernel.
The reverse direction is programmed when the first packet from the reverse direction is seen.
● TCP : Nothing happen at the reception of the first SYN packet. At the reception of the second packet (SYN/ACK) by the kernel, the reverse
forward entry (Server → Client) is programmed first. Then, when the third packet (ACK, client → Server) is received, the original
direction Client → Server forward entry is programmed.
Some cases were seen where this sequence may create outoforder packet during the session setup. To avoid it, a CLI parameter has been
added (config system npu → set delaytcpnpusession enable|disable*), refer to #365497.
Note : It is a common belief that TCP session acceleration all takes place at the 3rd packet from the session setup. This is incorrect.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 33 www.fortinet.com
● The session push
To trigger the creation of a forward entry in the NP, the kernel NP driver sends a special packet to the NP host interface (via the PCI bus), the
‘session push’ packet. This special crafted packet does not contain data from the user traffic but only administrative data from the session push
packet. Session push packets are intended for the SSE that will handle the remaining traffic. Upon its reception from the SSE, the forward entry is
programmed and the push packet discarded.
NP4 and NP6 implementation are a bit different. In NP6, the session push packet follows the exact same path as a regular data packet. Because
of this, it benefits from the natural distribution of IRQs across CPU cores so the session push cpu cost is distributed. The session push packet is sent
just before the corresponding data packets and follows the same path, uses the same queues and raises IRQ on the same CPU core. The session push
packet is expected to arrive before the data packet. In NP4, a dedicated command channel is used on the host interface and unlike NP6, this
command channel only raises IRQ on a single CPU core (the first one associated with the NP4 half). Upon burst of commands, this architecture can
cause an increase of the command queue and delay hardware acceleration programming, resulting in a few additional packets using slow path. Another
consequence is a not so well balanced system load on the 4 cores allocated to a half NP4.
Related reference : #365497 (possible packet outoforder with NP6 during TCP session establishment)
When traffic is hardware accelerated, the kernel has no visibility of packet ‘shortcuted’ in the NP. As a result, the kernel firewall sessiontimer would
decrease even if packets do flow. To keep the firewall session alive as if packets were seen. One update message is required per session. Two different
scheduling of updates for NP4 and NP6.
NP4 sends keepalive messages to the kernel every 40 sec for each live session.
With NP6, a new mechanism has been added where the session are updated based on the session expiration timer (established state timer for
tcp). The update is triggered when the session lifetime reaches a random value between 1/2 and 4/5 of the expiration timer. This is now the
default behavior on NP6. For NP6, the session update behavior is configured in ‘config system np6’, see “configuration options impacting NP6”
(#386626)
Session keepalive messages from the NP are asynchronous so not all session are updated at the same time : All NP entries have a timestamp to avoid
triggering session updates at the same time (unless the session were created at the same time). Session statistics are also updated by the update
message. When the session is deleted, NP generates an update message to update statistics.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 34 www.fortinet.com
Prior to NP6, NPs couldn’t provide accurate session statistic update (nb of packets and bytes) because NP was missing the capability to counter traffic.
This feature was added in NP6 (see Per session traffic accounting and traffic distribution). From testing (5.6.3), we see that kernel firewall session
update its statistic counter with the session update message so there is no specific message of statistic counter updates, it comes with session update
message.
# traffic logging is enabled on the policy (NP6 accounting is therefore automatically enabled in this version)
# constant telnet traffic passing through the NP session however the kernel session is not updated in realtime
# session dump 3 seconds before reaching half of the sessionttl (300/2 = 150)
session info: proto=6 proto_state=01 duration=152 expire=147 timeout=300 flags=00000000 sockflag=00000000 sockport=0
av_idx=0 use=4
originshaper=
replyshaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=log may_dirty npu f00
statistic(bytes/packets/allow_err): org=112/2/1 reply=60/1/1 tuples=2
tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0
orgin>sink: org pre>post, reply pre>post dev=10>32/32>10 gwy=10.5.21.2/10.100.5.2
hook=pre dir=org act=noop 10.100.5.2:2828>10.5.21.2:23(0.0.0.0:0)
hook=post dir=reply act=noop 10.5.21.2:23>10.100.5.2:2828(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=0
serial=0007e0ab tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=153/131, ipid=131/153, vlan=0x0000/0x0000
vlifid=131/153, vtag_in=0x0000/0x0000 in_npu=1/2, out_npu=1/2, fwd_en=0/0, qid=4/4
total session 1
# statistic counters are updated along with the session expiration timer reset
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 35 www.fortinet.com
session info: proto=6 proto_state=01 duration=154 expire=299 timeout=300 flags=00000000 sockflag=00000000 sockport=0
av_idx=0 use=4
originshaper=
replyshaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=log may_dirty npu f00
statistic(bytes/packets/allow_err): org=10172/193/1 reply=7278/103/1 tuples=2
tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0
orgin>sink: org pre>post, reply pre>post dev=10>32/32>10 gwy=10.5.21.2/10.100.5.2
hook=pre dir=org act=noop 10.100.5.2:2828>10.5.21.2:23(0.0.0.0:0)
hook=post dir=reply act=noop 10.5.21.2:23>10.100.5.2:2828(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=0
serial=0007e0ab tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=153/131, ipid=131/153, vlan=0x0000/0x0000
vlifid=131/153, vtag_in=0x0000/0x0000 in_npu=1/2, out_npu=1/2, fwd_en=0/0, qid=4/4
total session 1
# Next statistic counter update will also need to wait for the next half lifetime of sessionttl
When a session is removed from the firewall function in the kernel, the correspondent forward entries are also removed from NP.
Forward entries are also deleted from NP upon a routing lookup change or a policy configuration change. After session revalidation, forward entries are
reinstalled to NP.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 36 www.fortinet.com
● Hardware accelerated sessions across 2 NPs
As mentioned before, the 2 unidirectional forward entries resulting from a bidirectional accelerated session may be created on 2 different NPs.
We will review how packets are flowing. There are two different cases, the first one “cross NP acceleration with ISF” is when the FortiGate platform has
a Internal Switch Fabric (ISF) between NPs and ports. This is generally the case for high end unit, and always the case when NP4 are involved. The
second option is a possible option on “mediumrange” units equipped with the NP6 formfactor (3x 10G + 16x 1G, see NP6), this is “cross NP
acceleration without ISF”. For each scenario, the two cases “hardware accelerated or not” will be considered.
The yellow flow on the left has ingress and egress ports attached to the
same NP. Traffic reaches the CPU using the PCIe bus, making use of system
CPU for ‘Soft IRQ’ first (see MSIX interruptions), then packets are processed
by the kernel for delivery.
The green flow on the right has ingress and egress ports attached to
different NPs on the same ISF. Each NP sees a unidirectional traffic. Traffic
flows to kernel via PCIe bus and also make use of system CPU.
The yellow flow follows the same path across the ISF but is
‘shortcuted’ inside the NP so does not make use of any CPU at all.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 37 www.fortinet.com
For the “crossnp” green flow, we need to decompose each direction:
original direction in green (left), and reverse direction in blue (right) and
focus on which NP 10G xaui the traffic leaves the NP. The rule is : The NP
10G xaui used to egress from NP is the same xaui position as the one
where the traffic egress port is attached.
Without ISF, the only way to push traffic over to the other NP port is
to pass by the 10G link internp6 link. The example shown here is a
FortiGate900D with 10G ports + 6x 1G ports per NP so it there is a
potential to oversubscribe the 10G internp6 link.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 38 www.fortinet.com
● hardware acceleration with 2 asymmetric wan interfaces
This scenario involves 3 interfaces : a single interface on client side and 2 interfaces on server side (wan1 and wan2). Packets of the session egress on
wan1 but reply packets are received from wan2. This scenario is supported by the kernel : a single session is used and session is stateful however this
will be problematic with hardware acceleration. The sessions will keep changing state from non accelerated to one way accelerated and will constantly
be dirtied. It is not possible to support a stable hardware acceleration of this kind of sessions
In more details :
the session can be offloaded if multiple consecutive packets arrive in the original direction
as soon as a packet is received in the reply direction on an interface other than the one used to egress, then the session is marked ‘dirty’ and
the NP hardware accelerated session is removed (as expected)
if the session is bidirectional, it may end up perpetually in the dirty state or go in an out of the dirty state.
references :
#0464329: B1547 : no hardware acceleration when using different ports for egress to server and ingress from server
top3 #464594
● public references
http://docs.fortinet.com/uploaded/files/2855/fortigatehardwareacceleration54.pdf
The FortiGate Hardware Acceleration guide version 5.4.1 provides architecture diagram for all platforms.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 39 www.fortinet.com
Link aggregations
Session hardware acceleration is also available in the case of aggregated interfaces. NP allows link aggregation of interfaces attached to a single NP
but also when the aggregation is made of interfaces distributed over different NPs.
Using link aggregation across multiple NP provides a way to increase the performance by making use of the power of more than 1 NP.
The command showing interface mapping to NP XAUI ( “diag npu np6 portlist" ) does not consider if link aggregation is configured on not. With
redundant interface or lag, traffic received from an interface may be potentially sent to a different NP or XAUI than the one the interface is attached.
In a similar way, if relying on the sw_np_port from ‘diag hard dev nic’ to see on which NP/XAUI the interface is linked, may provide erroneous
information in case of lag or redundant interface : in this case, the reference returned is a ‘trunk id reference’ and not a ‘port reference’ (#389055).
Link aggregation is supported on NP, however controlplane LACP traffic is handled through the kernel.
The CLI command (in management vdom) ‘diag netlink aggregate name <aggregateName> ’ ligne ‘npu: y’ provides confirmation that
hardware acceleration is performed.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 40 www.fortinet.com
● Port hashing
On NP4, the choice of which port should be used to send traffic (so the hashing) is not done by the kernel. This choice is programmed with the
session creation in the NP (#126252).
On NP6, a trunk can be defined with and LIF associated to the trunk. The NP6 SSE makes a porttotrunk lookup from the mapping table to
define if the port is associated to a trunk, if so, then, the choice of egress port is done through a lookup in the trunk table. The programming of trunk
tables is done from FortiOS and can be programed to control traffic distribution. The port resolution is made so that if there is no NAT, forward and
return packet are using the same port.
Packet to be sent to a LAG has destination LIF corresponding to a LAG → Porttotrunk resolution changes the destination LIF with the LIF of the
chosen egress port.
● asic helper: y
Refers to the feature on the ISF that would distribute packets received on FGT port that are in a LAG to multiple NP6 XAUI involved in the lag.
#218813 defines asichelper : The NP6 introduces a new mechanism to help link failover for bond interface. The mechanism is disabled if ASIC
helper is set to disable. Basically, there are two phases during link failover, link down detected and traffic failover to a good member in bond
interface. The new mechanism doesn't help first phase, link down detection. But it help second phase, fail over to a good member. What's meaning
from application point view? Failover time is reduced, For TCP connection, the connection isn't requested to establish again. The asichelper has
to be disabled before you can add any 40G ports into an aggregated interface
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 41 www.fortinet.com
Diagram representing processing of inbound packet received on FGT’s port towards NP6 when port is member of a LAG.
The lag is defined in the ISF, the asichelper makes the decision to forward the packet to one of the 2 NP6 XAUIs attached to either port1 or port2.
The balancing between NP6s is based on a 5tuple hash. Inside the ISF, a coretag corresponding to LAG interface is used.
Once arrived on NP6, the ITP module traduces the LAG coretag to LAG LIF for NP6 processing.
(information confirmed by Yi)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 42 www.fortinet.com
Diagram representing processing of outbound packet when the destination fortigate interface is a lag : Inside NP6, packet destination is a LAG LIF.
SSE converts the LAG LIF to a fortigate port LIF attached to the NP XAUI (trunktoport conversion). In ETP, the port LIF is translated to a coretag for
transport inside ISF. The decision to egress port is not influenced by the ISF LAG, it is made only by NP6 from its trunktoport conversion.
(information confirmed by Yi)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 43 www.fortinet.com
Lag enhancement feature
Introduced in top3 #464594 on special branch br_54_nokia (5.4.7) for FortiGate3800D. Implemented also in non service module units for FortiGate3700D in
top3 #469106. Merged in 6.2, merges planned for 5.6.5 and 6.0.2 for FG38xxD, 39xxE, 5001E, 6K, 7K + 1200D → 3700D
The goal is to reduce NP6 EHP drops caused by egress collision from multiple sources to one single egress XAUI.
The solution is to force NP6 traffic to egress on the same XAUI from which traffic was originally received from.
This limits the congestion of all traffic received from the ISF. A possible collision still exists between egress kernel trafic with NP6 traffic but this condition
is less likely to happen and could still be handled by NP6 buffers. This new distribution allows a control of egress as a direct consequence of ingress
control on the same XAUI. Ingress XAUI congestion is much better handled thanks to ISF larger buffers.
no configuration required
→ reboot required
→ also automatically changes lag interface mode :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 44 www.fortinet.com
What is the best choice or ports in a lag ?
Scope :
This section applies to platforms having a fixed binding between interface and NP6 port XAUI (FortiGate1500D, FortiGate3700D…),
It does not apply to platform like Fortigate3800D that don’t have such binding and where all NP6 XAUIs are within a LAG on the ISF.
● if not using lag enhancement, applicable to unit with ISF (1200 to 3700D)
Distribute lag ports on each NP6 so make sure each NP6 can be used so the pressure of the traffic is distributed on multiple (all) NP6s.
In a lag, choose different XAUI id, even if spread amongst different NP6s.
use nonconnected interfaces in the lag definition to distribute on even more NP6 if needed.
● with lag enhancement : config system npu → set lagswouttrunk enable feature (5.6.5, 6.0.2, 6.2), applicable to unit with ISF (1200 to 3700D)
use all possible XAUI from all NP6 however do not mix 10G and 40G lags on the same NP6
use nonconnected interfaces in the lag definition to distribute on even more NP6 if needed
In case of multiple 10G LAG, also use all XAUI even if already used by another NP6.
Egress congestion is control by ingress (what egress a xaui has ingressed from this xaui so the throttling is managed at ingress using ISF
buffers)
Note : not prefered for ipsec concentrator
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 45 www.fortinet.com
Example from Nokia using set lagswouttrunk enable feature
In this example :
Dedicated 2x NP6 for 2x 40G port lag, not mixed with the 2x 10G lag XAUIs
Each 10G lag has 8 ports configured but only 4 connected to benefit from 8 XAUIs on 2 NP6s on ingress
The 2x 10G LAGs are sharing XAUIs allowing a full distribution on all 2 NP6 XAUIs, the collision of the 2 LAG traffic is done on ingress
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 46 www.fortinet.com
DSCP marking
DSCP marking consists in setting up the DSCP bits on the received packet by the NP itself allowing hardware acceleration.
Seems to be already supported and NP4 and also available in NP6 hardware (to be confirmed with NP6 with lab testing)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 47 www.fortinet.com
Per session traffic accounting and traffic distribution
● per session traffic accounting may not be accurate with hardware acceleration, same with traffic distribution
NP may not have the capability or may not be explicitly configured to report accurate accelerated traffic volume or packet volume. In this case, as
session accounting is done by the kernel and because accelerated traffic is not visible by the kernel, the reported traffic volume and packet rate
provided at the termination of the session may be wrong. Kernel typically reports the few packets seen during session setup and terminaison
which is not representative of the traffic that has flown in the session. Before the NP6 (FA2, NP2, NP4), NPs did not have the capability to report
the session traffic volume accurately.
NP6 has this capability if configured to do so, with impact on performance. To update the kernel statistics, NP generates session update (default
every 40 seconds.
NP6 and SoC3 have been improved to allow upon a configuration change accurate per-session accounting. Enabling per-session accounting
may cause a CPS drop up to ⅓. When the feature is enabled, for each packet received, the NP need to update its session counters.
There are changes of behavior and default settings done in 5.2 and 5.4 branch (see below)
○ since 5.4.0 : automatic switch as soon as log enabled on the policy (#268426, #273376) :
“Since the session accounting is most useful in traffic log, we should tie it to traffic log. Specifically, if in a policy, traffic log is not enabled, we
don't enable traffic accounting on NP6, this will help to preserve the NP6 performance in terms of throughput. When in a policy, traffic log is
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 48 www.fortinet.com
enabled, we automatically enable session accounting for all sessions allowed from that policy. This will help the traffic log to record correct
bytes/packets information. Alls this should be transparent to the user, so the benefit is that end user will see correct accounting information from
traffic log (with or without NP6)” (quoting of #268476)
● upgrade from 5.2 to 5.4.x changes per-session-accounting default setting from ‘disable’ to ‘enable-by-log’ (#273377).
When upgrading from 5.2, the npu setting per-session-accounting default settings changes from ‘disable’ to ‘enable-by-log’
This causes a potential risk of performance drop in CPS after upgrade.
● Impact on CPU : may lower CPS up to ⅓ (#251207) : “ NP6 supports per-session accounting. But it brings extra overhead on packet forwarding
rate, because we need to write into DDR memory where the session is saved. Packet forwarding rate may lower by 1/3 when it is enabled for small
packet flow. For big packets(>1K), the impact is not noticeable. It doesn't have impact on session offloading.”
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 49 www.fortinet.com
ipv6 unicast session acceleration
to be detailed
to be detailed
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 50 www.fortinet.com
IPSec encryption/decryption and hashing
ESP packet encryption, decryption and hashing can be performed by the network processor.
Encryption and hashing algorithms supports depends on NP model and revision (see the specific NP chapters).
This chapter is covered extensively in Stephane Hamelin’s IPSec guide for TAC hosted at :
FortiVision → GCSS → TAC → TAC related Trainings → IPSec VPN Training Material
We will only review a few key points related to hardware acceleration in this chapter
When non NP ports are involved, the CP can be used to offload the kernel from cryptographic functions.
● required configuration
Since FortiOS 5.0, hardware acceleration on NP does not need to specify the local gateway ip as it used to be in 4.3.
Hardware acceleration is the default choice in the phase1 configuration (set npuoffload enabled on phase1)
● SAs installation
IPSec SAs are installed on the NPU with the first packets flowing. Both SA may be installed during two different phases. If a tunnel is up but no packets
have flown, it is expected that tunnel list reports no hardware acceleration because no SA has been installed yet.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 51 www.fortinet.com
● Forwarding session
IPsec acceleration SA are only programmed on a single NP even if the tunnel is attached to a link
aggregation or if the phase2 may see traffic from different incoming port. In this case the IPSec
accelerated outbound packet may not be received on the NP carrying the encryption SA. For this
scenario, the NP6 driver also installs ‘forwarding sessions’ on the NPs attached to the other ports (of
the lag for instance). The goal of the ‘forwarding session’ is to forward the packet to the NP handling
the encryption SA. This is done through the ISF and it is transparent to the kernel.
Forwarding session are not processed by NP6 SSE by the FDB module
Forward session are accounted in NP6 session stats are regular session, they can’t be distinguished.
The cost in NP resources of the forwarding session is much less than the cost of
encryption/decryption but NP buffers still have to process traffic.
From Steph tests in labs with 5.4.1 with FG1500D : encrypt SA and decrypt SA are installed on
the NP6 linked to interface on public network (NP_2 here). A Forward entry is required on
NP6_1 to push clear text traffic received on NP6_1 to NP6_2 for encryption. If lags are used,
encrypt and decrypt SA may be installed on different NP)
session info: proto=1 proto_state=00 duration=114 expire=30 timeout=0 flags=00000000 sockflag=00000000 sockport=0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 52 www.fortinet.com
av_idx=0 use=3
originshaper=
replyshaper
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/swan_p1 vlan_cos=0/255
state=may_dirty npu synced
statistic(bytes/packets/allow_err): org=168/2/1 reply=168/2/1 tuples=2
tx speed(Bps/kbps): 1/0 rx speed(Bps/kbps): 1/0
orgin>sink: org pre>post, reply pre>post dev=141>133/133>141 gwy=10.10.5.40/10.118.0.1
hook=pre dir=org act=noop 10.118.0.1:25730>10.10.5.40:8(0.0.0.0:0)
hook=post dir=reply act=noop 10.10.5.40:25730>10.118.0.1:0(0.0.0.0:0)
misc=0 policy_id=1 auth_info=0 chk_client_info=0 vd=4
serial=260fa445 tos=ff/ff app_list=0 app=0 url_cat=0
dd_type=0 dd_mode=0
npu_state=0x000c00
npu info: flag=0x81/0x82, offload=8/8, ips_offload=0/0, epid=572/572, ipid=1005/572, vlan=0x80c0/0x83db
vlifid=1005/572, vtag_in=0x0000/0x03db in_npu=1/3, out_npu=1/3, fwd_en=0/0, qid=⅔
# Comment : in_npu = 1 / 3 => SA installed in NPU_0 (need to remove 1 !) and FWD in NPU_2 (remove 1)
# Comment : out_npu = 1 / 3 => same the opposite for the reply direction
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 53 www.fortinet.com
This command does not work for NP4
● ipsec engines
NP have multiple ipsec engines and subengines. For the same tunnel, a loadbalancing is done across the engines. NP6 has 2 ipsec engines which
have each 8 subengines. (detailed in NP6 chapter)
The genuine ‘diag vpn tunnel list’ command tells if ipsec hardware acceleration is performed in NP6.
00 Session is not (or not yet) hardware accelerated in NP. No SA yet pushed to NP
40 NPU SA sequence number space has been exhausted. The SA should no longer be
used.
80 Dirty flag of the NPU SA is set. The SA is expiring and should no longer be used
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 54 www.fortinet.com
Warning : remember that an initial packet on the direction is required to trigger the SA copy on the NP. Because of this, it is normal to have a
tunnel not showing npu_flag=0 if packets have not yet used the tunnel in each directions. Check carefully the number of packet dec and enc for both
directions, at least 1 packet should have gone through the tunnel to have a correct hardware acceleration statement for this direction.
Improvement in 5.4 : Since NP index has been added to tells on which NP the SA was installed (dec_npuid and enc_npuid) where :
0 means no SA copy, 14 is the NPU_id+1 (you need to remove 1 to the value to get NP6 id, so 5 is for NP6_4).
Because of bug (#375910) the 2 directions are inverted ! so enc is dec and vis versa (fixed in 5.6)
3810D182 # dia vpn tunnel list name p144v101
list ipsec tunnel by names in vd 0
name=p144v101 ver=1 serial=1 2.2.0.2:0>2.2.0.1:0
bound_if=33 lgwy=static/1 tun=intf/0 mode=auto/1 encap=none/8 options[0008]=npu
proxyid_num=1 child_num=0 refcnt=2064 ilast=7 olast=7 autodiscovery=0
stat: rxp=172930240 txp=0 rxb=239155512922 txb=0
dpd: mode=ondemand on=1 idle=20000ms retry=3 count=0 seqno=0
natt: mode=none draft=0 interval=0 remote_port=0
proxyid=p244v101 proto=0 sa=1 ref=2050 serial=1
src: 0:0.0.0.0/0.0.0.0:0
dst: 0:0.0.0.0/0.0.0.0:0
SA: ref=4 options=2e type=00 soft=0 mtu=1280 expire=40253/0B replaywin=2048 seqno=1 esn=0 replaywin_lastseq=0a4ea4c0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 55 www.fortinet.com
life: type=01 bytes=0/0 timeout=43175/43200
dec: spi=3ef55ca4 esp=aes key=16 8d9b784f67dd54f194c8466c0a2237ea
ah=sha1 key=20 a37578d17472ae044efc87969ce0b46f3083b8f9
enc: spi=9d214199 esp=aes key=16 4f74ec0d175d706cd52654c86b9aa1b1
ah=sha1 key=20 19490174793b766cf0692c150ec75bac6d59fa1f
dec:pkts/bytes=172926203/239149873376, enc:pkts/bytes=0/0
npu_flag=02 npu_rgwy=2.2.0.1 npu_lgwy=2.2.0.2 npu_selid=0 dec_npuid=0 enc_npuid=5
● outoforder limitation (in ipsec context)
The load balancing across subengine for the packet targeted to the same SA may cause outoforder packets situation. When the traffic flow is made of
long packets followed by short packets because short packets would take less time to process and may get out earlier than the preceding long packet.
Some workaround could be made with special images (ex: special image fg_50_Orange_LTE_269247/build_tag_8942 based on 5.0.10), see top3
#269247). This build introduce new CLI command to control the number of engines used for inband and outband :
Note : If antireplay is enabled for IPSEC, this should automatically configure 1 IPSEC engine for decryption, and keep the
configured ones for encryption (antireplay bug workaround)
A followup bug has been opened to merge special image new cli commands:
#370586 Add CLI commands to configure limited IPSEC engine on NP6 to solve outoforder issue (not merged as per today)
● antireplay limitation
Significant packets dropped may occur with ipsec hardware acceleration and antireplay enabled. This is due to a hardware limitation (#275195).
When the number of tunnel is significant (more than 50), it is not recommended to use antireplay with NP6. If this is a strict requirement, special image
exist to limit the impact however the performance would still be a significantly degraded to ¼ or ⅕ of the performance.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 56 www.fortinet.com
● Disabling hardware acceleration
● on CP
● Disable globally CP
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 57 www.fortinet.com
Notes :
● NP4 can’t do SHA2 HMAC, watchout the proposals to have traffic hardware accelerated
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 58 www.fortinet.com
● Legacy NP2/NP4 constraints
Replay detection : With replay detection set, encryption/decryption and hashing may be left to Content processor (CP) or processed by NP2/NP4
depending on 'config system npu' settings and depending on the software version.
Note : encoffloadantireplay and offloadipsechost must have the same value in NP2, NP4 context where this settings are used.
For NP4 : Settings applicable for NP4 between 4.3.10 and 5.2.2 (not since 5.2.3). Does not apply to NP4Lite (>= 4.3.10) and NP6 where 'config system
npu' is always ignored.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 59 www.fortinet.com
Passthrough ESP session acceleration
The goal of the feature is to have NP accelerating ipsec ESP passthrough traffic. In this context, the Fortigate is not the ipsec tunnel endpoint but
just sitting in between 2 ipsec gateways without nat. The fortigate sees incoming ESP packets (ip proto 50) and need to process this traffic and
egress on the interface towards the destination ipsec gateway. Originally ESP passthrough traffic was not eligible for hardware acceleration
This was added as of 5.2.2 and is supported since 5.4 GA
History:
This feature was originally implemented on NP4 through a special image (#229874)
It was also implemented in NP6 (#253221)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 60 www.fortinet.com
inter vdom (npuvlink) traffic acceleration
Intervdom links were available since a long time on FortiOS, they allow traffic to transit from a vdom to another one via a logical interface handled by
the kernel and therefore can’t be hardware accelerated which is a big problem for performance in MSSP scenario.
Hardware acceleration intervdom link, called ‘npuvlink’ has started since NP4 and was enhanced in NP6.
Genuine nonaccelerated ‘vdomlink’ should be avoided when NPs are available on the unit.
● concept
● To interconnect multiple vdoms, you need to create vlan interfaces based on the npu_vlink (virtual npu_vlink). Both ends of the virtual npu_vlink
should be on the same vlan.
○ example npu0_vlink0_100 (on vlan 100, based on interface npu0_vlink0)
and npu0_vlink1_100 (on vlan 100, based on interface npu0_vlink1)
● associate a vdom to each virtual npu link
● See also “Configuring InterVDOM link acceleration with NP6 processors” in Fortigatehardwareacceleration.pdf” for more details
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 61 www.fortinet.com
● NP6 implementation
If the external ports from the 2 connected vdoms with npuvlink are on the same NP6, there is no packet leaving the
NP6 before the packet egress from the second vdom. In the example above, if ‘port1’ and ‘port2’ are both attached
to the same NP6, once the session is accelerated, a packet would enter the NP6 by port1 and be processed for both
vdoms ‘vdom_A’ and ‘vdom_B’ without leaving the NP6 at all. It would only get out from NP6 to egress on port2.
Of course if port1 and port2 would be linked to 2 different NP6s, packet would need to reach the second NP6 through
the NP6 to NP6 link provided by the ISF. The same would also apply if the traffic ingress or egress from LAG
interfaces where ports are distributed amongst different NP6s.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 62 www.fortinet.com
NP6 : which XAUI on egress when reaching npuvlink from a different NP6 ?
In the context of the example just above (using npuvlink from different np6 to distribute vdoms connections), the question is :
If the session is passing through 2 NP6 using npuvlink and the npuvlink is delivered by the second NP6, which XAUI will be used on the first NP6 to
reach the second one ? (see diagram).
The flow is :
session is accelerated on a 1st vdom process on NP6_0 (because the ingressing port is attached to NP6_0)
session is pushed via an npuvlink delivered by NP6_1. For this, packet need to egress on NP6_0 XAUI to reach ISF
⇒ (question 1) which XAUI is used to egress on first NP6 ?
packet coming from ISF enters NP6_1.
⇒ (question 2) Which XAUI is used to ingress on second NP6 ?
Tested with a FortiGate1500D with 2 pairs of ports [port33, port38] and [port35, port38]
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 63 www.fortinet.com
Answers are :
Question 1 : XAUI for egress is the same as the one used to ingress on first NP6
Question 2 : XAU for ingress on the second NP6 has the same ID as the XAU used for egress on the first NP6 :
Example port33/port38 : XAUI 0 is used to egress on NP6_0 so XAUI 0 is also used for ingress on NP6_1
Example port35/port38 : XAUI 2 is used to egress on NP6_0 so XAUI 2 is also used for ingress on NP6_1
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 64 www.fortinet.com
● limitations
● NP4 implementation
The NP4 implementation requires the packet to be sent to the ISF even if the ingress and egress ports or connected to the same NP4.
In NP4 implementation, the npuvlink pointtopoint is a logical interface where each end is attached to an NP4 10G port.
● References :
Expert Academy 2016
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 65 www.fortinet.com
IPS traffic fastpath (Nturbo acceleration) IPSA (IPS acceleration)
Keep in mind that NTurbo is essentially a Software solution based on the kernel.
The rule of the NP/Soc is only reduced to pushing the packet from traffic flows eligible for IPS/NTurbo acceleration to dedicated channels on the
Host interface instead of the “perdefault” channel to the kernel. From this, traffic is handled in a different way by the kernel that allows a fastpath
and a better distribution to ipsengine processes.
The main focus of NTurbo is to increase the IPS processing performance by distributing the cost of processing to different CPU cores. One of the
idea is to avoid using the same CPU core used for IRQs and ipsengine processing. For this, a lot of attention is given to the balancing of IRQs used with
NTurbo ips acceleration across the CPU core. Each platform has its own hardware characteristics such as different types and number of processors
having a different number of cores, use of hyperthreading or not, different type and number of hardware acceleration chips… The consequence is that
getting the best performance of ips acceleration for each platform requires different choices of settings per platform such as :
more or less CPU cores (and therefore Host interface channels) used for packet transfer between NP6 to Kernel and kernel to NP6, more or less
ipsengines processes each one bound to a dedicated CPU...
This topic has been covered in one of the chapter from Expert Academy 2016 Support Team section where the overall solution is explained, please
refer to Expert Academy 2016.
● NP6
See NTurbo NP6 IRQ mapping for the NP6 contribution to the NTurbo based IPS acceleration.
● SoC3
The SoC3 integrated chip has all the requirements to allow ips Nturbo acceleration making it available to SoC3 based ‘E’ platforms such as :
FortiGate60E, FortiGate90E, FortiGate100E, FortiGate200E,...
● Legacy NP4
The first NTurbo acceleration has started on NP4 enabled platforms such as FortiGate3240C, FortiGate3600C, FortiGate5001C however NP4 had a
weakness : it can only be bound to 8 IRQs and more is required to use dedicated channels on the Host interface.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 66 www.fortinet.com
Because of this, it is not possible to perform ips acceleration with an NP4. The solution used is to let another additional chip with large IRQ capability in
charge of pushing packets to the kernel through the dedicated host interface channels. This chip is an Intel 82599 10G interface with its XAUI
connected to the switch fabric. NP4 only has to transfer packets to it via the ISF.
The following diagram compare the 2 different architectures : The NP4/Intel 82599/ISF solution is on the left, the native NP6 solution is on the right.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 67 www.fortinet.com
HA AA loadbalancing
This feature is available since the first NPs. It is made to offload the CPU of the master unit from an activeactive cluster. In activeactive
loadbalancing all traffic received on the cluster reaches first the master unit. The master unit would then retransmit eventually the packet to one of the
active slaves for loadbalancing purpose. For the packet retransmission from Master to Slave the master changes the source and destination mac
address so the sourcemac is the master interface mac address sending the packet, and the destination mac address is the targeted slave real mac
address (instead of the HA virtual MAC originally used).
When the ingressing packet reaches an NP interface, the kernel creates a session and sends a request to the NP to position the forward entry so
packets are retransmitted to the designated slave.
The retransmission of the following packets of the sessions is handled by the NP until
the session is deleted.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 68 www.fortinet.com
Traffic shaping
● Concept
NPs may have builtin basic traffic shaping engines. Their only purpose is to set a traffic cap for packet rate.
No real shaping is done in a sense that packets are not delayed, the feature is much closer to traffic policing where packets are dropped to adjust the
rate to the threshold.
There are two different independent shapers configured from the same common CLI commands:
Both are doing their work independently from the other one. The kernel shaping has more options, such as bandwidth reservation or priority queuing,
interface base limit, which are not available on the NP shaper. The policing algorithms are also different so a session may be shaped with a different
pattern when it becomes accelerated.
Note from #373203 :
The difference between software and NP is shaper implementation in software has more buffer to smooth out the traffic burst. NP drops packets when traffic rate is higher than
configured value without any buffering mechanism. Packet drop will trigger TCP congestion control. Host TCP stack will lower the flow to half of original throughput or lower.
This is a hardware limitation. Please disable npu offloading in policy for shaper when the bandwidth limit is low
Using NP shaping should only be used in simple traffic policing scenario if required.
NP measure traffic rate by using shaper objects which are limited resources on NPs and can be overrun (for instance if per ip shaper is used).
When more control is needed on shaping, the only way is often to get to bypass NP hardware acceleration so only the kernel shaper is used. This is the
only way if the flows to shape are not all ingressing/egressing on pairs of interface connected to the NP
● Forwarding sessions
Just like IPSec, traffic shaping in NP6 can only be done be done on one single SA. All packets from different ports sharing the same shaper have to be
sent to the same NP6 for an accurate accounting. This is done by installing forwarding sessions.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 69 www.fortinet.com
Syn proxy
is the first NP to implement a SYN proxy feature. This feature was available on legacy SP2,SP3 chips, it is configured the same way.
The configuration is the same as a genuine DDoS profile with the addition of action=’proxy’
A threshold has to be defined, it defines at which packet rate the syn proxy feature should get activated.
● Generic SYN proxy principle The FortiGate is acting as a proxy for 3way handshake SYN, SYN/ACK, ACK packets. It provides a better protection
against SYN/Flood attack compared to DoS action ‘block’. Legitimate tcp connection with a proper handshake are allowed, even if their connection rate
is higher than the defined threshold while SYN attack packets are dropped.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 70 www.fortinet.com
● references
● #218425
● #272927 (lifetime of proxy session can be defined in config system np6 → edit np6_x → set garbagesessioncollector enable & set
sessioncollectorinterval 8)
● #370592 DOSProfile using parameter "tcp_syn_flood", option "set action proxy" sets TCP window size value to 0 and no options in :
FGT send SYN/ACK with windows size is 0, and no option. that because you enabled synProxy, FGT as a "man" in the
middle.
1. FGT/NP6 received a SYN, it will send a SYN/ACK to client, next if Client send back a ACK, NP6 will do synflood
checking, if it is not attacking. Then FGT will initiate a SYN to real server. if it is attacking, then No SYN was
send to server.
1. WindowSize was set to 0, this to prevent client send any data pkt, AS before NP6 check the ACK packet.
the real connection to server has not been established.
2. No option at first SYN/ACK, this because before SYN was snd to server, FGT doesn't known what option server will
response.
3. After ACK from client passed NP6 checking, NP6 initiate SYN to server, server will response real SYN/ACK,
now FGT known all option and window size, then FGT will send final correct SYN/ACK PKT to client.
● Configuration example
config firewall DoSpolicy
edit 1
set interface "port5"
set srcaddr "all"
set dstaddr "all"
set service "ALL"
config anomaly
edit "tcp_syn_flood"
set status enable
set log enable
set action proxy < new option
set threshold 1 < unack'd syn threshold
next
end
next
end
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 71 www.fortinet.com
● monitoring
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 72 www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 73 www.fortinet.com
HPE protection
● concept
This feature was introduced by top3 #363398 as a workaround to limit kernel impact on DDoS attack on an SLBC cluster.
The concept is to apply a traffic policer on the NP towards the host interface (path to the kernel, via PCI bus), protect the kernel from bursts of packets
that may affect the unit stability. Obviously, the hpe policer is not applied on hardware accelerated traffic.
It could be used to recover a working access or troubleshoot a unit under dos attack.
There are several queues on host interface, all the queues are considered.
The feature was originally planned from 5.4.3 but has been only merged in 5.6.0.
Available via special image in 5.4.2 #395452 (fg_54_HPE/build_tag_9739) and #441731 (fg_54_orange_gi/build_tag_3250)
There are no logs generated when the level is reached, only dce counter ‘diag npu np6 dceall <np_id>→TPE_HPE’ would increase.
● configuration
typeshapingtcpsynmax NPU HPE shaping based on the maximum number of TCP SYN packets received (10000 10000000000 pps, default = 5000000).
typeshapingtcpmax NPU HPE shaping based on the maximum number of TCP packets received (10000 10000000000 pps, default = 5000000).
typeshapingudpmax NPU HPE shaping based on the maximum number of UCP packets received (10000 10000000000 pps, default = 5000000).
typeshapingicmpmax NPU HPE shaping based on the maximum number of ICMP packets received (10000 10000000000 pps, default = 1000000).
typeshapingsctpmax NPU HPE shaping based on the maximum number of SCTP packets received (10000 10000000000 pps, default = 1000000).
typeshapingipsecespmax NPU HPE shaping based on the maximum number of IPsec ESP packets received (10000 10000000000 pps, default = 1000000).
typeshapingipfragmax NPU HPE shaping based on the maximum number of fragmented IP packets received (10000 10000000000 pps, default = 1000000).
typeshapingipothersmax NPU HPE shaping based on the maximum number of other IP packet types received (10000 10000000000 pps, default = 1000000).
typeshapingarpmax NPU HPE shaping based on the maximum number of ARP packet types received (10000 10000000000 pps, default = 1000000).
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 74 www.fortinet.com
typeshapingothersmax NPU HPE shaping based on the maximum number of other layer 2 packet types received (10000 10000000000 pps, default = 1000000).
● references
● #384692 (HPE traffic policer is not reported in "diagnose npu np6 npufeature")
● #389845 (HPE shaper does not distinguish base and fabric interface in SLBC)
The Session Search Engine component of the NP is at the heart of it by referencing a chain of interfaces where packet should be sent.
There is a current limitation to 256 destination for 1 multicast session (a top3 case increase to 20k, reference to find)
References : top3 #272428 (add NP6 hash algorithm with src/dst ipaddr to assure same multicast flow to same cpu core)
● limitations :
○ #383624 multicast traffic on npuvlan cause PBA leak
⇒ Fix : Disable multicast offloading across npu intervdom link (see NP6 limitations)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 75 www.fortinet.com
SCTP traffic hardware acceleration
to be developed
to be developed
● references
● #259431
fpanomaly
NP have a simple builtin packet anomaly engine providing protection from a few wellknown attacks based on malformed or unexpected types of
packets. The configuration is CLI only. per interface, under config system interface (set fpanomaly […]).
It is available for both IPv6 (NP7 only) and IPv4 and is per default disabled.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 76 www.fortinet.com
set ipv4protoerr {allow | drop | traptohost}
set ipv4unknopt {allow | drop | traptohost}
set tcpland {allow | drop | traptohost}
set tcpsynfin {allow | drop | traptohost}
set tcpwinnuke {allow | drop | traptohost}
set tcp_fin_noack {allow | drop | traptohost}
set tcp_fin_only {allow | drop | traptohost}
set tcp_no_flag {allow | drop | traptohost}
set tcp_syn_data {allow | drop | traptohost}
set udpland {allow | drop | traptohost}
end
config fpanomalyv6
set ipv6daddr_err {allow | drop | traptohost}
set ipv6land {allow | drop | traptohost}
set ipv6optendpid {allow | drop | traptohost}
set ipv6opthomeaddr {allow | drop | traptohost}
set ipv6optinvld {allow | drop | traptohost}
set ipv6optjumbo {allow | drop | traptohost}
set ipv6optnsap {allow | drop | traptohost}
set ipv6optralert {allow | drop | traptohost}
set ipv6opttunnel {allow | drop | traptohost}
set ipv6protoerr {allow | drop | traptohost}
set ipv6saddr_err {allow | drop | traptohost}
set ipv6unknopt {allow | drop | traptohost}
end
● references
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 77 www.fortinet.com
● See hardware acceleration guide (docs.fortinet.com)
It is well known that proxy based UTM breaks hardware acceleration, there are however more features, not so obvious that would lead to the same
effect. They may be configured at different level like on the interface, on the policy or on a dedicated configuration statement. A new feature introduced
in 5.4 adds a new line “no_ofld_reason” in the session list to provide more information on the reason why the session is not offloaded.
Note : there were actually 2 steps leading to the “no_ofld_reason”, the first step introduced in #245447 added a line in session list like :
NPU driver internal error: code=7. < this line shows why np4 is not offloaded.
But the error code was a bit cryptical and was later changed to the more user friendly ‘no_ofld_reason:’ line ( mantis reference ?)
This section tries to summarize this cases with more details. This section requires more testings and update. Inputs are welcome !
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 78 www.fortinet.com
Feature / conditions no_ofld_reason configuration sample or diag command status Mantis #
session is dirty dirty Routing and/or config change while session is in not verified #381788
established state and no new revalidation packet
seen yet.
Session not in established state notestablished TCP session is not in its established state not verified #387310
(proto_state=01)
ESP acceleration not supported. offloaddenied NP4 and Soc3 (NP4Light) don’t support ESP hardware confirmed #310606,
Protocol is not supported for Offload acceleration by design (NP4 has special build #308902
#229874).
A protocol is not suitable for offload
Access from SSLVPN portal, local Session is accessed through an SSL portal (case1) not verified #387629,
Explicit proxy involved Explicit proxy is used on the FortiGate (case2) #377926
Hw acceleration has been disabled on disablebypolicy Hardware acceleration has been disabled by policy not verified #386626
the policy. config system interface
edit <intf>
set autoasicoffload disable
Session is inspected by ipsengine redirtoips A flow based profile is applied and nturbo not verified #377711
(Signature., App control,...) without acceleration does not apply to the device
nturbo acceleration possible
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 79 www.fortinet.com
Device authentication (srcvisibility) machostcheck The session is inspected for source visibility. It #355970
may becomes hardware accelerated when the device
type has been identified
Offload is disabled because of a offloaddenied A session helper is involved. Seen with GTPC traffic on FortiCarrier not verified #378910
session helper helper : GTPu is disabled when logging is enabled on GTPc or GTPu
Comments:
● (1) sflow : can’t be accelerated because sflow requires periodic traffic sampling that can only be done in kernel
“If the session flags contain any of the following then (ignoring turbo mode IPS) then session offload will not be offloaded :
'redir' some kind of proxying,
'auth' (firewall) authentication
'srcvis' device detection
'ndr' IPS
'nb' IPS (block)
'nds' IPS
'ndri' IPS (interfacebased)
'os' traffic shaping (but see below)
'rs' traffic shaping (but see below)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 80 www.fortinet.com
does ingress and egress device support offload
if traffic shaping enabled does the NPU support that specific type of traffic shaping, some do and some (older ones) do not.
if traffic is locally terminated GRE then no offload
If IPsec is involved does the NPU support the chosen cipher suite
There is also more transient reasons for offload failing which usually only affect the first few packets :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 81 www.fortinet.com
NP chips
NP6
Form factors
NP6 chips comes in 2 different form factors depending on its connectivity.
The first form is made of 4x 10G, it is more tailored for high end unit where the 10G ports are attached to an ISF.
The second, made of 3x 10G + 16x1G, allows a more direct attachment of the FortiGate ports to the NP6 without switch fabric. It is more used in midrange
units.
An NP6 is first of all a NIC : It generates interrupts when packets comes in so both form factors have the same BUS connectivity to the CPU core via a
dual PCIe bus allowing a potential distribution to up to 256 CPU cores.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 82 www.fortinet.com
Integration
The public Fortinet Hardware Acceleration guide v4.1 document contains all FortiGate platform architectures so we won’t repeat them in this document.
Instead, we will focus on some representative platform and depict their specificities. This chapter is split in two parts : integration with and without switch
fabric.
● FortiGate3700D
The FortiGate3700D is a good example of multiNP6 platform that also include the ‘lowlatency’ feature.
It has the following synoptic :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 83 www.fortinet.com
● CPU mapping scheme
The broadcom chip switch fabric (ISF) is attached to all FortiGate front ports PHYs not shown in the diagram)
FortiGate3700D is composed of 4x NP6 in 4x10G form factor with each XAUI attached to the ISF
Each NP6 has 16 host queues, each one attached to a CPU block #1 core. CPU block #1 is reserved for NP interrupts processing
The second CPU block #2 is free from any interface interrupts
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 84 www.fortinet.com
● Port mapping scheme
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 85 www.fortinet.com
● FortiGate3810D
(from ‘fortigatehardwareacceleration54,pdf’)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 86 www.fortinet.com
● FortiGate1500DT
This platform is similar to the classical 1500D but has 4x10G RJ45 copper ports
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 87 www.fortinet.com
Integration without a switch fabric
This was reserved for medium range unit, however we also see it the 2500E with 4xNP6. The removal of the switch fabric would slightly
decrease the packet delay. Because the switch chip offers buffering and flow control features, traffic received on NP6 chips coming from port
without crossing ISF is likely to be more bursty and more susceptible to packet loss because of more tension on the NP6 queues.
Another consequence is the impossibility to create a lag of ports attached to different NP6s.
For unit with multiple NP6 without ISF, interNP6 port acceleration may importante limitations : If linked with a 10G XAUI together, there is a
potential oversubscription (FortiGate900D). If not linked at all (FortiGate2500E), the hardware acceleration is not possible.
● Design limitations :
○ (no reference) More susceptible to packet loss with bursts (no buffering and flowcontrol on ISF ship)
○ #290597 Crossnp6 link aggregation of redundant interfaces are not allowed
Note : Data sheet of those platform should have “small prints” at the end about this. This is also mentioned in official hardware guide
○ (no reference) No possible hardware acceleration if NP6 are not interconnected (FortiGate2500E)
No reference, but this is obvious from the design.
○ #300206 Packet loss on 1G port directly attached to NP6. Mitigation ECO with dedicated queue and shaping.
● FortiGate900D
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 88 www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 89 www.fortinet.com
● FortiGate1000D
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 90 www.fortinet.com
● FortiGate2500E
Comments:
Combines the 2 NP6 form factors
Has a bypass module on 2 LC connectors where the fibers is directly connected (no SFP module needed)
no internp6 interconnections
Mantis references:
● #375609 Merge FGT2000E/2500E to v5 trunk (schedule 5.6.0)
NPI with 5.4.0 GA, with special
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 91 www.fortinet.com
● Other platforms without ISP but with one single NP6.
This platforms don’t have restriction on LAG because of the single NP6.
They still suffer from tension on NP6 1G interfaces (SGMII interface for 1G, XAUI for 10G)
Mantis references:
○ #300206 Packet loss on 1G port directly attached to NP6. Mitigation ECO with dedicated queue and shaping
○ #389858 ICMP ping lost once traffic is offloaded to NP6 in FGT500D (same cause as #300206)
For other platforms, please refer Official Hardware Guide section “FortiGate NP6 architecture”.
FortiGate300D
FortiGate400D
FortiGate500D
FortiGate600D
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 92 www.fortinet.com
NP6 Performance figures
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 93 www.fortinet.com
Improvements from NP4
Internal improvements:
● full IRQ mapping on each np6 ports: No need to choose port anymore to ensure the max CPU availability. All NP have the same IRQ/ cpu
mapping providing a linear cpu usage across the load and the NP
● No need of extra Intel chip to provide Nturbo services : The NP6 is able to interrupt directly the cpu for user space processes like ipsengine.
● Hardware acceleration lag helper : Unlike NP4, NP6 allows to keep offloading sessions (to other NP6) in case of lag members loss. Traffic is still
redirected to original NP6 port by broadcom after failover.
● Session entry purging to accommodate routing change (any reference here ?)
● Reversible hash in RX/TX queues : Both direction of the same session is tight to the same cpu/core
● 4 level priority queues inside NP6 (NP4 has 2 only) mapped as follow:
○ Priority 3 (highest) : control plane traffic (ARP, OSPF, BGP, IKE, etc)
○ Priority 2 : data plane traffic control packet (ICMP, TCP RST, etc)
○ Priority 1 : high priority data traffic
○ Priority 0 (lowest) : normal data traffic
● Session push packet follows data packet path:
In NP4, a dedicated session management queue exists, interrupts mapped to 1 CPU core :
Session creation and packet data coming from 2 different sides
=> potential loss of sync of half loss
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 94 www.fortinet.com
No CPU distribution on the command queue
In NP6 session creation command precedes the data packet path, following the same path
improved data / command synchro
session creation CPU load distributed on all NP6 cores
● IP fragments with multiple core distribution : All received IP fragments on NP6 are sent to the kernel. Unlike NP4, NP6 uses all RX CPU queues
to distribute ip fragments to kernel (#401333)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 95 www.fortinet.com
From the inside
Functional blocks
Like other NPs, the NP6 is internally architectured around different functional blocks. A functional block implements a set of functions organized around
a common mission.
● Interface group TX :
Deal with packets received from the outside world, either from ports or kernel.
● Switching group :
Lookup for session and dispatching
● Service group :
Packet processing for special services
● Interface group RX :
Processing the required steps when packets leave NP6, either to external interfaces or kernel via Host interface
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 96 www.fortinet.com
● global view
The following diagram shows a high level view of groups and functional block
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 97 www.fortinet.com
● Detail of Interface group TX
All packets entering the NP6 have first to go through this block. This block has two parallel paths. One path (on the left side) is dedicated to packet
received from the ‘Host interface’. This corresponds to all packets arriving from the kernel via the PCIe bus. The other path (on the right side), is
dedicated to packets received from the XAUI interface of the NP6. There are several possible sources for the XAUIs : The ISF, a direct external
interface attachment, another NP6 directly attached. Depending on the source, different kind of processing is required with common missions : identify
the sender and translate the source into a LIF (Logical InterFace), sanity check the packet format to protect the NP from fuzzing attack, apply
eventually well known anomaly checks and finally register the packet inside the NP6 by copy its content to memory and create a ‘Packet
Descriptor’ based on the L2,L3,L4 packet headers (see Internal and external data transmission).
When packet comes from the Host interface, it contains information about what processing are required in NP6. This allows the NP6 to forward the
packet applying the requirement without extra processing work. There is 1 ITP and 1 IHP per XAUI : ITP0ITP3 , IHP0IHP3 and 2 HTX HTX0, HTX1,
one for each PCIe bus
Details of functionalgroups : HTX, ITP, IHP
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 98 www.fortinet.com
● Detail of Switching group
The switching group is the heart of the NP6, this is where received packets are hashed and session matching lookup is done with NP6 primary
and secondary table (see ipv4 unicast session acceleration). If no session match is found, the packet would be routed over to the kernel via the
Host Interface. In case of a match, the processing of the packet will depend on the ‘action’ field of the session. Packet may be dropped, may be
routed towards an external interface, may be routed towards a special service block for special processing, or may be sent to specific queues on
the Host interface corresponding to special handling like NTurbo.
The switching group is composed of 5 main functional block : ISW, FDB, SSE1, SSE2, OSW
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 99 www.fortinet.com
The ISW (Inbound Switch) is the entry point, it has dedicated queues for each sources and can prioritize packet on the queues. Packets from the
kernel containing predefined routing information may be sent to the Packet Forward Engine (FDB) acting as a fastpath dispatching with little
processing required on the packet. The ISW also distributes the packets to two Session Search Engines based on a packet 5tuples hash.
Each Session Search Engine (SSE1 and SSE2) is independent from the other one, each one has access to a dedicated fast memory DDR3 RAM
where session tables are stored. It hashes the Packet Descriptor received on their receiving PDQ to form a session key. The key is lookedup in
the primary session table. If more than one match exist, the overflow table will have to be checked until the session corresponding the the PDQ
details is found. When found, the action and other information and flags for the session are retrieved and packet sent to OSW block.
The OSW (Outband Switch) goal is to route the packet to the next required processing block, following the order received either directly by
kernel (path via FDB), or order received by the SSE. The routing is based on the destination LIF. It also has the mission to perform the trunktoport
resolution. This is done when the destination LIF references a trunk instead of a port. In this case, a hashing is done using the programmed algorithm.
If no natting is done on the function, the algorithm used tries to use the same interface for egress and ingress.
OSW also contains the TPE (Traffic Policy Engine) that get’s optionally involved priori any switching function whenever traffic shaping or
accounting is requested for the session. The TPE manages indexed tables of shapers and counters. One session may have multiple indexes, for
instance a session counter and a perip shaper.
A “Loopback” path allows to reinject packets from OSW back to ISW. This is used when npulink interfaces are used as target to perform the
hardware accelerated vdom link function.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 100 www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 101 www.fortinet.com
● Detail of Service group
The Service groups regroups functional blocks performing packet transforms actions like tunneling, translation or further inspection function like SYN
proxy. Each individual functional block is tight to its dedicated mission. For some of them, like IPSec encryption/decryption, the packet PDQ is not
enough for the job and an access to the packet payload in the memory buffer is required.
Note : in the case of the FortiGate3700DX unit, we could say that the extra functions delivered by the TP2 FPGA like GTP inspection, can be seen as
Service group function block located outside the NP6.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 102 www.fortinet.com
● Detail of Interface Group RX
This is the final steps for all packets leaving the NP6. This block is the similar to the TX group but works on the other direction. It also has two parallel
paths. One path (on the left side) is dedicated to packet received from the ‘Host interface’. This corresponds to all packets arriving from the kernel
via the PCIe bus. The other path (on the right side), is dedicated to packets transmitted to the XAUI interface of the NP6. Depending on the source,
different kind of processing is required with common missions : identify the destination and translate the destination LIF (Logical InterFace) to
interface. The packet has to recreated based on its updated PDQ and payload in memory. Another mission is to deal with packet fragmentation in
case the outgoing interface MTU is shorter than the packet length. Protocol
translation (example IPv4 → IPv6) is also done here and checksum
calculation as well. Finally packet is prepared to be sent out either on Host
interface or on XAUI. When an ISF is involved, packets need to be appended
with the correct ‘Core Tag’ required for switching in the ISF to send the
packet on the wire corresponding to the egress interface of the FortiGate.
Egress Header Processing (EHP) recreates the packet with fragmentation and
checksum recalculation. There is one EHP per XAUI (EHP0EHP3)
Host Receive (HRX) block, corresponds to the path towards the kernel via the
Host interface. It distributed the packet to the required host queue. The choice
for the queue would correspond to packet processing from a different CPU core,
this is how CPU load is distributed. Some host queues are specific like for
NTurbo queues. There are two HRX HRX0HRX1, one for each PCIe bus.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 103 www.fortinet.com
Traffic flow examples
This section provides some example of typical packet flows passing through NP6
● Basic host receive path (left) and Basic host transmit path (right)
The two directions of a flow between and external port and the kernel host interface takes 2 different paths inside the NP6. One direction interface to
kernel is going through the session search engine while other direction from Kernel to external port makes use of the FDB shortcut.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 104 www.fortinet.com
● Basic Firewall path
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 105 www.fortinet.com
● IPSec inbound path (left) and IPSec outbound path (right)
A shortcut through the ‘FDB’ exists for the IPSec outbound path because the interface to egress is part of the known information.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 106 www.fortinet.com
● Accelerated inter vdom path
For intervdom acceleration, the Loopback is used to reinject traffic back for a second round to process the second vdom lookup.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 107 www.fortinet.com
● Multicast with IPSec outbound
The inbound multicast packet (in yellow) reaches one of the session search engines.
Packets for the different destination egress interface are duplicated on the SSE. Each packet would then live its own live in the NP6, using the next
block appropriate for the interface it egresses.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 108 www.fortinet.com
Configuration options impacting NP6
This command globally disabled hardware acceleration fastpath on the whole NP6 so all interfaces bound to the NP6 are affected.
Should not be very commonly used, eventually needed if the kernel shaping function are required.
warning : dangerous command, has problem with vlans #372526 / #364448
“We didn't expect user disable fastpath at "config system np6". The CLI is only for debugging purpose. VLAN interface won't work when fast path is disabled.”
This is only applicable to FortiGate3700D for now on 2 of the 4 NP6 available on the unit. This is the command to attach directly the fortigate ports
to the NP6 without traversing the ISF. When in lowlatency mode, no hardware acceleration or lag is available with ports from other NP6s.
Benefit of lowlatency : latency drops from 3.5 micro sec to 1.6 micro sec
This feature should not be needed in normal use. It is meant to be activated in case of a PBA leak is discovered to recover periodically the blocked
memory. Under normal condition, memory deallocation is taking place normally when the packet is dropped on NP6 or leave the NP6.
It could be useful when synproxy is used to clear attack sessions (to be confirmed)
When enabled, the lifetime of a session is limited (see below chapter)
This counter seems to be the maximum TTL for a session in NP6. It is default to 64 s. If a session in the NP6 remains, without packet seen, the
entry would be deleted after the defined seconds. The same timer seems also to be used for synproxy sessions (#272927)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 109 www.fortinet.com
depending on session expiration timer, from a range between 1/2 to 4/5 of expiration timer (#386626). For TCP, the established state timer is used.
Verified with default sessionttl 300, first update is after 150 s
● config system np6 > edit np6_x> set sessiontimeoutinterval (default 40s)
If method ‘set sessiontimeoutfixed enable’ is used, this defines the base timer for session update (see randomrange)
If method ‘set sessiontimeoutfixed enable’ is used, this defines the random part added to the base timer
This is the section were all NP6 anomalies for IPv4 traffic are configured (see fpanomaly)
This is the section were all NP6 anomalies for IPv6 traffic are configured
● config system npu > set dedicatedmanagementcpu enable (201257, 218083, 251776)
Reserves CPU0 for other processes so all interrupts originally scheduled on CPU0 or moved to CPU1 (on top of its own interrupts).
The goal is to avoid management slow down or management disconnection from FortiManager.
This was introduced in 5.0.5 for NP4 and applied later to NP6 in 5.0.10, 5.2.1
The drawback of this command is a possible excessive CPU load of CPU1.
This option is available on 3700D. It is made to increase the CPS by distributing the NP6 interrupts to 32 CPU cores in lab conditions.
It is not recommended on production network where CPU power should be saved for other critical function like HA, logging…
● renamed "np6-cps-optimization-mode" by #262981 instead of np6_cpu_optimization_mode
● removed from platforms having a dual CPU socket (like 3700D), kept in 1200D/1500D as of 5.6.0 by #291819
● mantis to remove in 5.4 as well (not done so far, may never be done) #399659
● other references #305096, #300975, #301536
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 110 www.fortinet.com
● Note : no impact from config system npu > set encoffload/decoffload/offloadipsec
As a reminder, NP6 ignores the set encoffload/decoffload/offloadipsec from the ‘config system npu’ group.
Following bug causing the stacking-up of NP6 forward entries (see #422746, #441532), monitoring capabilities were added via diag command and SNMP
OIDs to monitor if sessions are drifting in NP6. If so (and this would mean a bug exists), a diag command to clear idle sessions has been added.
Variable drv-drift could have a negative or positive value. Negative value indicates there are more deletion than insertion. The bug we had before will
cause the drv-drift go negative because we deleted the session at wrong NPU. When drv-drift has a negative value, there is possibility that we failed
to delete session from another NPU.
After purge procedure removes idle session from NPU, drv_drift will have positive value because value of entot decreased after purging.
Please be aware that drv-drift could have positive value in multicast case because single session deletion will remove all the sessions in the same
multicast chain.
● diagnose npu np6 sse-drift-summmary : show summary of drv-drift of all the NP6 chips in the system, and calculate the sum of drv_drift.
Normally, sum is 0.
NPU drvdrift
np6_0 0
np6_1 0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 111 www.fortinet.com
Sum 0
The command will purge idle session from NP_<dev_id>. Argument [time] is option.
Default purging time is 300 seconds. It will take roughly 2-4 seconds for NP6 to
walk through the whole session table.
Example:
The procedure may take up to 10 Secs.
Please wait until the procedure is finished. Stopping in the middle may cause system malfunctioning.
Starting to clean up idle sessions in NP6_0.
Purging progress sse0/sse1:57470975/57470974, 0 idle sessions were purged.
NP6_0 session cleanup finished in 11.000000 Seconds.
Total session purged: 0
● diagnose npu np6 sse-stats <dev_id> : Addition of a drv-drift counter in the sse-stats
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 112 www.fortinet.com
Add fgNPU counters to system MIBs.
FORTINETFORTIGATEMIB::fgNPUNumber.0 = INTEGER: 2
FORTINETFORTIGATEMIB::fgNPUName.0 = STRING: NP6
FORTINETFORTIGATEMIB::fgNPUDrvDriftSum.0 = INTEGER: 0
FORTINETFORTIGATEMIB::fgNPUIndex.0 = INTEGER: 0
FORTINETFORTIGATEMIB::fgNPUIndex.1 = INTEGER: 1
FORTINETFORTIGATEMIB::fgNPUSessionTblSize.0 = Gauge32: 33554432
FORTINETFORTIGATEMIB::fgNPUSessionTblSize.1 = Gauge32: 33554432
FORTINETFORTIGATEMIB::fgNPUSessionCount.0 = Gauge32: 0
FORTINETFORTIGATEMIB::fgNPUSessionCount.1 = Gauge32: 0
FORTINETFORTIGATEMIB::fgNPUDrvDrift.0 = INTEGER: 0
FORTINETFORTIGATEMIB::fgNPUDrvDrift.1 = INTEGER: 0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 113 www.fortinet.com
NP6 IPsec outoforder and subengine settings
Problem description : NP6 is likely to create outoforder packets when IPsec is done in NP6. This is cause by the distribution method across the 8 IPsec
subengines.
Cause : Packets from the same ipsec session may be processed by different sub-engines in parallel. A big packets followed by a small packet distributed to two
different sub-engines will likely to be sent out in order small, then big because the time needed to process small packet is lower.
It is not possible to change the distribution algorithm so packets from the same SA is processed by the same sub-engine.
This problem exists for both encryption (distribution of clear text packets) and decryption (distribution of ESP packets)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 114 www.fortinet.com
Distribution across IPsec engines inside NP6 :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 115 www.fortinet.com
NP6 IPsec modules is made of 2 banks, each bank is made of 8 sub-engines.
First SAs are distributed across BANKS. For each packets of an SA, a sub-engine is selected. The selection rule is to first try to use the first
sub-engine. If the engine is busy, then the next engine is tried, and so on. It is then expected to see the first engine more busy than the second.
● Engine status
diag npu np6 register <x> | grep engine_status provides engine (aka: Bank) and sub-engine status (idle/busy).
Comments :
● The 2 last lines are not used, first line is for Bank 0, second line is for Bank 1.
● The 2 last bytes represents sub-engine status :
FF (so 8 bits: 1111 1111) mean all engines are IDLE (should be the case when traffic stops)
00 (so 8 bits 0000 0000) mean all engines are busy
● Command is a snapshot so you may need to try the command multiple time to capture sub-engine 1 in idle state
Another diag command has been done to provide engine status to avoid using the long register dump :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 116 www.fortinet.com
Mitigation :
The only possible mitigation is to limit the number of subengines for encryption and decryption. Doing this guarantees that all packets for a specific SA will be
serialized, treated one after each other.
Zero outoforder packet can be achieved in decryption direction if a single subengine is used for decryption and zero outoforder packet can be achieved in
encryption if a single subengine is used for encryption. The 2 banks can still be used simultaneously as they process different SAs.
The selection of subengine has been made configurable. The command applies globally to all NP6 on the unit.
The mask to supply is a 2 bits value (in hexa), for performance reason, it is recommended to use different subengines for encryption and decryption if a single
subengine is targeted.
A recommended configuration to guarantee zerooutoforder packet for encryption and decryption direction is :
config system npu
set ipsecdecsubenginemask 0x01 # 0x01 = 00000001 ==> subengine 1 only enabled for decryption
set ipsecencsubenginemask 0x10 # 0x10 = 00010000 ==> subengine 5 only enabled for encryption
end
Performance impact :
CRT tab testings with a FGT1500D has show that each NP6 subengine processes roughly 2 G of traffic (with antireplay disabled). Considering the 2 banks,
with multiple SAs involved, a zerooutoforder configuration would deliver 4G max by direction (tests show between 3.9 G and 5.2 G depending on packet
size).
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 117 www.fortinet.com
SA distribution on BANKs for inbound and outband :
This command output details distribution of inband and outband SA on the 2 banks
IB0 : Inbound SA on bank0 (here 127 SA installed)
IB1 : Inbound SA on bank1 (here 127 SA installed)
OB0 : Outband SA on bank0
OB1 : Outband SA on bank1
Note : The distribution algorithm of SA in the 2 banks in so far unknown but it seems that traffic for one SA only goes to the same bank, therefore the
same subengine if masking is used. This garanties 0 outoforder (we could not produce ooo in the lab with this).
References :
● #403883
○ details on ipsec engine status, 2 banks of 8 suengines. Status is represented by 2 bytes and each bit tells if engine is idle(1) or busy(0)
○ subengine selection : first and lowest available engine. That means, it chooses the available engine from Engine_0 to Engine_7. So, they may
frequently see something like: 0xF8, 0xFC, 0xF2
○ PDQ_OSW_IPTO' the counter from Outbound Switch to IP Tunnel Outbound
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 118 www.fortinet.com
NP6 limitations and bugs
There are 2 types of problems : Issues in the NP6 hardware are generally not fixable. Sometimes a software based solution or mitigation may exist but it
generally comes with consequences on performances. Applying the solution may or not require a specific configuration. Some other bugs are at the NP6 driver
level and can be fixed without impact.
Hardware limitations (hardware bugs than can’t be fixed by software or without an important impact on feature or performance) :
Description : Performance of ipsec with anti-replay in NP6 is bad. The greater number of tunnel, the more impact. It was originally observed on a
Fortigate-1500D with 500 tunnels used for LTE.
Cause : hardware bug : NP6 anti-replay cache corruption.
Mitigations :
○ special image available to improve the performance but still with high degradation (top3 #275195).
○ top3 #373505 came with a big improvement that may be merged with #380600 without CLI option
○ performance impact of fix (24G → 6G , so divided by 4)
○ No specific configuration needed for mitigation.
Comment : reducing the number of ipsec sub-engines to encrypt or decrypt is NOT a workaround for anti-replay issue.
Other references with valuable information : #437462 Add IPSec Anti-Replay workaround based on 5.4.4 3700DX branch
● #370586 ipsec out-of-order caused by distribution of traffic for same SA on multiple engine/sub-engines
Description : Packets from the same ipsec session may be processed by different sub-engines in parallel. A big packets followed by a small packet
distributed to two different sub-engines will likely to be sent out in order small, then big because the time needed to process small packet is lower.
Cause : hardware bug
Mitigations :
○ special image available (fg_50_Orange_LTE_269247/build_tag_8942) to improve the performance by defining/limiting number of encryption and
decryption engines (CLI change)
○ #370486 Add CLI commands to configure limited IPSEC engine on NP6
=> requires user specific configuration (generally one sub-engine for encryption, one sub-engine for decryption)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 119 www.fortinet.com
=> config system npu ; set ipsecdecsubenginemask <engine_mask_hex>; set ipsecencsubenginemask <engine_mask_hex> ; end
=> implemented in 5.4.4
● #383624 Sending multicast traffic across NP6 npuvlink may cause interfaces to stop sending/receiving
Description : multicast traffic on npuvlan caused PBA leak. At some point, traffic stops
Cause : Hardware bug,
Mitigation : the fix in software disables multicast acceleration on npu vlink
⇒ fixed in 5.6.0, fixed in 5.4.6, 5.2.9 covered by 282472
Description : Max frame size NP6 can transmit is 15360. Packets will get dropped if the size is bigger than this value.
Cause : Hardware limit
● #416102 Traffic over IPsec VPN getting dropped after 2 pings when it is getting offloaded to NPU
Description : Traffic may be dropped when a tunnel is npu offload after routing revalidation. This is more the case when the tunnel is bound to a
loopback interface advertised via dynamic routing protocol.
Cause : Seems to be more mishandling of revalidation cases
The new behavior to expect described 415155 : no npu offload for ipsec when tunnel bound on loopback.
Today (170719) there is no plan to fix it, the solution seems to be to avoid acceleration in this case.
● #396027 Single flow exceeds 10GB causes all BGP peers to drop randomly
Description : Though it understood that a 40G interface is made of 4x10G, a single 10G path saturated would cause degradation on other 10G paths.
Cause : The ISF is using shared buffers for all 10G paths. Buffers for would be used-up if 1 is saturated
Comment : counter sw_in_drop_pkts in ‘diag hard dev nic’ increases.
Fix : none foreseen
● #392436 Bad throughput using 10G interfaces [1G / 10G port mix, devices without ISF like FGT600D]
Description : Due to limited NP6 internal packet buffer, offloaded packets from a 10G interface to a 1G interface can be dropped
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 120 www.fortinet.com
Fix/mitigation : 5.6.1 only, add a new CLI command to control 10G/1G flow (for units without ISF)
Comment : Units with ISF don’t have the problem thanks to ISF packet buffers
New CLI command : config system npu
set host-shortcut-mode bi-directional Offload TCP and IP Tunnel sessions in both directions between 10G and 1G interfaces (normal operation)
set host-shortcut-mode host-shortcut Only offload TCP and IP Tunnel sessions received by 1G interfaces. Select if packets are dropped for offloaded
traffic between 10G to 1G interfaces.
Mitigation : The new feature in this top3 allows throttles input to NPU, forcing the ISF buffer to be used for ingress traffic. EHP drops are seen when
traffic has short burst and because the NP6 has short queues. The idea is to take the benefit of the ISF queues that are bigger than NP6 ones.
Comments :
● N-Turbo throttling : N-turbo is another source for incoming packet to NP6, it does not use the same path as the regular ‘slow-path’ (except for
session establishment, log report, timer sync, etc), as such, its ingress traffic on NP6 is not impacted by the gtse-quote. N-Turbo which does not
use the kernel has its own throttling. A fixed shaper at around 6G has been programmed in kernel to limit traffic to NP6 and allow the use of kernel
bigger buffer than NP6 to avoid dropping packets.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 121 www.fortinet.com
● drops on the ISF can be measured by counters : sw_in_drop_pkts, sw_out_drop_pkts, sw_np_in_drop_pkts, sw_np_out_drop_pkts from ‘diag hard
dev nic <port>’
● Description : it was observed much higher EHP drops when a lag is used (even with a single interface) compared to the single interface with no lag.
Observed with 5.4.6, 5.6.3 (and probably with all other versions).
Seen on both service module (38xxD, 39xxE, 5001E, 6000F and 7000E) and non service module platform (1200D → 3700D)
● HPE protection ( SLBC clusters collapse under DDoS attacks with fragmented packets)
source #363398
Refer to chapter HPE Protection
Description : kernel CPU could be impacted in case of slow path traffic attack.
Mitigation : Use NP6 shapers to limit DDoS attack traffic to the kernel.
Shapers could be defined for tcp, udp, icmp, sctp, esp, ipfrag, arp on traffic egressing from NP6 towards kernel via PCIE host interface.
● N-Turbo throttling
Description : A hard-coded shaper was added for traffic coming from from N-turbo to NP6 to avoid bursts that would congestion NP6 egress interface.
When this shaper drops traffic counter from n-turbo stats (fnsysctl command) increases
sources :
#257607 ECO for 1500D : apply 7G gtse shaper and change np6 tx spmask to 1 to slower the tx speed
#251104 Two sets of traffic shaper are implemented for FGT 300D/500D and FGT 1500D/3700D [Actually all other platforms]
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 122 www.fortinet.com
● 300D/500D : 1G shaper
● Others : about 6G shaper
● #412664 (and top3 #413388) DSCP EF (express forwarding) marked traffic is not prioritized in NP
Fixed bug (bugs with fix without significant impact on feature or performance)
● top3 #422746 NP6 exhibited 90% failure rate for SSE insert-success.
● #284694 High CPU, NP6 counter drops and traffic loss
Description : multi NP6 platforms with ipsec and/or npu-vlinks may stack forward-entries in NP6
Symptoms : session supposedly accelerated are not (CPU increases), when reaching limits traffic outage is visible
Cause : NP6 driver deletes forward sessions on the wrong NP6. It causes session deletion failure on the NP6 where the session is not installed and also
causes stacking of session on the NP6 where the forward-entry was added (because never removed)
Fix : 5.2.5, 5.4.0, 5.6.0
Monitoring additions : #441532 for GA merge : diag commands and snmp monitoring for specific NPU counter have been added, see section ‘NP6
monitoring additions for drift sessions’
Description : when receiving an overpadded packet either from clear_text interface to encrypt or cipher_text interface to decrypt, IPsec subengine
may enter in a lockedup state. Only a reboot can help to recover.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 123 www.fortinet.com
Cause : no packet sanity check performed on the ipsec engine
Fix for clear text side : force packets to first loop through the nat module that have an sanity check
=> 5.6.1 (B1458)
Fix for cipher_text side : force packets to first loop thourhg the IPT module (IP tunneling)
=> special image top3 #403883 B9612 with CLI command “config system npu →set stripesppadding enable”
=> fixed in GA 5.6.1 (with cli change) #416950
Comments :
more information in top3 #403883
subengine lockup detection : use “fnsysctl cat /proc/net/np6_0/ipsec-stats -> ipsec engines idle status’ : during low traffic, we expect to see the
counter showing the ‘FF’ values.
- Though it is a hardware problem, a safe fix exists without big impact on performance.
● #386626 kernel session expire for some hardware accelerated traffic when ‘virtualwire’ with vlan configuration is used
● top3 #255526/ Mantis #255349 IPSEC multicast acceleration problem : duplicated forwarding end leftover session
Description : Current multicast offloading code has issues in multicastoveripsec, which cause duplicated multicast forwarding and leftover
sessions inside NP6.
Fix : 5.4.0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 124 www.fortinet.com
Understanding EHP drops
Packets may be dropped when they egress from an NP6 XAUI. This happens at the end of packet processing chain inside the NP6.
Drops are caused by the merging of different packets origins that all have to egress on the same XAUI.
In case of packets bursts from some or multiple origins, the NP6 egress queue which is very limited on NP6 would be full and cause drops.
A high bandwidth traffic is not necessarily required to produce burst. The conjunction of traffic microburst from different sources could be enough.
NP6 buffers are small and don’t allow the storage of a lot of packets resulting in packets drops when full.
The following diagram is a good summary of HPE drops. Bubbles represent packets from the different sources fillingup the EHP buffer ‘bucket’ and causing
drops.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 125 www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 126 www.fortinet.com
NP6 shaping protection summary
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 127 www.fortinet.com
NTurbo NP6 IRQ mapping
From NP6 standpoint, packets allowing ipsengine kernel fastpath are pushed to specific queues on the Host interface.
On the reverse direction, NP6 receives packets from ipsengines as well on specific host logical interfaces (LIF).
The interrupts cost for those queues are handled by different CPU cores than the ones used for regular NP6 to kernel path on the host interface for a
better loadbalancing on the cores. Different platforms may have different CPU allocations.
This is the contribution from NP6 to the NTurbo mechanic, the remaining part takes place in the kernel.
NTurbo has two different types of packets : control packet and data packets.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 128 www.fortinet.com
● Control packets
Control packets correspond to first packets of new sessions. At this point, the session is also unknown from ipsengine processes because the
kernel has not yet notify about a new session to ips inspect and with what profile. Those packets are sent by NP6 to kernel using the ‘regular’ host
path channels. Interruption raised and CPU cores used are the usual ones. They don’t benefit from any special acceleration compared to other
packets.
(3) Session created in kernel, kernel chose an IPS engine based on NTurbo scheduler info.
(4) Kernel send to NP6 with packed info (IPS viewid, DNAT, MTU)
(5) Packet is sent to the corresponding loadbalancer via specific NTurbo host interface
(6) Loadbalancer stores packet in ipsengine RX queue and sends notification interrupt to
ipsengine
(7) ipsengine receives packet from its NTurbo RX queue, looks at its NTurbo table and find
this is a new session. Add a new entry with received outofband data, optionally refragment
here and process with IPS with action=pass/block/shaping; fragment if needed. If application
control is enabled and app identified, ips engine notifies kernel to update session
(9) Loadbalancer forward packet to NP6 HIF TX queue. NP6 forward to ISF
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 129 www.fortinet.com
● Data packets
Data packets correspond to packets from already known sessions. At this point, the session is known from ipsengine processes. Those packets
are sent by NP6 to kernel NTurbo using dedicated host channels. The interruptions raised and CPU cores used are different ones.
These packet benefit from NTurbo IPS acceleration in the kernel.
(1) NP6 receives packet from ISF, a forwarding entries is found, ipsengine process selection
and profiles are known (see note *)
(2) NP6 OSW sends to NTurbo via NTurbo specific host interface passing by its loopback
(5) ipsengine receives packet from its NTurbo RX queue, looks at its NTurbo table and find
existing session. Corresponding outofband info is retrieved from NTurbo session. Optionally
refragment here and process with IPS with action=pass/block/shaping; fragment if needed. If
application control is enabled and app identified, ips engine notifies kernel to update session
(6)(7) no change
Note : before 5.4, ips viewid was encoded in packet VLAN/DMAC. No more viewid in 5.4
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 130 www.fortinet.com
● NTurbo outofband information in packet
● inside packet VLAN or VLAN+DMAC fields (in NAT mode) from NP6 to NTurbo :
Note : each ipsengine maintains a local NTurbo session list that stores the info so that it can be found for fastpath data packet.
This mechanism may have changed in 5.4 since the removal of the ipsview which was initially a requirement for SP2/SP3 ips hardware acceleration.
They are not anymore supported in 5.4.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 131 www.fortinet.com
● debug commands related to NTurbo ips acceleration
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 132 www.fortinet.com
The following command is given for reference only has it is not hardware oriented.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 133 www.fortinet.com
The irq mapping dump from “diag diagnose hardware sysinfo interrupts” shows the different types of IRQs and mapping used in NTurbo
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 134 www.fortinet.com
● Limitations :
Note : An old reference #178521 refers to ‘set hardwareaccelmode’ which seem to have been changed to ‘npaccelmode’ later.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 135 www.fortinet.com
● example of CPU mapping for a FortiGate1500D ips NTurbo
FortiGate 1500D : 2x NP6, 2x NTurbo, each NTurbo maps 5 ipsengines (10 ipsengine), 12 CPU cores
This mapping is extracted as follow :
retrieve interrupts names and corresponding IDs from “diag hard sys interrupt”
find for each interrupts the CPU id mapping from “diag system cpuset interrupt <irq>” (see IRQ Mapping)
find for each NTurbo the mapped ipsengine PID from “fnsysctl cat /proc/nturbo/<NTurboid>/drv
for each ipsengine, find the allowed cpu from the “Cpu_allowed_list” entry in “fnsysctl cat /proc/<pid>/status
Ref Name IRQ CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11
[1] np6_0txrx0 75 X
[2] np6_0txrx1 76 X
[3] np6_0txrx2 77 X
[4] np6_0txrx3 86 X
[5] np6_0txrx4 87 X
[6] np6_0txrx5 88 X
[7] np6_1txrx0 97 X
[8] np6_1txrx1 98 X
[9] np6_1txrx2 99 X
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 136 www.fortinet.com
[12] np6_1txrx5 105 X
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 137 www.fortinet.com
[29] ipsengine 3 (NTurbo 0) X
← suggest a diagram like this to show each interrupt with the reference id from the table
above in the lines. Use colors on the interupt names in the table with same color in the
diagram.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 138 www.fortinet.com
● IPS/NTurbo and IPsec improvement (as of 5.6.1)
Source : #398960
Feature : NTurbo is used for IPSEC+IPS case. The IPSEC SA info is passed to NTURBO as part of VTAG for control packet and will be used for the
xmit.
Note: If the packets need to go through IPSEC interface, the traffic will be always offloaded to Nturbo. But for the case that SA has not been installed to
NP6 because of hardware limitation or SA offload disable, the packets will be sent out through raw socket by IPS instead of Nturbo, since the software
encryption is needed in this case.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 139 www.fortinet.com
Diag commands and counters
● diag npu np6 fastpath <enable*|disable> <np6_id>
A diag command to disable all hardware acceleration on the given NP6 id. Upon a reboot, this setting goes back to default ‘enable’ value.
For troubleshooting purpose, when a particular session need kernel tracing, it is recommended to apply a specific, dedicated policy for the traffic to trace with
option “set autoasicoffload disable”. This approach is safer and would avoid a potential huge CPU impact.
Dumps the Drop Counter Engine counters for the requested np6 id.
Different types of counters are printed. The ones with ‘PDQ’ in the name refer to number of packets dropped because of a full packet descriptor queue. The
name of the queue generally refers to the functional block pushing the packet and the one receiving the packet.
Each NP6 XAUI is linked to a total of 4 blocks : ITP, IHP, ETP, EHP so you would see references such as PDQ_OSW_EHP0 and PDQ_OSW_EHP1 that are
similar Packet drop counters, all linked to OSW but with a connection to a different XAU (EHP0 and EHP1)
Examples:
PDQ_SSE0_SSE1 : packet sent from SSE0 to SSE1 using the loopback (npuvlink)
PDQ_OSW_HRX0 : PDQ dropped between OSW and HRX0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 140 www.fortinet.com
List of functional modules referred in NP6 drop counters
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 141 www.fortinet.com
DCE TABLE 0 : HRX drops
From diagnose npu np6 hrxdropall <chipid>
VHIF_TX0_DROP ~ VHIF_TX127_DROP 0x0 ~ 0x7f Per virtual host transmit PDQ (to ISW) drop, total 128 TX queues; generally
shows forwarding path does not have enough processing power
VHIF_RX0_DROP ~ VHIF_RX127_DROP 0x80 ~ 0xff Per virtual host receive PDQ (from OSW) drop, total 128 RX queues; generally
means host does not have enough processing power to handle all incoming
packets
Refer to per type APS drop counter table for individua 0x0 ~ 0x1f Per type packet anomaly drop in IHP0
counter meaning of each group.
0x20 ~ 0x3f Per type packet anomaly drop in IHP1
0x40 ~ 0x5f Per type packet anomaly drop in IHP0 (same ???)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 142 www.fortinet.com
Name Index Description
DCE2_IDX_DROP_MACFIL_BASE0 0x00 Destination MAC mismatch drop for packet from XAUI0
DCE2_IDX_DROP_MACFIL_BASE1 0x01 Destination MAC mismatch drop for packet from XAUI1
DCE2_IDX_DROP_MACFIL_BASE2 0x02 Destination MAC mismatch drop for packet from XAUI2
DCE2_IDX_DROP_MACFIL_BASE3 0x03 Destination MAC mismatch drop for packet from XAUI3
DCE2_IDX_DROP_MACFIL_BASE4 0x04 Destination MAC mismatch drop for packet rom CAPWAP tunnel inbound
DCE2_IDX_DROP_MACFIL_BASE5 0x05 Destination MAC mismatch drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_MACFIL_BASE6 0x06 Destination MAC mismatch drop for packet from IP tunnel inbound
DCE2_IDX_DROP_MACFIL_BASE7 0x07 Destination MAC mismatch drop for packet from IP tunnel outbound
DCE2_IDX_DROP_MACFIL_BASE8 0x08 Destination MAC mismatch drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_MACFIL_BASE9 0x09 Destination MAC mismatch drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_MACFIL_BASE10 0x0a Destination MAC mismatch drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_MACFIL_BASE11 0x0b Destination MAC mismatch drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_MACFIL_BASE12 0x0c Destination MAC mismatch drop for packet from host transmit HTX 0
DCE2_IDX_DROP_MACFIL_BASE13 0x0d Destination MAC mismatch drop for packet from host transmit HTX 1
DCE2_IDX_DROP_MACFIL_BASE14 0x0e Destination MAC mismatch drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_MACFIL_BASE15 0x0f Destination MAC mismatch drop for packet from loopback interface
DCE2_IDX_DROP_ISW_L2ACT_TPRT0 0x10 Target interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_TPRT1 0x11 Target interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_TPRT2 0x12 Target interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_TPRT3 0x13 Target interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_TPRT4 0x14 Target interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT5 0x15 Target interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT6 0x16 Target interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT7 0x17 Target interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT8 0x18 Target interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT9 0x19 Target interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT10 0x1a Target interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT11 0x1b Target interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_TPRT12 0x1c Target interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_TPRT13 0x1d Target interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_TPRT14 0x1e Target interface action drop for packet from SYN/DNS proxy
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 143 www.fortinet.com
DCE2_IDX_DROP_ISW_L2ACT_TPRT15 0x1f Target interface action drop for packet from loopback interface
DCE2_IDX_DROP_ISW_L2ACT_ETHR0 0x20 L2 Ethertype action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_ETHR1 0x21 L2 Ethertype action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_ETHR2 0x22 L2 Ethertype action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_ETHR3 0x23 L2 Ethertype action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_ETHR4 0x24 L2 Ethertype action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR5 0x25 L2 Ethertype action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR6 0x26 L2 Ethertype action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR7 0x27 L2 Ethertype action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR8 0x28 L2 Ethertype action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR9 0x29 L2 Ethertype action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR10 0x2a L2 Ethertype action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR11 0x2b L2 Ethertype action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_ETHR12 0x2c L2 Ethertype action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_ETHR13 0x2d L2 Ethertype action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_ETHR14 0x2e L2 Ethertype action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_ETHR15 0x2f L2 Ethertype action drop for packet from loopback interface
DCE2_IDX_DROP_ISW_L2ACT_SVIF0 0x30 Source virtual interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_SVIF1 0x31 Source virtual interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_SVIF2 0x32 Source virtual interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_SVIF3 0x33 Source virtual interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_SVIF4 0x34 Source virtual interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF5 0x35 Source virtual interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF6 0x36 Source virtual interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF7 0x37 Source virtual interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF8 0x38 Source virtual interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF9 0x39 Source virtual interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF10 0x3a Source virtual interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF11 0x3b Source virtual interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_SVIF12 0x3c Source virtual interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_SVIF13 0x3d Source virtual interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_SVIF14 0x3e Source virtual interface action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_SVIF15 0x3f Source virtual interface action drop for packet from loopback interface
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 144 www.fortinet.com
DCE2_IDX_DROP_ISW_L2ACT_SPRT0 0x40 Source interface action drop for packet from XAUI0
DCE2_IDX_DROP_ISW_L2ACT_SPRT1 0x41 Source interface action drop for packet from XAUI1
DCE2_IDX_DROP_ISW_L2ACT_SPRT2 0x42 Source interface action drop for packet from XAUI2
DCE2_IDX_DROP_ISW_L2ACT_SPRT3 0x43 Source interface action drop for packet from XAUI3
DCE2_IDX_DROP_ISW_L2ACT_SPRT4 0x44 Source interface action drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT5 0x45 Source interface action drop for packet from CAPWAP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT6 0x46 Source interface action drop for packet from IP tunnel inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT7 0x47 Source interface action drop for packet from IP tunnel outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT8 0x48 Source interface action drop for packet from IPSec engine 0 inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT9 0x49 Source interface action drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT10 0x4a Source interface action drop for packet from IPSec engine 0 outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT11 0x4b Source interface action drop for packet from IPSec engine 1 outbound
DCE2_IDX_DROP_ISW_L2ACT_SPRT12 0x4c Source interface action drop for packet from host transmit HTX 0
DCE2_IDX_DROP_ISW_L2ACT_SPRT13 0x4d Source interface action drop for packet from host transmit HTX 1
DCE2_IDX_DROP_ISW_L2ACT_SPRT14 0x4e Source interface action drop for packet from SYN/DNS proxy
DCE2_IDX_DROP_ISW_L2ACT_SPRT15 0x4f Source interface action drop for packet from loopback interface
DCE2_IDX_DROP_APS_IHP0 0x50 Packet anomaly check drop for packet from XAUI0
DCE2_IDX_DROP_APS_IHP1 0x51 Packet anomaly check drop for packet from XAUI1
DCE2_IDX_DROP_APS_IHP2 0x52 Packet anomaly check drop for packet from XAUI2
DCE2_IDX_DROP_APS_IHP3 0x53 Packet anomaly check drop for packet from XAUI3
DCE2_IDX_DROP_APS_XHP0 0x54 Packet anomaly check drop for packet from IPSec egnine 0 inbound
DCE2_IDX_DROP_APS_XHP1 0x55 Packet anomaly check drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_APS_CWI 0x56 Packet anomaly check drop for packet from CAPWAP tunnel inbound
DCE2_IDX_DROP_APS_IPTI 0x57 Packet anomaly check drop for packet from IP tunnel inbound
DCE2_IDX_DROP_APS_HTX0 0x58 Packet anomaly check drop for packet from host transmit HTX 0
DCE2_IDX_DROP_APS_HTX1 0x59 Packet anomaly check drop for packet from host transmit HTX 1
DCE2_IDX_DROP_IHP0_PKTCHK 0x5a Packet sanity check drop for packet from XAUI0
DCE2_IDX_DROP_IHP1_PKTCHK 0x5b Packet sanity check drop for packet from XAUI1
DCE2_IDX_DROP_IHP2_PKTCHK 0x5c Packet sanity check drop for packet from XAUI2
DCE2_IDX_DROP_IHP3_PKTCHK 0x5d Packet sanity check drop for packet from XAUI3
DCE2_IDX_DROP_XHP0_PKTCHK 0x5e Packet sanity check drop for packet from IPSec egnine 0 inbound
DCE2_IDX_DROP_XHP1_PKTCHK 0x5f Packet sanity check drop for packet from IPSec engine 1 inbound
DCE2_IDX_DROP_CWI_PKTCHK 0x60 Packet sanity check drop for packet from CAPWAP tunnel inbound
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 145 www.fortinet.com
DCE2_IDX_DROP_IPTI_PKTCHK 0x61 Packet sanity check drop for packet from IP tunnel inbound
DCE2_IDX_DROP_HTX0_PKTCHK 0x62 Packet sanity check drop for packet from host transmit HTX 0
DCE2_IDX_DROP_HTX1_PKTCHK 0x63 Packet sanity check drop for packet from host transmit HTX 1
DCE2_IDX_DROP_SSE0_SHAPER 0x64 SSE engine 0 session shaper packet drop
DCE2_IDX_DROP_SSE0_SESSION 0x65 SSE engine 0 session action dictated drop
DCE2_IDX_DROP_SSE0_TTL 0x66 SSE engine 0 IPv4 TTL or IPv6 Hop Limit check failure drop
DCE2_IDX_DROP_SSE0_MTU 0x67 SSE engine 0 MTU check failure drop
DCE2_IDX_DROP_SSE0_PROXY 0x68 SSE engine 0 SYN proxy temporary session triggered drop due to TCP SEQ mismatch
or the packet not being TCP ACK only
DCE2_IDX_DROP_SSE0_MCAST 0x69 SSE engine 0 forwarded packet count by multicast session
(Note: this is NOT a drop counter and for debugging only)
DCE2_IDX_DROP_SSE1_SHAPER 0x6a SSE engine 1 session shaper packet drop
DCE2_IDX_DROP_SSE1_SESSION 0x6b SSE engine 1 session action dictated drop
DCE2_IDX_DROP_SSE1_TTL 0x6c SSE engine 1 IPv4 TTL or IPv6 Hop Limit check failure drop
DCE2_IDX_DROP_SSE1_MTU 0x6d SSE engine 1 MTU check failure drop
DCE2_IDX_DROP_SSE1_PROXY 0x6e SSE engine 1 SYN proxy temporary session triggered drop due to TCP SEQ mismatch
or the packet not being TCP ACK only
DCE2_IDX_DROP_SSE1_MCAST 0x6f SSE engine 1 forwarded packet count by multicast session
(Note: this is NOT a drop counter and for debugging only)
DCE2_IDX_DROP_CWI_HDRCHK 0x70 CAPWAP inbound engine packet header check failure drop
(CAPWAP header and internal 802.3 header)
DCE2_IDX_DROP_CWI_RMACMIS 0x71 CAPWAP inbound engine packet inner frame radio MAC lookup failure drop
DCE2_IDX_DROP_CWI_SMACMIS 0x72 CAPWAP inbound engine packet inner frame source MAC lookup failure drop
DCE2_IDX_DROP_CWI_DMACMIS 0x73 CAPWAP inbound engine packet inner frame destination MAC lookup failure drop
DCE2_IDX_DROP_CWI_DMAC 0x74 CAPWAP inbound engine packet inner frame unicast destination MAC configured drop
DCE2_IDX_DROP_CWI_BDMIS 0x75 CAPWAP inbound engine packet inner frame broadcast domain lookup failure drop
DCE2_IDX_DROP_CWI_BC 0x76 CAPWAP inbound engine packet inner frame broadcast destination MAC configured drop
(per broadcast domain)
DCE2_IDX_DROP_CWI_MC 0x77 CAPWAP inbound engine packet inner frame multicast destination MAC configured drop
(per broadcast domain)
DCE2_IDX_DROP_CWI_ETHER 0x78 CAPWAP inbound engine packet inner frame Ethertyp configured drop
DCE2_IDX_DROP_CWO_DMACMIS 0x79 CAPWAP outbound engine inner frame destination MAC lookup failure drop
DCE2_IDX_DROP_CWO_DMAC 0x7a CAPWAP outbound engine inner frame destination MAC configured drop
DCE2_IDX_DROP_CWO_BDMIS 0x7b CAPWAP outbound engine packet inner frame broadcast domain lookup failure drop
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 146 www.fortinet.com
DCE2_IDX_DROP_IPTO_ODF 0x7c IP tunnel outbound engine packet drop due to outer IPv4 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPTO_IDF 0x7d IP tunnel outbound engine packet drop due to inner IPv4 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPTO_IV6 0x7e IP tunnel outbound engine packet drop due to inner IPv6 header has DF set while
packet length bigger than MTU
DCE2_IDX_DROP_IPSEC0_IQUEUE 0x7f IPSec engine 0 packet drop due to invalid SA, invalid crypto suite, invalid padding/length,
or insufficient tunnel traffic quota
DCE2_IDX_DROP_IPSEC0_ENGINB 0x80 IPSec engine 0 packet drop due to authentication failure
DCE2_IDX_DROP_IPSEC1_IQUEUE 0x88 IPSec engine 1 packet drop due to invalid SA, invalid crypto suite, invalid padding/length,
or insufficient tunnel traffic quota
DCE2_IDX_DROP_IPSEC1_ENGINB 0x89 IPSec engine 1 packet drop due to authentication failure
DCE2_IDX_DROP_CWO_TUNINV 0x92 CAPWAP outbound packet drop due to invalid tunnel
DCE2_IDX_DROP_IPTO_TUNINV 0x93 IP tunnel outbound packet drop due to invalid tunnel
DCE2_IDX_DROP_TPE_SHAPER 0x94 Traffic policy engine policy based traffic shaping triggered packet drop
DCE2_IDX_DROP_TPE_PRTSHP 0x95 Traffic policy engine port based traffic shaping triggered packet drop
DCE2_IDX_DROP_TPE_HPE 0x96 Host protection engine triggered packet drop (due to host protection policies)
Following lists the Packet descriptor queue (PDQ) full drop. PDQ is used to move packet around different functional blocks inside NP6.
Each PDQ drop counter includes a source module name and a target module name. A PDQ full drop generally means the target module
cannot process packet fast enough or get stuck due to abnormal conditions.
DCE2_IDX_DROP_PDQ_ISW_SSE0 0x97
DCE2_IDX_DROP_PDQ_ISW_SSE1 0x98
DCE2_IDX_DROP_PDQ_SSE0_SSE0 0x99
DCE2_IDX_DROP_PDQ_SSE0_SSE1 0x9a
DCE2_IDX_DROP_PDQ_SSE1_SSE0 0x9b
DCE2_IDX_DROP_PDQ_SSE1_SSE1 0x9c
DCE2_IDX_DROP_PDQ_ISW_FDB 0x9d
DCE2_IDX_DROP_PDQ_IPSEC0I_XHP 0x9e
DCE2_IDX_DROP_PDQ_IPSEC1I_XHP 0x9f
DCE2_IDX_DROP_PDQ_OSW_EHP0 0xa1
DCE2_IDX_DROP_PDQ_OSW_EHP1 0xa2
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 147 www.fortinet.com
DCE2_IDX_DROP_PDQ_OSW_EHP2 0xa3
DCE2_IDX_DROP_PDQ_OSW_EHP3 0xa4
DCE2_IDX_DROP_PDQ_OSW_IPSEC0I 0xa5
DCE2_IDX_DROP_PDQ_OSW_IPSEC0O 0xa6
DCE2_IDX_DROP_PDQ_OSW_IPSEC1I 0xa7
APSTYPE_CWI0 ~ APSTYPE_CWI31 0xc0 ~ 0xdf Per type packet anomaly drop in CAPWAP inbound engine
ASPTYPE_IPTI0 ~ APSTYPE_IPTI31 0xe0 ~ 0xff Per type packet anomaly drop in IP tunnel inbound engine
Comment : For HPE (Host Protection Engine), see also HPE protection
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 148 www.fortinet.com
FG1K5D3 # diagnose npu np6 dceall 0 IPTO_IV6 :0000000000000000 [7e] IPSEC0_IQUEUE :0000000000000000 [7f]
MACFIL_BASE0 :0000000000000009 [00] MACFIL_BASE1 :0000000000000000 [01] IPSEC0_ENGINB0 :0000000000000000 [80] IPSEC0_ENGINB1 :0000000000000000 [81]
MACFIL_BASE2 :0000000000000000 [02] MACFIL_BASE3 :0000000000000000 [03] IPSEC0_ENGINB2 :0000000000000000 [82] IPSEC0_ENGINB3 :0000000000000000 [83]
MACFIL_BASE4 :0000000000000000 [04] MACFIL_BASE5 :0000000000000000 [05] IPSEC0_ENGINB4 :0000000000000000 [84] IPSEC0_ENGINB5 :0000000000000000 [85]
MACFIL_BASE6 :0000000000000000 [06] MACFIL_BASE7 :0000000000000000 [07] IPSEC0_ENGINB6 :0000000000000000 [86] IPSEC0_ENGINB7 :0000000000000000 [87]
MACFIL_BASE8 :0000000000000000 [08] MACFIL_BASE9 :0000000000000000 [09] IPSEC1_IQUEUE :0000000000000000 [88] IPSEC1_ENGINB0 :0000000000000000 [89]
MACFIL_BASE10 :0000000000000000 [0a] MACFIL_BASE11 :0000000000000000 [0b] IPSEC1_ENGINB1 :0000000000000000 [8a] IPSEC1_ENGINB2 :0000000000000000 [8b]
TBD :0000000000000000 [0c] TBD :0000000000000000 [0d] IPSEC1_ENGINB3 :0000000000000000 [8c] IPSEC1_ENGINB4 :0000000000000000 [8d]
TBD :0000000000000000 [0e] TBD :0000000000000000 [0f] IPSEC1_ENGINB5 :0000000000000000 [8e] IPSEC1_ENGINB6 :0000000000000000 [8f]
ISW_L2ACT_TPRT0 :0000000000000000 [10] ISW_L2ACT_TPRT1 :0000000000000000 [11] IPSEC1_ENGINB7 :0000000000000000 [90] TBD_91 :0000000000000000 [91]
ISW_L2ACT_TPRT2 :0000000000000000 [12] ISW_L2ACT_TPRT3 :0000000000000000 [13] TBD_92 :0000000000000000 [92] TBD_93 :0000000000000000 [93]
ISW_L2ACT_TPRT4 :0000000000000000 [14] ISW_L2ACT_TPRT5 :0000000000000000 [15] TPE_SHAPER :0000000000000000 [94] TPE_PRTSHP :0000000000000000 [95]
ISW_L2ACT_TPRT6 :0000000000000000 [16] ISW_L2ACT_TPRT7 :0000000000000000 [17] TPE_HPE :0000000000000000 [96] PDQ_ISW_SSE0 :0000000000000000 [97]
ISW_L2ACT_TPRT8 :0000000000000000 [18] ISW_L2ACT_TPRT9 :0000000000000000 [19] PDQ_ISW_SSE1 :0000000000000000 [98] PDQ_SSE0_SSE0 :0000000000000000 [99]
ISW_L2ACT_TPRT10:0000000000000000 [1a] ISW_L2ACT_TPRT11:0000000000000000 [1b] PDQ_SSE0_SSE1 :0000000000000000 [9a] PDQ_SSE1_SSE0 :0000000000000000 [9b]
TBD :0000000000000000 [1c] TBD :0000000000000000 [1d] PDQ_SSE1_SSE1 :0000000000000000 [9c] PDQ_ISW_FDB :0000000000000000 [9d]
TBD :0000000000000000 [1e] TBD :0000000000000000 [1f] PDQ_IPSEC0I_XHP :0000000000000000 [9e] PDQ_IPSEC1I_XHP :0000000000000000 [9f]
ISW_L2ACT_ETHR0 :0000000000000000 [20] ISW_L2ACT_ETHR1 :0000000000000000 [21] TBD_A0 :0000000000000000 [a0] PDQ_OSW_EHP0 :0000000000000000 [a1]
ISW_L2ACT_ETHR2 :0000000000000000 [22] ISW_L2ACT_ETHR3 :0000000000000000 [23] PDQ_OSW_EHP1 :0000000000000000 [a2] PDQ_OSW_EHP2 :0000000000000000 [a3]
ISW_L2ACT_ETHR4 :0000000000000000 [24] ISW_L2ACT_ETHR5 :0000000000000000 [25] PDQ_OSW_EHP3 :0000000000000000 [a4] PDQ_OSW_IPSEC0I :0000000000000000 [a5]
ISW_L2ACT_ETHR6 :0000000000000000 [26] ISW_L2ACT_ETHR7 :0000000000000000 [27] PDQ_OSW_IPSEC0O :0000000000000000 [a6] PDQ_OSW_IPSEC1I :0000000000000000 [a7]
ISW_L2ACT_ETHR8 :0000000000000000 [28] ISW_L2ACT_ETHR9 :0000000000000000 [29] PDQ_OSW_IPSEC1O :0000000000000000 [a8] PDQ_OSW_CWI :0000000000000000 [a9]
ISW_L2ACT_ETHR10:0000000000000000 [2a] ISW_L2ACT_ETHR11:0000000000000000 [2b] PDQ_OSW_CWO :0000000000000000 [aa] PDQ_OSW_IPTI :0000000000000000 [ab]
TBD :0000000000000000 [2c] TBD :0000000000000000 [2d] PDQ_OSW_IPTO :0000000000000000 [ac] PDQ_OSW_SYN :0000000000000000 [ad]
TBD :0000000000000000 [2e] TBD :0000000000000000 [2f] PDQ_OSW_HRX0 :0000000000000000 [ae] PDQ_OSW_HRX1 :0000000000000000 [af]
ISW_L2ACT_SVIF0 :0000000000000000 [30] ISW_L2ACT_SVIF1 :0000000000000000 [31] PDQ_IHP0_ISW :0000000000000000 [b0] PDQ_IHP1_ISW :0000000000000000 [b1]
ISW_L2ACT_SVIF2 :0000000000000000 [32] ISW_L2ACT_SVIF3 :0000000000000000 [33] PDQ_IHP2_ISW :0000000000000000 [b2] PDQ_IHP3_ISW :0000000000000000 [b3]
ISW_L2ACT_SVIF4 :0000000000000000 [34] ISW_L2ACT_SVIF5 :0000000000000000 [35] PDQ_XHP0_ISW :0000000000000000 [b4] PDQ_XHP1_ISW :0000000000000000 [b5]
ISW_L2ACT_SVIF6 :0000000000000000 [36] ISW_L2ACT_SVIF7 :0000000000000000 [37] PDQ_IPSEC0O_ISW :0000000000000000 [b6] PDQ_IPSEC1O_ISW :0000000000000000 [b7]
ISW_L2ACT_SVIF8 :0000000000000000 [38] ISW_L2ACT_SVIF9 :0000000000000000 [39] PDQ_CWI_ISW :0000000000000000 [b8] PDQ_CWO_ISW :0000000000000000 [b9]
ISW_L2ACT_SVIF10:0000000000000000 [3a] ISW_L2ACT_SVIF11:0000000000000000 [3b] PDQ_IPTI_ISW :0000000000000000 [ba] PDQ_IPTO_ISW :0000000000000000 [bb]
TBD :0000000000000000 [3c] TBD :0000000000000000 [3d] PDQ_SYN_ISW :0000000000000000 [bc] PDQ_OSW_ISW :0000000000000000 [bd]
TBD :0000000000000000 [3e] TBD :0000000000000000 [3f] PDQ_HTX0_ISW :0000000000000000 [be] PDQ_HTX1_ISW :0000000000000000 [bf]
ISW_L2ACT_SPRT0 :0000000000000000 [40] ISW_L2ACT_SPRT1 :0000000000000000 [41] APSTYPE_CWI0 :0000000000000000 [c0] APSTYPE_CWI1 :0000000000000000 [c1]
ISW_L2ACT_SPRT2 :0000000000000000 [42] ISW_L2ACT_SPRT3 :0000000000000000 [43] APSTYPE_CWI2 :0000000000000000 [c2] APSTYPE_CWI3 :0000000000000000 [c3]
ISW_L2ACT_SPRT4 :0000000000000000 [44] ISW_L2ACT_SPRT5 :0000000000000000 [45] APSTYPE_CWI4 :0000000000000000 [c4] APSTYPE_CWI5 :0000000000000000 [c5]
ISW_L2ACT_SPRT6 :0000000000000000 [46] ISW_L2ACT_SPRT7 :0000000000000000 [47] APSTYPE_CWI6 :0000000000000000 [c6] APSTYPE_CWI7 :0000000000000000 [c7]
ISW_L2ACT_SPRT8 :0000000000000000 [48] ISW_L2ACT_SPRT9 :0000000000000000 [49] APSTYPE_CWI8 :0000000000000000 [c8] APSTYPE_CWI9 :0000000000000000 [c9]
ISW_L2ACT_SPRT10:0000000000000000 [4a] ISW_L2ACT_SPRT11:0000000000000000 [4b] APSTYPE_CWI10 :0000000000000000 [ca] APSTYPE_CWI11 :0000000000000000 [cb]
ISW_L2ACT_SPRT12:0000000000000000 [4c] ISW_L2ACT_SPRT13:0000000000000000 [4d] APSTYPE_CWI12 :0000000000000000 [cc] APSTYPE_CWI13 :0000000000000000 [cd]
ISW_L2ACT_SPRT14:0000000000000000 [4e] ISW_L2ACT_SPRT15:0000000000000000 [4f] APSTYPE_CWI14 :0000000000000000 [ce] APSTYPE_CWI15 :0000000000000000 [cf]
APS_IHP0 :0000000000000000 [50] APS_IHP1 :0000000000000000 [51] APSTYPE_CWI16 :0000000000000000 [d0] APSTYPE_CWI17 :0000000000000000 [d1]
APS_IHP2 :0000000000000000 [52] APS_IHP3 :0000000000000000 [53] APSTYPE_CWI18 :0000000000000000 [d2] APSTYPE_CWI19 :0000000000000000 [d3]
APS_XHP0 :0000000000000000 [54] APS_XHP1 :0000000000000000 [55] APSTYPE_CWI20 :0000000000000000 [d4] APSTYPE_CWI21 :0000000000000000 [d5]
APS_CWI :0000000000000000 [56] APS_IPTI :0000000000000000 [57] APSTYPE_CWI22 :0000000000000000 [d6] APSTYPE_CWI23 :0000000000000000 [d7]
APS_HTX0 :0000000000000000 [58] APS_HTX1 :0000000000000000 [59] APSTYPE_CWI24 :0000000000000000 [d8] APSTYPE_CWI25 :0000000000000000 [d9]
IHP0_PKTCHK :0000000000000000 [5a] IHP1_PKTCHK :0000000000000000 [5b] APSTYPE_CWI26 :0000000000000000 [da] APSTYPE_CWI27 :0000000000000000 [db]
IHP2_PKTCHK :0000000000000000 [5c] IHP3_PKTCHK :0000000000000000 [5d] APSTYPE_CWI28 :0000000000000000 [dc] APSTYPE_CWI29 :0000000000000000 [dd]
XHP0_PKTCHK :0000000000000000 [5e] XHP1_PKTCHK :0000000000000000 [5f] APSTYPE_CWI30 :0000000000000000 [de] APSTYPE_CWI31 :0000000000000000 [df]
CWI_PKTCHK :0000000000000000 [60] IPTI_PKTCHK :0000000000000000 [61] APSTYPE_IPTI0 :0000000000000000 [e0] APSTYPE_IPTI1 :0000000000000000 [e1]
HTX0_PKTCHK :0000000000000000 [62] HTX1_PKTCHK :0000000000000000 [63] APSTYPE_IPTI2 :0000000000000000 [e2] APSTYPE_IPTI3 :0000000000000000 [e3]
SSE0_SHAPER :0000000000000000 [64] SSE0_SESSION :0000000000000000 [65] APSTYPE_IPTI4 :0000000000000000 [e4] APSTYPE_IPTI5 :0000000000000000 [e5]
SSE0_TTLEQ0 :0000000000000000 [66] SSE0_TTLEQ1 :0000000000000000 [67] APSTYPE_IPTI6 :0000000000000000 [e6] APSTYPE_IPTI7 :0000000000000000 [e7]
SSE0_MCAST :0000000000000000 [68] SSE1_SHAPER :0000000000000000 [69] APSTYPE_IPTI8 :0000000000000000 [e8] APSTYPE_IPTI9 :0000000000000000 [e9]
SSE1_SESSION :0000000000000000 [6a] SSE1_TTLEQ0 :0000000000000000 [6b] APSTYPE_IPTI10 :0000000000000000 [ea] APSTYPE_IPTI11 :0000000000000000 [eb]
SSE1_TTLEQ1 :0000000000000000 [6c] SSE1_MCAST :0000000000000000 [6d] APSTYPE_IPTI12 :0000000000000000 [ec] APSTYPE_IPTI13 :0000000000000000 [ed]
TBD :0000000000000000 [6e] TBD :0000000000000000 [6f] APSTYPE_IPTI14 :0000000000000000 [ee] APSTYPE_IPTI15 :0000000000000000 [ef]
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 149 www.fortinet.com
CWI_HDRCHK :0000000000000000 [70] CWI_RMACMIS :0000000000000000 [71] APSTYPE_IPTI16 :0000000000000000 [f0] APSTYPE_IPTI17 :0000000000000000 [f1]
CWI_SMACMIS :0000000000000000 [72] CWI_DMACMIS :0000000000000000 [73] APSTYPE_IPTI18 :0000000000000000 [f2] APSTYPE_IPTI19 :0000000000000000 [f3]
CWI_DMAC :0000000000000000 [74] CWI_BDMIS :0000000000000000 [75] APSTYPE_IPTI20 :0000000000000000 [f4] APSTYPE_IPTI21 :0000000000000000 [f5]
CWI_BC :0000000000000000 [76] CWI_MC :0000000000000000 [77] APSTYPE_IPTI22 :0000000000000000 [f6] APSTYPE_IPTI23 :0000000000000000 [f7]
CWI_ETHER :0000000000000000 [78] CWO_DMACMIS :0000000000000000 [79] APSTYPE_IPTI24 :0000000000000000 [f8] APSTYPE_IPTI25 :0000000000000000 [f9]
CWO_DMAC :0000000000000000 [7a] CWO_BDMIS :0000000000000000 [7b] APSTYPE_IPTI26 :0000000000000000 [fa] APSTYPE_IPTI27 :0000000000000000 [fb]
IPTO_ODF :0000000000000000 [7c] IPTO_IDF :0000000000000000 [7d] APSTYPE_IPTI28 :0000000000000000 [fc] APSTYPE_IPTI29 :0000000000000000 [fd]
APSTYPE_IPTI30 :0000000000000000 [fe] APSTYPE_IPTI31 :0000000000000000 [ff]
This command provide counters related to wellknown anomaly (see fpanomaly) from different inputs of the NP6 : IHP0, IHP1, IHP2, IHP3
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 150 www.fortinet.com
XHP0:
IPV4_LAND :0000000000000000 [80] IPV4_PROTO_ERR :0000000000000000 [81]
IPV4_UNKNOPT :0000000000000000 [82] IPV4_OPTRR :0000000000000000 [83]
../..
XHP1:
IPV4_LAND :0000000000000000 [a0] IPV4_PROTO_ERR :0000000000000000 [a1]
IPV4_UNKNOPT :0000000000000000 [a2] IPV4_OPTRR :0000000000000000 [a3]
../..
HTX0:
IPV4_LAND :0000000000000000 [c0] IPV4_PROTO_ERR :0000000000000000 [c1]
IPV4_UNKNOPT :0000000000000000 [c2] IPV4_OPTRR :0000000000000000 [c3]
../..
HTX1:
IPV4_LAND :0000000000000000 [e0] IPV4_PROTO_ERR :0000000000000000 [e1]
IPV4_UNKNOPT :0000000000000000 [e2] IPV4_OPTRR :0000000000000000 [e3]
../..
Provides packet drop counters for each host subinterfaces for RX and TX sides. Each sub interface has a dedicated purpose, for instance
NTurbo, but counters are given by an index for each direction. Index are from 0 to 127
Sample :
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 151 www.fortinet.com
● diag npu np6 sessionstats (sessionstatsclear) <npu_id>
Provides statistics on number of sessions installed and delete by NP for both ipv4 and ipv6 including v4 to v6 and v6 to v4 sessions.
These correspond to special ‘session_push’ and ‘session_delete’ special packet to program the SSE (see ipv4 unicast session acceleration)
Each insert or delete order may comes from a different channel (qid) of the Host interface where each channel is associated with a different
interrupt. The more the numbers are balanced on the ‘qid’, the best is the CPU core system load distribution.
The total for each channel is provided in the last line. The difference between ‘insert’ and ‘delete’ should provide the current number of installed
session on the NP
FGT1500D (global) # diagnose npu np6 sessionstats 0
qid ins44 ins46 del4 ins64 ins66 del6
ins44_e ins46_e del4_e ins64_e ins66_e del6_e
0 1164536088 0 1164500209 0 0 0
0 0 0 0 0 0
1 1168139559 0 1168103576 0 0 0
0 0 0 0 0 0
2 1165064519 0 1165028582 0 0 0
0 0 0 0 0 0
3 1168117367 0 1168081430 0 0 0
0 0 0 0 0 0
4 1112353090 0 1112318764 0 0 0
0 0 0 0 0 0
5 1114545834 0 1114511481 0 0 0
0 0 0 0 0 0
6 1112961759 0 1112927472 0 0 0
0 0 0 0 0 0
7 1115203965 0 1115169714 0 0 0
0 0 0 0 0 0
8 1113204955 0 1113170696 0 0 0
0 0 0 0 0 0
9 1115123725 0 1115089425 0 0 0
0 0 0 0 0 0
10 1112352398 0 1112318231 0 0 0
0 0 0 0 0 0
11 1114861876 0 1114827601 0 0 0
0 0 0 0 0 0
Total 691563247 0 691145293 0 0 0
0 0 0 0 0 0
Question : What means “_e” ?Error ?
Answer : no clue yet, but I have always seen value ‘0’ so far :) Guess: ephemeral ? Pointers are welcome.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 152 www.fortinet.com
The counter could be reset with command ‘diag npu np6 sessionstatclear <npu_id>
This command provides details of session for reach SSE and the total :
Active : number of current sessions installed
inserttotal / insertsuccess should be the same unless some failed. This is the number of insertion order received
deletetotal / deletesuccess : same for session deleted.
purgetotal/purgesuccess : refers to purge request that can be enabled on the NP (never tried, not sure of the usage)
search total : number of session lookup done following the reception of a packet in the SSE
searchhit : How many of those packet had a match in the session table (no confirmed)
phtsize : size of the primary table in memory (see ipv4 unicast session acceleration)
oftsize : size of the overflow table
oftfree : free memory within the overflow table
PBA : Number of packet slots left in the Packet Buffer Allocator. 3001 is the nominal value, it may get under temporarily but should
always get back to 3001 when load drops, if not there may be a PBA leak taking place.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 153 www.fortinet.com
● diag npu np6 pdq <np6_id>
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 154 www.fortinet.com
● diag npu np6 xgmacstats (xgmacstatsclear) <npu_id>
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 155 www.fortinet.com
● diag npu np6 gmacstats (gmacstatsclear)
Similar command than the xgmacstats but this time related to 1G ports when using the mix 1Gx10G NP6 form factor chip like with a
FortiGate500D for instance. Ports are numbered from port1 to port16.
Counters port5|GIGE8 port6|GIGE11 port7|GIGE9 port8|GIGE10 Counters port13|GIGE5 port14|GIGE4 port15|GIGE7 port16|GIGE6
RX_BCAST 0 0 0 0 RX_BCAST 0 0 0 0
RX_MCAST 0 0 0 0 RX_MCAST 0 0 0 0
RX_UCAST 0 0 0 0 RX_UCAST 0 0 0 0
RX_PAUSEFRM 0 0 0 0 RX_PAUSEFRM 0 0 0 0
RX_UNDERSIZE 0 0 0 0 RX_UNDERSIZE 0 0 0 0
RX_OVERSIZEP 0 0 0 0 RX_OVERSIZEP 0 0 0 0
RX_FRAG 0 0 0 0 RX_FRAG 0 0 0 0
RX_JAB 0 0 0 0 RX_JAB 0 0 0 0
RX_FCS 0 0 0 0 RX_FCS 0 0 0 0
RX_WFULL 0 0 0 0 RX_WFULL 0 0 0 0
RX_GOODOCTET 0 0 0 0 RX_GOODOCTET 0 0 0 0
RX_OCTET 0 0 0 0 RX_OCTET 0 0 0 0
TX_BCAST 0 0 0 0 TX_BCAST 0 0 0 0
TX_MCAST 0 0 0 0 TX_MCAST 0 0 0 0
TX_UCAST 0 0 0 0 TX_UCAST 0 0 0 0
TX_COL 0 0 0 0 TX_COL 0 0 0 0
TX_LATECOL 0 0 0 0 TX_LATECOL 0 0 0 0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 156 www.fortinet.com
TX_EXCESSCOL 0 0 0 0 TX_EXCESSCOL 0 0 0 0
TX_UNDERRUN 0 0 0 0 TX_UNDERRUN 0 0 0 0
TX_XPX_QFULL 0 0 0 0 TX_XPX_QFULL 0 0 0 0
TX_GOODOCTET 0 0 0 0 TX_GOODOCTET 0 0 0 0
TX_OCTET 0 0 0 0 TX_OCTET 0 0 0 0
PKT1024TOMAX 0 0 0 0 PKT1024TOMAX 0 0 0 0
PKT512TO1023 0 0 0 0 PKT512TO1023 0 0 0 0
PKT256TO511 0 0 0 0 PKT256TO511 0 0 0 0
PKT128TO255 0 0 0 0 PKT128TO255 0 0 0 0
PKT65TO127 0 0 0 0 PKT65TO127 0 0 0 0
PKT64 0 0 0 0 PKT64 0 0 0 0
Similar information than gmacstats but the portname is used as argument instead of the NP_id
This table is easier to read and parse for one port.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 157 www.fortinet.com
● diag npu np6 portlist
Provides the mapping table between external port and their NP6 XAUI attachment. This side by side output from a FortiGate1500D,
FortiGate3700D and FortiGate900D shows the different specificities of the port to XAUI associations.
FGT1500D # diagnose npu np6 portlist FGT3700D # diagnose npu np6 portlist FG900D # diagnose npu np6 portlist
Chip XAUI Ports Max Crosschip Chip XAUI Ports Max Crosschip Chip XAUI Ports Max Crosschip
Speed offloading Speed offloading Speed offloading
np6_0 0 port1 1G Yes np6_0 0 port26 10G Yes np6_0 0
0 port5 1G Yes 1 port25 10G Yes 1 port17 1G Yes
0 port17 1G Yes 2 port28 10G Yes 1 port18 1G Yes
0 port21 1G Yes 3 port27 10G Yes 1 port19 1G Yes
0 port33 10G Yes 03 port1 40G Yes 1 port20 1G Yes
1 port2 1G Yes 1 port21 1G Yes
1 port6 1G Yes np6_1 0 port30 10G Yes 1 port22 1G Yes
1 port18 1G Yes 1 port29 10G Yes 1 port23 1G Yes
1 port22 1G Yes 2 port32 10G Yes 1 port24 1G Yes
1 port34 10G Yes 3 port31 10G Yes 1 port27 1G Yes
2 port3 1G Yes 03 port3 40G Yes 1 port28 1G Yes
2 port7 1G Yes 1 port25 1G Yes
2 port19 1G Yes np6_2 0 port5 10G Yes 1 port26 1G Yes
2 port23 1G Yes 0 port9 10G Yes 1 port31 1G Yes
2 port35 10G Yes 0 port13 10G Yes 1 port32 1G Yes
3 port4 1G Yes 1 port6 10G Yes 1 port29 1G Yes
3 port8 1G Yes 1 port10 10G Yes 1 port30 1G Yes
3 port20 1G Yes 1 port14 10G Yes 2 portB 10G Yes
3 port24 1G Yes 2 port7 10G Yes 3
3 port36 10G Yes 2 port11 10G Yes
3 port8 10G Yes np6_1 0
np6_1 0 port9 1G Yes 3 port12 10G Yes 1 port1 1G Yes
0 port13 1G Yes 03 port2 40G Yes 1 port2 1G Yes
0 port25 1G Yes 1 port3 1G Yes
0 port29 1G Yes np6_3 0 port15 10G Yes 1 port4 1G Yes
0 port37 10G Yes 0 port19 10G Yes 1 port5 1G Yes
1 port10 1G Yes 0 port23 10G Yes 1 port6 1G Yes
1 port14 1G Yes 1 port16 10G Yes 1 port7 1G Yes
1 port26 1G Yes 1 port20 10G Yes 1 port8 1G Yes
1 port30 1G Yes 1 port24 10G Yes 1 port11 1G Yes
1 port38 10G Yes 2 port17 10G Yes 1 port12 1G Yes
2 port11 1G Yes 2 port21 10G Yes 1 port9 1G Yes
2 port15 1G Yes 3 port18 10G Yes 1 port10 1G Yes
2 port27 1G Yes 3 port22 10G Yes 1 port15 1G Yes
2 port31 1G Yes 03 port4 40G Yes 1 port16 1G Yes
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 158 www.fortinet.com
2 port39 10G Yes 1 port13 1G Yes
3 port12 1G Yes 1 port14 1G Yes
3 port16 1G Yes 2 portA 10G Yes
3 port28 1G Yes 3
3 port32 1G Yes
3 port40 10G Yes
Comments:
● 40G ports are bundling 4x10G ports at the ISF
● Lots of units have possible oversubscription on NP6 XAUI port, example above :
○ FortiGate1500D np6_0 XAUI 0 ⇒ 4x1G +1x10G = 14G < 10 G
○ FortiGate3700D np6_2 XAUI 0 ⇒ 4x10G = 40G < 10G (x4 !)
○ FortiGate900D np6_0 XAUI 1 ⇒ 16x1G = 16 G < 10 G
Details related to ipsec for all NPs. There are no precise description of the command output but some of them are quite obvious.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 159 www.fortinet.com
ib_chk_null_sa 00000000000 ob_chk_null_adpt 00000000000
ob_chk_null_sa 00000000000 rx_vif_miss 00000000000
rx_sa_miss 00000000000 rx_mark_miss 00000000000
waiting_ib_sa 00000000000 sa_mismatch 00000000000
msg_miss 00000000000
Dumps a simple table with features enabled on the different NP6s of the unit.
Example output from FortiGate900D running 5.2.9 and a FortiGate3700DX running 5.2.7
FG900D # diagnose npu np6 npufeature moiFG37DX1LAB (global) # diagnose npu np6 npufeature
np_0 np_1 np_0 np_1 np_2 np_3
Fastpath Enabled Enabled Fastpath Enabled Enabled Enabled Enabled
Lowlatencymode Disabled Disabled Lowlatencymode Disabled Disabled Disabled Disabled
Lowlatencycap No No Lowlatencycap Yes Yes No No
IPv4 firewall Yes Yes IPv4 firewall Yes Yes Yes Yes
IPv6 firewall Yes Yes IPv6 firewall Yes Yes Yes Yes
IPv4 IPSec Yes Yes IPv4 IPSec Yes Yes Yes Yes
IPv6 IPSec Yes Yes IPv6 IPSec Yes Yes Yes Yes
IPv4 tunnel Yes Yes IPv4 tunnel Yes Yes Yes Yes
IPv6 tunnel Yes Yes IPv6 tunnel Yes Yes Yes Yes
GRE tunnel No No GRE tunnel Yes Yes Yes Yes
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 160 www.fortinet.com
IPv4 Multicast Yes Yes IPv4 Multicast Yes Yes Yes Yes
IPv6 Multicast Yes Yes IPv6 Multicast Yes Yes Yes Yes
CAPWAP No No CAPWAP No No No No
● Comments :
○ capwap offload is not available in 5.0. It has only been added since B0961 (5.4 GA)
○ GRE tunnel is only available on FortiGate3700DX (thanks from TP2 FPGA)
○ Low latency is only available on 2 NP6 of FortiGate3700D and FortiGate3700DX
Very details command on internal NP6 registers values. Unknown usage for TAC, might be requested by devs eventually.
Only dumping the first lines for illustration purpose only for fans of hex…
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 161 www.fortinet.com
gpio_ivcr =00000000 (ffffff0000a63420)
pcs_isr =00000000 (ffffff0000a63440)
pcs_imr =00000000 (ffffff0000a63448)
pcs_imrc =00000000 (ffffff0000a63450)
pcs_isel =00000000 (ffffff0000a63458)
pcs_ivcr =00000000 (ffffff0000a63460)
pe00_isr =00000000 (ffffff0000a63480)
pe00_isrc =00000000 (ffffff0000a63488)
pe00_imr =00000000 (ffffff0000a63490)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 162 www.fortinet.com
sse_qry0 :0
sse_qry1 :0
sse0_timeout :0
sse1_timeout :0
sse_tmout_miss :0
sse_qry_miss :0
wrong_msg_len :0
wrong_msg_type :0
sa_exp_by_trf :0
sa_sn_exhausted :0
sa_sn_update :0
sa_throughput_update:0
sa_inb_antireply_update:0
sa_reconnect :0
tce_tmo :0
cwi_tmo :0
cwo_byte :0
cwo_pkt :0
cwo_tmo :0
ipto_update :0
tpe_update :0
ipsec_vif_miss :0
ipt_vif_miss :0
gre_vif_miss :0
ulif_miss :0
mcast_null :0
tpe_mcast :0
tpe_ipt :0
tpe_gre :0
tpe_cwi :0
Design recommendations
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 163 www.fortinet.com
Limitations and workarounds, fixed bugs
● Fixed bug 365497 : possible packet outoforder with NP6 during TCP session establishment
workaround: CLI command on policy : delaytcpnpusession enable|disable*
● Fixed bug 309458 : Passthrough UDP 4500 not accelerated, bug fixed in 5.4.1/5.2.8
● Fixed bug 0263634 / 0270666 : multicast is not offload in TP mode (fixed in 5.4, not in 5.2)
● #310482 IPv6 HA AA cluster master forwarding traffic does not go offloading
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 164 www.fortinet.com
NP4
Form factors
● 1 single form factor : dual core ship, each core has a single XAUI attachment
● PCIE x 8 lane bus
Performance figures
● 20 Gbps maximum Firewall throughput
● Sessions : 6 millions
● 6 Gpbs throughput IPSec ESP encryption/decryption
Integration
● NP4 are always connected to ISF (legacy exception amcxd4 module)
IRQ distribution
Each core of the NP4 has 4 IRQ mapped. This makes 8 IRQs overall for the NP4. This is not enough to allow a direct NTurbo ips acceleration
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 165 www.fortinet.com
From the inside
● Made of 2 cores
● MSIX support : multiRX and multiTX queues (since mantis #139358,4.3 build 423)
● 2 Session Search Engines : one per NP4 core
● Shapers : 2048 maximum (mantis #137405)
● 1 IPSec engine shared between the 2 cores with 8 subengines.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 166 www.fortinet.com
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 167 www.fortinet.com
Configuration options impacting NP4
● config system npu -> set dedicated-management-cpu enable (#201257, #218083, #251776)
● config system npu -> set dedicated-tx-npu enable (FG3600C only, mantis #256367)
● config system npu -> enc-offload-antireplay/dec-offload-antireplay/offload-ipsec-host (see ipsec part and Stephane’s IPSec Guide)
Diag commands
Provides NP4 chips port mapping without details on which core/XAUI is used.
Provide all the details thanks to the “sw_port” and “sw_np_port” information.
sw_np_port : BCM switch port where the NP4 core 10 g interface is connected
sw_port : BCM switch port where the interface is connected
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 168 www.fortinet.com
sw_port :26
sw_np_port :14
half_id :1
● hashing function to chose one of the IRQ is missing destination IP address in the hash (only src_ip, src_port,
dst_port)
● #310606 ESP passthrough is not accelerated by NP4
Special image fix, not merged was made via top3 (#229874)
● For maximum performance, interfaces should be chosen to use the 2 NP4 cores.
● reminder : no ipv6 acceleration, no multicast ipv4 acceleration. All packets sent to the first queue (first IRQ) of the
core which may cause a distribution issue if the traffic is high (#217643, #140153)
● Dedicated command channel for the kernel : session queue (fastpath setup) and message queue
(keepalive+ipsec). Both command chanels are triggering only the first IRQ (#217643, #263580)
● antireplay settings in ‘config system npu’ may or may not be considered depending on FortiOS version (see ipsec
chapter)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 169 www.fortinet.com
● Impact from dedicated management cpu command (#201257, #218083, #251776)
● Potential HPE drop due to congestion on egress XAUI between “intercore” accelerated traffic and non accelerated traffic.
Mitigations :
Use as many NP4 as possible in the lab
Spread LAG ports (for a lag) on multiple NP4
Use the same LAG hash on switch and FortiGate (L3)
Make sure return path goes through a different pairs of NP4 cores
Try to have accelerated and non accelerated traffic on different NP4
Mantis #256367 for FG3600C : config system npu → set dedicatedtxnpu enable (the 3rd NP4 is dedicated
● Possible no ipsec acceleration after a interface outage in a lag for outbound traffic only (#189140, #267252)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 170 www.fortinet.com
SoC2 (NP4 lite + CP8 lite)
Used on FortiGate60D, FortiGate70D, FortiGate90D, FortiGate200D, FortiGate240D,
FortiGate280D.
Soc2 is using an NP4 light.
● NP4Lite architecture
○ 1 single core
○ 4 x RGMII ( reduced RGMII : 1G bandwidth total for TX and RX)
○ 4 different interrupts but distribution on 2 the CPU cores available
○ another interrupt exist showing 0 interrupts (unknown usage)
Comments :
The RGMII attachments are different between models
There is no command known to establish port mapping (test with traffic may be required to see which counter increases)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 171 www.fortinet.com
FortiGate60D and FortiGate90D
Comments :
May have a single or a “dualconnected” switch fabric
In a dual switch fabric, each switch has its own RGMII link so bandwidth may be better between ports from 2 different switch fabric
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 172 www.fortinet.com
Interrupts
FG200D1LAB # diagnose hardware sysinfo interrupts FG240D # diagnose hardware sysinfo interrupts
CPU0 CPU1 CPU0 CPU1
0: 538988 0 IOAPICedge timer 0: 77576284 0 IOAPICedge timer
2: 0 0 XTPIC cascade 2: 0 0 XTPIC cascade
4: 9 0 IOAPICedge serial 4: 5827 0 IOAPICedge serial
7: 0 0 IOAPICedge LCD_KEYPAD 7: 0 0 IOAPICedge LCD_KEYPAD
8: 0 0 IOAPICedge rtc 8: 0 0 IOAPICedge rtc
16: 0 0 IOAPIClevel ehci_hcd, ehci_hcd 16: 0 0 IOAPIClevel ehci_hcd, ehci_hcd
17: 5743 0 IOAPIClevel libata, usbuhci, 17: 450364 0 IOAPIClevel libata, usbuhci,
usbuhci, net2280 usbuhci, net2280
18: 0 0 IOAPIClevel usbuhci, usbuhci 18: 0 0 IOAPIClevel usbuhci, usbuhci
19: 0 0 IOAPIClevel usbuhci 19: 0 0 IOAPIClevel usbuhci
64: 298362 0 PCIMSIedge mgmtQ0 64: 24897124 0 PCIMSIedge mgmtQ0
65: 9 0 PCIMSIedge mgmt 65: 8 0 PCIMSIedge mgmt
66: 6653177 0 PCIMSIedge np4lite 66: 817918 0 PCIMSIedge np4lite
67: 0 6990207 PCIMSIedge np4lite 67: 0 11229931 PCIMSIedge np4lite
68: 8188509 0 PCIMSIedge np4lite 68: 893281 0 PCIMSIedge np4lite
69: 0 5626707 PCIMSIedge np4lite 69: 0 1051194 PCIMSIedge np4lite
70: 0 0 PCIMSIedge np4lite 70: 0 0 PCIMSIedge np4lite
71: 0 0 PCIMSIedge cp8 71: 10111 0 PCIMSIedge cp8
72: 0 0 PCIMSIedge cp8 72: 0 10567 PCIMSIedge cp8
73: 0 0 PCIMSIedge cp8 73: 0 0 PCIMSIedge cp8
74: 0 0 PCIMSIedge cp8 74: 0 0 PCIMSIedge cp8
75: 0 0 PCIMSIedge cp8 75: 0 0 PCIMSIedge cp8
NMI: 538926 538962 NMI: 77576216 77576252
LOC: 538904 538903 LOC: 77576576 77576575
ERR: 0 ERR: 0
MIS: 0 MIS: 0
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 173 www.fortinet.com
Drop counter table
FG200D1LAB # fnsysctl cat /proc/fsoc/npl/dce 0083: 00000086 CRC Error drop in Ch2
000c: 00000161 Inbound total valid Packets in Ch0 008c: 027aa6f2 Inbound total valid Packets in Ch2
000d: 00000309 Outbound total valid packets in Ch0 008d: 02758887 Outbound total valid packets in Ch2
0011: 00000001 SSE to EHP PDQ Full in Ch0 0091: 00055c3c SSE to EHP PDQ Full in Ch2
0012: 0000010c SSE to HRX PDQ Full drop 009a: 00005af5 Inbound Statistic number for Broadcast Packet in Ch2
001a: 0000003c Inbound Statistic number for Broadcast Packet in Ch0 009b: 00000025 Inbound Statistic number for Multicast Packet in Ch2
001b: 00000125 Inbound Statistic number for Multicast Packet in Ch0 009c: 027a4bd8 Inbound Statistic number for Unicast Packet in Ch2
001c: 00062f9e Inbound Statistic number for Unicast Packet in Ch0 009d: 000078fc Outbound statistic number for Broadcast Packet in Ch2
001d: 000000ed Outbound statistic number for Broadcast Packet in Ch0 009e: Outbound statistic number for Multicast Packet in Ch2 ?
001e: Outbound statistic number for Multicast Packet in Ch0 ? 009f: 02750f8b Outbound Statistic number for Unicast Packet in Ch2
001f: 00000309 Outbound Statistic number for Unicast Packet in Ch0 00ba: 02758887 SSE to EHP PDQ Read Commit Number in Ch2
0036: 002c1ea7 SSE to HRX PDQ Read Commit Number 00bb: 02758887 SSE to EHP PDQ Write Commit Number in Ch2
0037: 002c1ea7 SSE to HRX PDQ Write Commit Number 00be: 027aa6f2 IHP to SSE PDQ Read Commit Number in Ch2
003a: 00000309 SSE to EHP PDQ Read Commit Number in Ch0 00bf: 027aa6f2 IHP to SSE PDQ Write Commit Number in Ch2
003b: 00000309 SSE to EHP PDQ Write Commit Number in Ch0 00c3: 0000001b CRC Error drop in Ch3
00cc: 00005ae1 Inbound total valid Packets in Ch3
003e: 00000161 IHP to SSE PDQ Read Commit Number in Ch0 00da: 00005ae1 Inbound Statistic number for Broadcast Packet in Ch3
003f: 00000161 IHP to SSE PDQ Write Commit Number in Ch0 00db: 0000006c Inbound Statistic number for Multicast Packet in Ch3
004c: 00000049 Inbound total valid Packets in Ch1 00dc: 00000221 Inbound Statistic number for Unicast Packet in Ch3
004d: 00185d5f Outbound total valid packets in Ch1 00dd: 00000009 Outbound statistic number for Broadcast Packet in Ch3
0051: 0000000f SSE to EHP PDQ Full in Ch1 00de: Outbound statistic number for Multicast Packet in Ch3 ?
0052: SSE to HRX PDQ Full drop ? 00df: 0000040f Outbound Statistic number for Unicast Packet in Ch3
005a: 00000049 Inbound Statistic number for Broadcast Packet in Ch1 00fa: 0000040f SSE to EHP PDQ Read Commit Number in Ch3
005b: 00000002 Inbound Statistic number for Multicast Packet in Ch1 00fb: 0000040f SSE to EHP PDQ Write Commit Number in Ch3
005c: 00163b83 Inbound Statistic number for Unicast Packet in Ch1 00fe: 00005ae1 IHP to SSE PDQ Read Commit Number in Ch3
005d: 00000096 Outbound statistic number for Broadcast Packet in Ch1 00ff: 00005ae1 IHP to SSE PDQ Write Commit Number in Ch3
00ee: Outbound statistic number for Multicast Packet in Ch1 ? 0114: 0015ea3b HTX0 to SSE PDQ Write Commit Number
005f: 00185cc9 Outbound Statistic number for Unicast Packet in Ch1 0115: 00161707 HTX1 to SSE PDQ Write Commit Number
007a: 00185d5f SSE to EHP PDQ Read Commit Number in Ch1 0138: 00140d09 SSE to HRX0 PDQ Read Commit Number
007b: 00185d5f SSE to EHP PDQ Write Commit Number in Ch1 0139: 0018119e SSE to HRX1 PDQ Read Commit Number
007e: 00000049 IHP to SSE PDQ Read Commit Number in Ch1 013c: 00140d09 SSE to HRX0 PDQ Write Commit Number
007f: 00000049 IHP to SSE PDQ Write Commit Number in Ch1 013d: 0018119e SSE to HRX1 PDQ Write Commit Number
../..
Comments:
only shows line when counter increases. This output has been built from multiple dumps on multiple units.
each time the command is send, counter are reset
Lines with ‘?’ at the end are guessed from similar lines on other groups
DCE counter table in 5 sections : Ch0, Ch1, Ch2, Ch3 (one for each RGMII port probably), host interface 0 and 1
each counters has a reference (000c: 000d: ..), there are some holes in this dump (missing ref)
seems to have 1 channel per RGMII (Ch0, Ch1, Ch2, Ch3)
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 174 www.fortinet.com
Statistics
FG200D1LAB # fnsysctl cat /proc/fsoc/npl/stats
cmd_alloc_fail :0000000000 cmd_resc_flush :0000000000
cmd_resc_flush_fail :0000000000 cmd_issue_fail :0000000000
ses_ins_total :0007947784 fw_ses_ins_orig :0003973485
fw_ses_ins_reply :0003974299 ses_del_total :0007880055
fw_ses_del :0007880055 ses_timeout :0000000000
sa_set_total :0000000000 sa_set_ib :0000000000
sa_set_ib_nomem :0000000000 sa_set_ib_dfail :0000000000
sa_set_ib_ses_fail :0000000000 sa_set_ob :0000000000
sa_set_ob_nomem :0000000000 sa_set_ob_dfail :0000000000
sa_del_total :0000000000 sa_del_ib :0000000000
sa_del_ib_nomem :0000000000 sa_del_ib_dfail :0000000000
sa_del_ib_ses_fail :0000000000 sa_del_ob :0000000000
sa_del_ob_nomem :0000000000 sa_del_ob_dfail :0000000000
check_ipsec_offload :0000000000 check_ipsec_offload_ok :0000000000
Comments:
3 different zones : command, sessions, ipsec
● FortiGate240D :
Port1 to port40 share same uplink. So the total bandwidth is 1Gbps. They other RGMII link is used for DMZ ports. (#278064)
240D has 4 RGMII link which are dedicated to WAN1, WAN2, ALL_LAN_PORTS, ALL_DMZ_PORTS respectively (#278064)
⇒ This is contradiction with FG240D functional diagram that shows a different distribution. Don’t know which one is correct…
● #295622 High packet dropped when using a port in 100Mb connected directly to NPlight RGMII. Use a port connected to internal switch instead to
buffer packets.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 175 www.fortinet.com
FortiGate3700DX overview
In short, 3700D with addition of 2x FGPA called TP2 used as an external extension of the NP6 service-group to accelerated GTP traffic and GRE tunneling.
The platform has also been used to workaround the NP6 out-of-order issue using the TP2 to do packet re-ordering.
Syntax:
diagnosis tp2 status <dev_id>
diagnosis tp2 register <dev_id>
diagnosis tp2 xgmacstats <dev_id>
diagnosis tp2 xgmacstatsclear <dev_id>
diagnosis tp2 selcnt <dev_id>
diagnosis tp2 selcntclear <dev_id>
diagnosis tp2 update
top3 #437462 "Add IPSec Anti-Replay workaround based on 5.4.4 3700DX branch "
So far (as of 170718) there was no deployment of 3700DX for ipsec ooo but the code exists.
With regards to GTP and GRE acceleration another solution has been found with regular 3700D in pure CPU (using kernel acceleration based on
nturbo) with higher performance.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 176 www.fortinet.com
FortiCarrier (Carrier Grade Nat) overview
Work to be done.
In short, a unit based on Forticore hardware dedicated to high performance natting. No NP6 involved.
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 177 www.fortinet.com
Reference websites and documents
The following useful references are available related to Fortinet hardware acceleration
● docs.fortinet.com
● Related publications
● Expert Academy 2016 : The back of the rack [ Fortinet internal / Fortinet official partners]
https://fortivision.fortinet.com/index.php?/topic/11786emeaexpertacademy2016thebackoftherack
Hardware acceleration and the NP6 processor
NextGen Firewall IPS/AppCtrl with NTurbo
ADVPN Configuring & troubleshooting
Fortinet confidential, do not distribute outside Fortinet authors : Cedric Gustave (SET) 178 www.fortinet.com