Professional Documents
Culture Documents
Juniper Netscreen Troubleshooting
Juniper Netscreen Troubleshooting
NOTE: Please note this Student Guide has been developed from an audio narration. Therefore it will have conversational English. The purpose of this transcript is to help you follow the online presentation and may require reference to it. Slide 1
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
Welcome to Juniper Networks NetScreen 5000 Series Security Systems and ISG Series Troubleshooting eLearning module.
Course SERT-NS5000
Slide 2
Navigation
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 2
Throughout this module, you will find slides with valuable detailed information. You can stop any slide with the Pause button to study the details. You can also read the notes by using the Notes tab. You can click the Feedback link at anytime to submit suggestions or corrections directly to the Juniper Networks eLearning team.
Course SERT-NS5000
Slide 3
Course Objectives
After successfully completing this course, you will be able to:
Distinguish between ISG Series and NS5000 Series hardware configuration and packet flow Explain the importance of the ASIC functions Describe First Path and Fast Path in packet flow Differentiate between functions processed in the CPU versus PPU Use and interpret debug commands unique to high end systems Explain the workarounds for 3 typical troubleshooting examples
2010 Juniper Networks, Inc. All rights reserved.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 3
Distinguish between ISG Series and NS5000 Series hardware configuration and packet flow Explain the importance of the ASIC functions Describe First Path and Fast Path in packet flow Differentiate between functions processed in the CPU versus PPU Use and interpret debug commands unique to high end systems, and Explain the workarounds for 3 typical troubleshooting examples
Course SERT-NS5000
Slide 4
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 4
The High End Systems Architecture Packet Flow ASIC Functions Debug, and Troubleshooting Examples
Course SERT-NS5000
Slide 5
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
In this section we take a look at the high end systems: the ISG Series and the NetScreen 5000 Series.
Course SERT-NS5000
Slide 6
Section Objectives
After successfully completing this section, you will be able to:
Identify the two high end system series List the built-in modules and the interface cards in the platform Identify the types of SPMs available with each of the three Management modules
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 6
Identify the two high end system series List the built-in modules and the interface cards in the platform, and Identify the types of SPMs available with each of the three Management modules
Course SERT-NS5000
Slide 7
ISG2000, ISG2000-IDP
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 7
First we have the ISG Series, which is the lower range of the high end systems, with the ISG1000, and the ISG2000. They can also have IDP for the security module, which we are going to see is provided as a built-in card.
Course SERT-NS5000
Slide 8
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 8
We have built-in modules and also the interface card. The built-in modules are the Management module, the Security module for the IDP, and the ASIC module. Then we have the interface cards. There are four ports and eight ports fast Ethernet (FE), two ports gigabit Ethernet (GE), and four ports GE as well. The four port is available starting from ScreenOS 5.4 and the one port ten gigabit is available starting with ScreenOS 6.1.
Course SERT-NS5000
Slide 9
NS5400
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 9
We also have the NS5000 Series. These are in the higher range of the high end systems, and there are two chassis one is the NS5200 and the other is the NS5400. The NS5400 has two more slots for the line cards.
Course SERT-NS5000
10
Slide 10
MGT
YES YES NO NO NO NO
MGT2 MGT3
YES YES YES YES NO NO NO NO NO NO YES YES
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 10
What sorts of modules do we have for this platform? We have the Management modules and the Secure Port Modules (SPMs). There are three types of Management modules, referred to as Management 1, 2 and 3. For SPM, there is the two gigabit, 24-port fast Ethernet (2G24FE). Then there is the eight port gigabit and a two port ten gigabit.
With ScreenOS 6.1 we have the latest version of the eight gigabit and ten gigabit cards. We will see that in a subsequent slide.
In the table here, you see how they can be used. For Management 1 we can use the 24 port FE and the eight Gig 1 card. With Management 2, we can also use the eight Gig 2 card and the 10 gigabit card, and with Management 3, we can use only the newer generation of the eight Gig and the two port 10 Gig card.
Course SERT-NS5000
11
Slide 11
Section Summary
In this section, we:
Identified the two high end system series Listed the built-in modules and the interface cards in the platform Identified the types of SPMs available with each of the three Management modules
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 11
Identified the two high end system series Listed the built-in modules and the interface cards in the platform, and Identified the types of SPMs available with each of the three Management modules
Course SERT-NS5000
12
Slide 12
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 12
Course SERT-NS5000
13
Slide 13
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 13
Course SERT-NS5000
14
Slide 14
Architecture
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
Architecture
Course SERT-NS5000
15
Slide 15
Section Objectives
After successfully completing this section, you will be able to:
Differentiate between the ISG and NetScreen 5000 chassis Use the commands get system path and get chassis
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 15
Differentiate between the ISG and NetScreen 5000 chassis, and Use the commands get system path and get chassis
Course SERT-NS5000
16
Slide 16
Architecture (1 of 11)
Why is the architecture important?
To understand the packet flow Troubleshooting depends on it
These components are directly involved in the process
Debugging in the CPU level is not always enough System behavior depends on the architecture
E.g., in ScreenOS 5.4, TCP SYN check is done in CPU on NS5000, but its done in PPU on ISG
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 16
Why talk about the architecture? Its very important to understand packet flow in the system, and to be able to troubleshoot it because these components are directly involved in the process. When we do debugging in the CPU, it may not always be enough to find the reason a packet dropped or why the traffic is not processed as expected, etc. Also because the system behavior depends on the architecture depending on the card or version thats being used, the behavior might be different. The example here is TCP SYN check, which is done in the CPU for the NetScreen 5000 Series, but for the ISG its done in the PPU. We are going to see what the PPU is later in the course. But the PPU is inside the ASIC chip, so its very important for us to understand.
Another example that shows that features depend on the architecture is the fact that AES encryption is done in the ASIC for GigaScreen3 and 4, which we will see when we look at the schematic.
Course SERT-NS5000
17
Slide 17
Architecture (2 of 11)
Highlights
Use of ASIC chips to increase performance and throughput ISG Series have GigaScreen3 ASIC NS5000 Series have 3 different ASICs:
GigaScreen2 2G24FE/8G SPM GigaScreen3 8G2/2XGE SPM GigaScreen4 8G2-G4/2XGE-G4 SPM
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 17
Lets cover some highlights concerning the architecture. The use of ASIC chips increases the performance and throughput, which is one great advantage of this platform. The ISG Series uses the GigaScreen3 ASIC.
The NetScreen 5000 Series has three different types that will depend on the secure port module used. They are listed on the slide the GigaScreen4 is the latest one, thats in combination with the Management3 card that we saw in the table in a previous slide.
Another important thing is that the Management and the Security modules have dual CPUs. One CPU is used to process the flow of traffic and the other CPU is used to perform the task for example, OSPF routing or some other management task in the system.
Course SERT-NS5000
18
Slide 18
Architecture (3 of 11)
ISG Chassis
ISG Series
Management Module
1 x GigaScreen3 ASIC in the ASIC module ASIC module has direct connection with Management and Security Modules via PCI bus Management and Security Modules have dual CPU Security Module has additional FPGA (FieldProgrammable Gate Array)
CONFIDENTIAL
SERT-NS5000 www.juniper.net | 18
Network Traffic
2010 Juniper Networks, Inc. All rights reserved.
Lets look at the ISG Series. The basic structure consists of one ASIC module. At the bottom are the interface cards that connect to the ASIC module, and the ASIC connects to the security module. In the ISG2000 you can have three, and the ISG1000 can have two, for the IDP functionality. Then theres the Management module. The security module also has an FPGA to help provide high throughput to the system.
Course SERT-NS5000
19
Slide 19
Architecture (4 of 11)
ISG ASIC Module
Built-in 1 x GigaScreen3 ASIC All I/O cards connect to backplane with dedicated paths to ASIC chip Front End Processor FPGA chips interface between I/O and ASIC (2 in ISG2000 and 1 in ISG-1000)
ASIC Module
Control Bus
Slot 2 Slot 1
Slot 4
Slot 3
CONFIDENTIAL
SDRAM
SERT-NS5000
www.juniper.net | 19
Lets look specifically now into the ISG ASIC module. Thats the focus of our attention because thats where we need to look when we are troubleshooting the platform. We have the GigaScreen3 ASIC, we have I/O cards, and we have connection to the I/O cards, so there is a data bus from the I/O card to the FPGA, which is a front-end processor. You can think of a switch thats transferring the packets from the I/O cards to the ASIC chip for processing.
Course SERT-NS5000
20
Slide 20
Architecture (5 of 11)
ISG2000 Architecture
ISG-1000/2000 share the similar HW architecture Single ASIC chip, FPGA chip, IO modules are separated with chip
3
Slot 3 MGT Module Slot 2-0 Security Modules
I/O Modules
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 20
Here we see a feature of the chassis looking at it from the top. On the left hand side is the rear of the chassis and on the right hand side is the front. In the front are the I/O modules. Then we see the ASIC module; then 3 empty slots for the security modules; in the back we see in slot three the Management module
Course SERT-NS5000
21
Slide 21
Architecture (6 of 11)
ISG-1000 Architecture
ISG-1000/2000 share the similar HW architecture Single ASIC, Switch Fabric FPGA, IO modules are separated with chip
FAN Module Power Supply Module 2 Slot for Security Module ASIC Module Slot 3 Mgt Module
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 21
The ISG1000 is very similar. Here we see the front is on the left side of the picture. We see the ASIC module its always the one thats closest to the I/O card. Then there are two slots in the middle for the security module. Here we see again slot 3 for the Management module. Finally, theres the power supply in the back of the chassis.
Course SERT-NS5000
22
Slide 22
Architecture (7 of 11)
NS5000 Chassis GigaScreen ASIC in the SPM
NetScreen 5400
MGT
SPM
SPM
SPM
15Gbps switch fabric interconnecting SPMs Dedicated bus for control Dedicated bus for traffic to MGT module MGT1 has one CPU MGT2/MGT3 have 2 CPUs
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 22
Next, lets look at the NetScreen 5000, in general. We have more capacity here. There are 3 SPMs that share the 15 gigabit switch fabric. It has a dedicated bus for traffic control in the chassis and another bus for traffic to the Management module. Later we will show when the SPM needs to send traffic to the Management module, that dedicated bus is used to avoid any congestion.
Management2 and Management3 cards have two CPUs for flow and tasks. For Management1 they are in same physical CPU, separated in the architecture of the software.
Course SERT-NS5000
23
Slide 23
Architecture (8 of 11)
NS5000 SPM (1)
ASIC chips reside in the SPMs Number and type of ASIC depend on the SPMs:
2G24FE 1 x GigaScreen2 8G 2 x GigaScreen2 8G2/2XGE 2 x GigaScreen3 8G2-G4/2XGE-G4 2 x GigaScreen4
Front End Processor FPGA chips interface between ASICs and backplane to MGT board/ASICs in other SPMs
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 23
Here you see the Secure Port Module of the NS5000. This would be the equivalent of the ASIC module that we saw for the ISG Series.
Course SERT-NS5000
24
Slide 24
Architecture (9 of 11)
NS5000 SPM (2)
8G2-G4 SPM
GigaScreen4 GigaScreen4
Backplane
FPGA
FPGA
FPGA
I/O
I/O
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 24
Here you see there are two GigaScreen ASICs in each module. There are front-end processors that do the interconnection within the cards, between the different ASICs, and also to the backplane if the traffic needs to go to another SPM.
At the bottom you see the I/O interface. This can be one ten gig port or four one Gig ports.
Course SERT-NS5000
25
Slide 25
ns5400-> get system | in product Product Name: NetScreen-5400-II isg2000-> get system | in product Product Name: NetScreen-2000 ns5200-> get system | in product Product Name: NetScreen-5200-II nsisg1000-> get system | in product Product Name: NetScreen-ISG1000
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 25
How do we check the hardware configuration? This simple command shows what product we are talking about: get system | in product.
Course SERT-NS5000
26
Slide 26
isg2000(M)-> get chassis Chassis Environment: Power Supply: Good Fan Status: Good CPU Temperature: 113'F ( 45'C) Slot Information: Slot Type S/N Assembly-No Version Temperature 0 System Board 0079022005000207 0051-005 E01 78'F (26'C), 86'F (30'C) 4 Management 0081022005000392 0049-004 D06 113'F (45'C) 3 Security 0137062005000114 0049-001 A02 cpu1:Ready, cpu2:Ready 5 ASIC Board 000140527B050065 0050-003 C00 Marin FPGA version 9, Jupiter ASIC version 1, Fresno FPGA version 110 I/O Board Slot Type S/N Version FPGA version 1 1 port XFP 0229062008000062 A00 3 2 4 port 10/100 0084042004000002 D01 6 3 1 port XFP 0229062008000070 A00 3
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 26
If you want to see details, you will use the command get chassis. Here you see an example first for a NetScreen 5400. Management3 is the card being used and theres one ten Gig module, and one eight Gig module, and they are in slots two and three in this notation. You can see the serial number for each card, the assembly number, temperature and the DRAM size.
At the bottom, the other output is for the ISG2000. Here also is a management board, but additionally there is the security module, and then the ASIC module as was shown in the schematic and also the I/O cards. Also in the middle you can see the FPGA version information. Jupiter is the internal name of the ASIC and Fresno is the internal name of the FPGA. Those were the names used when the command was run.
Course SERT-NS5000
27
Slide 27
Section Summary
In this section, we:
Differentiated between the ISG and NetScreen 5000 chassis Showed how to use the commands get system path and get chassis
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 27
Differentiated between the ISG and NetScreen 5000 chassis, and Showed how to use the commands get system path and get chassis
Course SERT-NS5000
28
Slide 28
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 28
Course SERT-NS5000
29
Slide 29
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 29
Course SERT-NS5000
30
Slide 30
Packet Flow
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
Packet Flow
Course SERT-NS5000
31
Slide 31
Section Objectives
After successfully completing this section, you will be able to:
Explain the difference between packet flow in First Path and Fast Path Describe packet flow in the NS5000 and ISG Series platforms Identify packet types that need to be processed at the CPU level
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 31
Explain the difference between packet flow in First Path and Fast Path Describe packet flow in the NS5000 and ISG Series platforms, and Identify packet types that need to be processed at the CPU level
Course SERT-NS5000
32
Slide 32
Packet Flow (1 of 6)
NS5000First Path: CPU is involved in processing
MGT3
CPU CPU
4 8G2-G4 SPM
GigaScreen4
4) ASIC checks the packet and forwards it to CPU 5) CPU processes the packet and sends it back to ASIC
Backplane
6
FPGA
FPGA
FPGA
2
I/O I/O
6) ASIC forwards the packet to FPGA 7) FPGA forwards packet to interface chip 8) Interface chip sends the packet out
Packet
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 32
Lets now look at Packet Flow. We want to show how packets go through different components so you know what to look for when you are troubleshooting. We will first start with the NetScreen 5000. The example here is for the First Path. The First Path is when the CPU is involved in processing the packet. We call it First Path because this process is most commonly used when there is a packet for a new session. A new session is always created in the CPU so the ASIC needs to forward traffic to the CPU for processing.
You see the packet at the bottom step number 1. The packet arrives at the interface chip, then it will go to the FPGA, and the FPGA then forwards it to the ASIC thats directly connected to the FPGA. The ASIC looks at the packet and determine that this one needs to be sent to the CPU. It will send it to the CPU via the backplane and then the CPU will do the processing. Lets say it creates the session and then sends it back to the same ASIC chip, and then the ASIC chip will match the packet to an existing session. When the CPU processed the packet, the session was created and installed in the ASIC chip. The packet received matches the session and is then sent out. At that point the FPGA gets the packet and will forward it to the correct outgoing interface. The packet goes to the interface and then it will leave the system.
Course SERT-NS5000
33
Slide 33
Packet Flow (2 of 6)
NS5000First Path: CPU is involved in processing
First packet for session creation
Packets that need ALG/DI/Web Filtering Packets for the following protocols need to be processed by CPU:
0 : IPv6 Hop-by-Hop Option 1: ICMP 2: IGMP 4: IP-in-IP 58: ICMPv6 89: OSPF 103: PIM 112: VRRP 132: SCTP
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 33
Here are some more details about the First Path. To repeat, first its for session creation. When there is a packet that doesnt match any existing flow, it has to be sent to the CPU for session creation. Also, when we have Application Layer Gateway (ALG) inspection or Deep Inspection (DI) or Web Filtering, the content of the packet needs to be inspected so that, for example, in the ALG FTP the control connection needs to be inspected so that the dynamic ports can be opened properly by the firewall. And there are other packets that also need to be processed on the CPU level and these are mainly: ICMP, IGMP, OSPF, PIM, VRRR, SCTP and so on.
Course SERT-NS5000
34
Slide 34
Packet Flow (3 of 6)
NS5000Fast Path: CPU is not involved: packet matches session
MGT3
CPU CPU
8G2-G4 SPM
GigaScreen4
Backplane
4) ASIC checks the packet, matches session and forwards it back to FPGA 5) FPGA forwards packet to interface chip 6) Interface chip sends the packet out
4
FPGA
4 FPGA
FPGA
2
I/O I/O
Packet
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 34
Now that we have considered the First Path, that requires CPU help to process the traffic, lets now check the Fast Path. It is called Fast Path because the CPU doesnt get involved. The GigaScreen ASIC is capable of processing the flow and avoids burdening the CPU. The packets are processed on the ASIC level and thats how we get very high throughput with this system.
Lets look at how the packet flows. It first arrives at the interface chip, it goes to the FPGA, and then the GigaScreen ASIC checks the packet and it will check it against the session table. It will go to session lookup engine to match the session, and then it will match the session, identify the outgoing interface, and then send it back to the FPGA. Then the FPGA can forward it to the interface port and it will then be sent out.
Course SERT-NS5000
35
Slide 35
Packet Flow (4 of 6)
ISG2000-IDP First Path: Traffic is sent from CPU to 1) Packet arrives interface card Security Module
2) Packet is forwarded to FPGA FPGA forwards it to ASIC ASIC checks the packet and forwards it to CPU (pass 96 bytes to MM via PCI control bus) CPU processes the packet and sends it to ASIC ASIC receives the packet and forwards it to IDP (A complete packet is transferred to SM through Data Bus) IDP processes the packet and sends it to ASIC ASIC sends packet to FPGA FPGA forwards packet to interface chip
MM
CPU CPU
SM
CPU CPU
3) 4)
ASIC Module
Slot 2 Slot 1
4
FPGA
5) 6)
Packet
Data Bus
GigaScreen3
SDRAM
2
Data Bus FPGA
7) 8) 9)
Slot 4
Slot 3
10
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 35
Next, we check the First Path for the ISG2000 with the IDP security module. Lets see how the packet flows in this case. We start at the same point the packet arrives at the interface card and then via the data bus goes to the FPGA. The FPGA will send it to the ASIC chip; the ASIC chip checks the session table and will not find it. It will send it to the Management module for the session creation in this example. If its ALG, the session actually is matched, but it will have a flag to say this packet needs to go to the CPU for inspection for further processing. Then the packet is processed and it is sent back to the GigaScreen ASIC. If this is the case for the security module to also inspect the traffic then the ASIC gets the packet and sends it to the security module. At this point, the whole packet is sent to the security module all the packets content because the security module needs to receive all the data to be able to inspect it. Then it is inspected and then it goes back to the GigaScreen ASIC, and then finally it will go out to the interface.
Course SERT-NS5000
36
Slide 36
Packet Flow (5 of 6)
ISG2000-IDP Fast Path: Traffic is directly to Security Module 1) Packet arrives interface
card
MM
CPU CPU
SM
CPU CPU
ASIC Module
Slot 1 Slot 2
5 4
FPGA
Packet
Data Bus
2
Data Bus FPGA
5) IDP processes the packet and sends it to ASIC 6) ASIC sends packet to FPGA 7) FPGA forwards packet to interface chip 8) Interface card sends the packet out
GigaScreen3
Slot 4
Slot 3
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 36
How does it work in the case of Fast Path? The CPU is not involved, but the security module still has to inspect the traffic. Again, the packet will go to the FPGA and then the GigaScreen ASIC. It will go straight to the security module this time no CPU involvement. Then the packet is processed and sent back. The GigaScreen ASIC will identify the outgoing interface and send the packet out through the FPGA and then to the interface card and then out of the system.
Course SERT-NS5000
37
Slide 37
Packet Flow (6 of 6)
What are the possible paths?
NS5000
Single-ASIC Cross-ASIC
8G2-G4 SPM GigaScreen4 GigaScreen4
Backplane
FPGA
FPGA
FPGA
ISG2000
Always single-ASIC
Single FPGA Dual FPGA
I/O I/O
ASIC Module
Data Bus
Control Bus
Slot 2
Slot 1
ISG-1000
Always single-ASIC/single-FPGA
FPGA
SDRAM GigaScreen3 Data Bus
Slot 3
Slot 4
FPGA
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 37
Lets summarize the packet flow now; lets think of possible paths. First, lets consider the NetScreen 5000, which can use what we refer to as Single-ASIC or Cross-ASIC. Single-ASIC is when the incoming traffic goes this way and then return traffic goes out this way out of the same ASIC chip.
Then we have cross-ASIC; its going to be this way. For example, incoming traffic goes here, then the return traffic goes this way. When the traffic comes from the other side, it will come here, on the other interface set. It will go to this ASIC for processing, and then this ASIC will process the packet, and then send it this way. Thus we have Cross-ASIC.
For the ISG, its always Single-ASIC because in the ASIC module its just one chip, but we think of the FPGA in this case. We can have traffic coming here and going out the same FPGA or we can have traffic coming into the top FPGA and going out of the bottom FPGA. This is important when we look at the output, so that we know which FPGA to check and we know what to expect when we look at the counters.
Course SERT-NS5000
38
Slide 38
Section Summary
In this section, we:
Explained the difference between packet flow in First Path and Fast Path Described packet flow in NS5000 and ISG Series platforms Identified packet types that need to be processed at the CPU level
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 38
Explained the difference between packet flow in First Path and Fast Path Described packet flow in the NS5000 and ISG Series platforms, and Identified packet types that need to be processed at the CPU level
Course SERT-NS5000
39
Slide 39
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 39
Course SERT-NS5000
40
Slide 40
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 40
Course SERT-NS5000
41
Slide 41
ASIC Functions
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
ASIC Functions
Course SERT-NS5000
42
Slide 42
Section Objectives
After successfully completing this section, you will be able to:
Differentiate between functions performed in the CPU versus those done in the ASIC chip and PPU Use the get ASIC PPU command to see which functions are processed by each PPU
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 42
Differentiate between functions performed in the CPU versus those done in the ASIC chip and the PPU, and Use the get ASIC PPU command to see which functions are processed by each PPU
Course SERT-NS5000
43
Slide 43
ASIC Functions (1 of 3)
ASIC benefits: Increase Performance and Throughput
FAST PATH: Traffic forwarding without using CPU VPN Encryption and Decryption (AES, 3DES, DES,SHA-1, MD5) TCP 4-Way close IP fragmentation re-assembly Screening IPSec fragmentation and re-assembly with IKE acceleration Byte counters / data collection from local session memory IPv6 acceleration
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 43
Lets now look at the ASIC functions to see what the ASIC is doing. The most important objective is to increase the performance and throughput in the system. One of the benefits that the system has is Fast Path. This enables the system to handle traffic forwarding without using the CPU, as we saw in the packet flow.
VPN encryption and decryption is also done in the ASIC chip, so it doesnt increase CPU utilization to do that. It also can be responsible for processing TCP 4-Way close; also to do fragmentation re-assembly, and additionally for some screen functions, such as IDP flood, SYN flood, ISMP flood.
It can also perform IPsec fragmentation and re-assembly with the IKE acceleration. Additionally, it can provide byte counters for the policy and IKE acceleration for IPv6 traffic. So, the IPv6 traffic is also processed on the ASIC level without going to the CPU.
Course SERT-NS5000
44
Slide 44
ASIC Functions (2 of 3)
Packet Processing Units (PPU)
Packet Processing Units (PPU)provide additional processing capacity in ASIC level Provide additional processing power for ASIC chip PPU features:
Defragmentation (cleartext and encrypted) TCP SYN check SYN proxy SYN cookie TCP 4-way close IPv6 acceleration HA packet forwarding (ISG) Interface with IDP Security Module (ISG) DSCP copy Policy counters
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 44
One important part of this architecture in the ASIC chip is the PPU; the packet processing unit. It gives additional processing capacity at the ASIC level. It is an entity that can be programmed to do different things. The features that are supported in this PPU are listed in this slide.
It can perform defragmentation for both clear text and encrypted traffic. It can perform TCP SYN check, SYN proxy and SYN cookie, get TCPU 4-way close and increase the acceleration like shown previously. It also does the HA packet forwarding in the case of ISG, and also interfaces the IDP security module in the ISGs. It can also perform the DSCP copy for QoS and policy counters to count the number of bytes.
Course SERT-NS5000
45
Slide 45
ASIC Functions (3 of 3)
How to check PPU functions
Total of 6 PPUs in GigaScreen3 and 4
Example for ScreenOS 6.3
ns5400(M)-> get asic ppu functions PPU and XTCPU functions: Defragmentation of encrypted packets: PPU-A Defragmentation of clear-text packets: PPU-C Syn-proxy function: PPU-B Tcp-3way-check function: PPU-B sdram HA and IDP packet forwarding: PPU-D IDP processing: PPU-E Syn-cookie function: PPU-F IPV6 flow processing: PPU-A IPV6 tunnel processing: PPU-C and PPU-D IPV6 parser: PPU-E
Use get asic # eng ppu functions for ScreenOS 5.4 and earlier
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 45
How do you check these functions in the system? Its simple with this command get ASIC PPU functions. If you run this command, you can see the PPU. We have six PPUs in GigaScreen3 and 4 the latest models. In this example for ScreenOS 6.3, you can see the PPUs. For example, the SYN cookie function is processed by PPU-F. We have PPUs from PPU-A to PPU-F. Another example highlighted here: defragmentation of clear-text is done by PPU-C.
These functions might change depending on the version, because of different features that were included. You can check using this command.
Course SERT-NS5000
46
Slide 46
Section Summary
In this section, we:
Differentiated between functions performed in the CPU versus those done in the ASIC chip and PPU Used the get ASIC PPU function to see which functions are processed by each PPU
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 46
Differentiated between functions performed in the CPU versus those done in the ASIC chip and PPU, and Used the get ASIC PPU function to see which functions are processed by each PPU
Course SERT-NS5000
47
Slide 47
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 47
Course SERT-NS5000
48
Slide 48
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 48
Course SERT-NS5000
49
Slide 49
Debug
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
Debug
Course SERT-NS5000
50
Slide 50
Section Objectives
After successfully completing this section, you will be able to:
Review general commands used in ScreenOS List the most important commands specific to high end systems Explain how to collect the data and interpret the output Run debug tag info when looking for problems related to CPU
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 50
Review general commands used in ScreenOS List the most important commands specific to high end systems Explain how to collect the data and interpret the output, and Run debug tag info when looking for problems related to the CPU
Course SERT-NS5000
51
Slide 51
Debug (1 of 49)
What are the troubleshooting commands?
Same get/debug commands from ScreenOS Additional commands to troubleshoot different components in the system
Different commands depending on platform/card type Different outputs depending on card type/ScreenOS version In ScreenOS 6.2 and 6.3 the commands are visible and documented
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 51
Lets now discus debugging and the commands that are used to troubleshoot the platform.
The first thing to note is that we have the same get and debug commands as ScreenOS. Thats going to help us here. But we are also going to see additional commands specifically for this platform. In the ScreenOS 6.2 and 6.3, the latest version, we have these commands visible in the command line interface. If its an earlier version then they are hidden, but you can execute them as normal.
Course SERT-NS5000
52
Slide 52
Debug (2 of 49)
Common commands in ScreenOS
General information:
get tech get log system get log system saved get event
Performance:
get performance cpu all detail get performance session detail
Session Information:
get session info get session frag get session
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 52
The first set of commands consists of general commands that we use in ScreenOS. We want to check general information, so we use get tech, get log system, get log system saved and get event. Then, for performance, we use get performance CPU all detail and get performance session detail. For session information, we use get session info, and for information about fragmentation counters and processing we use get session frag. The get session command can be used for the complete session table. You can use that tool to investigate the data. You can also run the session analyzer using get session output.
Course SERT-NS5000
53
Slide 53
Debug (3 of 49)
Common commands in ScreenOS
Interface and Screening statistics:
get counter stat get pps * (if ScreenOS 6.1 and later) get zone <zone> screen counter
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 53
There are also some other things to check: interface and screening counters. First you check with get counter stat. You can use packets per second (PPS) counters as well if you enable them with check PPS. You can check screen counters with get zone screen counter. If you are looking for possible attacks, such as floods, you can check this command.
For the memory and internal resources, use the command get net-pak s. For statistics, use get gate, get pport, get tcp and get flow. This provides general information about how the system is allocating resources.
Course SERT-NS5000
54
Slide 54
Debug (4 of 49)
Additional Commands for High End Systems
get session hardware Displays the hardware sessions installed in the ASIC chip
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 54
Now we come to whats really special about this platform. These are the most important commands we are going to cover here and they are most commonly used in troubleshooting.
The command get session hardware is going to show the session tables on the ASIC chip itself. Sometimes there may be a problem. For example, if the session table in the CPU is not the same as in the ASIC chip. We can get the output to compare. With the command get sat counters you see the read-write pointers that are used for the queues. There are different queues in the ASIC and its very important to see how the queues are if they are full or if they are free, if there are packets dropped, you can look for queue full.
Then theres get sat demux. This is important as it enables you to see packets going to the CPU, and packets dropped by the screening function. Then theres get sat frq1, which is a command to see the free buffer queue. This is basically to see how the packets buffers are being used.
With get sat x-context you see the output of some memory tables, and also some reset counters that are important.
Course SERT-NS5000
55
Slide 55
Debug (5 of 49)
Additional Commands for High End Systems
get arp asic <asicnumber>
Displays the ARP entries in an ASIC
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 55
This second set of commands is also specific for high end systems. With get sat session we see how sessions are allocated in the hardware in the chip. With get ARP ASIC, we see the ARP entries in the ASIC chip. You can also use get ASIC demux. Its the same as get sat demux but it will be information for the whole system.
If you have NetScreen 5000, with three cards, you have six ASIC chips. When you use get ASIC demux, you see the counters for all of them in aggregate.
Then we have the command get ASIC PPU to check how the PPU is performing. Use get ASIC PPU defrag for the defragmentation and get ASIC PPU SYN-cookie for the SYN cookie feature.
Course SERT-NS5000
56
Slide 56
Debug (6 of 49)
Additional Commands for High End Systems
get asic ppu syn-proxy
Displays statistics for syn-proxy Screening feature (SYN flood)
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 56
The get ASIC PPU SYN-proxy command displays statistics for the SYN-proxy screening feature (SYN flood); get ASIC PPU TCP 3-way check displays statistics for the TCP SYN check feature.
Use get ASIC PPU ipv6 for IPv6 traffic acceleration in the PPU. The command get ASIC PPU HA-IDP fwd is used to display HA or IDP forwarding in the ISG. In the ISG the PPU can do the HA forwarding and also send packets to security module.
If you run the get ASIC PPU IDP, you also get counters for the packets sent or received by the IDP security module.
Then theres a debug command, which is debug tag info. This is very useful when you need to see whats going to the CPU. You can run this command to see the packet tags that go to the CPU for processing.
Course SERT-NS5000
57
Slide 57
Debug (7 of 49)
Specific Commands per Platform
NS5000-2G24FE
get michigan
Displays specific information for front end processor in 2G24FE card
NS5000-8G2/2XGE/8G2-G4/2XGE-G4
get arch
Displays counters for front end processor in the SPMs using GigaScreen3 and 4
ISG
get fresno
Displays counters for front end processor in the ASIC module
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 57
Lets go ahead and look at the specific commands for each platform as well. If you have the 24 FE card you use get michigan. If you have an 8 gig card or 10 Gig card you use get arch, and if you have an ISG, you use get fresno because these commands are for the different FPGA chips that exist in each platform. You use different commands for each of the different FPGAs.
Course SERT-NS5000
58
Slide 58
Debug (8 of 49)
Commands to Collect
NS5000 with 2G24FE SPM
get sat <asicnumber> d get sat <asicnumber> x-c get sat <asicnumber> fr get sat <asicnumber> c get sat <asicnumber> s get arp asic <asicnumber> get michigan <slotnumber> count get michigan <slotnumber> igmac
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 58
This is a simple example of the commands. For example, here are commands that youd use for the NetScreen 5000 with the 24 FE card.
Course SERT-NS5000
59
Slide 59
Debug (9 of 49)
Commands to Collect
NS5000 with 8G2/2XGE/8G2-G4/2XGE-G4 SPM
get asic demux (if 6.0r2 or later) get sat <asicnumber> d get sat <asicnumber> x-c get sat <asicnumber> fr get sat <asicnumber> c get sat <asicnumber> s get arp asic <asicnumber> get arch <slotnumber>
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 59
Here you see example commands in the case of the eight Gig or 10 Gig card. The get sat command and the get ASIC command are always common. But now we use get arch instead of get michigan.
Course SERT-NS5000
60
Slide 60
SERT-NS5000
www.juniper.net | 60
In the ISG we use get fresno. In the ISG1000 there is only get Fresno 0 since there is only one FPGA.
Course SERT-NS5000
61
Slide 61
How:
Copy/paste commands in console session Script in ScreenOS (if 6.0 or later) Script in external tool
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 61
Now the question that we have is how do we collect this output? You know the commands but you need to know how do you actually collect them. The tip here is that most counters are absolute, so they will always increment every time you run a command, they increment. The idea is to run the commands five times during a 30 second interval, so later you can check the delta between each output, and then compare if their counter is incrementing or not.
You may see some counter with a very high number but it could be its not incrementing anymore. Thats why we run it a few times usually it is five times. How do you do that? You can do copy/paste in the session so console or Telnet or SSH, or you can do a script in the ScreenOS itself if you create a script for that. Alternatively, you can use an external tool to connect to the firewall and execute the command.
Course SERT-NS5000
62
Slide 62
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 62
There is one thing about NetScreen 5000. How do you know the exact numbers that need to be put in the command? In this case, when we do get chassis we see the slot number is 4, so the command is going to be get arch two because we need to subtract two from the slot number to get the number. For the ASIC number, we always use zero for the ISG because there is only one, but for the 5000 Series we have to use get ASIC mapping. You can easily see which ASIC you need to check. Lets say you have a problem with Ethernet 4/1, then you go check ASIC 4.
Course SERT-NS5000
63
Slide 63
get sat 0 c get sat 0 s get arp asic 0 get sat 1 d get sat 1 x-c get sat 1 fr get sat 1 c
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 63
To summarize here, we will show an example. Heres a NetScreen 5400 with an 8 gig card and a 10 gig card and the ASIC numbers are 0, 1, 4 and 5. This means there is one card in slot zero and one card in slot two. Here are the commands to run to get the data for all the system. We see the get ASIC PPU and the get ASIC demux are common you run it only once. With the get sat command and the get arp command you have to run it for each ASIC.
Course SERT-NS5000
64
Slide 64
get arp asic 4 get sat 5 d get sat 5 x-c get sat 5 fr get sat 5 c get sat 5 s get arp asic 5
KB13216 - How to troubleshoot ASIC issues on Juniper Firewalls: NS5000 and ISG Series
2010 Juniper Networks, Inc. All rights reserved.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 64
The get arch command is for each card, so get arch zero and get arch two. Refer to the Knowledge Base reference document KB13216 for a more detailed explanation, as well as other examples.
Course SERT-NS5000
65
Slide 65
0 0 161
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 65
Now that you have seen how to collect the data, even more importantly, you need to see how to interpret this output. Its very important that you know what you are looking at. The get ASIC demux output or get sat demux will provide a similar output. Here you see the packets going to the CPU. You can see on the right-most column the PPS count the packets per second. This is the most important thing you need to check in this output. The slide is highlighted to show there are 5000+ packets going to the CPU per second. This is something we consider very important when we are looking at problems of performance. For example, in case we are having high CPU processing in the system, we want to know why. We can run this command to see how many packets per second are going to the CPU. Then you can understand whether that is expected or if that is overloading the system and you can make a decision about what to do next. For example, we also see here a breakdown of the packets that go to the CPU. It can be packets to the host or packets for the First Session. In this case, most of the packets that are going to the CPU are for First Session, so they are packets that dont match any session of the ASIC chip and were sent to the CPU for further processing.
Here we also see the counters of the packets that somehow were dropped. So, ttl_zero or invalid source address or TCP checksum error, UDP checksum error. These were all packets that were dropped.
Course SERT-NS5000
66
Slide 66
Why is it important?
Troubleshooting of high CPU issues
What to do next?
Determine if the pps observed is expected or solve problem in the network to reduce the load Investigate the type of packet that is going to CPU with high pps
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 66
We look at the PPS counters and thereby understand whats going to the CPU, and this is important for us to see if theres an attack or why the traffic is going to the CPU.
Course SERT-NS5000
67
Slide 67
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 67
Then there is get ASIC PPU defrag. You use that to check statistics about fragmentation. What is important here is to check the new session error and the defrag fail. Usually, when there is a problem with defragmentation, thats where the counters increment.
Course SERT-NS5000
68
Slide 68
Why is it important?
Fragmented traffic may be getting dropped Detect fragmentation in the network
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 68
What else can you do for this case? You can check whether you really expect this defragmentation? Do you want this fragmented traffic in the network?
Course SERT-NS5000
69
Slide 69
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 69
Next, you can check get session frag output to look for the fragmentation count to see how many packets arrived as first fragment, or no first fragment; fragments that couldnt be re-assembled can also be checked with this command.
You can also correlate the data with the other ASIC commands to help you pinpoint the issue and you can also do some packet captures. You want to see, did you really receive all the fragments that were sent to the firewall? Maybe the firewall is not receiving all the fragments.
Then you can also tweak the policy configuration. Set no hardware session to see if that solves the problem. When you do that you bypass the PPU defragmentation processing, and you can possibly isolate the issue.
Course SERT-NS5000
70
Slide 70
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 70
Similarly, you can use get ASIC PPU TCP-3-way check. Most important here are total drop and invalid session count. This is to help you understand how the ASIC is processing the 3-way handshake. You can see here there is a total drop of three in ASIC one, and you have ASIC two receive stage five and also three.
Course SERT-NS5000
71
Slide 71
What is important?
Find out if there are dropped packets
Why is it important?
TCP sessions are not being established due to TCP SYN check TCP SYN check feature is faulty
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 71
This is an example of a problem that TCP 3-way check was not working properly when the session involved two ASIC chips. It was being dropped by one chip and the other was waiting stage 5.
Course SERT-NS5000
72
Slide 72
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 72
What else can you check with this output? You can try to understand the condition is it all TCP traffic or is it a specific source, destination, or service? In the problem we looked at there was traffic going through both ASIC chips, so it was a special case.
Also check if there is asymmetric traffic whether only one direction of the flow is going through the firewall. This could be something thats having an influence.
Also check the other ASIC commands. Look at the data of not only one output but also as a whole. One thing that can be done as an action is disable TCP SYN check to see if that can help.
You can use get session ID because to see the status of the session if its going normally or if it is not completing properly.
Course SERT-NS5000
73
Slide 73
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 73
The other command is get ASIC PPU SYN-cookie. Its the same idea, so the most important things to check are VLAN check fail and invalid ack.
Course SERT-NS5000
74
Slide 74
What is important?
Find out if there are packets dropped by SYN cookie feature
Why is it important?
Unable to pass TCP traffic Network under attack
2010 Juniper Networks, Inc. All rights reserved.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 74
Course SERT-NS5000
75
Slide 75
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 75
Do we have a SYN flood attack or do we have the proper settings for SYN flood protection. We can also take action to disable it for troubleshooting to see if that will avoid the problem. Usually you may have a packet drop, and then you can disable it and check.
Course SERT-NS5000
76
Slide 76
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 76
For SYN-proxy, the counter to usually check is the unexpected packet drop, which will tell you if there is a problem.
Course SERT-NS5000
77
Slide 77
Why is it important?
Packets are being dropped due to SYN Proxy SYN Proxy feature is being triggered
What to do next?
Determine if SYN flood thresholds are expected Check syn cookie counters if enabled Determine if there isnt any SYN flood attack Disable SYN Proxy to see if the problem is solved Check other ASIC commands
2010 Juniper Networks, Inc. All rights reserved.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 77
We can look further at the SYN flood attacks. Look at the configuration, see if the threshold is as expected; have a look at the traffic to see if the load is expected or if it may be some kind of attack.
Course SERT-NS5000
78
Slide 78
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 78
Now lets go to get sat counters. This is also a very important command, because here you look at the status of the queue. Each line is one queue in the ASIC chip, and they send packets to each other. You see in the example the session lookup queue is the one that is highlighted with a high queue full count number. You need to look at the queue full count to see if it is incrementing. Queue full means the queue has reached capacity and cannot process any more packets. There can be packets dropped because the queue was full and couldnt receive more packets.
Also, its important to check the full column because, if this is 1, it means the queue is full and then it may block all the traffic. If the queue is full all the time, it will block the traffic all the time. Well see that in an example further on.
Course SERT-NS5000
79
Slide 79
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 79
As was mentioned, each line is for a different queue. They exist inside the chip, so we have parser queue, transmit queue, CPU queue, host queue, session lookup engine queue, PPU queue, and free buffer queue.
Course SERT-NS5000
80
Slide 80
What to do next?
Determine which traffic/services are being affected Disable the feature corresponding to the queue to see if the problem stops Check other ASIC commands Check PPS to determine if traffic load is too high Check if full goes back to 0 if not system reset is required Check get log sys for ASIC reinit messages
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 80
If queue full is always 1 and it doesnt go back to zero, it may require a reset to recover the system from the failure.
Course SERT-NS5000
81
Slide 81
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 81
This is a very important command as well: get sat <asic> x-context. Here you look for free buffer reinit and engine reset counts. These two counters help us understand if there was any reset in the ASIC chip for any reason. If the ASIC had to reset, you will see it here with these counters. If you are seeing packet drops in the network, you can look at these and see if it was reinit, which means they were dropped.
Also, you check packet up/down between CPU and ASIC to see if, for any reason, there was some loop between the CPU and the ASIC. One example could be the session exists in the CPU but doesnt exist in the ASIC. So, the ASIC receives a packet from the CPU and doesnt know where to send it, it will send it back to the CPU. Then it stays in a loop, and these are the counters you can check. This is good to check in the case of high CPU you might have a packet looping inside the system.
Course SERT-NS5000
82
Slide 82
Why is it important?
ASIC reinits drop traffic System may be overloaded To understand if there is ASIC failure
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 82
Course SERT-NS5000
83
Slide 83
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 83
In the 6.1 release and later, you can also see with this output defragmentation information some additional buffers that you usually dont need to check only when you get a special request via our engineering team.
Course SERT-NS5000
84
Slide 84
wr=0x0000f29a, rd=0x0000e6a5, 0xbf5 bufs in frq2. FRQ2 buf HEALTHY, 11 bufs held expected: No.1 buf 0x00200902 No.2 buf 0x00201102 No.4 buf 0x00202102 No.5 buf 0x00202902 No.6 buf 0x00203102 No.7 buf 0x00203902
2010 Juniper Networks, Inc. All rights reserved.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 84
Lets now check another very important command, get SAT FRQ. This shows the state of the free buffers that are used to store the packets. When you look here you see buffer missing messages, but please note that they might not always indicate an issue. They are here but the ASIC itself can deal with that and avoid any problem. Also, you can see here that the state is HEALTHY, so you dont need to really worry about it.
Course SERT-NS5000
85
Slide 85
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 85
The condition you do need to worry about is when you do a get SAT 0 FRQ | include bufs and you see the read and write pointers are always the same.
Course SERT-NS5000
86
Slide 86
Why is it important?
Buffer leak eventually can cause ASIC reinit Performance is affected System may be overloaded To understand if there is ASIC failure
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 86
When the read and write pointers are always the same it means you might have a leak. It means all the buffers are used and no more buffers are available, so no more packets can be processed. The consequence for the network is that the system just stops forwarding the traffic.
Course SERT-NS5000
87
Slide 87
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 87
You can always correlate that with the get sat counter command, because it will tell you if there is any queue full. If you have frq full in the get sat counter, you are going to see frq is full.
Course SERT-NS5000
88
Slide 88
What is important?
Find out if there are sessions leaking in the ASIC session table
Why is it important?
Session leak can cause packet loop between CPU and ASIC -> high CPU problem
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 88
Then you have get sat session. This one usually is not a problem, but sometimes you may have a leak, so you have sessions in the ASIC that are mismatching from the CPU session table.
Course SERT-NS5000
89
Slide 89
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 89
This is nothing to worry about, because the ASIC can also deal with that, and the CPU as well can correct. Its only a problem if this output, this number of leak sessions, really starts increasing very high.
Course SERT-NS5000
90
Slide 90
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 90
Now lets check some specific commands introduced earlier. The command, get michigan, for the FPGA for the 24FE card, looks for the drop counters.
Course SERT-NS5000
91
Slide 91
Why is it important?
System capacity is being reached Hardware fault
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 91
This usually is not a problem. When you have drops at this level of the FPGA chip most of the time there are hardware issues.
Course SERT-NS5000
92
Slide 92
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 92
In such cases, you can do a replacement or, if system capacity is being reached, then there is nothing else to do but to increase the number of cards or change the design.
Course SERT-NS5000
93
Slide 93
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 93
The other specific command is get arch for the 8 gig or 10 gig card. Here in this command you see the names BigSur and Alpine, which are the FPGA chips. Here you see the counters rx, tx, packet and error. What you look for here are errors; you need to pay attention to that.
Another thing that might help here is to check if all the expected counters are incrementing. For example, you have here four channels. If you have the eight Gig card you expect each channel to be related to one port, so you can see here, you can run this command and see how they are incrementing, when you send traffic through the system.
Course SERT-NS5000
94
Slide 94
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 94
Most of the time, when you look for errors, they are going to be hardware errors, in which case you do an RMA. While it is certainly possible there may be a problem in how the packets are sent, thats not very common.
Course SERT-NS5000
95
Slide 95
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 95
The get fresno output is similar. For ISG you check the FPGA counters on the ISG platform. You also look for errors to see if they are incrementing and here there is one extra detail so that you see the transmit queues. If you remember from get sat counters, that output shows the queues. Here you see how the queues are used, so slot 2 is using the transmit queue three (XMTQ3), for example.
Course SERT-NS5000
96
Slide 96
Why is it important?
Throughput is not as high as expected Hardware failure System capacity is being reached
What to do next?
Determine if traffic load is not reaching system capacity Check get sat <asic> c for full queues or queue full increments Check get log sys for ASIC reinit messages Check other ASIC outputs Possible RMA
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 96
Thats what you look for with get fresno. The errors are basically the same idea as with get arch. Most of the time, its either a hardware failure or you are really reaching the system capacity.
Course SERT-NS5000
97
Slide 97
demux 4: first packet for the session Src-ip: 10.227.5.200 -> dst-ip: 4.4.4.4 Src port: c52e -> dst port: 17 Incoming interface eth2/1.400 IPID = 0xe4ca
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 97
Now we come to the debug command, debug tag info. This is very important. You run it when you are looking for problems related to the CPU. This command will show us only packets going to the CPU. If packets are being processed only by the ASIC chip, we dont see them in the debug.
The debug flow basic is the same; it only shows packets going to the CPU.
Why do we do debug tag info? Here you see the information from the packet going to the CPU and a lot of detail. You see packet length and also the queue index that shows which queue sent the packet to the CPU. If you go to the get sat counters you can see which queue has queue index 6. You see the address of the buffer, so if you want to see the whole packets content, you can look at this buffer.
The protocol is six and then the demux tag, which is very important since it indicates why the packets went to the CPU. Demux 4 means, its the first packet for the session. If there was no session in the table in the ASIC chip, it has to send it to the CPU for session creation.
You also see source address, destination address, source port and destination port here, in abbreviated notation.
Another important thing is the IPID of the packet. If you are looking for a packet loop, you can do this debug and then you see it all you see the same packet ID five, ten, or 100 times; the same packet so, there is a loop.
Course SERT-NS5000
98
Please remember that the debug command can be service affecting depending on the load in the system because it takes a lot of CPU time to do this debug. If the load is very high, you might create some interference.
Course SERT-NS5000
99
Slide 98
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 98
What we usually do is run debug for 10 seconds, and then type ESC to abort immediately, and then inspect the output.
Another example is tag. We have 1, which is a packet that had to be sent to the CPU for processing. Even if there is a session, the packet needs to go to the CPU for example, in the case of ALG also, 25 is for ICMP, and ICMP always goes to the CPU.
Course SERT-NS5000
100
Slide 99
Why is it important?
Investigation of high CPU Packets that should processed only by ASIC are going to CPU incorrectly Packet loop between ASIC and CPU
What to do next?
Determine if the packets going to CPU are expected If not, investigate the traffic pattern and policy configuration Check ASIC commands for queue full increments or reinits
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 99
Thats it for this debug command. We always do correlation, so we check also the get sat command, especially get sat demux, because then we know how many packets are going to the CPU per second.
Course SERT-NS5000
101
Slide 100
Section Summary
In this section, we:
Reviewed general commands used in ScreenOS Listed the most important commands specific to high end systems Explained how to collect the data and interpret the output Showed how to run debug tag info when looking for problems related to CPU
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 100
Reviewed general commands used in ScreenOS Listed the most important commands specific to high end systems Explained how to collect the data and interpret the output, and Showed how to run debug tag info when looking for problems related to CPU
Course SERT-NS5000
102
Slide 101
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 101
Course SERT-NS5000
103
Slide 102
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 102
Course SERT-NS5000
104
Slide 103
Troubleshooting Examples
2010 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Proprietary and Confidential
Troubleshooting Examples
Course SERT-NS5000
105
Slide 104
Section Objectives
After successfully completing this section, you will be able to:
Describe workarounds provided in the three most critical troubleshooting examples occurring in the field Apply the commands described in each troubleshooting example
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 104
Describe workarounds provided in the three most critical troubleshooting examples occurring in the field, and Apply the commands described in each troubleshooting example
Course SERT-NS5000
106
Slide 105
Problem
Master unit stops forwarding traffic Failover to backup unit doesnt occur Manual failover needed to recover the services Reset needed to recover the system
ns5400-> get chass | in mb Slot Type S/N 1 Management 0102032007000009 2 Processing-2XGE 0143072006000013 Assembly-No 0058-005 0063-003 Temperature 109'F (43'C) 114'F (46'C) DRAM Size 2048MB 1024MB
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 105
The first real world troubleshooting example here, and one that is most service affecting, is when the system stops forwarding the traffic. This example was with NetScreen 5400 Management 2, with the two port, 10 gigabit card, and it was an active/passive cluster running the 6.2r1 release. What was the problem? The master unit just stopped forwarding traffic; no traffic was being processed. It was service affecting because no failover to the backup unit was triggered, so the traffic was not being processed. But they were still exchanging heartbeats, so there was no failover that was triggered.
How was the situation resolved? A manual failover was done to the backup unit, so the backup unit was running well it recovered the services. Then the old master had to be reset to recover from that situation. Here we show the get chassis output so you can see the information about the card.
Course SERT-NS5000
107
Slide 106
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 106
How did we investigate this problem? We collected the get sat commands. To look at the ARP table, these are the most important commands: get sat demux, get sat x-compact, get sat frq, get sat counter, get sat session and get arp asic. Also, use get arch zero to see the counters in the front-end processor.
Use the command get asic mapping to know which ASIC you need to check. You have to check zero and one, so thats why you see here both get sat 0 and get sat 1.
Course SERT-NS5000
108
Slide 107
ASIC reinits
LISNS5400:FW1(M)-> get log sys | in reinit ## 2008-12-08 13:40:42 : reinit chip 0, invalid buf (380a7100). ## 2008-12-08 13:41:42 : reinit chip 0, invalid buf (380bf900).
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 107
What did we see with this output? We were looking for the counters, so the first thing we note is the slu queue in the get sat counter command was showing a lot of queue full. This was incrementing constantly. Every time we ran the command the number was higher. Then we also noted that the queue full was always 1, so that meant no packets were being processed, the queue was full and stuck. It was dropping all the traffic. Thats why no packets were being processed; no traffic was running.
Then we kept on checking the data and we also saw a lot of packets up and down between the CPU and ASIC. Also we see that re-initialization in the ASIC chip. With the get log sys command, we saw reinit chip zero so there was an invalid buffer.
So, we obtained these three pieces of evidence that there were problems on the ASIC chip. Then we did one try of disabling the TCP SYN check, and we noted the problem was not happening anymore.
Course SERT-NS5000
109
Slide 108
Root Cause
Software defect: TCP SYN check was corrupting packets for cross-ASIC sessions, causing packet loop between ASIC/CPU and slu queue stuck.
Solution
Code was modified to implement the necesssary corrections
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 108
Then we know the workaround is to disable the TCP SYN check, but whats important here is the investigation that we did with engineering determined that TCP SYN check was corrupting the packets in the case of cross ASIC sessions. Then we saw that because of the packet loop between the ASIC and the CPU the session lookup queue got stuck and couldnt recover and then it couldnt process any more packets. Thats why the system stopped forwarding the traffic.
The solution in this case was to modify the code to avoid this problem of corrupting the packets, and then the problem was solved. Now we dont have this issue anymore.
Course SERT-NS5000
110
Slide 109
Problem
Specific users cannot do TFTP transfers through the cluster Transfer starts but after a few seconds it hangs If no-hw-session is enabled in policy transfer is successful
SDU:Jabbar-NS5400(M)-> get Slot Type 1 Management-III 2 Processing-2XGE-G4 3 Processing-8G2-G4 chas | in mb S/N 0225082008000060 0227062008000032 0226092008000027 Assembly-No 0072-001 0085-001 0084-001 Temperature 109'F (43'C) 123'F (51'C) 116'F (47'C) DRAM Size 2048MB 1024MB 1024MB
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 109
The second example is also with the NetScreen 5400, but now with the Management-3 card, and with the new interface cards ten Gig and also eight Gig. Also, we have in this case active/active cluster, ScreenOS 6.1r4, and we saw that all the sessions were cross-ASIC going from a 10 Gig port to an eight Gig port. The problem is we had some specific users that couldnt do TFTP transfers through the cluster. From the client side, we could see the transfers were starting but after a few seconds they would just hang. We suspected some of those problems were in the ASIC level, so we enabled no hardware session in the policy, especially for that client, and then we saw that port. We could then see that we had something in the ASIC thats causing the problem, because the no hardware session will bypass the processing in the PPU.
Course SERT-NS5000
111
Slide 110
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 110
Whats the analysis we did here? We did some packet captures to see why only that specific client was having a problem. We saw that those clients were doing transfers with fragmented packets. The block size of the TFTP was 8000 bytes or so, so it was causing fragmentation. Then what do we do? Lets check get ASIC PPU defrag, because thats where the defragmentation is done. But here we see zero no defragmentation error; no null session error. So the PPU processing seemed to be fine. We continued to look at the other ASIC commands. They also didnt show anything that could really pinpoint the problem. What do we do next? We did a debug tag info.
Course SERT-NS5000
112
Slide 111
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 111
We decided to see whether there was something wrong going to the CPU. We did debug tag info and then we saw what the problem was. We saw these fragments were going up to the CPU. They belonged to a flow that didnt exist, but they were being sent to the CPU with demux tag four; they were considered first packets for a new session. It was confusing the CPU because the CPU already had a session for that traffic. The packet was not sent out. It was being dropped when the ASIC received it. That was the issue.
Course SERT-NS5000
113
Slide 112
Root Cause
Software defect: PPUC fragment handling was incorrect, causing ASIC session matching to fail and send packet to CPU
Solution
Code was modified to implement the necesssary corrections
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 112
What we did as a workaround is we used no hardware session in the policy. In that case, the packets are processed in the CPU, and we saw from the root cause that the PPUC, which is the one that handles defragmentation, was incorrect. We saw zero errors, but that was incorrect, so it was using a bad hashing mechanism to match the session table in the ASIC. This was causing session matching fail in the ASIC. Then, because no session was found in the ASIC, it was sent to the CPU. The CPU was confused and the packet was not sent out. The solution here was also to modify the code and now we dont have this problem anymore in the latest version.
Course SERT-NS5000
114
Slide 113
Problem
System showing high CPU Determine the reason for this behavior
ns5400-> get perf cpu all detail Average System Utilization: 5% (flow Last 60 seconds: 59: 20(30 2) 58: 20(30 1) 57: 55: 78(88 8)** 54: 78(88 7)** 53: 51: 77(87 6)** 50: 77(87 6)** 49: 47: 77(87 6)** 46: 77(87 6)** 45: 43: 76(86 6)** 42: 77(87 6)** 41: 6 task 3) 3) 6)** 6)** 5)** 6)** 56: 52: 48: 44: 40: 79(89 77(87 77(87 77(87 76(86 7)** 6)** 6)** 7)** 5)**
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 113
Heres another example, which is regarding abnormally high CPU. This is something that is also important for the system.
What causes high CPU? In this example we have NetScreen 5400 with the 10 gigabit card running ScreenOS 6.2r1. We have a system showing high CPU. The first command to use when high CPU exists is get perf CPU all detail. The word all is critical since, when using it, it will break down the CPU utilization.
The output shows both flow and task CPU utilization. This reveals, in this case, that we had flow CPU high, but not task. What does this tell you? It tells you that the flow processing is the one thats causing the high CPU utilization and that means its traffic we are processing a lot of traffic. Lets focus on the traffic thats being processed.
Course SERT-NS5000
115
Slide 114
~8000 packets per second were sent to CPU because of ALG processing
ns5400-> get asic demux to_host_packet: first_packet: brcst: no_ip_ether_net: total packet: clsf counters: icmp To CPU traffic analysis: ALG: DMA required: Current(02:57:15) 612430 13400782 243 708 14014163 Last(02:57:15) 612430 13258685 243 708 13872066 PPS( 17s) 0 8147 0 0 8147
40
40
4152761 59
4010664 59
8147 0
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 114
The next thing we did was to look at get ASIC demux. We checked the PPS and saw we have 8,000 packets per second going to the CPU for ALG processing. We had all these packets going to the CPU for ALG. The next question that we asked was, which ALG is being triggered? Which traffic is this? We didnt expect to have this amount of traffic for the ALG.
Course SERT-NS5000
116
Slide 115
Destination ports were identified There were services using well-known ports and matching ALGs Packets go to CPU if needed to be processed by ALG
****************** 11236.0: tag (03a15f00) ****************** pak length: 46 vlan qidx:6 slot:0 port:0 buffer:0x806e191c protcol:17 demux:4 l2idx:5190 ipid:0 flags:0x00000007 session pointer:0x000e1d91 src:192.134.71.124 dst:212.60.215.99 sport:13c4 dport:13c4 ********************** end tag info ************************* st_tag_2_ifp: 192.134.71.124 -> 212.60.215.99, incoming ifp=ethernet2/1.400 start demux process 4
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 115
Next we ran debug tag info, which shows the packet tags going to the CPU. In the tag, we can see the destination port. We can match to a service and then understand which ALG is being triggered.
In this case, 13c4 is 50/60, which is the port for the SIP service for Voice over IP. We then knew why the CPU was high. There was a lot of traffic going through the firewall for the SIP service.
We asked ourselves, Do we expect this high amount of traffic for SIP service? We can try a packet capture in the network or check, for example, the source, to see why its sending all the traffic, and hopefully understand whats going wrong.
In this case there was no problem in the system. The traffic load was high because the packets sent to this condition represented a relatively high load and what happened was that port was being used by a different service using that port and that service didnt need any ALG processing. But, because it was using the port that was for SIP, it was going to the CPU for the ALG processing.
Course SERT-NS5000
117
Slide 116
Root Cause
System working as expected, traffic load for CPU processed packets was too high.
Solution
Change services to use non well-known ports Or disable the ALGs if not needed
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 116
The idea here was to either change the port that serves that application from that specific network, or disable the ALG if you dont need to use it; if you dont have any SIP service in the network.
With these three examples, we saw the most important problems that we had in the field. First, system stopped forwarding the traffic, then second, certain applications or certain services are dropped and we needed to check exactly which service it is and check the details. Then the third one was the high CPU. Again, these three are the most important types of problems we have had.
We also have this Knowledge Base reference document KB 9453, which provides a good starting point, and also covers the analysis that we covered.
Course SERT-NS5000
118
Slide 117
More Information
Juniper Knowledge Base: http://kb.juniper.net
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 117
The Knowledge Base has several articles that can help you.
Via the Technical Documentation link you can get to the ScreenOS Concepts and Examples Guide, which can help you understand the expected behavior, and also the ScreenOS CLI Guide can help you review the syntax of the commands.
You can also use J-NET to discuss problems you may encounter.
Course SERT-NS5000
119
Slide 118
Section Summary
In this section, we:
Described workarounds provided in the three most critical troubleshooting examples occurring in the field Showed how to apply the commands described in each troubleshooting example
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 118
Described workarounds provided in the three most critical troubleshooting examples occurring in the field, and Showed how to apply the commands described in each troubleshooting example
Course SERT-NS5000
120
Slide 119
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 119
Course SERT-NS5000
121
Slide 120
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 120
Course SERT-NS5000
122
Slide 121
Course Summary
In this Course, we:
Distinguished between ISG Series and NS5000 Series hardware configuration and packet flow Explained the importance of the ASIC functions Described First Path and Fast Path in packet flow Differentiated between functions processed in the CPU versus PPU Used and interpreted debug commands unique to high end systems Explained the workarounds for three typical troubleshooting examples
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 121
Distinguished between ISG Series and NS5000 Series hardware configuration and packet flow Explained the importance of the ASIC functions Described First Path and Fast Path in packet flow Differentiated between functions processed in the CPU versus PPU Used and interpreted debug commands unique to high end systems, and Explained the workarounds for 3 typical troubleshooting examples
Course SERT-NS5000
123
Slide 122
Additional Resources
Education Services training classes
http://www.juniper.net/training/technical_education/
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 122
For additional resources or to contact the Juniper Networks eLearning team, click the links on the screen.
Course SERT-NS5000
124
Slide 123
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 123
You have reached the end of this Juniper eLearning module. You should now return to your Juniper Learning Center to take the Practice Test and the Student Survey. The test will allow you to gauge your knowledge of the material covered in this course. The survey will allow you to give feedback on the quality and usefulness of the course.
Course SERT-NS5000
125
Slide 124
Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. The Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
CONFIDENTIAL
SERT-NS5000
www.juniper.net | 124
Juniper Networks, Inc. All rights reserved. Juniper Networks, the Juniper Networks logo, Junos, NetScreen and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. JunosE is a trademark of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks or registered service marks are the property of their respective owners. Juniper Networks reserves the right to change, modify, transfer or otherwise revise this publication without notice.
Course SERT-NS5000
126
Slide 125
CONFIDENTIAL
Course SERT-NS5000
127
e d u c a t io n se r v ic e s c o u rse w a re
Co rp orat e a nd Sa les Head q uart ers Junip er Net w orks, Inc. 119 4 Nort h Mat hild a Avenue Sunnyvale, CA 9 4 0 8 9 USA Phone: 8 8 8 .JUNIPER ( 8 8 8 .5 8 6 .4737) or 4 0 8 .74 5 .20 0 0 Fax: 4 0 8 .74 5.210 0 w w w.junip er.net
APAC Head q ua rt ers Junip er Net w orks ( Hong Kong) 26 / F, Cit yp laza One 1111 Kings Road Taiko o Shing, Hong Ko ng Pho ne: 8 52.2332.36 36 Fax: 8 52.2574 .78 0 3
EMEA Head q uart ers Junip er Net w o rks Ireland Airsid e Business Park Sw o rd s, Count y Dub lin, Ireland Pho ne: 35 .31.8 9 0 3.6 0 0 EMEA Sales: 0 0 8 0 0 .4 5 8 6 .4 737 Fax: 35.31.8 9 0 3.6 0 1
Cop yright 20 10 Junip er Net w orks, Inc. Al l right s reserved . Junip er Net w o rks, t he Junip er Net w orks logo, Juno s, Net Screen, and ScreenOS are regist ered t rad em arks o f Junip er Net w orks, Inc. in t he Unit ed St at es and ot her count ries. Al l ot her t rad em arks, service m arks, regist ered m arks, or regist ered service m arks are t he p rop ert y of t heir resp ect ive ow ners. Junip er Net w orks assum es no resp onsib il it y f o r any inaccuracies in t his d o cum ent . Junip er Net w orks reserves t he right t o change, m od if y, t ransf er, or ot herw ise revise t his p ub l icat ion w it ho ut not ice.