Troubleshooting Latency Issues

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

TROUBLESHOOTING LATENCY ISSUES

By
ANAND SINGH

© 2017 Juniper Networks, Inc. All rights


1
reserved.
What is latency/slowness/delay?
• Network latency is an expression of how much time it takes for a
packet of data to get from one designated point to another
• High latency/slowness should always be considered to be relative to
a reference when troubleshooting.
• When a customer comes up with concerns regarding slow transfer
rates, we need a point of reference of what is considered as expected
speed.

© 2017 Juniper Networks, Inc. All rights


2
reserved.
Causes of Latency
• Processing delay by device(s)

• Transmission Delay

• Misconfiguration

© 2017 Juniper Networks, Inc. All rights


3
reserved.
Prerogative for Engineers before Troubleshooting
• Customer claiming High latency due to Juniper Firewall:
>What is the expected bandwidth/transfer rate/latency that can be
considered as a point of reference

>Confirm if bypassing the Juniper gives the expected speed, to know if


firewall is culprit or not

>how the speed test is performed, proper tests like FTP/HTTP transfer and
tools like iperf should be used to test the bandwidth.
*Note:General notation for a connection are in terms of bits/second, and
transfer rates are considered in Bytes/Second. For example 100Mbps
IDEALLY should give 12.5 MBps transfer rate

© 2017 Juniper Networks, Inc. All rights


4
reserved.
• Further, file sharing using CIFS/SMB, Speed test sites and ICMP response time
is not a good test and will not give the correct results:
CIFS/SMB:It’s a chatty protocol, and transfer rates over WAN always induce
latency.
Speed test websites:Cannot be trusted, use Multithread tests.

ICMP:It’s a low level protocol and different devices may process it on low
priority. For example, following KB describes low ICMP response times for
Juniper Devices:
https://kb.juniper.net/InfoCenter/
index?page=content&id=KB28157&actp=search

• Which protocol causes high latency, ICMP, UDP TCP. Latency with TCP is
common, if UDP & ICMP are also affected, the network is completely
degraded

© 2017 Juniper Networks, Inc. All rights


5
reserved.
Processing Delays
1. An Unhealthy device causes delays in processing and packet drops(which cause retransmissions and slowness)
• Commands to check Data Plane health:

• J /Branch/All SPUs (10.1 and later)…


show security monitor performance spu
show security monitor performance sess
PFE commands:
show octeon cpu detail all
show octeon session detail
• HE specific SPU (10.1 and later)…
show security monitor performance spu fpc <slot#> pic <0-1>
show security monitor performance sess fpc <slot#> pic <0-1>
PFE commands:
show xlr cpu detail summary
show xlr cpu detail all
show xlr session detail
• If high CPU is seen or session count exceeds the limitation, then it would definitely cause latency and this would need to be troubleshot first
© 2017 Juniper Networks, Inc. All rights
6
reserved.
2. Packet drops on device

• Interface errors
>show interface extensive | match error

• Traffic statistics
>show system statistics ip | match drop

>show system statistics tcp | match drop

>show system statistics udp | match drop

*Outputs taken multiple times to check if the counters are incrementing abnormally high.

© 2017 Juniper Networks, Inc. All rights


7
reserved.
3. Features Enabled
• With multiple features configured, processing delay will increase as SRX would need to perform another task on the
traffic stream
• Following is a list of features that would increase the delay:
UTM Antivirus >IDP>UTM Web filtering >Application FW>IPSEC>NAT

• Layered performance analysis can be referred to understand throughput limitation when these features are used:
For Branch:
https://junipernetworks.sharepoint.com/sites/nok/technology/security/DiscussionsListDocLibrary/bb6d99d7-
ac46-27b2-2cad-baed8e18e700/co1pr05mb39610531d247cd319e6d81ebe4a0@co1pr05mb396.namprd05.prod.
outlook.com/Branch_SRX_Series_Layered_Performance_Analysis.
pdf#search=layered%20performance%20analysis%20srx

For HE
https://scale.juniper.net/rbusnp/landing.php

NexGen;

https://junipernetworks.sharepoint.com/sites/nok/technology/security/DiscussionsListDocLibrary/0101d25c-77cc-
f16f-0883-5e504f42ab6d/21C966E1-ABF8-4336-BF76-02B7B03A2E51@juniper.net/X49-D60-D65-Branch-Mid-Range-
SRX_Performance.pdf

© 2017 Juniper Networks, Inc. All rights


8
reserved.
• Slowness Due to UTM Anti-virus
>Isolate if the issue exists due to this, by disabling it (either completely or on a single security policy for
testing)
>Optimize/tune trickling(not supported after 15.1X-49), content size limit
>After 15.1X49-D70 TCP options are copied when TCP proxy kicks in and also, in security policies tcp-
options you can enable window scaling

• Slowness Due to IDP:


>Isolate if the issue exists due to IDP, by disabling it (either completely or on a single security policy for
testing)
>Check to see if IDP causing CPU spikes
>Check to see if there are any drops or errors in IDP counters(>show security idp counters packet/log)

© 2017 Juniper Networks, Inc. All rights


9
reserved.
Transmission Delays
1. Important parameters:
• TCP Retransmission:
>If the sender does not receive an Ack from the receiver after certain time (RTO), it thinks the
segment was dropped and retransmits.
• MSS: Maximum Segment Size
>It is the maximum size of layer 4 segment, calculated by removing TCP, IP headers.
MSS=MTU-(TCP+IP)

• MTU: Maximum Transmissional Unit


>Maximum size of a packet at Layer 3 that can be sent for Layer 2 encapsulation.
MTU=MSS + (TCP+IP)

© 2017 Juniper Networks, Inc. All rights


10
reserved.
Fragmentation:
>Every interface on a device has a MTU value, the IEEE standard for Ethernet is 1500. With frame
(layer 2) is 64 to 1518 bytes

>It’s a standard requirement for optimum Layer 2 encapsulation and data transfer

>In case a device sends a packet with higher MTU than the receiving interface of a device has, the
packet will be dropped

>In case a device needs to a send a packet out an interface with lower MTU than the packet size, it
will be dropped

>Fragmentation is breaking of packet in a multiple smaller ones to accommodate MTU


considerations

>Fragments are created on the sender (with lower MTU), but only be reassembled on the receiver
and no intermediate device can reassemble these packets

© 2017 Juniper Networks, Inc. All rights


11
reserved.
>it happens at layer 3 of OSI model and is a necessity as well as drawback
>It’s a necessity as device may send different MTU packets and without this traffic will be dropped
>Drawback as even if one fragment is dropped on the network, the complete packet(all fragments) will need to be
resent as the packet will not be able to reassembled on the receiver
>As it on Layer 3, IP header has fields to facilitate it:
Flags(DF bit, MF bit), Fragment Offset, Identifier

Flags:
DF bit: Don’t fragment, if set any intermediate device would not fragment the packet
MF bit: More Fragment, if set, it signifies that the packet is part of a
Fragment and more fragments are there after this. If unset,
it Signifies the last packet of fragment

Fragment Offset: It signifies to the Reassembling device, where a particular


fragments placement in the Complete packet. The field is 13 bits wide, so the
offset can be from 0 to 8191. Fragments are specified in units of 8 bytes, which is why fragment length must be a
multiple of 8.

Identifier: 2B identification value for a packet, remains same for all fragments

© 2017 Juniper Networks, Inc. All rights


12
reserved.
2. Studying Captures in Wireshark
• Need to check the 3-way handshake, what is the decided MSS (lower of the two sent by each end)
• Captures needed on end devices, in case detailed analysis is needed, also required on ingress and
egress on SRX
• Need to see if there are a lot re-transmission and reassembled PDUs.
• Also, can check timestamps over the top, to see if there are any delays in response

© 2017 Juniper Networks, Inc. All rights


13
reserved.
2. Troubleshooting
• Transmission delays can be attributed either related to packet-drops or fragmentation related issues.

• In case of a lot of retransmissions in the captures, it means that the there is an issue with packet-drop in the network

• On the SRX, flow traceoptions can give us information if the packet drops on SRX, else we can only modify certain settings to
minimize its effects

• What would happen if there are packet drops in the network and a lot fragmentation is occurring??

• Fine tuning the tcp-options on the SRX can help us reduce these affects:
Run multiple ping from a PC from internal network to a remote PC (over VPN) or a website on internet. On the command prompt,
we need to set the Df-bit of the ICMP packets and different sizes, to get the maximum packet size that can be sent without being
fragmented (Once the echo request returns a reply in which it says fragmentation is required)
Eg:
Ping –f –l 1400
-f >to set Df-bit
-l>to specify packet length
The above specified size is MTU, as it specified packet length. Based on this, we can derive the MSS. For eg, in the above case
MSS=1400 –(TCP(20)+IP(20))==1360
• This is the optimum MSS value that would avoid fragmentation of the packet

© 2017 Juniper Networks, Inc. All rights


14
reserved.
SRX has the capability of modifying MSS value by changing this in TCP-3way handshake.
>For normal traffic:
Modify TCP MSS
[edit security flow]
user@host# set tcp-mss all-tcp mss 1360

For IPSEC traffic:


Considering tunnel mode, there is an additional ESP+IP header for IPSEC traffic with an overhead of 50 bytes than
clear text. Thus, you’ll be able to send a smaller packet and would need to change MSS as such:
[edit security flow]
user@host# set tcp-mss all-tcp ipsec-vpn mss 1310

Also, as IPSEC packets are encapsulated, we can also set, copy and clear df-bit in the outer header. Clearing df-bit is
the right option to allow fragmentation to take place as we know if device has lower MTU and we don’t allow
fragmentation, the packet will be dropped.

set security ipsec vpn <VPN Name> df-bit clear (default)

© 2017 Juniper Networks, Inc. All rights


15
reserved.
Configuration Issue

1.Interface configuration:

• The configuration on SRX should be matching to the connected interfaces on next hop device and should be full-
duplex, auto-negotiation, correct speed and MTU settings

2. Class of Service
• Check the configuration to see if this traffic is marked and traffic set in a queue which limits bandwidth
• Further interface queue statistics can be seen in case there are any drops
>show interfaces queue

© 2017 Juniper Networks, Inc. All rights


16
reserved.
THANKS

© 2017 Juniper Networks, Inc. All rights


17
reserved.

You might also like