Professional Documents
Culture Documents
Case Study On Troubleshooting BGP Flaps Over IPSec Using FFT
Case Study On Troubleshooting BGP Flaps Over IPSec Using FFT
Case Study On Troubleshooting BGP Flaps Over IPSec Using FFT
Problem Description:
AOL reported random BGP session flaps over IPsec between two M10i’s.
Topology:
Configuration
Troubleshooting
Frame freer tracking would basically would help us which function /process of Services is dropping the
packets.
-----------------------------------------------------------------------------
../common/applications/svcs/fwnat/fwnat_process.c 227 26
-----------------------------------------------------------------------------
../common/applications/svcs/fwnat/fwnat_process.c 227 26
As seen above, counter for line number 459 of ipsec_data_process.c gets incremented.
Let’s see what’s there in line 459 of ipsec_data_process.c. JUNOS used is 8.3R3.4
/8.3R3.4/src/juniper/pfe/common/applications/svcs/ipsec/ipsec_data_process.c
457 PREPROCESS_DONE:
458 START_PROCESS_ERROR:
459 sf_free((Frame *)ipf);
460 return(IPSEC_COMPLETE);
461
462 START_PROPCESS_NEED_FRAG:
463 return(IPSEC_NEED_FRAG);
Line 459, calls the pointer function ‘sf_free’. The sf_free would clean up the memory blocks used by the
IPSec packet and drop the packet. The packet is dropped as the result START_PROCESS_ERROR
being called.
From here, we understand the packet is dropped due to START_PROCESS_ERROR function. Now, let’s
see under what circumstances this would be called.
ESP Counters:
clear_bytes_sent: 533891474709
clear_bytes_recvd: 3006595054821
protected_bytes_sent: 544476412456
protected_bytes_recvd: 3030019310200
packets_sent: 1556401415
packets_recvd: 3837987853
AH Counters:
clear_bytes_sent: 0
clear_bytes_recvd: 0
protected_bytes_sent: 0
protected_bytes_recvd: 0
packets_sent: 0
packets_recvd: 0
ESP Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 200
rplDuplicate: 7
rplZero: 0
AH Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 0
rplDuplicate: 0
rplZero: 0
----------------------------------------------
esp_seq_num: 0
ah_seq_num: 0
auth_failed: 0
replay_errors: 0
ipsec_bad_headers: 0
esp_bad_trailers: 0
decrypt_errors: 0
tunnel_rcv_errors: 0
ipsec_frags: 0
ipsec_frag_errors: 0
ipsec_defrags: 0
ipsec_defrag_errors: 0
auth_sent: 1556401415
auth_successful: 3837987646
esp_auth_failed: 0
clear_bytes_sent: 533891586144
clear_bytes_recvd: 3006598787904
protected_bytes_sent: 544476531184
protected_bytes_recvd: 3030023064544
packets_sent: 1556402438
packets_recvd: 3837991922
AH Counters:
clear_bytes_sent: 0
clear_bytes_recvd: 0
protected_bytes_sent: 0
protected_bytes_recvd: 0
packets_sent: 0
packets_recvd: 0
ESP Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 200
rplDuplicate: 7
rplZero: 0
AH Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 0
rplDuplicate: 0
rplZero: 0
----------------------------------------------
esp_seq_num: 0
ah_seq_num: 0
auth_failed: 0
replay_errors: 0
ipsec_bad_headers: 0
esp_bad_trailers: 0
decrypt_errors: 0
tunnel_rcv_errors: 0
ipsec_frags: 0
ipsec_frag_errors: 0
ipsec_defrags: 0
ipsec_defrag_errors: 0
auth_sent: 1556402438
auth_successful: 3837991715
esp_auth_failed: 0
Total Drops: 0
ESP Counters:
clear_bytes_sent: 533894458814
clear_bytes_recvd: 3006667546824
protected_bytes_sent: 544479531288
protected_bytes_recvd: 3030092266408
packets_sent: 1556421387
packets_recvd: 3838067394
AH Counters:
clear_bytes_sent: 0
clear_bytes_recvd: 0
protected_bytes_sent: 0
protected_bytes_recvd: 0
packets_sent: 0
packets_recvd: 0
ESP Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 200
rplDuplicate: 7
rplZero: 0
AH Replay:
rplBeyondWindow: 0
rplOutOfOrder: 0
rplBeforeWindow: 0
rplDuplicate: 0
rplZero: 0
----------------------------------------------
esp_seq_num: 0
ah_seq_num: 0
auth_failed: 0
replay_errors: 0
ipsec_bad_headers: 0
esp_bad_trailers: 0
decrypt_errors: 0
tunnel_rcv_errors: 0
ipsec_frags: 0
ipsec_frag_errors: 0
ipsec_defrags: 0
ipsec_defrag_errors: 0
auth_sent: 1556421387
auth_successful: 3838067187
esp_auth_failed: 0
Total Drops: 0
o From the above two captures, could see ‘rejected DF packets from local side’ counter incremented.
Let’s see what variable ‘rejected DF packets from local side’ corresponds to:-
The variable here is l_pmtu. Let’s look when this counter would be incremented. We need search for
something like l_pmtu ++ ( l_pmtu = l_pmtu +1)
As seen above, l_pmtu is incremented at line number 136. On reading the code above, this counter is
incremented when the packet size exceed the tunnel mtu and when unable to fragment. The RETURN of
-1 is called. The -1 indicates ERROR condition.
68 #ifdef IPSEC_PROCESS_AUTH
69 int ah_auth_size = esp_crypt_size + ip_ipsec_size +ESP_CHECKSUM_SIZE;
70 int outer_buf_size = ah_auth_size;
71 #else
72 int ah_auth_size = esp_crypt_size + ip_ipsec_size;
73 int outer_buf_size = ah_auth_size;
74 #endif
In IPSec, we take an incoming "clear" packet and place it inside a second packet called the outer IPSec
packet. The inside packet is then encrypted and authenticated as configured. The outer ipsec has
source and destination of the two tunnel end-points and a size of the original inner packet size
(inner_pkt_len) plus the size of the outer IPSec packet. The exact size of the IPSec overhead depends
on exactly what encryptions and authentication is preformed.
With the help l_pmtu counter stats, looks like STAR_PROCESS_ERROR (from frame-freer-tracking )
was called as the result of :
If the BGP packet size + ipsec overhead is more than tunnel-mtu, we send drop the BGP packet and
send an ICMP PTMU message requesting for a lower MSS.
We can approach this problem two ways. The first one is preferable from an IPSec performance
perspective. You cap the size of the inner packet so that when you stuff that inner packet inside the
IPSec packet the overall size isn't more than the tunnel MTU. In this case the IPSec packet ( with the
original packet inside) will travel to the other end of the tunnel where it's received by the IPSec process
running on the remote end, it's decrypted, the original packet is recovered and sent along it's way.
So example: if we set tcp-mss to 1400b, IPSec adds say 64b, the total size of the packet is 1464b which
is lower than the 1500b tunnel and interface mtu, the packet is encapsulated by IPSec set to the other
end, and decapped.
The second option is to RAISE the tunnel MTU. For this example we'll just jump right to the example.
We can raise the tunnel MTU to 1600b. IPSec takes the 1500b packets stuffs it into IPSec packet
adding an addtional 64b. IPSec sends this new IPSec packet to the other end of the tunnel. Ipsec
doesn't have a problem with this packet because the packet size is less than the tunnel mtu. However,
this 1564b packet is too large to go out over a gige link so the gige interface fragments the packet before
sending it out (this is OK because the IPSec packet doesn't have the DF bit set the packet inside the
IPSec packet did but that is a different packet). The fragments arrive at the remote end of the tunnel
were IPSec reassembles the fragments, forming the orginal 1564b packet and decapsulates the original
1500b which is again sent on its way.