Professional Documents
Culture Documents
White Paper: CSG2 Key Performance Indicators CSG2 Release 3.5
White Paper: CSG2 Key Performance Indicators CSG2 Release 3.5
Revision: 4
White Paper
CSG2 Key Performance Indicators
CSG2 Release 3.5
Overview
The Cisco CSG2 delivers best-in-class content-aware billing, content filtering, service
control, traffic analysis, and data mining in a highly scalable, fault-tolerant package. Its
deep packet inspection (DPI) capability allows mobile operators to analyze, optimize,
secure, and meter all traffic flows, including content-based services.
The Cisco CSG2 is supported on SAMI (Service and Application Module for IP). SAMI
is a next-generation high performance Cisco IOS software application module that
occupies a single slot in the Cisco 7600 series router platform.
This document is intended to provide key performance indicators for monitoring and
maintaining Cisco CSG2, and will be updated as needed.
To better understand KPI monitoring for CSG2, it is important to be familiar with the hardware
architecture of SAMI/CSG2.
SAMI is a multi-processor blade, it contains 6 power PCs with one dedicated to be the control
processor (processor 3 referred as CP) and five used as traffic processors (processor 4 to 8
referred as TPs). The CP is used to process the control messages including Radius, GTP’ etc. The
user traffic gets directed to the individual TP for processing.
This document focuses on the key performance indicators that should be monitored by the
operator at an on-going base for network capacity planning. These major KPIs, including CPU,
memory, throughput, load management, user and session statistics are discussed in details in
section 2 below. Additional KPIs that can be used for further analysis are discussed in section 3
below. Most of the KPIs discussed in this document are supported with SNMP MIB; operators
can use Cisco MWTM (Mobile Wireless Transport Manager) or other network management tools
to retrieve the KPIs via SNMP.
To optimize the CSG2 performance/capacity, the following KPIs should be closely monitored and
capacity growth shall be considered based on the recommendations below:
Monitoring commands should be executed from the CP only, for CPU, memory, interface
throughput, statistics from each processor are displayed and shall be monitored respectively.
2.1. CPU
CPU utilization on Control Processor is mainly driven by the control (Radius, GTP’) traffic,
and CPU utilization on Traffic Processors is mainly driven by user data traffic.
Note:
The number of quota server messages processed mostly impacts CPU utilization of the
CP; increasing quota grant would help reducing the CPU utilization of the CP.
The number of CDRs generated impacts the CPU utilization of both the CP and the TPs,
configuring service level CDRs would help reducing CPU utilization of both the CP and
the TPs.
2.2. Memory
Memory usage on CSG2 is driven by the number of concurrent users, data sessions, active
services per user, RADIUS attributes reported per user etc. Here we are mostly referring to
the usage on each Traffic processor.
2.3. Throughput
Throughput usage on CSG2 is driven by the packets per second rate and average packet size.
As a general guideline, the following recommendations should be followed:
The data packet per second rate should not exceed the CSG2 capacity*. Note the
maximum throughput rate could vary depending on the average packet size.
Operator shall start planning capacity growth when data packet per second rate hits 70%
of the maximum capacity, and shall increase capacity when data rate exceeds 90%.
Operator shall start planning capacity growth when control and management data usage
hits 70% of the maximum capacity, and shall increase capacity when control and
management data rate exceeds 90%.
Data throughput utilization on CSG2 can be monitored via the following CLI:
show interfaces gigabit Ethernet 0/0 | include bits/s
Note: Overall date rate of CSG2 card can be calculated by summing up the input or output
packet per second rate of all the traffic processors (4-8). Data traffic going through CSG2
shows up as both input and output traffic under the interface stats, only one rate should be
used for calculation to avoid counting the data traffic twice. Interface load interval can be
configured from a default of 5 minutes to the lowest 30 seconds to get a more accurate rate
for the peak time usage.
Control and management traffic utilization on CSG2 can be monitored via the following CLI:
Note: Control and management traffic utilization on CSG2 can be monitored on the control
processor (3), this includes Radius, GTP’ traffic etc.
Interface packet drops and errors should also be monitored for operator to be alarmed that an
overload or error condition has occurred:
CSG2#show interfaces g0/0 | incl errors|throttles|drops
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 output errors, 0 collisions, 0 interface resets
Protocol transaction rate is another KPI on CSG2, R3.5 introduced a new feature to
provide additional statistics for transaction rates per protocol via “show ip csg stats
protocol” command*. It displays the statistics count, rate, maximum rate, and maximum
rate timestamp for the transaction, byte count, and packet count for each of the protocols
that is configured on the CSG2.
Note for IP, TCP, and UDP accounting, the Transaction Rate fields display the
Layer 4 transaction rates.
For all other protocols, the Transaction Rate fields display the Layer 7
transaction rates.
To configure the interval, in seconds, that the CSG2 is to use when calculating
the rates for this command, use the ip csg statistics protocol interval command
in global configuration mode.
CSG2's load management feature provides a central facility for protecting the CSG2
when overloaded and warning the operator that an overload exists. The philosophy applied
throughout the CSG2 application is that existing users and sessions should continue to receive
service during an overload situation, and events related to new sessions should be discarded
to protect critical system resources.
This component monitors critical system resources and sheds load caused by external
events in a specific order. Those events, in the order of load-shedding are: radius accounting
start messages, user database requests, new data session creation, BMA messages, QS
messages, user idle processing. For each of these items, CSG2 reports:
Allowed (The total number of events allowed since last stats clear.)
[Allowed] (per second) (The number of events per second over the last 30 second period.)
Max per second (The highest number of events per second over a 30 second interval.)
IPC:
IPC Queue Depth Tolerance (The first number is the maximum number of queued IPC messages
allowed by load management for this transaction type. The second number is the maximum
number of queued IPC allowed for the entire CSG. The last number is the tolerance expressed as a
percentage of the overall IPC queue max.) These numbers are fixed and not meant to be tuned.
Denied (The total number of events denied due to exceeding the defined IPC queue depth since
the last stats clear.)
[Denied](per second) (The events / second denied over the last 30 second interval, due to
exceeding the defined IPC queue depth. )
Max per second (The highest number of events denied per second over a 30 second interval due to
exceeding the defined IPC queue depth.)
Rate(if applicable):
Limit (The number of transactions allowed per second.) It is recommended to leave these
settings as default.
Denied (The total number of events denied due to exceeding the defined rate limit since the last
stats clear.)
[Denied](per second) (The events / second denied over the last 30 second interval, due to
exceeding the defined rate limit. )
As a general guideline, any “Denied” counter increases indicate there has been an over
utilization of system resources on CSG2, “Max per second” rates record the historical
maximum to help operator perform traffic analysis and plan for capacity growth.
Radius Start
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 10000 / 50000 (20 percent)
Denied (per second) = 0 (0), Max per second = 0
Rate: Limit 5000 on control processor
Denied (per second) = 0 (0), Max Per Second = 0
Allowed (per second) in the “Radius Start section” represent the total number of
new users and new user sign on rate.
When critical system resources are running low (either the IPC queue depth
tolerance is exceeded or the transaction rate limit is exceeded), new Radius
accounting start messages will be dropped (Counter “Denied” would increase).
When critical system resources become available again, CSG2 will start to
process new Radius accounting start messages again.
Database Request
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 10000 / 50000 (20 percent)
Denied (per second) = 0 (0), Max per second = 0
Rate: Limit 1800 per traffic processor
Denied (per second) = 0 (0), Max Per Second = 0
Allowed (per second) in the “Database Request section” represent the total
number of new users and new user sign on rate when the user database is
configured to look up username from subscriber IP address.
When critical system resources are running low (either IPC queue depth
tolerance is exceeded or transaction rate limit is exceeded), new user database
request messages will be dropped (Counter “Denied” would increase).
Session Create
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 18000 / 50000 (36 percent)
Denied (per second) = 0 (0), Max per second = 0
Allowed (per second) in the “Session Create section” represent the total number
of new data sessions and the data session rate.
When critical system resources are running low (in this case, IPC queue depth
tolerance is exceeded), new data session will be dropped (Counter “Denied”
would increase).
When critical system resources are available (in this case, IPC queue depth
tolerance dropped below threshold), CSG2 will start to process new data sessions
again.
BMA Messages
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 30000 / 50000 (60 percent)
Denied (per second) = 0 (0), Max per second = 0
Quota Server
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 30000 / 50000 (60 percent)
Denied (per second) = 0 (0), Max per second = 0
Allowed (per second) in the “Quota Server section” represent approximately the
total number of messages sent to QS since the last module reload or counter
clearing and average rate over the last 30 seconds on CSG2.
When critical system resources are running low (in this case, IPC queue depth
tolerance is exceeded), QS messages will be dropped (Counter “Denied” would
increase).
User Idle *
Allowed (per second) = 0 (0), Max per second = 0
IPC: Queue Depth Tolerance 10000 / 50000 (20 percent)
Denied (per second) = 0 (0), Max per second = 0
Rate: Limit 1000 per traffic processor
Denied (per second) = 0 (0), Max Per Second = 0
Allowed (per second) in the “User Idle section” represent approximately the total
number of IPC messages sent due to a KUT element idling out to clean up the
KUT and average rate over the last 30 seconds on CSG2.
When critical system resources are running low * (either IPC queue depth
tolerance is exceeded or transaction rate limit is exceeded), the User Idle IPC
messages will be dropped.
When critical system resources are available again, these IPC messages will be
sent again to clean up the KUT.
CSG2 buffer management is part of the load management feature, it tracks IO memory
usage for the following categories:
CSG2 puts a fixed memory limit for each of the above category except “Unlimited”
to prevent the system from overloading due to specific network conditions
such as “large number of OOO packets or IP fragments”.
Create gives you the number of IO buffers that have been created for the category
since the last module reload or counter clearing.
The number of current users (current counter) and the maximum users existed (highwater
counter) in KUT can be monitored via the following CLI:
I. “show ip csg stats | section CSG User Stats”
CSG User Stats:
max = 500000, current = 4338, highwater = 7996
…
The number of current data sessions (current counter) and maximum sessions
existed(highwater counter) can be monitored via the following CLI:
II. “show ip csg stats | section CSG Session Stats”
CSG Session Stats:
user sessions = 133243, highwater = 156097, ha_overrun = 0
…
Please refer to section 4 for maximum users and data sessions supported per CSG2.
2.7. MIBs
The following table lists the MIBs that are available to monitor the KPIs discussed in this section:
KPI MIB
CPU CISCO-PROCESS-MIB
Memory CISCO-ENHANCED-MEMPOOL-MIB
Throughput IF-MIB
CISCO-CONTENT-SERVICES-MIB
User and Session
This section discusses additional KPIs that can be monitored on CSG2 that helps the
operator to understand the overall system capacity.
As the performance of the external billing system plays a key role of the overall billing
solution, it is important to monitor the KPIs that are related, BMA, and QS statistics are discussed
in this section.
The key performance indicators ( highlighted in bold above ) to graph over time are:
Packet rate vs ack rate – These 2 rates should be close to equal if the BMA is
acknowledging in time for all the packets that CSG2 is sending. And the rate itself
indicates the current message rate exchanged between CSG2 and BMA. This KPI gives
you an idea on how the BMA is performing at present.
retransmits vs data packets sent – If this ratio is high, it indicates the total number of
messages that the BMA can receive is insufficient for the number of records the CSG2 is
sending. This KPI gives you a view of how the BMA has been performing over time, by
combining with the first KPI stated above, operator can determine if there is an issue with
BMA performance at present or in the past.
3.2. QS Statistics
The key performance indicators ( highlighted in bold above ) to graph over time are:
Packet rate vs ack rate – These 2 rates should be close to equal if the QS is
acknowledging in time for all the packets that CSG2 is sending. And the rate itself
indicates the current message rate exchanged between CSG2 and QS. This KPI gives you
an idea on how the QS is performing at present.
retransmits vs data packets sent – If this ratio is high, it indicates the total number of
messages that the QS can receive is insufficient for the number of records the CSG2 is
sending. This KPI gives you a view of how the QS has been performing over time, by
combining with the first KPI stated above, operator can determine if there is an issue with
QS performance at present or in the past.
3.3. MIBs
The following table lists the MIBs that are available to monitor the KPIs discussed in this section:
KPI MIB
CISCO-CONTENT-SERVICES-MIB
BMA Statistics
CISCO-CONTENT-SERVICES-MIB
QS Statistics