Professional Documents
Culture Documents
Simbox Case Study Anonymised
Simbox Case Study Anonymised
Study
SIM Box/Bypass Fraud Detection
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Table of Contents
1 Machine Learning Case Study with SIM Box data ......................... 2
1.1 Model Training ......................................................................................................................... 3
1.2 Model Training Outputs ........................................................................................................... 3
1.2.1 Cluster Description Table ................................................................................................ 4
1.2.2 Target Proportion Table ................................................................................................... 5
1.2.3 Fraud Clusters Analysis ................................................................................................... 6
1.2.4 Anomaly Clusters Analysis ............................................................................................ 13
1.2.5 Large Clusters Analysis ................................................................................................. 16
1.2.6 Classification Report ...................................................................................................... 21
1.2.7 Anomaly Entities ............................................................................................................ 22
1.2.8 Un-labelled Fraud Entities ............................................................................................. 27
1
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
This Machine Learning Case Study describes how to apply and interpret the analysis output from the
Clustering ML model fed with SIM Box/Bypass fraud data.
The dataset used in this case study has 100,391 subscribers and 70 features that describe various
activities of the subscriber. Out of these subscribers, 99,973 are non-fraud accounts and 418 are fraud
accounts (0.41%).
The features used in this case study contains SIM summarised calling, billing and recharging activity
from the subscribers. The detection of bypass fraud is indicated by the ‘BYPASS’ column – ‘1’ is a
confirmed bypass MSISDN, ‘0’ is probably clean. Example features include:
FEATURES SOURCE Description
countMoLocalCalls Switch Count of MO Local Calls ( Call_Type = Voice MO )
countMoIntlCalls Switch Count of MO Calls ( Call_Type = Voice MO and call_class =INTL)
countMoVASCalls Switch Count of MO Calls ( Call_Type = Voice MO and call_class = VAS )
countMoFREECalls Switch Count of MO Call ( Call_Type = Voice MO and call_class =
TOLL_FREE
durMoLocalCalls Switch Duration of MO Local Calls ( Call_Type = Voice MO )
durMoIntlCalls Switch Duration of MO INT Calls ( Call_Type = Voice MO and call_class
=INTL)
durMoVASCalls Switch Duration of MO VAS Calls ( Call_Type = Voice MO and call_class
= VAS )
…
2
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
G_countMoIntlCalls Switch Same as above but this will sum up all previous days in the
OP + last day
…
G_valueChargedCalls Billing Same as above but this will sum up all previous days in the
System OP + last day
G_totalChargedData Billing Same as above but this will sum up all previous days in the
System OP + last day
G_totalUnchargedData Billing Same as above but this will sum up all previous days in the
System OP + last day
…
G_countRecharges Voucher Same as above but this will sum up all previous days in the
Server OP + last day
G_valueRecharges Voucher Same as above but this will sum up all previous days in the
Server OP + last day
number_of_days combined number of days where activity occurred based on the
criteria taken ( call_types mentioned above )
tariff CRM tariff profile of the subscriber
POS CRM Point of Sale
FC_DATE First call made by the subscriber
3
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Two important properties of the clusters can be inferred from the clustering model:
1. Cluster population (i.e. the number of data records in each cluster), and
2. Spread of data records within each cluster
From the table, one can observe that clusters ‘0’, ‘10’ and ‘11’ are the major clusters with the largest
population, making up to 75% of the total population. Evident from the distance columns, they also
tend to be dense (i.e. data records are quite similar to each other within the same cluster).
4
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
On the opposite spectrum, the model found several clusters with very few records each - clusters ‘3’,
‘13’, ‘14’ and ‘15’. The data records in these clusters will be considered as anomalous in the anomaly
detection function (marked as anomaly case 1), as the clustering model found that these sets of data
records are not similar to other groups of larger clusters.
0 0 23,483 100.0%
1 4 0.0%
1 0 6,355 100.0%
1 2 0.0%
2 0 4,517 100.0%
1 1 0.0%
3 0 19 15.0%
1 108 85.0%
4 0 508 99.4%
1 3 0.6%
5 0 436 99.8%
1 1 0.2%
6 0 6,894 99.7%
1 24 0.3%
7 0 4,745 100.0%
1 2 0.0%
8 0 438 99.5%
1 2 0.5%
9 0 408 98.3%
1 7 1.7%
10 0 19,753 99.9%
1 26 0.1%
11 0 31,891 99.7%
1 86 0.3%
12 0 222 89.5%
1 26 10.5%
13 0 83 100.0%
1 - 0.0%
14 0 26 17.1%
1 126 82.9%
15 0 195 100.0%
1 - 0.0%
5
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
The Nt ML Package can generate explanatory plots describing “what are the distinctive features for
each cluster” and “why they are distinctive” in order to aid users in understanding the profile of a
cluster.
This is shown the following sub-sections first for these fraud clusters (3, 12 and 14), then in following
sections for anomaly clusters (13 and 15) and large clusters (0, 10 and 11).
The plots identify the most distinctive features from most distinct at the top to least distinctive; this
refers to how important each feature is in separating this cluster of dealers from the general population
of all dealers.
A colour bar is shown under each data feature giving the level of distinctiveness together with the
numeric value given underneath the data feature label; this maps to the colour scale shown on the
right side of each plot. The top horizontal also shows this numeric scale. The measure of
distinctiveness is global across all clusters, i.e. one cluster may not have as strongly distinguishing
features compared to another cluster. For example, Cluster 11 only shows as yellow even for the
6
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
most distinctive feature as compared to Cluster 3 where stronger distinguishing features are shown
with a red colour bar.
Cluster details are also shown for each data feature by the blue circle and line; the circle gives the
typical data feature’s value for the cluster and the blue lines show the typical lower and upper value
associated with the cluster. Note that the spread is typically not symmetric. The green arrow shows
how this cluster differs from the general data feature value for all dealers. The direction of the arrow
indicates if the cluster represents a higher than normal value (points to right) or lower (points to left).
Labels on the blue circle and associated bar plus the green arrows gives original data feature values.
The above indicates that the SIMs in the cluster are most differentiated due to their high ratio of
unique destinations on local calls as well as high duration and count of MO international calls.
7
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Below is a plot showing the fraud classification reasons analysis highlighting significant features which
are associated with fraud vs. non-fraud within the cluster. This is identifying distinct local destinations
together with a low value of distinct cells, recharge value and data charging.
8
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Cluster 14 is being distinguished due to the count of MT free SMS services; this might indicate
camouflage activity being performed by fraudsters to disguise the underlying SIM box activity.
9
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Though the fraud reasons analysis is highlighting the high ratio of unique destinations for local calls as
well as a low count of MO local calls and recharge value.
10
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
Similar to cluster 12, this is showing free SMS services but in this case MO instead. Again would
appear to be a camouflage activity.
11
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
The fraud reasons analysis is showing a low duration and count of MT local calls and also of
uncharged calls together with a low ratio of MT or MO local call count.
12
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
13
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
This is showing that these SIMs are most distinguished by their count and duration of MO VAS calls,
and MO free calls.
14
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
This is showing that these SIMs are most distinguished by their duration and count of MT international
calls.
15
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
16
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
This large cluster represents SIMs with low duration of uncharged calls, count and duration of MT local
calls, etc. Overall, low usage.
17
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
The small amount of fraud found in this cluster is strongly identified based on the ratio of unique
destinations for local calls.
18
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
In this case, SIMS are being distinguished based on their lower than normal time gap between calls
(i.e. more frequent calling) and more movement around cell sites. In other words, frequent users.
19
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
20
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
21
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
22
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
23
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
24
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
25
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
26
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited
SIM Box/Bypass Fraud Detection
27
PROPRIETARY & CONFIDENTIAL - Do not adapt, duplicate or distribute without written consent of Neural Technologies Limited