Professional Documents
Culture Documents
Machine Learning Techniques For SIM Box Fraud Detection
Machine Learning Techniques For SIM Box Fraud Detection
Machine Learning Techniques For SIM Box Fraud Detection
Abstract—In today’s competitive environment, telecommunica- the global mobile traffic by significant margins approaching to
tion operators and service providers need to generate revenue by about 50 exabytes per month by 2021 [2]. In today’s increasing
designing and delivering innovative services to the subscribers. competitive environment, all telecommunication operators and
At the same time, a prime consideration is to minimize the cost
and prevent the revenue leakages. In this context, the industry service providers need to protect and generate revenue by
faces numerous challenges and different types of frauds. There designing and delivering the innovative services that attract the
is a continuous effort to tackle this problem by improving subscribers. Primary objective of telecommunication operators
the implementation methodology and the network protocols. is to maintain a healthy volume of subscribers by providing
However, in general, fraud detection is difficult and currently best products and services. On the other hand an increasing
addressed in a proactive manner. Fraudulent callers exploit the
weaknesses of specific protocol level solutions and avoid detection trend in variety and intensity of telecommunication frauds has
of the gray traffic. Our research work is inspired by classification been observed that leads to huge revenue losses.
algorithms used in machine learning and employed in different With passing year there is a huge increasing trend in telecom
fields of science and engineering e.g. images processing, speech frauds [3]. A telecom fraud is defined as the use of any telecom
recognition, spam email detection etc.. We have applied these service without paying the usage charges [4]. According to
machine learning techniques (MLTs) for the classification of
normal and fraudulent subscriber (SIM Box). We have used the the survey of the Communications Fraud Control Association
call detail records (CDRs) of normal and fraudulent subscriber (CFCA) conducted in 2013 the estimated global loss in the
as an input to identify the important attributes; 25 for each telecommunication fraud was about 46.3 Billion US dollars
customers. These attribute are used for classification of the with an estimated increase of 15% since 2011. A serious
normal and fraudulent subscribers using Neural Network (NN) effort improved the statistics in this regard and the number
and Support Vector Machine (SVM). A comparative performance
analysis of both techniques is also presented using various observed in 2017 is 29.2 Billion US dollars with an estimated
evaluation parameters. SVM using the kernel (Polynomial, Ra- 23% decrease over the last four years [5]. In November, 2013
dial, and Sigmoid) show best performance with an accuracy of Pakistan Telecom Authority (PTA) confiscated more than 700
99.24%. SVM Linear kernel show the worst performance with illegal international calls termination gateway in the country
accuracy of 95.18 % and 0.19 regression. In case of NN, Bayesian [6]. Thereafter, PTA implemented SIMs Verification system
Regularization and Resilient Back-Propagation algorithms show
best and worst performance with an accuracy of 99.87 % and in collaboration with National Data Registration Authority
99.53% respectively. (NADRA) to control illegal sale of SIM and reduce the SIM
Index Terms—Gray Traffic, Interconnect Bypass, Machine Box Fraud [7].
Learning, Neural Network, SIM Box Fraud, Support Vector Higher calling rates as compared to domestic services
Machine, Telecommunication attract a fraudulent to terminate the international calls on
I. I NTRODUCTION any local operator. Classified as SIM Box fraud it involves
utilizing illegal means to terminate the international operator’s
Telecom industry is growing rapidly because of an increas- traffic onto the intended receiver after re-initiating it as a
ing volume of communication among the people, making the domestic call. Since the telecom operators must maintain the
world a global village. Similar is the case in Pakistan, it confidentiality of subscriber information, the data available
was reported to be the world’s third fastest growing telecom for experimental research is limited that leads to a small set
industry in 2008. At the end of June 2013, the total mobile of options as solution. The fraud management systems are
subscribers were approximately 128.93 million [1]. The exten- generally specific to the fraud types and have little capability to
sive use of IP services on cellular networks is likely to inflate detect the emerging threats [8]. In addition, the huge repository
“978-1-5386-5106-3/19/$31.00 ©2019 IEEE” Personal use of this mate- maintained by the operators further complicates the real time
rial is permitted. Permission from IEEE must be obtained for all other uses, in decision making on user classification. Furthermore, SIM Box
any current or future media, including reprinting/republishing this material for fraudulent subscribers change their usage behavior and also
advertising or promotional purposes, creating new collective works, for resale
or redistribution to servers or lists, or reuse of any copyrighted component of frequently change the SIM so no historical data is available
this work in other works.” for analysis. SIM Box fraudulent subscriber pretends to be
4
normal subscribers and it is only after a detailed investigation
that a classification is possible.
The research community has offered different approaches
to detect the telecom fraud. Authors in [9] applied the data
mining technique for detection of subscription fraud. In this
approach system maintained the usage profiles of each sub-
scription. Aforementioned customer profile is matched with
the subscriber who has already commit a subscription fraud.
[10] proposed a rough fuzzy set based approach to detect fraud
in 3G mobile telecommunication network. The authors de-
signed a rule based system called Citi FMS to detect abnormal-
ities and raise alarm in case of an anomaly detection. In [11],
authors used the statistical and probabilistic KL-divergence
to find the dissimilarities between the characteristic of the
normal and fraudulent subscriber. The authors in [12] propose
a cooperative work flow design for telecommunications fraud Fig. 1. Legitimate and Fraudulent Call Setup.
control and propose a network embedding based approach for
fraud detection. Experimental data is used to demonstrate the
effectiveness of the proposed method. A model is discussed received through the mentioned channel is not only accounted
in [13] that attributes the behavioral sequences generated for by the two networks but also by the third party i.e. the
from consecutive behaviors, in order to capture the sequential international carrier.
patterns. This approach declares the deviating behaviours from In case of a fraudulent call, the fraudulent has an agreement
the established pattern as fraudulence. with telecom network in country ‘A’ for termination of its
The objective of this research work is to study the per- international traffic for country ‘B’. In most cases, such
formance of the SIM Box Fraud Detection using machine arrangements are made to reduce the termination call charges,
learning techniques. We have applied the machine learning however a loophole is used by fraudulent to avoid subscription
techniques for the classification of the normal and fraudulent charges. A shown with a dotted line in Fig.1, subscriber of
subscriber (SIM Box). CDRs are used to identify 25 attributes country ‘A’ wants to call subscriber of country ‘B’. When a
forming the feature set of each customers. These attribute are call is made, network of country ‘A’ hands over the call to
used as input to two well known MLTs, i.e. Neural Network, the fraudulent subscriber which is responsible for landing the
and Support Vector Machine for classification as fraudulent traffic to the destination. Such fraudulent or person take the
and non-fraudulent subscribers. A comparative performance call on IP and has no agreement in destination country for
analysis of the both techniques is also carried out using landing traffic. This is done by using the SIM box placed
different evaluation parameters. in the destination country. The SIM box has multiple SIM of
The remaining part of the research paper is organized as operators working in the destination country. The call received
following: In Section II we have discussed the legitimate is passed through the SIM box and same is landed directly on
international call flow and International calls termination flow to designated subscriber as a local call. Doing so, the fraudster
using the SIM Box. We have discuss our research methodology is not only tricking the network in destination country by
in Section III. Section IV presents the methodology and landing international traffic as local traffic but also damaging
results using Artificial NN while classification using SVM and the network of its rightful share revenue from international
respective results has been described in Section V. In Section traffic.
VI we have summarized the comparison of both techniques. III. P ROPOSED M ETHODOLOGY
II. P ROBLEM S TATEMENT Our research work focus on the performance evaluation
of the SIM Box Fraud Detection using machines learning
In legitimate international calls termination, telecom op-
techniques. The process consists of following steps:
erators have interconnect agreement with international calls
carrier for termination of international calls between two 1) Collection of the normal and SIM Box Fraud CDRs.
countries. As shown in Fig. 1, subscriber of country ‘A’ wants 2) Pre-processing of the CDRs to extract the required input
to call a subscriber of country ‘B’. When a call is made, the attributes for classification.
subscriber network in country ‘A’ routes the call through it’s 3) Division of the data into three different groups (training,
core network to an international traffic carrier. The call is then testing and validation).
handed over to the destined country core network which is 4) Performing the classification using the SVM and NN.
responsible for landing the call to the destined subscriber. In 5) Compare the results for analysis.
this situation of a legitimate connectivity the core networks in In the data set, the normal and fraudulent subscribers are
originating country and destination country are involved and identified with a numerical flag:
accounting of such calls is a normal process. Each call made/ • NN Flags
5
TABLE I
S UMMARY OF DATA USED FOR CLASSIFICATION USING NEURAL NETWORK
AND SVM.
6
TABLE II
P ERFORMANCE COMPARISON OF VARIANTS OF NEURAL NETWORK FOR
CLASSIFICATION .
7
Fig. 7. Classification of data using SVM with linear kernel.
TABLE III
P ERFORMANCE COMPARISON OF DIFFERENT SVM KERNELS .