Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Architecture for Fraud Detection in Financial Institutions

Abstract:
Financial fraud analysis, identification and risk assessment are important for either banks or
companies, as well as for others. Different stakeholders including financial institutions and
regulators, investors, academics, the community and businesses continue to pay considerable
attention to this subject. In order to ensure financial stability, assess risk and minimize
expected financial losses it is important to provide a maximum decidability and minimize
uncertainty to the fraud probability estimate. Artificial intelligence techniques are a
powerful and effective solution. This article discusses existing patents relating to the
detection of financial fraud using one or more technologies of artificial intelligence, machine
learning or statistics, describes the key characteristics of contemporary intelligent solutions
for the detection of fraud and summarizes the major trends.

Key words: Fraud detection, financial fraud, machine learning, performance.

Contents
Abbreviations..............................................................................................................................................2
List of Tables...............................................................................................................................................2
List of Figures.............................................................................................................................................2
1 Fraud........................................................................................................................................................2
2 Financial fraud..........................................................................................................................................3
2.1 Financial fraud Types........................................................................................................................3
3. Financial fraud detection Techniques......................................................................................................4
3.1 Predictive Techniques........................................................................................................................4
3.2 Descriptive or Unsupervised Techniques...........................................................................................7
3.3 Artificial and Computational Intelligence Techniques.......................................................................7
4 Related work.............................................................................................................................................8
5 Comparative analysis review....................................................................................................................9
5.1 Classification based on fraud types and Techniques..........................................................................9
5.2 Performance measures of different techniques................................................................................10
6 Discussion and analysis..........................................................................................................................11
7 Conclusion..............................................................................................................................................13
REFERENCES..........................................................................................................................................14

1
Abbreviations
FFD: Financial Fraud Detection
LR: Logistic Regression
NN: Neural Network
SVM: Support Vector machine
FSF: Financial statement Fraud
SOM: Self-organization Map
AIS: Artificial Immune system

List of Tables
Table 1 Techniques used for financial Fraud detection
Table 2 Classification based on fraud types and techniques
Table 3Classification based on Techniques & Accuracy
Table 4 Overview of current technique benefits and limitations

List of Figures
Figure 1 Type of Financial Frauds
Figure 2 Linear Regression model
Figure 3 Decision Tree model
Figure 4 SVM Classifier
Figure 5 NN Model
Figure 6 Bayesian belief network
Figure 7 SOM model

1 Fraud
Fraud includes one or more individuals who behave secretly deliberately to rob others
for themselves that result in financial or personal gain. Fraud is as old as society itself
and can take infinite forms. Nevertheless, the emergence of emerging technology in
recent years has created increased opportunities for criminals to commit fraud. Fraud
is a challenge to the status and relationships of a company with external stakeholders,
including customers, providers, financiers and partners in business. Similarly, fraud
can lead to enormous financial harm. The biggest motivation for fraudsters is to make
the money profit – which makes it more risky for banks, because it holds much capital.
Therefore financial institutions have long been a priority for criminals to target.
This paper aims survey current research trends on financial fraud Detection (FFD).
Selected approaches focused on machine learning techniques are examined that were
used in FFD. Our aim is to highlight its strength and limitations and also to recognize
open fraud detection issues.
The remaining paper has been compiled respectively. Section 2 provides a description
and types of financial fraud; Section 3 includes present methods for detecting financial
fraud and their overview. Section 4 outlines the comparative analysis of various
techniques. Eventually we discuss the results and future research.

2
2 Financial fraud
According to the ACFE Association of Certified Fraud Reviewers (ACFE)
definition, fraud involves any deliberate or extreme activity by which someone else is
robbed of property or money by cunning, frustration, or other injustice [1] .
2.1 Financial fraud Types

Many categories of financial fraud can be classified. The effects on economy include
credit card fraud, insurance fraud, financial statement fraud, money laundering,
corporate fraud, security and commodities fraud. Figure 1 shows different types of
financial frauds [1].

Credit card fraud is of two types; compliance and behavior fraud [2]. Application
fraud involves receiving new cards from issuers with incorrect information or
information from others. Four forms of behavioral fraud can occur: mail theft,
theft/loss card, fake card and 'no card holder present'.
Insurance fraud may be perpetrated by customers, agents and brokers, Insurance
companies' staff, health care workers and at many stages in the insurance process (i.e.
application, eligibility, ranking, accounting and claims) [3].
Financial statement fraud, financial statements represent the company's financial
condition [4]. It has an objective for the reduction of tax obligations.
Several research papers on credit card fraud, insurance fraud and the fraud in financial
statements have been published because of their relative broad influence on the global
economy. Money laundering work has also recently gained a great deal of attention.
Money laundering is a method by which criminals "wash dirty money" in order to hide
their illegal roots and make them "clean" and legal.

Figure 1 Type of Financial Frauds

3
3. Financial fraud detection Techniques
In the past few years, fraud has been identified mainly with statistics and neural
networks approaches, although in some data mining problems they are still used in the
detection of fraud. In order to understand any form of fraud we have to research
machine learning techniques for detecting complex fraud and classifying according to
each fraud. Classification based on techniques for the detection of fraud is useful to
understand the value for a specific issue of each technique. Table 1 summarizes
several techniques used in the detection of financial fraud in recent years.

3.1 Predictive Techniques

Linear regression: A general linear model is a type of logistic regression (LR). Simple
linear regression is unsuccessful if the conditional variable is binary; due to
assumptions of normality. LR model is shown in figure 2.

Figure 2 Linear Regression model

Decision Trees (DT): is a tree structure, where each node is a test on an attribute, and
each branch is the results of the test. The tree thus seeks to separate observation into
each other. Fig 3 shows a decision tree model.

4
Figure 3 Decision Tree model

Support Vector Machines (SVM) uses a linear model in order to map input vectors into
a high-dimensional feature space in a non-linear manner. An optimum hyperplane is
constructed within the new space. SVM classifier model is shown in figure 4.

Figure 4 SVM Classifier

Neural Networks (NN): is an emerging technology with proven theory and fields. A
NN contains multiple neurons, i.e. interconnected processers. Every relation is
connected to a numerical meaning defined as weight.

5
Figure 5 NN Model

Naïve Bayes (NB) a classification tool simply uses Bayes conditional probability rule.
Each attribute and class label are considered random variable, and assuming that the
attributes are independent, the naïve Bayes finds a class to the new observation that
maximizes its probability given the values of the attributes.

The Bayesian belief network (BBN) allows dependencies between subsets of attributes
to be expressed. A BBN is a directed acyclic graph figure 6, in which each node is an
attribute and each arrow is attribute validity.

Figure 6 Bayesian belief network

3.2 Descriptive or Unsupervised Techniques

The Self-Organizing Maps (SOM) is a neural network approach that is not supervised.
SOM makes it possible for users to display data from high to low dimensions. Figure 7

6
shows SOM model.

Figure 7 SOM model

3.3 Artificial and Computational Intelligence Techniques

In genetic algorithm (GA) the rules are randomly generated as the initial population,
i.e. inspired by natural evolution.

Genetic programming (GP) is the genetic algorithm ( GA) extension. This is an


evolutionary estimation family search technique. A random population of solutions is
generated by GP. The original population is then modified to generate new
populations using specific genetic operators

Hidden Markov model (HMM), which varies in invisible states from the usual Markov
model, creates a visible state randomly. The most basic dynamic Bayesian network
can be presented in a hidden Markov model.

As a functional matrix, the Artificial Immune Recognition System (AIRS) represent


self-cells / non self-cells and detector cells. ARB (Artificial Recognition Ball) is used
to reduce redundancy and reflects identical memory cells.

Scatter Search (SS) is a developmental algorithm that shares some common features
with the GA. This works by combining these approaches to create new alternatives,
the benchmark collection

7
Table 1 Techniques used for financial Fraud detection

Classification Techniques
Descriptive or Self-organizing map (SOM)
Unsupervised Technique
Logistic regression
Predictive Techniques Decision Tree
Neural Network
Support Vector Machine
Naïve Bayes
Bayesion Belief Network
Artificial & Genetic Algorithm
computational Genetic Programming
Intelligence Techniques Hidden markov model
Artificial Immune system

4 Related work

With the advancement of information systems, data has become one of the important
factors. For effective data access, data exchange, extraction of data and use of this data
methods and techniques are important. There are several alternative approaches to
identify and prevent fraud [5]. In a variety of subgroups such as credit card,
telecommunications, and the associated fields such as money laundering or intruding,
Bolton and Hand [6] address different fraud detection techniques. Kou demonstrates
credit card, telecommunications and methods of intrusion detection [7]. Weatherford
suggests neural networks, recurrent neural networks and fraud detection artificial
immunes system [8]. Green et al [9] proposed a fraud classification neural network
model that uses endogenous financial details. A model of classification is then applied
to a test sample based on the observed behavior. Data mining has been one of the main
techniques derived from the data set in recent years. The technology has provided
useful commercial and scientific information and has obscured a lot of data. For data
mining applications and development, a variety of fields were described. different data
mining approaches, including neural networks [10] [11] [12] and decision trees (DT)
[13], have been used in FFD. Richarya [14] and Wang [15] provided a thorough
survey and assessment of the various techniques used to detect financial fraud in data
mining. In this report, we review publicly accessible Internet articles and journals
explicitly on data extraction and accounting for FFD in detail.

5 Comparative analysis review


In order to find the current pattern and method of identifying financial fraud, we
organize our survey in three groups. Firstly, we categorize different techniques used in

8
fraud detection. Second , we present work published in the literature on the detection
of financial fraud in relation to the machine learning technologies that were used to
detect fraud. Thirdly, the findings of the current work are analyzed using machine
learning techniques.

5.1 Classification based on fraud types and Techniques

Early research on fraud detection was mainly based on neural networks and
mathematical models. In recent years fraud detection has primarily concentrated on
statistical methods and neural networks, although in some data-mining problems these
are still used to detect fraud. To understand every form of fraud, data mining
techniques must be investigated to identify and classify complex fraud according to
each type. Classification based on techniques for the detection of fraud is useful to
understand the value for a specific issue of each technique. The classification system
is outlined in this section. The study on the identification of financial fraud is
classified according to the methods used and is shown in Table 2.

Table 2 Classification based on fraud types and techniques

Study Fraud Type Technique Used

[16],[17, Credit card fraud Logistic Regression


18, 19] Bayes Classifier
Artificial Neural Network
Artificial immune system
Genetic algorithm, Scatter search
Hidden markov Model

[20], [21] Financial statement Neural Network


Fraud Support Vector Machine
Logistic Regression
Beneish M-score model
Decision tree

[22, 23] Insurance Fraud Social Network analysis


Logistic regression
Fuzzy Logic

[24] Money Laundering Network analysis

9
5.2 Performance measures of different techniques

Multiple considerations for the performance measurement are usually taken into
consideration. Yet precision, sensitivity and specificity are the most widely used
matrices. The relationship between the successful classification number and the failed
classification number is precision. Sensitivity measures the right quantity of samples
as fraud against the sum of fraud. Specificity Compare between true positive and false
positive.
Table 3 demonstrates the accuracy of techniques for detecting financial fraud in
relation to the various forms of fraud as stated by the researchers.

Table 3Classification based on Techniques & Accuracy

Study Technique Type of fraud Accuracy


[23,20
] Logistic Credit card 96.6%-9.4%
regression Insurance fraud 60.68%
FSF 19% - 79%

[25,26] SVM Credit card 95.5%-6.6%


Insurance Fraud 70.41%73.41%
FSF 65.8%

[27] Self- Credit card 100%


Organizing
Map

[20] Genetic FSF 89.27%–


programmi 94.14%
ng

[20] Neural FSF 95.64%-96.4%


Network

[28] Artificial Credit card 94.65%–96.4%


Immune
system

10
6 Discussion and analysis
Examination of various methods and the outcomes of the identification of fraud show
that predictive methods have a higher rate of performance than other. Nevertheless,
other approaches have also been found to be more accurate than the predictive
techniques in some cases. The findings of the analytical and statistical approach can be
different depending on the function selection of a specific form of fraud. The strengths
and weaknesses of the existing methodologies discussed are summarized in Table 4.
The accuracy results show that the NN was the best option, in most cases with
marginally heigher accuracy (98.09%) in the financial statement fraud following the
genetic algorithm (95%). Logistic Regression and SVM offer strong credit card fraud
results (99.4% & 96.6%). We noticed that:

 SOM Clustering helps to detect new trends in input data that otherwise cannot
be detected through conventional statistical approaches and transaction filters
reduce total costs and further analysis processing time.
 Logistic regression suits well with the fraud identification data for the linear
credit card.
 The support vector machine approach can detect the fraudulent activity during
transactions.

Table 4 Overview of current technique benefits and limitations

Method Strengths Limitations


SVM -Could accommodate -The need to change
non-linear grading for the entry method
fraud classification creates problems such
-It requires low as identification of
computing power and fraud.
minimal preparation,
appropriate for real-
time use.
LR -Easy to use and -Its efficiency is poor
successful for detecting compared to other data
fraud. mining techniques and
has difficulties with the
complexity of fraud-
detection
NN -The identification of -It can be overfit if the
fraud is known and can training data set does
refer to many binary not perfectly represent
classifications the space for the
problems, requiring

11
constant updates for
new types of fraud.
-Training and service
involve a high
calculating power;
hence not Ideal for use
in real time
GP -Easy to enforce using -Training and service
physical fitness require a high degree
classification accuracy. of computer power; not
- Suitable for other Fit for deployment in
(non-algorithmic) real time.
problems of binary -The local maximum /
classification. minimum issue makes
it difficult for new
kinds of fraud to adapt.
DT -Easy to use and to -The training data set
comprehend. can over fit if the
- It requires low problem area is not
computation and addressed this requires
minimal preparation, constant updating to be
so it is ideal for use in perfectly appropriate
real time. for new fraud types.
-High computational
requirements
AIS -Suitable for activities -It requires intensive
related to data computer feedback and
mismatch, for example is therefore not suitable
the detection of fraud. for real-time use.
SOM -The auditors can -The analysis cannot be
easily comprehend the automated easily;
visual essence of the hence the auditor needs
performance. manual supervision.
-Easy to implement

7 Conclusion
In recent years, fraud has increased, particularly in large and sensitive technical areas.
Therefore, the battle against fraud is a desperate need. The best defense mechanism
against fraud is fraud prevention and detection. This is not enough to prevent fraud
alone. This paper analyzed existing methods for detecting fraud in different fields of
fraud. In addition, the approaches and techniques of fraud detection have been
classified and analyzed. Neural networks, support vector machines (SVM), logistic
recovery, decision-making trees and meta-heuristic systems, such as genetic

12
algorithms are the most common technique for detection of fraud. All techniques can
be used alone or combined to create powerful detection classifications with an
ensemble / meta-learning technique. In our future research, we will continue
researching card fraud with a view to improving present algorithms, we will provide a
hybrid model that is both able to handle the imbalanced dataset and the real-time
problem and will respond with better accuracy throughout the financial transaction.

13
REFERENCES

[1] Lisic, L.L., Silveri, S.D., Song, Y. and Wang, K. (2015) ‘Accounting fraud, auditing,
and the role of government sanctions in China’, Journal of Business Research, Vol. 68,
No. 6, pp.1186–1195.
[2] Y. Jin, R.M. Rejesus, B.B. Little, Binary choice models for rare events data: a crop
insurance fraud application, Applied Economics Volume 37, Issue 7, p841–848.
(2005)
[3] J.L. Kaminski, Insurance Fraud, OLR Research Report, http:
//www.cga.ct.gov/2005/ rpt/2005-R-0025.htm. 2004
[4] W.H. Beaver, Financial ratios as predictors of failure, Journal of Accounting
Research 4 p71–111. (1966)
[5] Brockett, P. L., Derrig, R. A., Golden, L. L., Levine, A., Alpert, M., 2002, Fraud
classification using principal component analysis of ridits. The Journal of Risk and
Insurance, 69:pp.341–371.
[6]   Bolton, R., and Hand, D, 2002, Statistical fraud detection: A review. Statistical Science,
17(3):pp.235–255.
[7]   Kou, Y., Lu, C., Sirwongwattana, S., Huang, Y., 2004, Survey of Fraud Detection
Techniques. International Conference on Networking, Sensing, and Control.2004,pp. 749-
754.
[8]   Weatherford, M., 2002, Mining for fraud. IEEE Intelligent Systems July/August:pp. 4-6.
[9] B. P. Green, and J. H. Choi, “Assessing the risk of management fraud through neural
network technology,” Auditing, vol. 16, pp. 14-28, 1997.
[10]   Cerullo, M.. J., Cerullo, V., 1999, Using neural networks to predict financial reporting
fraud, Computer Fraud and Security May/June 14–17.
[11]   Dorronsoro, J.R., Ginel, F., Sánchez, C., Cruz, C.S., 1997, Neural fraud detection in credit
card operations, IEEE Transactions on Neural Networks 8 (4).pp. 827–834.

[12]   Kirkos, E., Spathis, C., Manolopoulos, Y., 2007, Data mining techniques for the detection
of fraudulent fi nancial statements, Expert Systems with Applications 32 (4). pp.995–1003.
[13]   Kotsiantis, S., Koumanakos, E., Tzelepis, D., Tampakas, V., 2006, Forecasting fraudulent
financial statements using data mining, International Journal of Computational Intelligence
3 (2). pp.104–110.

[14] Richhariya, Pankaj, 2012. A Survey on Financial Fraud Detection Methodologies.


[15] Wang, Shiguo, 2010. A comprehensive survey of data mining-based accountingfraud
detection research. In: Proceedings of the 2010 International Conference on Intelligent
Computation Technology and Automation, ICICTA. 1, pp. 50–53. doi:
http://dx.doi.org/10.1109/ICICTA.2010.831.
[16] Yeh, I-C. and Lien, C-h. (2007) ‘The comparisons of data mining techniques for the

14
predictive accuracy of probability of default of credit card clients’, Expert Systems with
Applications, Vol. 36, No. 2, Part 1, pp.2473–2480.
[17] Duman, E. and Ozcelik, H.M. (2011) ‘Detecting credit card fraud by genetic algorithm
and scatter search’, Expert Syst. Appl., Vol. 38, pp.13057–13063.
[18] Gadi, M.F.A., Wang, X. and Lago, A.P.d.(2008) ‘Comparison with parametric
optimization in credit card fraud detection’, Machine Learning and Applications,
ICMLA ‘08, Seventh International Conference on, pp.279–285.
[19] Srivastava, A., Kundu, A., Sural, S. and Majumdar, A. (2008) ‘Credit card fraud
detection using hidden Markov model’, IEEE Transactions on Dependable and Secure
Computing, Vol. 5, No. 1, pp.37–48.
[20] Ravisankar, P., Ravi, V., Rao, G.R. and Bose, I. (2011) ‘Detection of financial
statement fraud and feature selection using data mining techniques’, Decision Support
Systems, Vol. 50, No. 2, pp.491–500.
[21] Aris, N.A., Othman, R., Arif, S.M.M., Malek, M.A.A. and Omar, N. (2013) ‘Fraud
detection: Benford’s law vs Beneish model’, IEEE Symposium on Humanities, Science
and Engineering Research, pp.726–731.
[22] Subelj, L., Furlan, S. and Bajec, M. (2011) ‘An expert system for detecting automobile
insurance fraud using social network analysis’, Expert Systems with Applications, Vol.
38, No. 1, pp.1039–1052.
[23] Bermudez, L.l. and Perez, J.M., Ayuso, M., Gomez, E. and Vazquez, F.J. (2008) ‘A
Bayesian dichotomous model with asymmetric link for fraud in insurance’, Insurance:
Mathematics and Economics, Vol. 42, No. 2, pp.779–786.
[24] Gao, Z. and Ye, M. (2007) ‘A framework for data mining–based anti–money laundering
research’, Journal of Money Laundering Control, Vol. 10, No. 2, pp.170–179.
[25] Bhattacharyya, S., Jha, S., Tharakunnel, K. and Westland, J.C. (2011) ‘Data mining for
credit card fraud: a comparative study’, Decision Support System, Vol. 50, pp.602–613.
[26] Huang, S.Y., Tsaih, R.H. and Yu, F. (2014) ‘Topological pattern discovery and feature
extraction for fraudulent financial reporting’, Expert Systems with Applications, Vol.
41, No. 9, pp.4360–4372.
[27] Humpherys, S.L., Moffitt, K.C., Burns, M.B., Burgoon, J.K. and Felix, W.F. (2011)
‘Identification of fraudulent financial statements using linguistic credibility analysis’,

15
Decision Support Systems, Vol. 50, No. 3, pp.585–594.
[28] Halvaiee, N.S. and Akbari, M.K. (2014) ‘A novel model for credit card fraud detection
using artificial immune systems’, Applied Soft Computing, Vol. 24, pp.40–49.

16

You might also like