Professional Documents
Culture Documents
IJAL020405SHARMA
IJAL020405SHARMA
net/publication/310423882
CITATIONS READS
2 6,541
3 authors:
Vipin Kumar
1 PUBLICATION 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Vikash Sharma on 16 November 2016.
Vikash Sharma*
Faculty of Computer Studies,
Symbiosis International University,
Pune, 412115 Maharashtra, India
Email: vikash0810@gmail.com
*Corresponding author
Bhavna Pandey
Symbiosis School of Banking & Finance,
Symbiosis International University,
Pune, 412115 Maharashtra, India
Email: bhavna.pandey@ssbf.edu.in
Vipin Kumar
Neurapses Technologies Pvt Ltd.,
Bangalore, 560100 Karnataka, India
Email: vipinkumar.work@gmail.com
Abstract: Big Data is playing a very significant role to take any industry
forward. In the context of the financial sector and fraud detection, automated
fraud detection tries to collect useful information to reduce financial frauds by
doing analysis and data mining of Big Data, especially structured data. Even a
significant attention of Big Data usage has shifted towards supply chain
management (SCM). Although, many dimensions of Big Data have been
studied and researched in SCM, there lies a missing gap in the understanding of
unstructured data for financial fraud detection. With the help of this paper, we
would like to propose a theoretical framework to study this dimension and
analyse how individual enablers of unstructured data impacts SCM. As such,
the paper intends to evaluate how much useful this unstructured data can be in
reducing financial frauds. Finally, we outline the limitations and challenges of
our study and further research directions.
Reference to this paper should be made as follows: Sharma, V., Pandey, B. and
Kumar, V. (2016) ‘Importance of Big Data in financial fraud detection’,
Int. J. Automation and Logistics, Vol. 2, No. 4, pp.332–348.
Bangalore, India. His current research interests include image processing, soft
computing techniques, Big Data, machine learning, operation research and
supply chain management.
1 Introduction
Trillions of data are generated online every day on Twitter, Facebook, and other social
media websites. Slowly, we are relying heavily on online social aspects of
communication. Thus, most of the data in today’s world are getting presented in an
unstructured form which comes in large volume, velocity, varieties, and variability
format. Irrespective of significant values hidden in data, it is difficult to extract relevant
information from it. During a recent years’ analyst have discovered valuable data hidden
in unstructured form and the need to understand this for the future benefit of the business.
The velocity at which data is getting generated is difficult to be analysed by the
traditional analysis tool. Size is the first, and at times, the only dimension that leaps out at
the mention of Big Data (Gandomi and Haider, 2015). With the vast amount of data now
available, companies in almost every industry are focused on exploiting data for
competitive advantage. The volume and variety of data have far outstripped the capacity
of manual analysis, and in some cases have exceeded the capacity of conventional
databases (Provost and Fawcett, 2013). Therefore, the need of the hour is to analyse this
huge amount of data and solve many business problems. As the data is producing at a
faster rate, it is getting difficult for a data analyst to understand new problems and
provide real-time solutions. Although analyst has been working on data for a long time,
this vastness of information in such short time is the reason, the term Big Data is defined.
(Wang et al., 2016) also stated that “Big Data can provide unique insights into, inter alia,
market trends, customer buying patterns, and maintenance cycles, as well as into ways of
lowering costs and enabling more targeted business decisions”.
With this advent, Big Data seems to be a ‘blessing in disguise’. Although it is hard to
manage such massive data, it is also opening doors for analytics to extract hidden
information from these data. The introduction of electronic communication technologies
and platforms such as Skype and WhatsApp (commonly referred to as ‘over the top’
services) that deliver audio, video and other media over the internet protocol have created
extremely competitive markets that have forced many companies across most industries
to explore and implement innovative and smarter products and services to the customer
(LaValle et al., 2011). With this, there is a rise in unstructured form of data; available
334 V. Sharma et al.
online on social media, news, and product reviews website. In 2013, Insurance et al.
(2013) stated that around 80–85% of data are in unstructured form. Later, in 2015,
Gandomi and Haider (2015) reported it to be 95%. Thus, there is a huge need to
understand these unstructured data and solve many business problems as possible.
One of the business related problems is the detection of ‘financial frauds’, which
alone brings billions of losses to many companies. Many efforts have been made to study
and automate fraud detection to reduce huge financial loss. Still, many challenges and
limitations lie ahead (Ngai et al., 2011). The limitations of analysing frauds are majorly
in data that are in unstructured form i.e. data modelling is not defined correctly. The
purpose of this paper is to make industry and researchers aware of new dimensions for
data extraction i.e. unstructured data, especially in the context of financial fraud detection
and impact on supply chain management (SCM). And later, propose a framework that
will help to reduce this gap.
Further, social networking is also contributing to bringing significant behavioural
changes on one’s decision-making capability. Although data analysis study has been
carried out to a good extent, the impacts of Big Data analysis in logistics need to be
identified more to compete with other companies. A long-term customer may not take
much time to change product preference due to behavioural, peer influence, homophily,
pricing, cultural and political effect. This can quickly impact logistics, warehouse and
return on investment in business. If one can correctly predict best ways to retain
customers, understand future needs and the possibility of frauds then it will surely help in
arranging competitive pricing, increasing customer satisfaction, chunking fraudulent
customers based on fraud analysis and reducing unnecessary stock maintenance. As such,
there is a real need to study financial fraud detection and its impact from SCM
perspective to overcome the future business problem and help the business in taking
necessary action beforehand. The objective of the research is to develop a theoretical
framework to detect financial fraud using unstructured data which constitutes our 95%
(Gandomi and Haider, 2015) of data. With the help of this paper, we will try to study and
reduce the gap between unstructured data and detection of financial frauds.
The current study on the ‘importance of Big Data in financial fraud detection’
emphasises the need to give importance to unstructured data for detecting financial
frauds. The main contributions of this paper are as follows. In Section 2, we will review
the literature and describe Big Data, unstructured data, financial fraud detection and how
unstructured data can be helpful to reduce frauds related to financial transactions. In
Section 3, we will describe research methodology adopted for this paper. In Section 4, we
will discuss the theoretical research framework designed for this study. In Section 5, we
will share some limitations and research direction one can take. Finally, in Section 6, we
will draw our conclusion of this paper.
2 Literature review
In this section, we first review the literature related to Big Data, unstructured data and
financial frauds. Moreover, then examine the need and relevance of unstructured data for
fraud detection.
Importance of Big Data in financial fraud detection 335
Further, there are three essential attributes of Big Data as defined below and also
explained with the help of Figure 1. Let us first understand how data is presented to the
data analyst:
a Structured data: these data comes in proper format and database schema. Proper
modelling is already defined to understand this data. Example, database and
spreadsheet. Querying the data and then extracting relevant information is quite easy.
This is how most of the organisations keep their data with the well-defined database
schema.
b Semi-structured data: these types of data also exist in a structured format, but data is
not maintained in a database, rather flat files. These data are mostly a dynamic
combination of data and metadata before presented to the user. Examples are xml
data, json file, and source code of a website in html format.
c Unstructured data: these consist of the most amounts of data being generated in
today’s world. Every company follows their standards to represent these data based
on user’s requirement and necessity. Either the source is internal in the form of
memos or meeting notes or external as reports and journals. Most of these data are
descriptive in nature, and a proper format is not defined. It is also tough to define
such kind of data in some common format. Other examples are also defined in
Figure 1.
by that time, these data might have taken some other form or may not be relevant. Certain
companies are explicitly focusing on analysing this real-time data by streaming directly
into the analytics system (Sensmeier, 2013).
Irrespective of it being in unstructured form, we must accept the fact that there is a
substantial amount of information stored which might look invaluable at the first point,
but contains much valuable information. Moreover, the job of an analyst is to offer a
meaning to given data. We will study further, how much importance this data hold and
how it can be helpful to save billions of dollars in the financial sector.
Figure 2 Hierarchy chart of white-collar crime perpetrators from both firm-level and
community-level perspectives
Fraud Bureau estimates that the total cost of undetected insurance fraud in the UK is
£2.1 bn, which translates into an average £50 additional premium for every policyholder.
The ABI states that every week 2,670 fraudulent insurance claims are exposed in the UK,
worth £19 m. Fraud is a big ticket item both for the perpetrators and for those dedicated
to exposing it and bringing fraudsters to justice (Insurance et al., 2013).
Financial statement fraud is a serious threat to market participant’s confidence in
published audited financial statements. Capital market participants expect vigilant and
active corporate governance to ensure the integrity, transparency and quality of financial
information (Rezaee 2005; Gupta and Gupta, 2015). Li (2010) and Gupta and Gupta
(2015) also studied corporate financial fraud and detection using an empirical framework
that models the strategies interdependence between fraud and detection and accounts for
the possibility that some fraud remains undetected.
In this, automated fraud detection tries to detect fraud based on certain anomalies.
Fraud detection systems have been working mostly on structured data that are internal to
the company. Even data that they get from external sources are in structured form. Our
objective is to understand financial related frauds and importance of unstructured data in
it. For this, we also need to give importance to real-time unstructured data that exist on
the web. We will first study the need and relevance of unstructured data in fraud
detection and later, discuss the theoretical framework developed to overcome fraud
related issue.
after 2004 have also created an abundance of user-generated content from various online
social media such as forums, online groups, web blogs, social networking sites, social
multimedia sites (for photos and videos), and even virtual worlds and social games
(O’Reilly, 2005; Chen et al., 2012).
For social media analytics of customer opinions, text analysis, and sentiment analysis
techniques are frequently adopted (Pang and Lee 2008). Various analytical techniques
have also been developed for product recommender systems, such as association rule
mining, database segmentation and clustering, anomaly detection, and graph mining
(Adomavicius and Tuzhilin, 2005; Chen et al., 2012).
In Arias et al. (2014), authors have tested the impact of Twitter on two different
datasets, the stock market and movie box office revenue to make time-series predictions.
They have proved that adding this dataset does improve the overall prediction. The test
was conducted on three years of twitter data between 2011 to 2014. Another paper (Gray
and Debreceny, 2014) did work related to financial fraud detection using data mining
techniques. The authors have conducted a literature review of publications between 1997
and 2008 to classify the articles into four categories of financial fraud (bank fraud,
insurance fraud, securities and commodities fraud, and other related financial fraud) and
six classes of data mining techniques (classification, regression, clustering, prediction,
outlier detection, and visualisation). The findings revealed that data mining techniques
have mostly been applied to the detection of insurance fraud, in comparison to corporate
fraud and credit card fraud which gave gained some attention in recent years. However,
there seems to be a lack of research on mortgage fraud, money laundering, securities and
commodities fraud.
Aral et al. (2009) developed a matched sample estimation framework to distinguish
influence and homophily effects in dynamic networks, and applied that framework to a
unique dataset documenting product adoption in a vast network. It was found that
previous methods significantly overestimate peer influence in this network, mistakenly
identifying homophilous diffusion as influence driven contagion. With all its potential in
both the academic and commercial world, the effect of Big Data on the behavioural
sciences is already apparent in the ubiquity of online surveys and psychology
experiments that outsource projects to a distributed network of people (Bentley et al.,
2014; Rand, 2012; Sela and Berger, 2012; Twenge et al., 2012). As such there is a need
to understand social behavioural nature from unstructured data for better future
forecasting and detection of financial frauds.
Since last decade, high-profile financial frauds committed by large companies in both
developed and developing countries were discovered and reported, such as Enron,
Lucent, WorldCom, Parmalat, YGX, SK Global, Satyam, Harris Scarfe and HIH
(Wadhwa and Pal, 2012). With a massive increase in accounting fraud witnessed, there is
a need for efficient financial accounting fraud detection from the investors, academic
researchers, media, the financial community and regulators (Wong and Venkatraman,
2015). However, the existing fraud detection model’s accuracies are not generally high
(Li, 2015). That brings us to a conclusion that there is a need to study Big Data further or
rather more specifically unstructured data to detect fraudulent activities.
340 V. Sharma et al.
3 Research methods
This research adopts a conceptual framework-based approach. The prime aim was to
investigate about Big Data, financial frauds, and how unstructured data can help to
improve financial fraud detection. To determine the current state and future directions for
Big Data and detecting financial frauds, we conducted an extensive literature review.
To achieve our objective, we developed and proposed a framework based on
unstructured data analysis using online data such as Twitter and Facebook and use these
data to illustrate the practical implementation of the framework and analyse the results
obtained. The first phase of the review was to determine the scope and relevant source
material. Since Big Data and detection of financial frauds is an interdisciplinary topic,
related articles were published in a wide variety of journals. Furthermore, unstructured
data and financial fraud detection are still an emerging research area, so not much
substantial literature review was found which talks about both in detail except for some
conference proceedings. Therefore, we included references from both these topics,
conference proceeding and journals from various disciplines. It includes Science Direct,
IEEE, Google Scholar, ACM, Scopus, and Springer. From the papers identified, we did
backtracking on relevant cited articles of other authors. The search was based on ‘Big
Data’, ‘data analytics’, ‘financial fraud detection’ and ‘SCM’. We mostly considered on
those papers were there was some nearby related link between searched keywords.
In the second stage, we classified the papers into the different categories. The
classification proceeded as follows. Two researchers independently collected the articles.
One focused on Big Data related while another researcher focused on financial frauds
related articles. Later, all the papers were classified into corresponding factors within the
framework. The two classifications were subsequently compared and, in the case of
differing results, a third researcher repeated and evaluated the classification.
Further, two other researchers focussed on the framework design based on existing
literature found related to data mining and their personal industrial research experience.
Moreover, with the aid of another researcher the framework was finalised. Altogether, we
reviewed the title, abstract, discussion and conclusions sections of the paper and found
out its main topic, for example, the importance of unstructured information. Some
publications focused on various factors, but not anyone in detail or a compounding of
both the topics. Those publications were then classified as ‘multiple categories’ papers.
Several other papers presented market overviews summarising the state of Big Data,
its challenges or potentials – such papers were classified as ‘overviews’. Side by side, we
analysed methodologies used. The classification was first drawn into ‘conceptual’ and
‘empirical’ and then split up the ‘empirical’ further to ‘qualitative’, ‘quantitative’ and
‘design research’. Most of the technical papers proposed conceptual constructs, but some
mainly described technologies. Therefore, we divided conceptual studies into proposed
constructions and descriptions. In the end, we produced a new theoretical model.
4 Research framework
In this section, we have defined a new theoretical model to detect future frauds. With the
help of this framework, we can focus on data modelling, extraction and mapping of
unstructured data to structured data and then find essential enablers to identify frauds. In
Importance of Big Data in financial fraud detection 341
Figure 3, one can see that the overall unstructured data processing framework consists of
a collection of modules or in other words sub-process.
f Feature extraction and segmentation: the extracted feature can then be segmented
among different categories based on functional knowledge for the ease of usage. We
have to check the different combination of features on models.
g Extraction linking and mapping with base system: the information we can get can be
represented in two types:
• It can simply be added as new feature to the base model (e.g. location,
organisation info, online activity percentage, married/single, no of friends, etc.).
• The information can be added as link format also where we can find user having
multiple locations, date of births, e-mail ids or knows another user from our user
set. Graph databases can be used for such linking.
h Pattern detection and behavioural analysis: firstly, the new features get added to
base features and then by applying models we have to check if we can find patterns
revealing towards fraud users. Later, through the links that we have discovered, we
can check for unknown fraud cycles.
i Fraud analysis: based on the machine-generated output we can build cases to check
for the authenticity of fraud system. The information can then be sent back to the
framework model for more learning enhancement.
j Machine learning and knowledge base: this overall framework requires machine
learning at various stages. A sound knowledge base of the domain is required. Some
of this knowledge base is industry and need specific. Some are data specific, and
some are standard across the industry and data source. As such, this process shall be
used across all other sub-processes.
The immediate challenge at the start of data collection is to chunk the source data to
something manageable and meaningful. Considering the principle of grounded theory,
data analysis will begin immediately because future data collection and sampling depends
on the information which emerges from the first set of data. Also, delaying this part will
somehow create an unmanageable workload in the research. Consequently, grounded
theory is known as ‘the constant comparative method of analysis’ (Glaser and Strauss,
1967). Further, the study will be carried out by mathematical foundation, graphs, and
charts considering both qualitative and quantitative approach. Factor analysis will be
done in two ways. One by using ISM/TISM model to develop the behavioural model for
the enablers of an SCM industry. The structural model developed using this methodology
will help us to understand the interaction between the various elements of the system.
After the model is developed, it is further subjected to assessment by a different group of
domain experts so as to enhance its validity (Jayalakshmi and Pramod, 2015). Another
way to do factor analysis is by doing text analytics and creating term frequency–inverse
document frequency. The expert can validate the result based on experience. This can
further help to train the model.
Finally, as discussed above that same system can be directly applied to various
industries, we only have to replace the base system. In context to SCM and logistics,
extra information needs to be incorporated in the base system apart from customer's
information. This will help to identify customer behaviour. As anomaly detection models
are almost same everywhere, the framework is providing extra features to improve the
pattern signal. Only functional and domain knowledge used will be different.
Importance of Big Data in financial fraud detection 343
Data mining is a threat to privacy and data security because when data can be viewed
from many different angles at different abstraction levels, it threatens the goal of keeping
data secured and guarding against the intrusion on privacy. For example, it is relatively
easy to compose a profile of an individual (e.g. personality, interests, spending habits)
with data from various sources (Jun Lee and Siau, 2014). Due to this, it is challenging to
get correct results and identify frauds.
Also, biggest limitation in the area of unstructured data analysis is a collection of real
data because of restricted access to many data points related to the search criteria. Online
web sources may not allow accessing these data on the vast amount. Example, for a bank,
it is hard to filter users based on targeted customers. A particular bank is only interested
in knowing how its customers are behaving online, in order the identify possible chances
of future frauds. Due to this, instead of enhancing the state-of-the-art existing automated
fraud detection system, most companies still rely on existing human intervention to a
great extent. Given the sensitive nature of fraud situations and because of the rather low
frequency of detected fraud cases, most empirical studies in this area use experimental
settings or surveys focusing on auditor’s perceptions, rather than actual fraud cases as a
source of data (Hassink et al., 2010). This model comparison with manual detection is
incorrect.
There is also geographical limitations and challenges for certain data. A user might be
located in one country but doing fraudulent business from another country. To talk about
kinds of fraudsters, there are possibilities that an average fraudster might be caught, but
professional fraudster tends to stay away from using online social media or using
smartphones to keep themselves aloof.
The next research direction will be to collect sample data and filter only those related
to fraud detection. This will help us to develop a final model that will combine
multi-dimensional unstructured data from various online sources. Example, Twitter, and
Facebook. After collection, prune only financial fraud-detection related words and
sentences using text analytics and other machine learning models.
We will also maintain other information like user’s name, age, and location. This will
help to identify possible users who might be suspicious of fraud, and make a connection
between base user details to multiple online identities. Surely, the culprit may not be
identified all the time correctly, but common behavioural pattern and cluster analysis will
help to determine possible sets of users. Out of other ways for classification and
detection, the model will also use graphs to highlight possible correlations between
various anomalies, as stated by Akoglu and Faloutsos (2013) that graph representation of
datasets inherently imposes long-range correlations among the data objects.
Like any other system, the proposed theoretical framework also has some limitations.
When registering the base system for a specific domain or industry, the proposed system
will require a deep knowledge of the domain. Human experts may not be available or
ready to share the knowledge. Integration might need a lot of human resources, thus will
be a costly affair. Further, direct system access and linkage need to be done to get data
out of it. If data is copied to a new system for analysis, the possibility of data consistency
and redundancy is an issue. If data keeps updating on a frequent basis, this may not match
with the data that has already been copied for analysis. Later, combining all sources of
data is a challenge too. This is due to complexity in matching different features from
344 V. Sharma et al.
various sources. Also, privacy factor will not allow access to sensitive data that is a must
for profile matching. Thus, data may not be available for collaboration.
To add further, even if there is a downside to big-data research, without clear
objectives and a unifying framework, behavioural scientists may ask whether it is useful,
for example, to infer from millions of Facebook pages or Twitter feeds that “men are
more influential than women … [and] that influential person with influential friends help
spread” information (Aral and Walker, 2012) or that “people awaken later on weekends”
(Golder and Macy, 2011). Bentley et al. (2014) stated that in the best case, big-data
studies will not compete with more traditional behavioural science but instead will allow
us to see better how known behavioural patterns apply in novel contexts. As long as
people trust their individual experiences, even in observing the behaviour of others, a
collective wisdom is possible. The previous studies have already laid the foundation for
further research design. Further evaluation will help us to review the system better. To
add more, many existing fraud detection systems operate by supervised approaches on
labelled data, hybrid approaches on labelled data, semi-supervised approaches with legal
(non-fraud) data, unsupervised approaches with unlabelled data (Chintalapati and
Jyotsna, 2013). Our intention would be to find the implications of unstructured data in
fraud detection.
Therefore, what we believe is that if all these factors are considered, a slow and
steady steps are taken to finalise the system and then data is poured into this new
proposed system then surely identification and detection of fraud are feasible to a very
good extent. Still, there are some limitations and challenges. This extent can only be
evaluated by doing a further implementation of this theoretical framework. Our goal is to
start at some point and include as many possibilities as possible. This theoretical study
will surely help us and someone working in this area to get close to common goal.
6 Concluding remarks
In this paper, we have done a literature review to understand Big Data, financial frauds
and the importance of unstructured data. We have also touched upon the benefit in
context to SCM. The literature mostly talks about the potential identification of
automated fraud detection from unstructured data. Also, most of the study needs to
understand this unstructured format and focus more to get relevant data out of it. If
necessary, check if meaningful data can be extracted without affecting privacy. We have
also formalised a theoretical research framework to carry out our further study. This
framework was proposed to focus on unstructured data for extraction of relevant
information from it.
In future, we have planned to implement and execute a better decision-making model
to test this theory by using newer data analytics and machine learning tools on the
realistic system. We are currently in a process to formalise close to a generic model that
can work for variable type data received from different sources. This framework will help
the future researcher to solve other similar business related problems. It will surely
contribute to creating a model that can be applied to multiple industries with minimum
modifications.
Importance of Big Data in financial fraud detection 345
Acknowledgements
The authors are grateful for the valuable comments of the referees on an earlier version of
this paper.
References
Adomavicius, G. and Tuzhilin, A. (2005) ‘Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions’, IEEE Transactions on Knowledge and
Data Engineering, Vol. 17, No. 6, pp.734–749 [online] http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=1423975 (accessed 14 December 2015).
Akoglu, L. and Faloutsos, C. (2013) ‘Anomaly, event, and fraud detection in large network
datasets’, Proceedings of the Sixth ACM International Conference on Web Search and Data
Mining – WSDM ‘13, p.773 [online] http://dl.acm.org/citation.cfm?id=2433396.2433496
(accessed 11 December 2015).
Aral, S. and Walker, D. (2012) ‘Identifying influential and susceptible members of social
networks’, Science, Vol. 337, No. 6092, pp.337–341, New York (NY) [online]
http://www.scopus.com/inward/record.url?eid=2-s2.0-84863993298&partnerID=tZOtx3y1
(accessed 25 January 2016).
Aral, S., Muchnik, L. and Sundararajan, A. (2009) ‘Distinguishing influence-based contagion from
homophily-driven diffusion in dynamic networks’, Proceedings of the National Academy of
Sciences of the United States of America, Vol. 106, No. 51, pp.21544–21549 [online]
http://www.scopus.com/inward/record.url?eid=2-s2.0-76049088642&partnerID=tZOtx3y1
(accessed 25 January 2016).
Arias, M., Arratia, A. and Xuriguera, R. (2014) ‘Forecasting with Twitter data’, ACM
Trans. Intell. Syst. Technol., Vol. 5, No. 1, pp.8:1–8:24 [online] http://doi.acm.org/10.1145/
2542182.2542190 (accessed 16 April 2016).
Bentley, R.A., O’Brien, M.J. and Brock, W.A. (2014) ‘Mapping collective behavior in the big-data
era’, Behavioral and Brain Sciences, Vol. 37, No. 1, pp.63–76 [online] http://www.journals.
cambridge.org/abstract_S0140525X13001659 (accessed 25 January 2016).
Beulke, D. (2011) Big Data Impacts Data Management: The 5 Vs of Big Data [Blog post], Dave
Beulke Blog [online] http://davebeulke.com/big-data-impacts-data-management-the-five-vs-
of-big-data/ (accessed 4 January 2016).
Boucher Ferguson, R. (2013) ‘How eBay uses data and analytics to get closer to its (massive)
customer base’, MIT Sloan Management Review [online] http://sloanreview.mit.edu/article/
how-ebay-uses-data-and-analytics-to-get-closer-to-its-massive-customer-base/ (accessed 17
December 2015).
Boyd, D. and Crawford, K. (2012) ‘Critical questions for Big Data’, Information, Communication
& Society, Vol. 15, No. 5, pp.662–679 [online] http://www.tandfonline.com/doi/abs/10.
1080/1369118X.2012.678878 (accessed 10 December 2015).
Chen, H., Chiang, R.H.L. and Storey, V.C. (2012) ‘Business intelligence and analytics: from Big
Data to big impact’, MIS Quarterly, Vol. 36, No. 4, pp.1165–1188 [online] http://dl.acm.org/
citation.cfm?id=2481683&CFID=614785593&CFTOKEN=52416716 (accessed 17 May
2016).
Chintalapati, S.S. and Jyotsna, G. (2013) ‘Application of data mining techniques for financial
accounting fraud detection scheme’, International Journal of Advanced Research in Computer
Science and Software Engineering, Vol. 3, No. 11, pp.717–724.
Clifton, P. et al. (2010) ‘A comprehensive survey of data mining-based fraud detection research’,
International Conference on Intelligent Computation Technology and Automation, Vol. 1,
pp.50–53.
Davenport, T.H., Barth, P. and Bean, R. (2012) ‘How ‘Big Data’ is different’, MIT Sloan
Management Review, Vol. 54, No. 1, pp.22–24.
346 V. Sharma et al.
Dijcks, J. (2012) Oracle: Big Data for the Enterprise, Oracle White Paper, June, p.16 [online]
http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Oracle+:+Big+Data+for
+the+Enterprise#0 (accessed 12 December 2015).
Doan, A., Ramakrishnan, R. and Halevy, A.Y. (2011) ‘Crowdsourcing systems on the World-Wide
Web’, Communications of the ACM, Vol. 54, No. 4, p.86.
Ferguson, G., Mathur, S. and Shah, B. (2005) ‘Evolving from information to insight information to
insight’, MIT Sloan Management Review, Vol. 46, No. 2, pp.51–57.
Gandomi, A. and Haider, M. (2015) ‘Beyond the hype: Big Data concepts, methods, and analytics’,
International Journal of Information Management, Vol. 35, No. 2, pp.137–144 [online]
http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007 (accessed 17 May 2016).
Glaser, B.G. and Strauss, A.L. (1967) The Discovery of Grounded Theory: Strategies for
Qualitative Research [online] http://www.amazon.com/dp/0202302601 (accessed 13 February
2016).
Golder, S.A. and Macy, M.W. (2011) ‘Diurnal and seasonal mood vary with work, sleep, and day
length across diverse cultures’, Science, Vol. 333, No. 6051, pp.1878–1881, New York (NY)
[online] http://www.scopus.com/inward/record.url?eid=2-s2.0-80053345545&partnerID=
tZOtx3y1 (accessed 17 January 2016).
Gray, G.L. and Debreceny, R.S. (2014) ‘A taxonomy to guide research on the application of data
mining to fraud detection in financial statement audits’, International Journal of Accounting
Information Systems, Vol. 15, No. 4, pp.357–380 [online] http://dx.doi.org/10.1016/
j.accinf.2014.05.006 (accessed 12 May 2016).
Gupta, P.K. and Gupta, S. (2015) ‘Corporate frauds in India – perceptions and emerging issues’,
Journal of Financial Crime, Vol. 22, No. 1, pp.79–103 [online] http://www.emeraldinsight
.com/doi/abs/10.1108/JFC-07-2013-0045 (accessed 11 December 2015).
Hassink, H., Meuwissen, R. and Bollen, L. (2010) ‘Fraud detection, redress and reporting by
auditors’, Managerial Auditing Journal, Vol. 25, No. 9, pp.861–881.
Holzinger, A. et al. (2013) ‘Human-computer interaction and knowledge discovery in complex,
unstructured’, in Holzinger, A. and Pasi, G. (Eds.): Big Data, Springer, Berlin Heidelberg
[online] http://link.springer.com/10.1007/978-3-642-39146-0 (accessed 10 December 2015].
IBM (2013) Infographic: The Four V’s of Big Data | The Big Data Hub [online] http://www.
ibmbigdatahub.com/infographic/four-vs-big-data (accessed 10 December 2015).
Insurance, T. et al. (2013) Fraud Detection – The Unstructured Data Goldmine, Blakehead Limited
[online] http://www.blakehead.co.uk/user_uploads/fraud detection – the unstructured data
goldmine.pdf (accessed 9 December 2015).
Jacobs, A. (2009) ‘The pathologies of Big Data’, Communications of the ACM, Vol. 52, No. 8, p.36
[online] http://portal.acm.org/citation.cfm?doid=1536616.1536632 (accessed 15 December
2015).
Jayalakshmi, B. and Pramod, V.R. (2015) ‘Total interpretive structural modeling (TISM) of the
enablers of a flexible control system for industry’, Global Journal of Flexible Systems
Management, Vol. 16, No. 1, pp.63–85 [online] http://dx.doi.org/10.1007/s40171-014-0080-y
(accessed 4 April 2016).
Jun Lee, S. and Siau, K. (2014) ‘A review of data mining techniques’, Industrial Management &
Data Systems, Vol. 101, No. 1, pp.41–46.
Kern, E. (2012) Facebook is Collecting your Data – 500 Terabytes a Day, Gagomo Research,
Gigaom [online] https://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-
terabytes-a-day/ (accessed 14 December 2015).
Kwon, O., Lee, N. and Shin, B. (2014) ‘Data quality management, data usage experience and
acquisition intention of Big Data analytics’, International Journal of Information
Management, Vol. 34, No. 3, pp.387–394 [online] http://dx.doi.org/10.1016/j.ijinfomgt.
2014.02.002 (accessed 17 May 2016).
Importance of Big Data in financial fraud detection 347
Laney, D. (2001) ‘3D Data management: controlling data volume, velocity, and variety’,
Application Delivery Strategies, Vol. 949, p.4 [online] https://blogs.gartner.com/doug-
laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-
Variety.pdf (accessed 2 February 2016).
LaValle, S. et al. (2011) ‘Big Data, analytics and the path from insights to value’, MIT Slogan
Management Review [online] http://sloanreview.mit.edu/article/big-data-analytics-and-the-
path-from-insights-to-value/ (accessed 9 April 2016).
Li, R. (2015) ‘Detection of financial reporting fraud based on clustering algorithm of automatic
gained parameter K value’, International Journal of Database Theory and Application, Vol. 8,
No. 1, pp.157–168 [online] http://www.sersc.org/journals/IJDTA/vol8_no1/17.pdf (accessed
14 May 2016).
Li, S. (2010) ‘Corporate financial fraud: an application of detection controlled estimation’, SSRN
Electronic Journal [online] http://papers.ssrn.com/abstract=1698038 (accessed 12 December
2015).
Manyika, J. et al. (2011) Big Data: The Next Frontier for Innovation, Competition, and
Productivity, June, p.156, McKinsey Global Institute [online] http://www.mckinsey.com/~/
media/McKinsey/Business Functions/Business Technology/Our Insights/Big data The next
frontier for innovation/MGI_big_data_full_report.ashx (accessed 10 December 2015).
McAfee, A. and Brynjolfsson, E. (2012) ‘Big Data. The management revolution’, Harvard
Business Review, Vol. 90, No. 10, pp.61–68 [online] http://www.buyukverienstitusu.com/
s/1870/i/Big_Data_2.pdf (accessed 10 December 2015).
Ngai, E.W.T. et al. (2011) ‘The application of data mining techniques in financial fraud detection: a
classification framework and an academic review of literature’, Decision Support Systems,
Vol. 50, No. 3, pp.559–569 [online] http://dx.doi.org/10.1016/j.dss.2010.08.006 (accessed 12
May 2016).
O’Reilly, T. (2005) ‘What is web 2.0: design patterns and business models for the next generation
of software’, Social Science Research Network Working Paper Series, Vol. 2007, No. 65,
pp.17–37 [online] http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-
20.html (accessed 14 December 2015).
Opresnik, D. and Taisch, M. (2015) ‘The value of Big Data in servitization’, International Journal
of Production Economics, Vol. 165, pp.174–184 [online] http://linkinghub.elsevier.com/
retrieve/pii/S0925527314004307 (accessed 10 December 2015).
Pang, B. and Lee, L. (2008) ‘Opinion mining and sentiment analysis’, Foundations and Trends in
Information Retrieval, Vol. 2, Nos. 1–2, pp.1–135 [online] http://www.nowpublishers.com/
article/Details/INR-011 (accessed 14 May 2016).
Provost, F. and Fawcett, T. (2013) ‘Data science and its relationship to Big Data and data-driven
decision making’, Big Data, Vol. 1, No. 1, pp.51–59 [online] http://online.liebertpub.com/
doi/abs/10.1089/big.2013.1508 (accessed 10 December 2015).
Rand, D.G. (2012) The promise of mechanical turk: how online labor markets can help theorists
run behavioral experiments’, Journal of Theoretical Biology, Vol. 299, pp.172–179 [online]
http://www.scopus.com/inward/record.url?eid=2-s2.0-84857918066&partnerID=tZOtx3y1
(accessed 21 January 2016).
Rezaee, Z. (2005) ‘Causes, consequences, and deterence of financial statement fraud’, Critical
Perspectives on Accounting, Vol. 16, No. 3, pp.277–298 [online] http://www.sciencedirect
.com/science/article/pii/S1045235403000728 (accessed 18 November 2015).
Rijmenam, M. van (2015) Datafloq – Why the 3V’s are Not Sufficient to Describe Big Data
[online] https://datafloq.com/read/3vs-sufficient-describe-big-data/166 (accessed 16 December
2015).
Sela, A. and Berger, J. (2012) ‘Decision quicksand: how trivial choices suck us in’, Journal of
Consumer Research, Vol. 39, No. 2, pp.360–370 [online] http://www.scopus.com/inward/
record.url?eid=2-s2.0-84864011467&partnerID=tZOtx3y1 (accessed 25 January 2016).
348 V. Sharma et al.
Sensmeier, L. (2013) How Big Data is Revolutionizing Fraud Detection in Financial Services –
Hortonworks [online] http://hortonworks.com/blog/how-big-data-is-revolutionizing-fraud-
detection-in-financial-services/ (accessed 10 December 2015).
Twenge, J.M., Campbell, W.K. and Gentile, B. (2012) ‘Increases in individualistic words and
phrases in American books, 1960–2008’, PloS One, Vol. 7, No. 7, p.e40181 [online]
http://www.scopus.com/inward/record.url?eid=2-s2.0-84863705253&partnerID=tZOtx3y1
(accessed 25 January 2016).
Wadhwa, L. and Pal, V. (2012) ‘Forensic accounting and fraud examination in India’, International
Journal of Applied Engineering Research, Vol. 7, No. 11 Suppl., pp.2006–2009.
Wang, G. et al. (2016) ‘Big Data analytics in logistics and supply chain management: certain
investigations for research and applications’, International Journal of Production Economics,
Vol. 176, pp.98–110 [online] http://dx.doi.org/10.1016/j.ijpe.2016.03.014 (accessed 12 May
2016).
Wong, S. and Venkatraman, S. (2015) ‘Financial accounting fraud detection using business
intelligence’, Asian Economic and Financial Review, Vol. 5, No. 11, pp.1187–1207 [online]
http://www.pakinsight.com/archive/3/11-2015/11 (accessed 14 May 2016).
Zhong, R.Y. et al. (2015) ‘A Big Data approach for logistics trajectory discovery from
RFID-enabled production data’, International Journal of Production Economics,
Vol. 165, pp.260–272 [online] http://linkinghub.elsevier.com/retrieve/pii/S0925527315000481
(accessed 10 December 2015).
Zhu, H. and Madnick, S.E. (2009) ‘Finding new uses for information’, MIT Sloan Management
Review, Vol. 50, No. 4, pp.17–21 [online] http://sloanreview.mit.edu/article/finding-new-uses-
for-information/ (accessed 12 December 2015).