Medical Insurance Fraud Detection Based On Block Chain and Deep Learning Approach

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2022 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications

Medical Insurance Fraud Detection Based on Block


Chain and Deep Learning Approach
2022 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON) | 978-1-6654-6374-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/CENTCON56610.2022.10051253

Bijaya Kumar Sethi Debabrata Singh Prakash Kumar Sarangi


Department of Computer Science and Department of Computer Applications Department of CSE (AI & ML)
Engineering Siksha 'O' Anusandhan,(Deemed to be Vardhaman College of Engineering,
Siksha 'O' Anusandhan,(Deemed to be University), Bhubaneswar, India Hyderabad, India
University), Bhubaneswar, India debabratasingh@soa.ac.in. prakashsarangi89@gmail.com
sbijayakumar@gmail.com

Abstract-To control the medical expenses people are and the recompensing ratio is 80%, the recompensing
decided to do some insurance plans and the Health charge will be 160,000 rupees. In the Diagnosis Related
Insurance Department's duty of controlling medical Groups payment model, if the Diagnosis Related Groups
expenses has become increasingly vital. Traditional medical
insurance settlements are paid per-service, which results in a weighting rate for cancer is 7.4 and the payment per unit
lot of unnecessary costs. Now a day, the single-disease price is 15,000 rupees, the health insurance have to
payment mechanism has been frequently employed to compensate 87,800 rupees [2].
address this issue. However, there is a possibility of fraud After the single-disease payment method is established,
with single-disease payments. In this work, we have hospital income is hardly associated with every medical
presented a methodology for detecting the health insurance
case and detection of disease, and nothing can be done with
fraud entrenched block chain and deep learning techniques,
that can automatically recognize apprehensive medical a patient's real treatment costs. If the treatment expenses for
records to assure sustainable execution of single-disease a condition are higher than the group's average, hospitals
payment and reduce medical insurance worker’s workload. must reimburse the difference in order to better control
We also proposed a medical record storage and management medical insurance costs.
procedure based on consortium block chain to assure data It can make consistent uses of medical asset i.e., medical
security, immutability, traceability, and audit ability. The
institution consumption is corresponding to the number of
suggested system may effectively identify fraud and
considerably increase the efficiency of medical insurance patients admitted, disease complication, and service
evaluations, as demonstrated by experiments on two real strength. To summarize, the single-illness payment model
datasets from two hospitals. creates a defined health insurance payment criterion for
Keyword: Fraud Detection, Block Chain, Medical Insurance, every disease in order to reduce hospital expenditures by
CNN, RNN, Deep Learning, International Classification of preventing excessive medical behavior. This approach
Diseases ( ICD) ensures that medical services are of high quality while
remaining simple to utilize.
I. INTRODUCTION
Chief complaint describes the symptoms or conditions of a
patient, as well as the duration of difficulties, that the
The expeditious rise of medical information has physician records in the patient's clinical report. Chief
resulted in a massive amount of data beingaccumulated in
complaint is the primary object in the clinical report. This
hospital information systems, propelling the medical
should articulate the features of the initial disease
industry into big data creation. The medical field has
identification; its representation should be brief, polished,
benefited greatly from hospital big data, which has sparked
and correct, with not more than 30 characters in the Chinese
a lot of interest from the academia and industry. Medical
electronic medical record's writing requirements.
cost control is a major section of big data research in
medical science. Under the conventional insurance system
International Classification of Diseases (ICD) is a
in healthcare, expenses are reimbursed based on service
globally standardized illness categorization system that
items provided by hospitals that give rise to immoderate
uses a coding system to categorize diseases based on
medicinal treatment and increase treatment prices. To
morphology, embryology, physical environment, and
address the concerns with components based pricing
clinical substantiations. The World Health Organization
methods, Single illness Payment depending upon detection created it, and it's generally used in the area of health care
oriented groups which have been considerably
techniques for things like determining the reason of death,
investigated, deployed. Under single Disease payment
illness demography and summarizing, evaluating medical
method, each disease has a predetermined payment
feature, and medical compensation.
standard. Hospitalization costs are reimbursed by the social
healthcare insurance organization based on the disease's II. RELATED WORK
prescribed standard [1].
To get a better knowledge that differentiate a single In order to uncover inaccurate clinical reports, the feasibility
illness payment system and a product based payment of the illness identified must be checked depending on the
technique, consider the following example. As per the usual patients' primary symptoms. To do this, the goal of
health insurance settlement method, if a patient's detection is identifying treacherous medical records is transformed into
cancer and the health insurance prices are 200,000 rupees

978-1-6654-6374-4/22/$31.00 ©2022 IEEE 103


Authorized licensed use limited to: AMITY University. Downloaded on October 16,2023 at 09:43:25 UTC from IEEE Xplore. Restrictions apply.
a text classification task, where we forecast the likelihood of for storing and querying medical data that
every ICD 10 category based on a major complaint. After ensures tamper-proofed, tractability, and
that, every ICD 10 codes' projected probability shall be authenticity of medical information.
arranged in decreasing order. Medical record will be
regarded plausible if the ICD 10 code allocated to it is
among the top-k set of projected results. Otherwise, it will
be deemed a scam and will require human auditing. The k
value may be derived based on the circumstances, and a
higher value of k indicates that stringent auditing is less [4]
[5].

Fig 2: Medical system based on b l o c k chain

Through transparency and trust standards, block chain


technology manages the production, ingress, and utilization
of reliable data. A peer-to-peer network typically manages a
block chain, and the data saved on the block chain is un-
Fig 1: Model for anti-fraud Detection System
modifiable, remarkable, and tractable. Block chain is a
Above Fig-1 indicates, the general anti fraud detection comprehensive solution in terms of technical architecture,
encompassing distributed storage, block-chain data
system used in the medicals. In this paper section 3 structures, peer- to-peer networks, consensus algorithms,
discussed about proposed work and section 4 produces cryptography algorithms, game theory, smart contracts, and
experimental results and analysis. Finally, in the section 5 other data AI technologies [9].
we have concluded our proposed work and also mention Public, private, and consortium block chains are the three
some model for future study. basic types of block chain systems. A public chain is one in
which anybody, even anonymously, can participate. A
III. PROPOSED WORK private block chain is one that is managed and used only
within a firm [8].
In this paper our methodology moves in a pipe line way.
First we have collected data from different sources, in In the figure 2, we have suggested medical block chain
structure. Regulatory agencies, hospital insurance centers,
second step we have to store m e d i c a l records and
hospitals, and doctors are among the block chain's users. A
management using Block chain technology, and at last we doctor creates an electronic medical record [10]. On upload
will classify these dataset using Deep Learning techniques. the medical record to the block chain, the doctor must
digitally sign it and initiate a transaction. The medical
A. Storage of m e d i c a l records and management record will then be reviewed by the doctor's facility and
using Block chain technology digitally signed for block chain certification. The Medical
Insurance Centre analyses block chain data to assess the
Primary complaints and ICD are crucial proof in antifraud validity of disease diagnoses and audits questionable
investigations. We provide a block chain based framework medical records. To synchronies all information for
for medical data storage and administration in order to inspection and tracking the universal trafficking process,
provide strong tamper resistance, anti repudiation, regulatory bodies can connect to the block chain as a vertex
reliability, and incorruptibility protection. Our key or communicate with it via terminals [6][7].
contributions to the proposed framework are in the
following three areas:
• We have created a data storage approach which B. Classification using Deep Leaning Techniques
minimizes depot capacity and enhances the
performance of block chain, based on features The health insurance center's staff examines the block chain
of medical data. for relevant medical data and signatures during physical
• To improve block chain consensus efficiency, we audits, and then connects them to linked doctors and
have optimized the agreement process through hospitals. Doctors, hospitals, and medical insurance
splitting block chain vertices as agreement companies will sign the audit results, which will then be
vertices and application vertices. kept on the block chain. Medical insurance fraud behavior
• We h a v e created a smart contract-based system by doctors and hospitals will be penalized, and the frauds
104
Authorized licensed use limited to: AMITY University. Downloaded on October 16,2023 at 09:43:25 UTC from IEEE Xplore. Restrictions apply.
will be documented on their credit reports. Doctors and diagnostic codes entered by doctors at the time of
hospitals with bad credit may face further restrictions from admission. We simply used the initial 3 figures of every
the medical insurance center. Using these datasets, we have ICD code in the studies. The databases are sparse and
to classify into two part one side having fraud and another imbalanced since more of the IC codes are only
not fraud. These binary classifications are performed assigned to a less amount of primary complaints. As a
through Text CNN Character, Text CNN word, DPCNN's result, there is a need for data preprocessing. ICD 10
network, Text RNN, HAN, First Text, and LEAM [3][11]. codes with a numerical value of less than 500 have been
phased out.
A standard CNN based text classification method.
Text CNN uses numerous one dimensional convolution
kernels to study the characteristic presentations of
phrases, giving it a great capacity to extract shallow text
features. Due to its good outcome and quick training
fastness, it is universally employed in brief language
classification. DPCNN is a language classification
technique that uses major CNN layers to apprehend a
text's extended-distance protectorate and outperforms
several other methods. The RNN is a classic successive
description approach that excels at modeling text
sequences of varied lengths while capturing natural
language features. Researchers have a strong desire to
employ RNNs rather than Convolutional Neural
Networks in numerous Natural language processing
jobs. In this research, the Recurrent Neural Network
network contains of 2 layers of LTSM Networks. The
input alphabets are initially submerged, and then the Fig. 3: Precision, Recall, and F1 Score for First data set
submerging outputs are passed to 2 layers of LSTM.
The results from the LSTM layers are then input into a In the figure 3, X-axis conations different deep learning
fully layer which are fully connected. Finally, the Soft techniques, and Y-axis contains Precision, Recall, and F1
Max algorithm is used to calculate the ICD-10 code Score for different models. For the first dataset for the
expected possibilities. A hierarchical RNN containing models Text CNN Character, Text CNN word, DPCNN's
attention mechanisms is known as a Hierarchical network, Text RNN, HAN, First Text, and LEAM,
Attention Network (HAN). Before applying excerptor precision are 97.61, 97.72, 97.60, 97.28, 96.98, 96.71,
and observation to the tokens and phrases, the system 97.56, Recalls are
separates the text into a group of expressions. Because 95.71, 95.22, 95.59, 95.48, 96.91, 93.02,95.33, and F1
the sentences in our dataset are brief, they are not split.
Scores are 96.55, 96.12, 96.47, 96.27, 96.88,94.50,96.30
First Text is a word classification and respectively.
characteristics presentation research tool proposed by
Face book that makes word classification and
characteristics presentation research simple and
efficient. It produces similar outcomes as deep learning,
but at a much rapidly. LEAM is a new text classification
attention model. It combines the training of words and
labels in order to acquire their embedding in the same
vector space. It offers a faster convergence time and
requires less parameter than other deep learning
approaches.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

We have taken two hospital data sets form medicals


in Jiangsu Province, China (hospital-X and hospital-Y)
having number of patient’s data 198810 and 434319
respectively. Based on the analysis each hospital data
Fig 4: Precision, Recall, and F1 Score for second dataset
set contains only 8 diseases identified and using block
chain created a new dataset which patients are having
insurance or card or not. The implementations are made In the figure 4, X-axis conations different deep learning
using python3 programming language, with system techniques, and Y-axis contains Precision, Recall, and F1
specifications 8 GB RAM and i5 processor. The ICD Score for different models. For the first dataset for the
codes were taken from the main page of hospital models Text CNN Character, Text CNN word, DPCNN's
databases, while the major complaints of sick persons network, Text RNN, HAN, First Text, and LEAM,
were taken from admittance registers. Because expert precision are
coders have changed ICD codes in the main page of 96.60, 96.04, 96.16, 95.78, 96.05,96.52,95.31, Recalls are
hospital reports as per the whole medical information of 90.85, 90.38, 90.61, 90.53, 90.72, 86.44,89.92, and F1
the admitted patients, they are more accurate than the
105
Authorized licensed use limited to: AMITY University. Downloaded on October 16,2023 at 09:43:25 UTC from IEEE Xplore. Restrictions apply.
Scores are 93.28, 92.86, 93.03, 92.79, 93.06,89.85,92.59 functions.
respectively.
REFERENCES
Convolutional Neural Network based methods outperform
RNN based, with LEAM being the worst performer. [1]. Mathauer, Inke, and Friedrich Wittenbecher. "Hospital payment
DPCNN's network is larger, but it doesn't outperform Text systems based on diagnosis-related groups: experiences in low-and
middle-income countries." Bulletin of the World Health
CNN's, likely because the main complaint's text is short, and
Organization 91 (2013): 746-756A.
our assignment does not take use of its advantage of mining
[2]. Gao, Feng, et al. "Discussion on the implementation of single disease
remote relationship characteristics. The performance of payment." Chinese Medical Record English Edition 1.1 (2013): 8-10.
HAN is marginally better than that of regular RNN, but the [3]. Viloria, Amelec, et al. "Diabetes diagnostic prediction using vector
changes are minor, showing that the observation procedure support machines." Procedia Computer Science 170 (2020): 376-381.
provides unimportant benefit in the brief main complaint. [4]. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional
transformers for language understanding." arXiv preprint
V. CONCLUSION AND FUTURE WORK arXiv:1810.04805 (2018).
[5]. Gao, Weichao, William G. Hatcher, and Wei Yu. "A survey of
For detecting fraud in health insurance, this paper’s blockchain: Techniques, applications, and challenges." 2018 27th
primarily the block chain. It forecasts the possibility of an international conference on computer communication and networks
illness based on a sick person’s principal complaint and (ICCCN). IEEE, 2018.
calculates the validity of the ICD code entered in the [6]. Mohapatra, Srikanta Kumar, et al. "Text classification using NLP
health report using label and character representations. We based machine learning approach." AIP Conference Proceedings. Vol.
2463. No. 1. AIP Publishing LLC, 2022.
have presented a solution for storing and managing
[7]. kumar Mohapatra, Srikanta, et al. "Internet of Things: Security,
medical records that is depends on block chain
Challenges, Open Problems & Solutions." 2021 9th International
methodology, which ensures information security, Conference on Reliability, Infocom Technologies and Optimization
reliability, immutability, traceability, and authentication. (Trends and Future Directions)(ICRITO). IEEE, 2021.
The suggested method has the potential to reduce [8]. Sahu, Premananda, et al. "Detection and Classification of Encephalon
healthcare insurance brokers' burden while also enhancing Tumor Using Extreme Learning Machine Learning Algorithm Based
efficiency. According to trials on authentic datasets from on Deep Learning Method." Biologically Inspired Techniques in Many
two reputed hospitals, our approach may efficiently Criteria Decision Making. Springer, Singapore, 2022. 285-295.
execute antifraud for health insurance and has well [9]. Bamunif, Abdullah Omar Ali. "Sports Information & Discussion
interpretability. We have applied seven deep learning Forum Using Artificial Intelligence Techniques: A New
models for classification purpose. Convolutional Neural Approach." Turkish Journal of Computer and Mathematics Education
Network based methods outperform RNN based, with (TURCOMAT) 12.11 (2021): 2847-2854.
LEAM being the worst performer. DPCNN's network is [10]. Verma, Akhil. "Encryption and decryption of images based on
larger, but it doesn't outperform Text CNN's, likely steganography and cryptography algorithms: a new model." Turkish
Journal of Computer and Mathematics Education
because the main complaint's text is short, and our
(TURCOMAT) 12.11 (2021): 2839-2846.
assignment does not take use of its advantage of mining
[11]. Ngai, Eric WT, et al. "The application of data mining techniques in
remote relationship characteristics. The performance of financial fraud detection: A classification framework and an
HAN is marginally better than that of regular RNN, but academic review of literature." Decision support systems 50.3
the changes are minor, showing that the observation (2011): 559-569.
procedure provides unimportant benefit in the brief main
complaint. In future we have to enhance the same into
other deep learning models with different activation

106
Authorized licensed use limited to: AMITY University. Downloaded on October 16,2023 at 09:43:25 UTC from IEEE Xplore. Restrictions apply.

You might also like