Professional Documents
Culture Documents
Prayag Report
Prayag Report
MACHINE LEARNING
A Major Project Report.
Submitted in partial fulfillment of the requirement for the award of Degree of
Bachelor of Engineering with specialization in Electronics and
Communication Engineering
Submitted to
Submitted To:
Asst. Prof. JYOTI PIPARIYA
(Department of ECE)
1
ACROPOLIS INSTITUTE OF TECHNOLOGY & RESEARCH,
INDORE (M.P.)
2023-2024
CERTIFICATE
Date:
2
ACKNOWLEDGEMENT
This dissertation would not have been possible without the guidance and the help of several individuals, who in
one way or the other, contributed and extended their valuable assistance in the preparation and completion of this
study.
Primarily, I would like to express my deep respect and gratitude towards my guide, Mr. Upendra Tiwari, who
not only guided the academic project work, but also stood, as a teacher and philosopher in realizing the
imagination in pragmatic way. I want to thank him for introducing me to the field of research with his abundant
knowledge and valuable guidance.
I wish to express my profound sense of gratitude to Dr. UBS Chandrawat (Head, Department of Electronics and
Communication Engineering). His presence and optimism have provided an invaluable influence on my work and
outlook. I consider it as my good fortune to have an opportunity to work under the guidance of such a wonderful
and great person.
I would also like to take this opportunity to acknowledge Dr. S.C. Sharma (Director) and other faculty members
and staff of the institute for extending all help in conducting the dissertation work directly or indirectly. They
have been a reliable source of inspiration to me, and I thank them sincerely. I would like to acknowledge my
institute, Acropolis Institute of Technology & Research, Indore, for providing all the necessary facilities to
complete my thesis work.
I am especially indebted to my parents for their love, sacrifice, and support. They are my teachers after I came to
this world and have set notable examples for me about how to live, study and work. It is all because of the values
and ethics taught by them, which made me capable of doing my work with great honesty and responsibility.
At last, I would like to thank God, since He has provided me with all his blessings, without which this work
would have been an impossible task to go with.
Prayag Parashar
(0827EC201018)
3
TABLE OF CONTENTS
❖ Acknowledgement
1. Introduction
➔ Introduction to Auto Insurance Fraud
➔ Introduction to Machine Learning
2. Data Pre-Processing in Machine Learning
➔ Why do we need Data Pre-Processing?
3. EDA (Exploratory Data Analysis)
➔ EDA Tools
➔ Types of EDA
4. Machine Learning Algorithms
➔ Linear Regression
➔ Logistic Regression
➔ Decision Tree
➔ Support Vector Machine (SVM)
➔ kNN Algorithm
➔ K-Means Algorithm
5. Clustering
6. Model Building
7. Existing System for Fraud Detection
➔ Private Investigators
➔ Analytics Tools
8. Technology Implemented
➔ Python - The New Generation Language
➔ Why Python is Perfect Language for Machine Learning?
➔ Technology Behind Fraud Detection
9. System Design
10.Implementation
4
10. Project Report
➔ Dataset Description
➔ Result
➔Project Output after
Deployment 12. Advantages of
Machine Learning 13. Disadvantages
of Machine Learning
14. Conclusion
15.References
5
INTRODUCTION
Insurance fraud is any act committed to defraud an insurance process. It occurs when a
claimant attempts to obtain some benefit or advantage they are not entitled to, or when an
insurer knowingly denies some benefit that is due. False insurance claims are insurance
claims filed with the fraudulent intention towards an insurance provider.
Insurance fraud has existed since the beginning of insurance as a commercial enterprise.
Fraudulent claims account for a significant portion of all claims received by insurers, and
cost billions of dollars annually. Types of insurance fraud are diverse and occur in all areas
of insurance. Insurance crimes also range in severity, from slightly exaggerating claims to
deliberately causing accidents or damage. Fraudulent activities affect the lives of innocent
people, both directly through accidental or intentional injury or damage, and indirectly
by the crimes leading to higher insurance premiums. Insurance fraud poses a significant
problem, and governments and other organizations try to deter such activity.
organized criminals who steal large sums through fraudulent business activities,
ordinary people who want to cover their deductible or view filing a claim as an
Auto insurance fraud ranges from misrepresenting facts on insurance applications and
inflating insurance claims to staging accidents and submitting claim forms for injuries or
damage that never occurred, to false reports of stolen vehicles.
There are two main categories for auto insurance fraud—hard insurance fraud and soft
insurance fraud.
Hard insurance fraud involves creating a situation that allows you to make a claim on an
auto insurance policy. A common example for this type of fraud is staging a auto wreck with
the goal of benefitting from the resulting claim.
6
Soft insurance fraud involves exaggerating or flat out fabricating the damages or expenses
involved in an auto insurance policy or claim. A common example for this type of fraud is a
mechanic making unnecessary repairs to your auto with the goal of charging the insurance
company a higher bill.
Soft insurance fraud is the more common form because it’s easier to commit and harder to
spot. Hard insurance fraud is usually more costly to the insurance provider and comes with
higher penalties.
Auto fraud is one of the largest and most well-known problems that insurers face.
Fraudulent claims can be highly expensive for each insurer. Therefore, it is important to
know which claims are correct and which are not. Ideally, an insurance agent would have
the capacity to investigate each case and conclude whether it is genuine or not. However,
this process is not only time consuming, but costly. Sourcing and funding the skilled labor
required to review each of the thousands of claims that are filed a day is simply unfeasible.
This is where machine learning comes in to save the day. Once the proper data is fed to the
system it'll be very easy to find out if the claim is genuine or not.
Machine Learning is said as a subset of Artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. Machine learning is a method of data analysis that automates
analytical model building. It is a branch of artificial intelligence based on the idea that
systems can learn from data, identify patterns, and make decisions with minimal human
intervention.
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, to look for patterns in data and make better decisions in the future
based on the examples that we provide. The primary aim is to allow the computers learn
automatically without human intervention or assistance and adjust actions accordingly.
7
Fig 1.1 Block Diagram for Machine learning Steps
Supervised machine learning algorithms can apply what has been learned in the past to
new data using labeled examples to predict future events. Starting from the analysis of a
known training dataset, the learning algorithm produces an inferred function to make
predictions about the output values. The system can provide targets for any new input
after sufficient training. The learning algorithm can also compare its output with the
correct, intended output and find errors to modify the model accordingly.
Types of Supervised Learning:
● Classification: It is a Supervised Learning task where output is having defined labels
(discrete value). Example: Gmail classifies mails in more than one classes like social,
promotions, updates, forum.
● Regression: It is a Supervised Learning task where output is having continuous
value. Example of Supervised Learning Algorithms:
● Linear Regression
● Nearest Neighbor
● Decision Trees
● Support Vector Machine (SVM)
● Random Forest
Unsupervised machine learning algorithms are used when the information used to
train is neither classified nor labeled. Unsupervised learning studies how systems can
infer a function to describe a hidden structure from unlabeled data. The system doesn’t
figure out the right output, but it explores the data and can draw inferences from
datasets.
8
to describe hidden structures from unlabeled data.
Unsupervised learning is classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that buy X also tend to
buy Y. Types of Unsupervised Learning: -
● Clustering
● Exclusive (partitioning)
● Agglomerative
● overlapping
● Clusterin
g Types: -
● Hierarchical clustering
● K-means clustering
● Principal Component Analysis
● Singular Value Decomposition
● Independent Component Analysis
9
● So, semi-supervised learning allows for the algorithm to learn from a small
amount of labeled text documents while still classifying a large amount of
unlabeled text documents in the training data.
How semi-supervised learning works
● The way that semi-supervised learning manages to train the model with less
labeled training data than supervised learning is by using pseudo labeling. This
can combine many neural network models and training methods. Here’s how it
works:
● Train the model with the small amount of labeled training data just like you
would in supervised learning, until it gives you good results.
● Then use it with the unlabeled training dataset to predict the outputs, which are
pseudo labels since they may not be quite accurate.
● Link the labels from the labeled training data with the pseudo labels created in
the previous step.
● Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
● Then, train the model the same way as you did with the labeled set in the
beginning to decrease the error and improve the model’s accuracy.
Reinforcement machine learning algorithms is a learning method that interacts with its
environment by producing actions and discovers errors or rewards. Trial and error search
and delayed reward are the most relevant characteristics of reinforcement learning. This
method allows machines and software agents to automatically determine the ideal
behavior within a specific context in order to maximize its performance. Simple reward
feedback is required for the agent to learn which action is best; this is known as the
reinforcement signal.
10
● Sustain Change for a long period of time.
● Disadvantages of reinforcement learning:
● Too much Reinforcement can lead to overload of states which
can diminish the results.
Negative–
Negative Reinforcement is defined as strengthening of a behavior because a negative
condition is stopped or avoided.
● Advantages of reinforcement learning:
● Increases Behavior
● Provide defiance to minimum standard of performance.
● Disadvantages of reinforcement learning:
● It Only provides enough to meet up the minimum behavior.
11
DATA PREPROCESSING IN MACHINE LEARNING
Data preprocessing is a process of preparing the raw data and making it suitable for a machine
learning model.
It is the first and crucial step while creating a machine learning model.
When creating a machine learning project, it is not always a case that we come across the clean
Data imputations. Most ML frameworks include methods and APIs for balancing or filling in.
missing data. Techniques generally include imputing missing values with standard
deviation, mean, median and k-nearest neighbors (k-NN) of the data in the given field.
observations/samples with methods like repetition, bootstrapping, or Synthetic Minority Over- Sampling
Technique (SMOTE), and then adding them to the under-represented classes.
Data normalization. The size of a dataset affects the memory and processing required for
iterations during training. Normalization reduces the size by reducing the order and magnitude
of data.
And while doing any operation with data, it is mandatory to clean it and put in a formatted way.
13
EDA (Exploratory Data Analysis)
Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets
and summarize their main characteristics, often employing data visualization methods.
It helps determine how best to manipulate data sources to get the answers you need, making it.
easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis.
testing task and provides a provides a better understanding of data set variables and the relationships
between them.
It can also help determine if the statistical techniques you are considering for data analysis are.
● Specific statistical functions and techniques you can perform with EDA tools include:
● Clustering and dimension reduction techniques, which help create graphical
displays of high- dimensional data containing many variables.
● Univariate visualization of each field in the raw dataset, with summary statistics.
● Bivariate visualizations and summary statistics that allow you to assess the relationship
between each variable in the dataset and the target variable you’re looking at.
● Multivariate visualizations, for mapping and understanding interactions between different
fields in the data.
● K-means Clustering is a clustering method in unsupervised learning where data points are
assigned into K groups, i.e., the number of clusters, based on the distance from each group’s
centroid. The data points closest to a particular centroid will be clustered under the same
category. K-means Clustering is commonly used in market segmentation, pattern
recognition, and image compression.
● Predictive models, such as linear regression, use statistics and data to predict outcomes.
14
Types of exploratory data:
● Univariate non-graphical. This is simplest form of data analysis, where the data being
analyzed consists of just one variable. Since it’s a single variable, it doesn’t deal with causes
or relationships. The main purpose of univariate analysis is to describe the data and find
patterns that exist within it.
● Univariate graphical. Non-graphical methods don’t provide a full picture of the data.
Graphical methods are therefore required. Common types of univariate graphics include:
● Stem-and-leaf plots, which show all data values and the shape of the distribution.
● Histograms, a bar plot in which each bar represents the frequency (count) or proportion
(count/total count) of cases for a range of values.
● Box plots, which graphically depict the five-number summary of minimum, first quartile,
median, third quartile, and maximum.
● Multivariate nongraphical: Multivariate data arises from more than one variable.
Multivariate non-graphical EDA techniques generally show the relationship between two or
more variables of the data through cross-tabulation or statistics.
● Multivariate graphical: Multivariate data uses graphics to display relationships between
two or more sets of data. The most used graphic is a grouped bar plot or bar chart with each
group representing one level of one of the variables and each bar within a group representing
the levels of the other variable.
● Other common types of multivariate graphics include:
● Scatter plot, which is used to plot data points on a horizontal and a vertical axis
to show how much one variable is affected by another.
● Multivariate chart, which is a graphical representation of the relationships
between factors and a response.
● Run chart, which is a line graph of data plotted over time.
● Bubble chart, which is a data visualization that displays multiple circles (bubbles)
in a two- dimensional plot.
● Heat map, which is a graphical representation of data where values are depicted by color.
15
Machine Learning Algorithms
There are many types of Machine Learning Algorithms specific to different use cases. As we work with
datasets, a machine learning algorithm works in two stages. We usually split the data around 20%-80%
between testing and training stages. Under supervised learning, we split a dataset into a training data and test
data in Python ML. Followings are the Algorithms of Python Machine Learning -
1. Linear Regression-
Linear regression is one of the supervised Machine learning algorithms in Python that observes continuous
features and predicts an outcome. Depending on whether it runs on a single variable or on many features, we
can call it simple linear regression or multiple linear regression.
This is one of the most popular Python ML algorithms and often under-appreciated. It assigns optimal
weights to variables to create a line ax+b to predict the output. We often use linear regression to
estimate real values like a number of calls and costs of houses based on continuous variables. The regression
line is the best line that fits Y=a*X+b to denote a relationship between independent and dependent variables.
2. Logistic Regression -
Logistic regression is a supervised classification is unique Machine Learning algorithms in Python that finds its use in
estimating discrete values like 0/1, yes/no, and true/false. This is based on a given set of independent variables. We
use a logistic function to predict the probability of an event, and this gives us an output between 0 and 1. Although it
says ‘regression’, this is actually a classification algorithm. Logistic regression fits data into a logit function and is
also called logit regression.
16
3. Decision Tree -
A decision tree falls under supervised Machine Learning Algorithms in Python and comes of use for both
classification and regression- although mostly for classification. This model takes an instance, traverses the
tree, and compares important features with a determined conditional statement. Whether it descends to the
left child branch, or the right depends on the result. Usually, more important features are closer to the root.
Decision Tree, a Machine Learning algorithm in Python can work on both categorical and continuous
dependent variables. Here, we split a population into two or more homogeneous sets. Tree models where the
target variable can take a discrete set of values are called classification trees; in these tree structures, leaves
represent class labels and branches represent conjunctions of features that lead to those class labels.
Decision trees where the target variable can take continuous values (typically real numbers) are called
regression trees.
18
5. kNN Algorithm -
This is a Python Machine Learning algorithm for classification and regression- mostly for classification.
This is a supervised learning algorithm that considers different centroids and uses a usually Euclidean
function to compare distance. Then, it analyzes the results and classifies each point to the group to optimize
it to place with all closest points to it. It classifies new cases using a majority vote of k of its neighbors. The
case it assigns to a class is the one most common among its K nearest neighbors. For this, it uses a distance
function. k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. k-NN is a special case of a
variable- bandwidth, kernel density "balloon" estimator with a uniform kernel.
6. K-Means Algorithm -
k-Means is an unsupervised algorithm that solves the problem of clustering. It classifies data using a number
of clusters. The data points inside a class are homogeneous and heterogeneous to peer groups. k-means
clustering is a method of vector quantization, originally from signal processing, that is popular for cluster
analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each
observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. k-means
clustering is rather easy to apply to even large data sets, particularly when using heuristics such as Lloyd's
algorithm. It often is used as a preprocessing step for other algorithms, for example to find a starting
configuration. The problem is computationally difficult (NP-hard). k-means originates from signal
processing, and still finds use in this domain. In cluster analysis, the k-means algorithm can be used to
partition the input data set into k partitions (clusters). k-means clustering has been used as a feature learning
(or dictionary learning) step, in either (semi-)supervised learning or unsupervised learning.
19
7. Random Forest -
A random forest is an ensemble of decision trees. To classify every new object based on its attributes, trees
vote for class- each tree provides a classification. The classification with the most votes wins in the forest.
Random forests or random decision forests are an ensemble learning method for classification, regression
and other tasks that operates by constructing a multitude of decision trees at training time and outputting the
class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
20
CLUSTERING
For ex– The data points in the graph below clustered together can be classified into one single
group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below
picture.
Fig 1.8 Clusters of Data Points
21
DBSCAN: Density-based Spatial Clustering of Applications with Noise These data points
are clustered by using the basic concept that the data point lies within the given constraint from the
cluster center. Various distance methods and techniques are used for calculation of the outliers.
22
MODEL BUILDING
Building machine learning models that could generalize well on future data.
requires thoughtful consideration of the data at hand and of assumptions about various
available training algorithms.
Machine learning consists of algorithms that can automate analytical model building.
Using algorithms that iteratively learn from data, machine learning models facilitate.
computers to find hidden insights from Big Data without being explicitly programmed
where to look.
23
24
EXISTING SYSTEM
When we are talking about the current existing system of insurance fraud detection it usually
involves the insurance companies that have a separate team of people, analytics tools, policies and
private investigators to do background checks and find out if the claim made by the insured person
is fraud or not.
Sometimes, it’s obvious. For example, a claim for something the insured probably never owned
and can’t produce any documentation for, will be scrutinized. One of the front-line defenses
though is to analyze claims history. Insurance companies all share information about claims into a
central database, and both the insured and insurer generally have access to it. Was there a similar
claim shortly before you cancelled your last policy? Big red flag. Do you file aclaim every few
months? Big red flag. Do you have two policies covering the same thing and you’re trying to
claim on both? Yet another, big red flag. The insurance companies usually know more about our
claims history than we do.
POLICIES IN PLACE
Insurance companies usually share a list of things that make a claim suspicious. Few of the items
that are on the list are:
Claimant is in financial distress, worse off than they were when the policy
Altered documentation
Delayed reporting of a
claim.
These things create reasons to look deeper into the claim and investigate the authenticity of the claim.
25
PRIVATE INVESTIGATORS
Insurance companies sometimes hire private investigators when they need someone experienced
and capable of looking into someone that might seem suspicious to them. PIs are great for
gathering information the insured person and gather proof if any. They are cheaper to afford for
companies and they don’t bid publicity that the company might face if they were to start a police
investigation for every single suspicious claim.
ANALYTICS TOOLS
Many organizations nowadays provide a range of data analytics and pattern recognition tools that
help insurance companies detect an insurance fraud at a very early stage. It also helps them
recognize patterns between similar types of fraudulent claims and deny the claim at the earliest
stage possible so that loss to the company is minimum and they are also able to save time.
The insurance industry has developed sophisticated algorithms to detect fraud, as well as relying
on tried-and-true methods like raising questions if multiple claim checks are being sent to the
same-address.
26
Technology Implemented
Features
● Interpreted
In Python there is no separate compilation and execution steps like C/C++. It directly run the program
from the source code. Internally, Python converts the source code into an intermediate form called
bytecodes which is then translated into native language of specific computer to run it.
● Platform Independent
Python programs can be developed and executed on the multiple operating system platform. Python
can be used on Linux, Windows, Macintosh, Solaris and many more.
● Multi- Paradigm
Python is a multi-paradigm programming language. Object-oriented programming and structured
programming are fully supported, and many of its features support functional programming and aspect-
oriented programming .
● Simple
Python is a very simple language. It is a very easy to learn as it is closer to English language. In
python more emphasis is on the solution to the problem rather than the syntax.
● Rich Library Support
Python standard library is very vast. It can help to do various things involving regular expressions,
documentation generation, unit testing, threading, databases, web browsers, CGI, email, XML, HTML,
WAV files, cryptography, GUI and many more.
● Free and Open Source
Firstly, Python is freely available. Secondly, it is open-source. This means that its source code is
available to the public. We can download it, change it, use it, and distribute it. This is called FLOSS
(Free/Libre and Open Source Software). As the Python community, we’re all headed toward one goal-
an ever-bettering Python.
27
Why Python Is a Perfect Language for Machine Learning?
28
Flexibility-
Python for machine learning is a great choice, as this language is very flexible:
▪ Programmers can combine Python and other languages to reach their goals.
Community Support-
It’s always very helpful when there’s strong community support built around the programming
language. Python is an open-source language which means that there’s a bunch of resources open for
programmers starting from beginners and ending with pros. A lot of Python documentation is
available online as well as in Python communities and forums, where programmers and machine
learning developers discuss errors, solve problems, and help each other out. Python programming
language is absolutely free as is the variety of useful libraries and tools.
Growing Popularity-
As a result of the advantages discussed above, Python is becoming more and more popular among
data scientists. According to Stack Overflow, the popularity of Python is predicted to grow until
2020, at least. This means it’s easier to search for developers and replace team players if required.
Also, the cost of their work maybe not as high as when using a less popular programming language
29
THE TECHNOLOGY BEHIND FRAUD DETECTION
In the world of Data Science, there are a great number of other methodologies and algorithms that
accurately leverage large amounts of user data. Each of them has proven to effectively perform in
some particular scenarios and situations. Machine Learning experts divide them into two main
scenarios depending on the available dataset:
In this case, classic machine learning or statistics-based techniques are applied to detect fraudulent
attacks. This involves training a machine learning model or employing adequate algorithms to
estimate transaction legitimacy. We’ll go through the most commonly used algorithms below.
Scenario 2: The dataset has no (or just a very little number of) fraud examples.
In the case that none of any previous information on fraudulent transactions was stored, the
learning model is built based on examples of legitimate
transactions. Before jumping into the commonly used learning models applied to fraud
detection, it’s to say that the majority has the same purpose of usage and differs only by its
mathematical characteristics. Hence available data becomes a decisive factor when choosing
appropriate learning models rather than an algorithm itself.
● Random Forest or random decision forests. This algorithm ensembles decision trees and
accurately analyzes missing data, noise, outliers and errors. It is fast on train and score and,
as a consequence, has become one of the preferable among fraud detection professionals.
● Artificial Neural Networks (ANN). This system simulates the function of the brain to
perform tasks by learning from the past, extract rules and predict future activity based on the
current situation. It can predict whether the transaction is fraudulent or not by classifying an
input into predefined groups.
● Support Vector Machines (SVMs). It’s an excellent prediction tool that can resolve a wide
range of learning problems, such as handwritten digit recognition, classification of web
pages and face detection. This method is capable of detecting fraudulent activity at the time
of transaction.
30
● K-Nearest Neighbors (KNN). Also known as the “lazy learning” algorithm due to its
simplicity: instead of making calculations once the data is introduced, it just stores it for
further classification. The KNN algorithm rests on feature similarity and its proximity. When
the nearest neighbor is fraudulent, the transaction is classified as fraudulent and when the
nearest neighbor is legal, it is classified as legal.
● Logistic Regression is a prediction algorithm borrowed by machine learning from the fields
of statistics. It's widely used for credit card fraud detection and credit scoring.
Finally, the goal of artificial intelligence in the field of insurance fraud is to make it easier for
human agents to find and investigate fraudulent claims and transactions, rather than sifting through
tons of claims in an exhausting and time-consuming way.
31
SYSTEM DESIGN
Fig 1.3 System Design for our model
32
IMPLEMENTATION
33
Overview-
A dataset related to adult income is given. This project classifies whether a person will be able to earn more
than 50,000 or not.
Dataset Description-
The Insurance Fraud Detection dataset was extracted by GitHub. The data set consists of anonymous
information such as occupation, age, native country, race, capital gain, capital loss, education, work class
and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".
Dataset Source : https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection
Result-
Our project successfully classifies fraud detection on insurance claim with 80% Accuracy
34
35
Advantages of Machine Learning
Every coin has two faces, each face has its own property and features. It’s time to uncover the faces
of ML. A very powerful tool that holds the potential to revolutionize the way things work.
3. Continuous Improvement -
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them make
better decisions. Say we need to make a weather forecast model. As the amount of data, we have keeps
growing, our algorithms learn to make more accurate predictions faster.
5. Wide Applications -
We could be an e-seller or a healthcare provider and make ML work for us. Where it does apply, it holds the
capability to help deliver a much more personal experience to customers while also targeting the right
customers.
36
Disadvantages of Machine Learning
With all those advantages to its powerfulness and popularity, Machine Learning isn’t perfect.
The following factors serve to limit it:
1. Data Acquisition -
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of
good quality. There can also be times where they must wait for new data to be generated.
3. Interpretation of Results -
Another major challenge is the ability to accurately interpret results generated by the algorithms. We
must also carefully choose the algorithms for your purpose.
4. High error-susceptibility -
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with data
sets small enough to not be inclusive. You end up with biased predictions coming from a biased training set.
This leads to irrelevant advertisements being displayed to customers. In the case of ML, such blunders can
set off a chain of errors that can go undetected for long periods of time. And when they do get noticed, it
takes quite some time to recognize the source of the issue, and even longer to correct it.
37
Conclusion
In conclusion, the integration of machine learning into insurance fraud detection processes offers a
transformative leap in accuracy, efficiency, and cost-effectiveness. By harnessing the power of advanced
algorithms, insurers can proactively identify fraudulent activities in real-time, resulting in improved risk
management, cost savings, and enhanced customer experiences. As the landscape of insurance fraud
evolves, the adaptability of machine learning models becomes a critical asset in staying ahead of
sophisticated tactics. While not without challenges, the promising outcomes suggest that the synergy
between machine learning and traditional methods holds great potential for the ongoing fight against
insurance fraud.
38
References
➢ https://expertsystem.com/
➢ https://www.geeksforgeeks.org/
➢ https://www.wikipedia.org/
➢ https://www.coursera.org/learn/machine-learning
➢ https://machinelearningmastery.com/
➢ https://towardsdatascience.com/machine-learning/home
39