Professional Documents
Culture Documents
Data Mining in Different Fields: A Study
Data Mining in Different Fields: A Study
Figure 1 illustrates how data is processed using data useful information from the data that surrounds us. Those
mining techniques. A collection of datasets can yield a who master this technology and its methods will benefit
variety of information that can be used for a number of greatly and acquire an advantage over competitors. This
different things. An exponential growth in data production cutting-edge data mining technique is presently used in a
and storage is observed. We'll examine how data mining, a variety of areas, including education, healthcare, social
current topic, might be applied in several fields to glean media, retail sales, and crime investigation.
Predictive Analysis have, and for a better outcome, this prediction process
Thirdly, it will be simpler to forecast and move on to needed that certain criterion be followed. Table 1 in [5]
the next phase once the correlation has been discovered. Various algorithms with the capacity for prediction have
Industries create predictions based on the information they been mentioned.
Take Sensible Action sector, including medical research, drugs, medical gear,
In decision support systems, data mining techniques hospital management, health insurance, and fraud detection
are used. Management needs knowledge to make decisions. and prevention. Benefits of data mining, which are
Data analysis can lead to issues when working with large significant for patients, healthcare organizations,
amounts of data, which is where knowledge extraction researchers, and insurers, are growing in the healthcare
comes in helpful. Data is evaluated automatically in order to sector. As a second opinion for a doctor's treatment choice,
make wise decisions since a data mining technique is predictive models based on data mining can be used. It
necessary. could help people with similar illnesses or health issues be
grouped together [7].
Effective Techniques
Industries involved in data mining must adhere to One of the most significant financial operations is the
certain strategies, and in order to have these strategies, data forecasting of the stock market, currency exchange rates,
mining requires the usage of data mining tools for analysis. bankruptcies, trade futures, financial risk management,
Finally, industries will be able to select from a variety of credit ratings, loan management, bank client profiling, and
strategies. money laundering studies. According to the author, data
mining for consumer profiling in other industries and some
III. OBJECTIVES of these jobs, such bank customer profiling, have a lot in
common. There has been a lot of investigation into the long-
Data mining has created a world of new opportunities term advantages of overcoming these challenges [8].
for businesses. By comparing millions of isolated pieces of
data, businesses may learn about and predict consumer Sentiment analysis models have received a lot of
behavior using this area of computational statistics. It aims attention from researchers who are interested in using social
to create new commercial opportunities. Data is transformed network data to draw statistical conclusions. Bo Pang and
into knowledge through data mining. Lilliam Lee presented a thorough talk on sentiment analysis.
They examined the proportion of motivational affirmations
Relations with Customers to total words to gauge their point of view [9].
Data mining techniques are the most popular and
effective approach for customer behaviour analysis. Social media platforms like Facebook, Twitter, and
Instagram are examples of places where data mining is used
Determine how well data mining tools complement the to analyze user activity and locate target customers using
conventional approach by first identifying a consumer's techniques like clustering, k-mean analysis, and the priority
behaviour, psychological state, and decision-making during algorithm. For every firm, it aids in the development of web
a transaction. For use in purchased data in a retailer's store marketing.
or organization, necessary consumer rules are extracted
using various association criteria. Afterward, the necessary The use of data mining tools has made it possible for
rule can be used in other industries, such as online retailers to operate in a dynamic and competitive market. As
commerce, medical, and so forth [7]. in order to develop a globalization and competitiveness increase, businesses are
relationship with the client. seeking more advertising and promotion. A bank or other
financial institution can utilize data mining to examine loan
Revenue Growth payments, customer credit policies, target marketing, and
Once the first two goals are verified, every look into money laundering and other financial crimes [10].
organization can instantly boost their revenue by modifying
these data mining approaches. We found that there is very little study on the use of
data mining technology in the retail industry to target
IV. ASSESSMENT customers for marketing efforts. Most shops struggle to
execute successful campaigns and acquire acceptable
Our attention was on various articles that took into customers.
account the industries of Education, Healthcare, Finance,
social media, Retail Marketing, Telecommunication, and Crime analysis is the process of systematically
Fraud Detection. We noticed that different data mining identifying and assessing patterns and trends in criminal
approaches, including clustering, decision trees, k-means, activity. In the areas of crime data mining and criminal
etc., are applied in different industries to examine data that investigation data mining, numerous researchers have
has been extracted from datasets in accordance with their conducted studies. To lessen, prevent, and enhance the
needs. In the article, the author describes Educational Data quality of corruption procedures, researchers are continually
Mining (EDM) as a study area aimed at examining the working. The authors used a data mining approach, a study
specific forms of data that arise in educational contexts in of crime frequency reports, and evidence analysis to explain
order to address educational research issues [6]. the causes of crime and criminal activities [11].
Data mining in healthcare is mostly used to foresee In a number of situations and ways, the telecoms
various illnesses and support physicians' decision-making. industry benefited from data mining tools. Given the
Data mining has several applications in the healthcare extensive customer data that telecom companies maintain
and the intense competition in the market, there is a strong V. DATA MINING - FIELDS
motivation to investigate consumer data for multiple
objectives [12]. We focused on eight industries where data mining is
used, and we looked at how these industries are able to grow
Fraud detection techniques are always being improved both their business and activities thanks to data mining.
to prevent fraudsters from changing their tactics. The According to their needs, the eight industries of education,
creation of current fraud detection techniques has gotten healthcare, finance, social media, retail marketing, crime
harder because of the considerable restrictions on the analysis, telemarketing, and fraud detection use data mining
sharing of ideas in that field [13]. tools and methodologies. The following is a succinct
description.
Overall, it is clear that data mining has a significant
impact on practically every well-known industry. Industries Education
employ data mining techniques including anomaly The term "educational data mining" (EDM) refers to
detection, clustering, classification, discriminant analysis, the use of data mining techniques to educational data, such
decision trees, etc. to evaluate data for forecasting, forming as student information, academic records, exam results,
appropriate strategies, and maximizing commercial profit. academic achievement predictions, classroom engagement,
and the frequency of student questioning. In order to help
with the outcome, educational data mining uses a number of
optimization techniques to increase educational attainment.
EDM is a process that transforms unprocessed educational
system data into crucial information that can be used to
make statistical judgments.
Fig 3 Illustrates how Students' Learning can be Determined using Data Mining [14]
transaction analysis, bankruptcy forecasting, and other police agency and victims are the main beneficiaries in this
financial services heavily rely on data mining. The three field [11].
financial statements that financial analysts look at are the
cash balance, the statement of cash flow, and retained To extract from a big amount of data the information
earnings. the investigator or police need is the core aim of crime
analysis. Criminal analysis is a challenging endeavor since a
Data mining has a significant positive impact on the variety of factors, including emotional instability, fury,
financial sector since it allows large shareholders to increase envy, retaliation, identity crisis, ego, gain, and harm to the
their profits by strategically exchanging assets at the right reputation or property of others, may have contributed to the
periods. The finance sector uses classification, clustering, k- commission of any crime, shortcuts to financial success, etc.
means, and HC algorithms the most [15]. So, in addition to reports, knowledge of outside
circumstances is necessary to understand a crime. The best
Social Media algorithms must be taken into account when a computer is
Social media mining entails examining and gathering required to carry out this kind of intelligent task. Crime
unprocessed data from these channels. Data mining gathers analysis is difficult in general because it necessitates
or extracts user behavior, feelings, wishes, mindset, collecting and reporting vast amounts of data [18].
personality, needs, likes-dislikes, etc. patterns, or
correlations, from social media platforms like Facebook, Telecommunication
Instagram, Twitter, TikTok, LinkedIn, YouTube, etc. By Data produced and stored by the telecoms sector is
examining user age, gender, job title, geography, frequently enormous. These data comprise customer data, which
visited websites, content sharing, search history, comments, represents telecommunication customers, network data,
likes, and clicks from social media networks, raw social which describes the condition of the hardware and software
media data can offer applicable services or components in the network, and call detail data, which
recommendations. Trend analysis, social spam detection, e- details the calls that travel over telecommunication networks
commerce, digital media, research, and other areas all [12].
employ social media mining.
The telecommunications industry has quickly
Most commonly employed in social media mining are expanded from providing unique and long-distance phone
decision trees, naive bayes classifiers, K-nearest neighbors, services to supplying a wide range of other data services.
and clustering [16]. AdaBoost Artificial Neural Network Many different types of communication and computation,
(ANN), Apriori, Bayesian Networks (BN), Decision Trees including telecommunications, computer networks, the
(DT), Density Based Algorithm (DBA), Fuzzy, Genetic Internet, and others, are being merged. A call detail record is
Algorithm (GA), Hierarchical Clustering (HC), Linear created [15] to keep the call information.
Discriminant Analysis (LDA), Linear-Regression (Lin-R),
Logistic Regression (LR), Markov, Maximum Entropy Telecommunications data is fundamentally
(ME), Novel, Support Vector Machine (SVM), and Wrappe multidimensional and includes elements like calling time,
[17]. frequency, caller location, and call mode. A mixture of
information from such data can be used to identify and
Retail Marketing compare data traffic, system load, resource management,
Worldwide, a lot of businesses try to remake and consumer group action, and earnings.
rebuild themselves. They also benefit greatly from data
mining. Retail data mining in the retail sector identifies Fraud Detection
customer behavior, shopping patterns and trends, customer The goal of fraud detection is to stop people from
acquisition and retention, customer buying preferences, obtaining money or other items under false pretences. High-
delivery performance, reduces business costs, conducts level skill is required for fraud detection. The most popular
shopping cart analysis, risk management, fraud detection, AI methods for fraud detection use data mining to
and improves customer service for retailers' prediction and categorize, cluster, and segment data and automatically find
proactive decision-making from a large amount of sales linkages and rules in the data that may imply interesting
record and shopping history. patterns, including those associated to fraud.
Fig 4 We can use Data Mining to Lower this Rate because some Popular Types of Fraud Increased Across Industries
Globally from 2020 to 2021 [13]
Datawatch
KNIME
Alteryx
Orange
K-core
KEEL
Solver
Board
Spark
Weka
SPSS
Fields
Education ✔ ✔ ✔ ✔ ✔ ✔
Healthcare ✔ ✔ ✔
Finance ✔ ✔ ✔
Social media ✔ ✔ ✔
Retail Marketing ✔ ✔ ✔ ✔
Crime Analysis ✔ ✔
Telecommunication ✔ ✔
Fraud Detection ✔ ✔ ✔
Data Mining Tools are Described behavior of the algorithms, KEEL offers a straightforward
data stream GUI.
R
Programming tool R is available for free. Data Solver
cleansing, analysis, and graphing are all possible with R's Solver data mining is a group of tools that displays
statistical computing and graphical capabilities. Data precise data in a number of ways. It provides methods for
processing and storage are made easy with the R tool [25]. classification, prediction, affinity analysis, data exploration,
The most potent and widely used programming language for and reduction in context using statistical and machine-
data research, visualization, and computational statistics is learning techniques. Excel data mining, forecasting, and
R. visualization are all made simple using Solver's XLminer, a
professional data mining program. It includes a wide variety
Rapidminer of data preparation tools for gathering and cleansing user
The data mining software RapidMiner is available data.
without cost. With it, you can prepare data, learn algorithms,
and deploy models. Numerous options for creating fresh Spark
data mining activities and evaluating potential setups are Spark tools, which are utilized for effective and
included in this free data mining program. Its drag-and-drop reliable data processing in big data analytics, are the
interface and pre-built models enable non-programmers to framework's primary software components. Open-sourcing
create predictive processes for specific use cases like fraud of the Spark framework is done under the Apache license.
detection and customer attrition. Rapid Miner's R and For interactive searches, machine learning, and real-time
Python extensions allow programmers to customize their workloads, use the free and open-source Spark platform.
data mining in the interim. Powerful data mining software
called RapidMiner Studio supports everything from model Datawatch
deployment to model operations. An industrial intelligence and data mining system is
Datawatch Desktop. It enables the user to concentrate on
Weka real-time data visualization. Without writing a single line of
Weka is open-source mining program built on the Java code, it offers customers capabilities to construct and
platform. It was created at the Waikato University in New maintain monitoring and analytical systems. Renters,
Zealand. It is a situation where the necessary steps in data building owners, and property managers can all rely on
mining can be completed. Weka is a collection of machine Datawatch Systems for cutting-edge technology solutions
learning algorithms for data mining jobs. It includes tools that provide complete asset security.
for data preparation, categorization, regression, clustering,
association rule mining, and visualization [21]. K-Core
Network analysis, spam detection, and biology are
Orange three fields where data mining algorithms regularly use the
Orange is a strong platform for data analysis and k-core feature. Compared to other density-based measures
visualization, allowing users to see data flow and increase like the densest sub graph, it has the advantage of giving a
productivity. With a superior debugger that improves code score to every node in the network. The highest number of
debugging, Orange is the tool that is the simplest to learn. items that are individually related to at least k other entities
Python can be used to mine data by developers, on the other in the collection are gathered by a k-core tool. On a network,
hand. This platform includes tools for data analytics, text small, interconnected core sites are referred to as K-Core.
mining, bioinformatics, and machine learning.
VII. APPLICATION
Knime
Data scientists can build their own independent apps Each Field can Benefit Fundamentally from Data
and services using the robust drag-and-drop drag-and-drop Mining. Several of them are Listed below.
data mining tool KNIME [25]. Data science is produced
using the free source KNIME Analytics Platform. By being Education
straightforward, open, and constantly incorporating new Educational data mining is a method for turning
technologies, KNIME enables understanding data and unstructured data from enormous educational databases into
creating data science workflows and reusable components useful and insightful information (EDM). The advantages of
accessible to everyone. data mining for the education sector are: Identifying student
performance and making performance predictions.
Keel Predicting dropouts' academic achievement. To monitor the
KEEL is a Java open-source software tool that enables academic progress of the kids and the effectiveness of their
users to assess the performance of evolutionary learning and teachers. To improve both the teaching process and teacher
Soft Computing based algorithms for a range of data mining performance.
problems, such as regression, classification, clustering,
pattern mining, etc. For constructing experiments with
diverse datasets and soft computing techniques to assess the
Classification
The data mining function of classification divides up
the objects in a collection into specific groups or
classifications. To group data into a class or category,
classification techniques are utilized. Both structured and
unstructured data may be used in its execution. Generally
speaking, a classification algorithm is a function that divides
a class into positive and negative values based on the input
features [24].
Association Rule
Data mining function called association determines the Fig 5 Specific Popular Algorithms
likelihood of elements in a collection occurring together. For (https://link.springer.com/article/10.1007/s10916-018-1018-2)
finding intriguing relationships between variables in sizable
databases, association rule learning is a rule-based machine IX. DISCUSSION
learning technique [12]. Large volumes of data are analyzed
using association rule mining to uncover intriguing linkages Large unsupervised data sets can be mined for valuable
and relationships. It searches for the rules governing how or information using data mining techniques. It is a technology
why many things are frequently purchased together in a that uncovers patterns and correlations in a sizable dataset
same transaction [24]. and then analyses those patterns to generate helpful data for
additional study. Nearly all industries, including those in
Artificial Neural Network education, healthcare, banking, social media, retail, criminal
A neural network is a programming paradigm that analytics, telecommunication, fraud detection, and others,
mimics the function of brain nerve cells. A computational have greatly benefited from its application. It has sped up
model called an artificial neural network (ANN) is made up the creation of management and analytical tasks and cut
of numerous processing elements that take inputs and down on the amount of time and human effort needed to
outputs based on a specified perceptron [17]. organize massive data sets.
Naïve Bayes Data mining offers many advantages, but it also has
The Naive Bayes algorithm, a supervised learning some disadvantages, much like every instrument.
method, is based on the Bayes theorem. A probabilistic Dependence on this method can occasionally neglect the
computational model that can be used for a variety of value of human intervention because it is a strong strategy
classification applications is the Naive Bayes method. that relies heavily on computers. Therefore, if someone is
Useful applications include text classification, spam familiar with the method and the variables used in data
filtering, and sentiment analysis. mining, he or she can input false data in a way that the
method won't be able to tell apart from the true data, leading
Support Vector to the production of misleading information. It's important
Both classification and regression problems can be to acknowledge some of these disadvantages.
resolved using the sophisticated Supervised Learning
method known as the Support Vector Machine. A particular Education
kind of statistical model is a technology known as a support One such technology that deals with educational data
vector machine. It applies the kernel trick, a technique for for educational institutions or the entire educational system
transforming user data, to get the best margin between is called educational data mining (EDM). Any student's
potential values. One of the best and most effective personal information, academic history, test scores,
classification algorithms is support vector machine. For both attendance in class, and other details are used. Utilizing the
linear and nonlinear data, it is a new classification book information also contributes to enhancing the library
technique. system. As a result, there are certain issues in this situation.
Any official with access to the database system can change a
student's data, including their name, contact information,
grades, and attendance.
[5]. G. M. Weiss and B. D. Davison, "Data Mining," To [21]. R. Ratra, P. Gulia and N. S. Gill, "Performance
appear in the Handbook of Technology Management, Analysis of Classification Techniques in Data
2010. Mining using WEKA," in Proceedings of the
[6]. C. Romero and S. Ventura, "Data mining in International Conference on Innovative Computing &
education," WIREs Data Mining and Knowledge Communication (ICICC) , 2021.
Discovery, vol. 3, no. 1, pp. 12-27, 2012. [22]. R. Ranjan, S. Agarwal and D. S. Venkatesan,
[7]. M. N. O. Sadlku, K. G. Eze and S. M. Musa, "Data "Detailed Analysis of Data Mining Tools,"
Mining in Healthcare," International Journal of International Journal of Engineering Research &
Advances in Scientific Research and Engineering, Technology, vol. 6, no. 5, 2017.
vol. 4, no. 9, pp. 90-92, 2018. [23]. S. Mirza, D. S. Mittal and D. M. Zaman, "A Review
[8]. E. Vityaev and B. Kovalerchuk, "Data mining in of Data Mining Literature," International Journal of
finance: From extremes to realism," Journal of Computer Science and Information Security
Financial Transformation, vol. 11, pp. 81-89, 2004. (IJCSIS), vol. 14, no. 11, 2016.
[9]. K. C. Khatib, T. D. Kamble, R. R. Chendake and G. [24]. G. Weiss, "Data Mining in the Telecommunications
N. Sonavane, "Social Media Data Mining For Industry," Networking and Telecommunications:
Sentiment Analysis," International Research Journal Concepts, Methodologies, Tools, and Applications,
of Engineering and Technology (IRJET), vol. 3, no. p. 8, 2010.
4, 2016. [25]. G. M. Weiss, "Data Mining in Telecommunications,"
[10]. B. M. Ramageri and D. B. L. Desai, "ROLE OF Data Mining and Knowledge Discovery Handbook,
DATA MINING IN RETAIL," International Journal pp. 1189-1201, 2005.
on Computer Science and Engineering, vol. 1, p. 5, [26]. S. Jones and O. K. Gupta, "Web Data Mining: A
2013. Case Study," Communications of the IIMA, vol. 6,
[11]. N. Jabeen and P. Agarwal, "Data Mining in Crime no. 4, 2006.
Analysis," in Proceedings of Second International [27]. M. Bansal and M. Kaur, "Analysis and Comparison
Conference on Smart Energy and Communication, of Data Mining Tools Using Case Study of Library
2021. Management System," International Journal of
[12]. N. Petrovic, "Adopting Data Mining Techniques in Information and Electronics Engineering, vol. 3(5),
Telecommunications Industry: Call Center Case pp. 466-469, 2013.
Study," in IEEESTEC - 11th Student Projects
Conference, Serbia, 2018.
[13]. R. G. Bhati, "A Review on Present Technologies for
Fraud," International Journal of Applied Engineering
Research, vol. 14, no. 7, 2019.
[14]. R. LIorente and M. Morant, "Data mining in higher
education," New fundamental technologies in data
mining, pp. 201-220, 2011.
[15]. I. Ijarcet, "A Review on Data Mining in Healthcare,"
International Journal of Advanced Research in
Computer Engineering & Technology, 2018.
[16]. H. A. A. Hafez, "Mining Big Data in
Telecommunications Industry: Challenges,
Techniques, and Revenue Opportunity," International
Journal of Computer and Information Engineering,
vol. 10, no. 1, 2016.
[17]. P. A. Joshi and S. N. Bhamare, "A Study of Social
Media Data and Data Mining Techniques,"
International Journal of Engineering Research and
Application, vol. 7, no. 4, pp. 46-52, 2017.
[18]. M. Injadat, F. Salo and A. B. Nassif, "Data mining
techniques in social media: A survey,"
Neurocomputing, vol. 214, pp. 654-670, 2016.
[19]. S. Sathyadevan, M. S. Devan and S. S. Gangadharan,
"Crime analysis and prediction using data mining," in
First international conference on networks & soft
computing (ICNSC2014), 2014.
[20]. B. Khalid and N. Abdelwahab, "A Comparative
Study of Various Data Mining Techniques: Statistics,
Decision Trees and Neural Networks," International
Journal of Computer Applications Technology and
Research, vol. 5, no. 3, pp. 172-175, 2016.