Professional Documents
Culture Documents
Sharma 2022 J. Phys. Conf. Ser. 2267 012157
Sharma 2022 J. Phys. Conf. Ser. 2267 012157
Minakshi Sharma
1. Introduction
Health care researches are drowning in data but still starving for the knowledge. There is wealth of
health data available but due to lack of effective analysis tool to develop the relationship between them
or to find out the hidden patterns make it of no use. Traditional methods are time consuming and among
all data Mining is one of the best techniques that have a good response from govt, Healthcare
organisations, enterprise and many organisations. These techniques mainly used for interpreting big
data, to ease the work in hospitals by helping all the employees of the hospitals to serve better treatments
and facilities to the patients. With the help of these techniques’ doctors can go through the stored data
of patients and can draw out conclusion for the operations, OPDs, and many more works accordingly.
To know the patient’s future based on the patient’s data stored by electronic means is one of the best
features which can be used in healthcare for the wellbeing of patients.
Data mining has found its role in almost all the fields of day-to-day life medical field also being no
exception. In the present prevailing situation of covid crisis internet can be effectively used for public
health e.g. internet can be used to divide people into different age groups with various medical
anomalies. The role of data mining is especially much more in our country where people still hesitate to
get modern healthcare facilities e.g., vaccination etc. Uploading the information on a digital platform
helps the government to identify the loopholes. The countries like USA, UK, European Nations have
progressed a lot because all the government schemes, projects, development are all available online.
Making the information and the government schemes online enables transparency, good governance and
better public government relations. Data mining techniques can effectively use to address the medical
problems in a country with less resources and high population density. The non-availability of specialist
doctors in far flung areas can be best tackled by developing online medical consultation app in Play
Store that enabling better health care facilities to general paper public.
Knowledge discovery in databases is a process to extract knowledge from big data. Data mining is a
seven-step process to draw out the knowledge from big data. This consists of many approaches like
clustering, classification, summarization, association, analysing variations and visualization. Through
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
this paper, we focus on the data mining techniques which can be implemented in health sector. This
study also presenting a brief literature survey and discussing various techniques for various disease.
Healthcare facilities in our country are quite ambiguous and Data Mining Techniques helps in regulating
these healthcare facilities by maintaining records with amazing array of patient’s data. The traditional
methods are not accurate and easy hence the modern technique like data mining is the need of hour for
best treatments and day to day monitoring of the patients. At present big companies are trying to invest
in artificial intelligence for healthcare to develop the online consultation app in play store to tackle with
the people of far-flung areas by increasing medical services and by recommending medicines online
after predictive analytics.
2
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
2.1.7. Knowledge
Where visualization & knowledge representation techniques are used to present the mined knowledge
to the user.
Thus, it is concluded that Knowledge discovery on databases (KDD) is three step processes i.e.
Pre-processing Mining Post processing
In pre-processing technique Data cleaning, Data integration and Data Transformation takes place to
improve the quality of data. It is one of the most critical steps in data mining process which deals with
the preparation and transformation of the initial dataset and then in Post processing technique is to
extract the knowledge in such form that user can insight into data for better decision making and thus it
includes Pattern evaluation and visualization for extracting the knowledge.
3. Related works
Baek et al. [1] characterizes the depression risk in four stages as good, bad, risky and very risky which
depends on changing context by multiple regression. In this the 14 variables for the prediction of
depression risk are used also as the data increases number of layers gets accumulated. This problem can
be solved by the individual learning with less time. The predictive values of this study lie on the unit
interval i.e., in between 0 and 1. Khan et al. [2] presented the various data mining techniques for diabetes
detection, classification and prediction. In this study it is concluded that for accurate results the data
should be pre-processed and parallel models should be used instead of one. Delen et al. [3] compared
the three methods i.e., Artificial neural network, decision Trees, logistic Regression on breast cancer
survivability with the predictive accuracy of all the three models. This study used the statistical methods
and data mining techniques as the research methods. They developed the predictive models and discover
relationships between independent variables and the cancer survivability. In these 2,00,000 cases used
as dataset and 10-fold cross validation for the relative prediction ability of different data mining
methods. The results in this study shows the decision tree is the best predictor than others.
the untimely deaths may reduce. Analytical skills allow us to clarify and explain patterns in big data
related to improved patient responses, helping us select treatments for optimal patient outcomes. A well-
established company of medical equipment with the focused aim of hospitals and clinics increases their
sales and got more return of investment. With the analysis of data fraud can reduces, patient records can
be easily accessed and well treated. As the data is written in books are just for the shelf cover with no
use but by using data mining the diseases can be diagnosed and the proper treatment can be suggested
so that the people who cannot come physically can be benefitted by using these techniques. Several
medical resources companies analyzed their products for the usage in medical fields, clinics etc. and
develop them according to the standards set for specific and modified treatment plans. With the
advancement of the tools in healthcare industry nurses can keep well check on individuals and
responsible for day to day monitoring also. They can provide motivational, emotional support to the
patients which is one of the natural therapies to recover and this is one of the major contributions in
biomedical field of research using data mining tools.
4.1.2. Descriptive Model: The descriptive model classifies customers or prospects into groups based on
the analyzing relationship between the data as in Clustering, Summarization, Association rule, Sequence
discovery, etc.
4.1.3. Summarization: Summarization is of great importance in data mining. It is like the summary or
conclusion the data. In data mining summarization can be done by using various methods like mean,
standard deviation etc. This gives the general information of the whole data.
4.1.5. Classification: It is the process to find a model which is well suitable for the problem according
to the class objects whose class label is unknown. It accurately predicts the target for each class in the
4
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
data. Basically, it divides the data into training and testing data. By analysing the pattern on training
data prediction is applied on testing data by preparing a model based on training data It can be used to
classify loan applicant as low, medium, high risk.
4.1.6. Clustering: It is an unsupervised machine learning based algorithm which consists of set of data
points and form into clusters. Similar type of data objects belong to the one group and the subset of this
group is known as cluster. Clustering techniques basically are of two types hard clustering and soft
clustering. In hard clustering one data point can belong to one cluster only but in soft clustering it can
belong to more than one cluster also.
4.1.7. Trend analysis: By analysing the data that what happened in the past it suggests that what will
happen in the future. Nowadays as the number of cases are reducing and hence it follows the declining
or down trend.
4.1.8. Regression Analysis: It is a statistical process for estimating the relationships between the
dependent variables or one or more independent variables. In this continuous values or range of
numerical values can be predicted.
4.1.9. Decision Tree: The name shows the tree shaped structures of decisions. In this all the branches
gives their results and the best one to draw out as the decision for the classification of dataset. It includes
ID3, C4.5 and CART (classification and regression) Trees.
5. Healthcare Techniques
5
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
units) which are interconnected by nodes. These input units receive various information and neural
network tried to learn about the knowledge produced as a output. There is also a back propagation in
which the network works backward from output to input units to adjust weight of its connection until
the difference between the actual output and desired outcome shows less error as possible. It is also
applied for improving the patient’s disease management [3]. It helps to select the appropriate method
which can be used in health care industry for best results [4] artificial neural network technique is most
common technique used in major disease area like cancer etc according to a survey of artificial
intelligence applications [5].
5.4.2. Iterative Dichotomiser: It is introduced by Ross Quinlan used to take decision from data set and
it is the predecessor of C4.5 which takes its output as input for making decisions.
5.4.3. C4.5 algorithm: C4.5 is one of the decision tree algorithms. It is also known as J48 in WEKA tool
and as its decision tree can be used for the classification and that is why it is known as statistical
6
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
classifier. In this the normalized attribute selection measure is known as gain ratio and which can be
measured as Gain(A)/Split info(A) = Gain ratio (A) where, A shows the attribute in a data set D. It is
quite time efficient and can be used for building more accurate decision trees.
5.4.4. CHAID: Chi-squared automatic interaction detector produces multiple branches of single or
parent node and it is frequently used for descriptive analytics.
5.4.5. Random Tree Classifier: It is a machine learning algorithm and does ensemble classification. In
this it takes the decision from all the branches and the highest no. of vote prediction will be the final
decision tree. It is also time saving and efficient classifier.
6. Comparative study
Table 1. Comparative study of different prediction in healthcare.
S. No Techniques used Algorithm Field
1 Decision Tree C4.5 Diabetes
2 Naïve Bayes Classification Heart Disease
3 Neural Network Back Propagation Eye disease
4 Decision tree (SPSS) Chi square Diabetics
5 SVM - Stroke mortality in brain
6 Neural networks Trend analysis Heart disease
7 Decision tree induction method C5 Breast cancer
8 Artificial Neural Network Not discussed Breast cancer
9 Logistic Regression Not discussed Breast cancer
10 Artificial Neural network Not discussed Chest Disease
11 K- Nearest Neighbour Classification Diabetes, Cancer
12 SVM classifier classification Diabetes
13 Apriori algorithm Prediction Chronic Disease
7. Conclusion
In recent times there has been sharp hike in amount of medical data in the field of medical care which is
gathered by various electronic means and this along with rise in availability of economical and
dependable computing equipment has encouraged many researchers to start exploring these collected
data. However, despite all available data our healthcare system is still crumbling which shows that these
data have not been properly used. This paper explores the application of data mining techniques using
medical data effectively for disease prediction which can be diagnosis at proper time to enable good
human health. It is observed that some of the techniques of data mining has already been employed for
medical data and some can be in the coming time for better results. Huge amount of data available in
the medical field has made it mandatory to use data mining techniques for arriving at a decision and
prediction in the field of medical care such as to identify the kind of disease., availability of various
medical facilities at different medical centres. More over the J48, decision trees in Weka shows the
wonderful results in the healthcare sector. This paper checks various techniques in contrast with others
it is seen that random forest classifier shows effective results. These techniques can be applied in various
healthcare subfields also to predict and allow to prepare better for the upcoming situation.
8. References
[1] Baek J W and Chung K 2020 Context deep neural network model for predicting depression risk
using multiple regression IEEE Access 8 pp 18171-81
[2] Khan F A, Zeb K A M, Derhab A, Bukhari S A C 2021 Detection and Prediction of Diabetes
using Data Mining A Comprehensive Review IEEE Access
7
RAFAS-2021 IOP Publishing
Journal of Physics: Conference Series 2267 (2022) 012157 doi:10.1088/1742-6596/2267/1/012157
[3] Delen D, Walker G, Kadam A 2005 Predicting breast cancer survivability: a comparison of three
data mining methods Artificial intelligence in medicine 34(2) pp 113-27
[4] Luo L, Luo L, Zhang X, He X 2017 Hospital daily outpatient visits forecasting using a
combinatorial model based on ARIMA and SES models BMC health services research 17(1) pp
1-13
[5] B Taneja A 2013 Heart disease prediction system using data mining techniques Oriental Journal
of Computer science and technology 6(4) pp 457-66
[6] B Krishnaiah V, Narsimha G, Chandra N S 2016 Heart disease prediction system using data
mining techniques and intelligent fuzzy approach a review International Journal of Computer
Applications 136(2) pp 43-51
[7] Mahmoud H, Abbas E, Fathy I 2018 Data mining and ontology-based techniques in healthcare
management International Journal of Intelligent Engineering Informatics 6(6) pp 509-26
[8] Naveenkumar S, Kirubhakaran R, Jeeva G, Shobana M, Sangeetha K Smart Health Prediction
Using Machine Learning
[9] Kumar H and Singh N 2017 Review paper on Big Data in healthcare informatics International
Research Journal of Engineering and Technology 4(2) pp 197-201
[10] Er O, Yumusak N, Temurtas F 2010 Chest diseases diagnosis using artificial neural
networks. Expert Systems with Applications 37(12) pp 7648-55
[11] Tang P H and Tseng M H 2009 Medical data mining using BGA and RGA for weighting of
features in fuzzy k-NN classification In 2009 International Conference on Machine Learning and
Cybernetics IEEE 5 pp 3070-75
[12] Balakrishnan S and Narayanaswamy R 2009 Feature selection using fcbf in type ii diabetes
databases International Journal of the Computer the Internet and the Management 17(1) pp 50-8
[13] Chaurasia V and Pal S 2013 Early prediction of heart diseases using data mining techniques
Caribbean Journal of Science and Technology pp 208-17
[14] Bahrami B and Shirvani M H 2015 Prediction and diagnosis of heart disease by data mining
techniques Journal of Multidisciplinary Engineering Science and Technology (JMEST) 2(2) pp
164-8