Professional Documents
Culture Documents
DWM Review Paper
DWM Review Paper
Abstract — The healthcare environment is generally perceived by demanding a more quantitative assessment of information
as being ‘information rich’ yet ‘knowledge poor’. There is a with the supporting of all clinical and imaging data. For instance
wealth of data available within the healthcare systems. Valuable it might now be possible for the physicians to compare
knowledge can be discovered from application of data mining diagnostic information of various patients with identical
techniques in healthcare system. In this study, we briefly conditions. Likewise, physicians can also confirm their findings
examine the potential use of classification-based data mining with the conformity of other physicians dealing with an identical
techniques such as Rule based, Decision tree of healthcare data. case from all over the world [2].
The healthcare industry collects huge amounts of healthcare Clinical decisions are often made based on doctors’ intuition
data which, unfortunately, are not “mined” to discover hidden and experience rather than on the knowledge rich data hidden in
information. For data preprocessing and effective decision the database. This practice leads to unwanted biases, errors and
making One Dependency Augmented Naïve Bayes classifier excessive medical costs which affects the quality of service
(ODANB) and naive credal classifier 2 (NCC2) are used. This is provided to patients. Wu, et al proposed that integration of
an extension of naive Bayes to imprecise probabilities that aims clinical decision support with computer based patient records
at delivering robust classifications also when dealing with small could reduce medical errors, enhance patient safety, decrease
or incomplete data sets. Discovery of hidden patterns and unwanted practice variation, and improve patient outcome [3].
This suggestion is promising as data modeling and analysis tools,
relationships often goes unexploited. Using medical profiles
e.g., data mining, have the potential to generate a knowledge-rich
such as age, sex, blood pressure and blood sugar it can predict
environment which can help to significantly improve the quality
the likelihood of patients getting a heart disease. It enables
of clinical decisions.
significant knowledge, e.g. patterns, relationships between
medical factors related to heart disease, to be established.
A. Knowledge discovery in medical databases
Key words: Data mining, Healthcare, Naive Bayes, ODANB,
NCC2 Data mining is an essential step of knowledge discovery. In
recent years it has attracted great deal of interest in Information
I. INTRODUCTION industry [4]. Knowledge discovery process consists of an
iterative sequence of data cleaning, data integration, data
Knowledge discovery in databases is well-defined process selection, data mining pattern recognition and knowledge
consisting of several distinct steps. Data mining is the core step, presentation. In particulars, data mining may accomplish class
which results in the discovery of hidden but useful knowledge description, association, classification, clustering, prediction
from massive databases. A formal definition of Knowledge and time series analysis. Data mining in contrast to traditional
discovery in databases is given as follows: “Data mining is the non data analysis is discovery driven.
trivial extraction of implicit previously unknown and potentially
useful information about data” [1]. Data mining technology
provides a user-oriented approach to novel and hidden patterns in B. Heart Disease
the data. The discovered knowledge can be used by the healthcare
administrators to improve the quality of service. The discovered The term Heart disease encompasses the diverse diseases
knowledge can also be used by the medical practitioners to reduce that affect the heart. Heart disease kills one person every 34
the number of adverse drug effect, to suggest less expensive seconds in the United States [7]. Coronary heart disease,
therapeutically equivalent alternatives. Anticipating patient’s Cardiomyopathy and Cardiovascular disease are some
future behavior on the given history is one of the important categories of heart diseases. The term “cardiovascular disease”
applications of data mining techniques that can be used in health includes a wide range of conditions that affect the heart and the
care management. blood vessels and the manner in which blood is pumped and
Health care data is massive. It includes patient centric data, circulated through the body. Cardiovascular disease (CVD)
resource management data and transformed data. Health care results in severe illness, disability, and death [6]. Narrowing of
organizations must have ability to analyze data. Treatment the coronary arteries results in the reduction of blood and
records of millions of patients can be stored and computerized oxygen supply to the heart and leads to the Coronary heart
and data mining techniques may help in answering several disease (CHD).A sudden blockage of a coronary artery,
important and critical questions related to health care. generally due to a blood clot results in a heart attack. Chest
The availability of integrated information via the huge pains arise when the blood received by the heart muscles is
patient repositories, there is a shift in the perception of clinicians, inadequate [5] and inductive logic programming.
patients and payers from qualitative visualization of clinical data
2
Many healthcare organizations struggle with the data mining. We now describe a few Classification data mining
utilization of data collected through an organization online techniques with illustrations of their applications to healthcare.
transaction processing (OLTP) system that is not integrated for
decision making and pattern analysis. For successful healthcare A. Rule set classifiers
organization it is important to empower the management and
staff with data warehousing based on critical thinking and Complex decision trees can be difficult to understand, for
knowledge management tools for strategic decision making. instance because information about one class is usually
Data warehousing can be supported by decision support tools distributed throughout the tree. C4.5 introduced an alternative
such as data mart, OLAP and data mining tools. A data mart is formalism consisting of a list of rules of the form “if A and B
a subset of data warehouse. It focuses on selected subjects. and C and ... then class X”, where rules for each class are
Online analytical processing (OLAP) solution provides a multi- grouped together. A case is classified by finding the first rule
dimensional view of the data found in relational databases. With whose conditions are satisfied by the case; if no rule is satisfied,
stored data in twodimensional format OLAP makes it possible the case is assigned to a default class
to analyze potentially large amount of data with very fast
response times and provides the ability for users to go through
1) IF conditions THEN conclusion
the data and drill down or roll up through various dimensions
as defined by the data structure.
This kind of rule consists of two parts. The rule antecedent
(the IF part) contains one or more conditions about value of
II. PROPOSED METHOD FOR PREDICTION
predictor attributes where as the rule consequent (THEN part)
contains a prediction about the value of a goal attribute. An
For negotiate the medical data these days diverse information
accurate prediction of the value of a goal attribute will improve
systems incidental to healthcare are being used because of the
decision-making process. IF-THEN prediction rules are very
immense collection. The primary grail is to design a system
popular in data mining; they represent discovered knowledge at
which is used for early anticipation of heart related sickness.
a high level of abstraction. In the health care system it can be
Figure employee the methodology for the prediction of heart
applied as follows:
unwellness. The dataset from UCI/Kaggle in CSV format then
preprocess the data has been done which includes data
transformation, data cleaning, and data integration. After
preprocessing data mining classification algorithms such as
Decision Tree, SVM, Naive Bayes, k-NN are applied for the
prediction with and without filtering the attributes and compare
their accomplishment.[4]
REFERENCES
V.WHAT IS HEART DISEASE? [2] Miller, A., B. Blott and T. Hames, 1992. Review of
neural network applications in medical imaging and signal
Cardiovascular disease (CVD) can take many forms, and processing. Med. Biol. Engg. Comp., 30: 449-464.
includes a range of conditions that affect the heart. These include
coronary artery disease, heart failure, stroke, and high blood [3] Chen, J., Greiner, R.: Comparing Bayesian Network
pressure. The good news is that fewer Americans are smoking and Classifiers. In Proc. of UAI-99, pp.101–108 ,1999.
more Americans are increasing their physical activity, both of
which have improved the statistics regarding heart disease. The [4]https://www.researchgate.net/publication/345383677_
bad news is that in 2017, guidelines for high blood pressure were Heart_Disease_Prediction_Using_Frequent_Item_Set_Mi
changed, and many more people than before are now considered ning_and_Classification_Technique
to have high blood pressure, putting them at higher risk for having
heart attacks or strokes. [5] Glymour, C., D. Madigan, D. Pregidon and P.Smyth,
1996. Statistical inference and data mining.
Communication of the ACM, pp: 35-41.