CIS5364 Termpaper Data Miningin Healthcare

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322754945

DATA MINING IN HEALTHCARE

Research · January 2018


DOI: 10.13140/RG.2.2.22189.38887

CITATIONS READS
6 11,568

2 authors, including:

Indrajit Sen
Texas State University
13 PUBLICATIONS 8 CITATIONS

SEE PROFILE

All content following this page was uploaded by Indrajit Sen on 28 January 2018.

The user has requested enhancement of the downloaded file.


DATA MINING IN
HEALTHCARE

Indrajit Sen, Krati Khandelwal


CIS 5364; SPRING 2014
TEXAS STATE UNIVERSITY
TABLE OF CONTENTS
Abstract .................................................................................................................................................................... 2
Introduction .............................................................................................................................................................. 3
Background .............................................................................................................................................................. 4
Definition and Usage ................................................................................................................................................ 5
Tools, Software and Algorithms ............................................................................................................................... 7
Common data analysis tools.................................................................................................................................. 7
Data mining algorithms ........................................................................................................................................ 8
Choosing the right algorithm ................................................................................................................................ 9
Choosing an algorithm by type ............................................................................................................................. 9
Applications of Data Mining in Healthcare ............................................................................................................. 10
Cardiovascular diseases ...................................................................................................................................... 10
Cancer ................................................................................................................................................................ 12
Pediatrics ............................................................................................................................................................ 14
Outpatient healthcare .......................................................................................................................................... 14
Transnational medicine ....................................................................................................................................... 17
Life expectancy calculations ............................................................................................................................... 17
Legal Aspects of Data Mining ................................................................................................................................ 19
Ethics ..................................................................................................................................................................... 20
Findings and Proposed Solutions ............................................................................................................................ 21
Summary and Conclusion ....................................................................................................................................... 21
Appendix (List of figures) ...................................................................................................................................... 23
Figure 1 .............................................................................................................................................................. 23
Figure 2 .............................................................................................................................................................. 24
Figure 3 .............................................................................................................................................................. 24
Bibliography........................................................................................................................................................... 25

1
ABSTRACT
The focus of this paper is to examine the gift of data mining in everyday life,

especially healthcare. With preponderance of computing technology, statistical analysis

has received a fillip. Using and enhancing already known statistical techniques, data

mining helps predict human behavior from sectors as diverse as supermarket purchases

to cancer vaccine manufacture.

The paper starts with a brief introduction to data mining including describing its

popular and everyday applications in retail. Data mining technologies and algorithms are

briefly analyzed. A quick interview with a firm actually using data mining to its benefit is

mentioned.

Next, the paper moves onto describing various research papers that have used

data mining to answer critical health questions. What is the age group most susceptible

to cardio vascular diseases? What is the most popular cancer vaccine trial? How many

such trials have been successful? What is a good treatment for a rare children’s disease?

How can data mining be used to solve problems relating to medical applications across

nations? How can life expectancy be accurately determined? Most of these questions find

an answer in this paper.

We then examine legal and ethical aspects of data mining. Finally, we close on an

optimistic note on the future prospects of this promising technology.

2
INTRODUCTION
In today’s world it seems that it is difficult to plan without data mining but imagine

you wake up one day and realize there is no way you can access any information that is

valuable to you. Suppose you are a doctor and found out that there is no means by which

you look in the computer and recall the patient’s habits and activities. There is no way to

search for effective treatments and best practices and moreover there was no way to

analyze the data and avoid some of the complications involved in the industry. We know

that data is powerful and valuable. But how?

With the advancement in data mining, these days we can answer crucial questions

like “What kind of surgeries resulted in longer than five days of stay for patients in

hospitals?” and “What were the common pre-surgery symptoms of patients who stayed

for a longer period of time in a hospital?” The utility of data mining is not only important

and limited to healthcare industry but also in improving customer satisfaction, better target

marketing campaigns, identifying high-risk clients, and improving production processes

for all industries. However, since this paper is based on healthcare applications of data

mining, our focus will be on healthcare.

3
BACKGROUND
Data mining can be considered a relatively recently developed methodology and

technology, coming into prominence only in 1994. It aims to identify valid, novel,

potentially useful, and understandable correlations and patterns in detail by combing

through copious sets of data to sniff out patterns that are too subtle or complex for humans

to detect. There is huge amount of data that is collected during different processes.

Traditional methods will take too much time and efforts to analyze the data .With data

mining business tools and data mining algorithms, it will be much easier to track down the

core of the information with much ease and accuracy (Hian).

Due to its huge importance data mining has been used intensively by many

organizations. In healthcare, data mining is becoming increasingly popular. Data mining

and its applications within healthcare are of vital importance. For example, data mining

can help healthcare insurers detect fraud and abuse, health care organizations make

customer relationship management decisions, physicians identify effective treatments

and best practices, and patients receive better and more affordable healthcare services

(Koh HC1, n.d.).

Major areas such are the evaluation of treatment effectiveness, management of

healthcare, customer relationship management, and the detection of fraud and abuse. It

also gives an illustrative example of a healthcare data mining application involving the

identification of risk factors associated with the onset of diabetes.

Imagine that you are running fast and come up to a point where you should not run

any further but you are still pushing yourself. Until your doctor calls you and tells you must

slow down. How awesome is that. Already, some mobile apps and trackers are collecting

4
your fitness data and sending it to the cloud. Microsoft HealthVault — Microsoft’s web-

based electronic health records platform — lets doctors access data from fitness trackers

like Fit Bit or Nike+ Fuel Band and glucose and heart monitors that patients have

uploaded themselves (Hernandez, 2014).

Today, with the advancement in technology, you do not have to fill out a new form

every time you see another doctor. Doctors now share that information with each other.

Apple, Adidas, Samsung, GPS maker Garmin, audio technology company Jawbone, and

gaming hardware manufacturer Razor are developing products that measure biological

functions at ever faster clips. Startups across the country are creating gadgets such as

pill boxes that can monitor whether patients are taking their meds and under-the-mattress

sensors that measure heart rate, breathing and movement. It is an attempt to create a

one-stop shop for health information (Hernandez, 2014).

DEFINITION AND USAGE


Data mining is a powerful new technology with great potential to help companies

focus on the most important information in the data they have collected about the behavior

of their customers and potential customers. With the use of data mining you can tell and

study a lot about patterns and behaviors. This can help to make valuable business

decisions. There can be several things that can be done from data mining like:

1) Fraud Detection: Big stores like Macy’s or J C Penny and other small

businesses can also keep track of which are the customers who buy things and return

them after using them .This kind of information can be tracked if the transactions are

being made by one particular credit card. In one of the author’s job search, she interacted

5
with a business analyst of Buckle, Inc., Mr. Shane Johnson who said that there are many

customers who will buy a particular item like child clothing or a women’s dress and return

it back after few days. These dresses are usually worn and after taking credit card

information and digging out in detail the store found out that the customers who were

doing this were mainly females in the age between 18 to 29 years old and of Hispanic

origin. But there is nothing which we can do to fix the problem. However, at most we can

tell them that they have a pretty strong return history. So, by doing this this segment of

customers will know that the store knows what they are doing (Johnson, 2014).

2) Can identify the complimentary goods for one particular kind of Product:

a) Amazon offers a useful example of how descriptive findings are used for prediction.

Looking at the user’s purchase history Amazon was able to find the association between

cocktail shaker and martini glass purchases (The Atlantic, 2012).

Another similar example could be:

b) Target assigns every customer a Guest ID number, tied to their credit card, name, or

e-mail address that becomes a bucket that stores a history of everything they have bought

and any demographic information Target has collected from them or bought from other

sources (Hill, 2012)

6
TOOLS, SOFTWARE AND ALGORITHMS
COMMON DATA ANALYSIS TOOLS
Orange: A component-based data mining and machine learning software suite written in

the Python language (Wikipedia, 2014).

R: A programming language and software environment for statistical computing, data

mining, and graphics. It is part of the GNU Project.

Rapid Miner : An environment for machine learning and data mining experiments (7).

SCaViS : Java cross-platform data analysis framework developed at Argonne National

Laboratory.

SenticNet API: A semantic and affective resource for opinion mining and sentiment

analysis.

UIMA: The UIMA (Unstructured Information Management Architecture) is a component

framework for analyzing unstructured content such as text, audio and video – originally

developed by IBM.

Weka : A suite of machine learning software applications written in the Java

programming language.

And there are many more to follow.

One of the authors of this paper interned at Keller Williams Realty firm and used

software R to do her research work. Keller Williams is a renowned realty firm and deals

in collecting customer data and its’ analysis. It collects data from various sources like

different companies, seminars, online enquiries and walk-ins. After collecting vital

information about the clients, for example, are people living in a particular location looking

for a big budget house or a small budget house. How age is related to the size of the

7
house. It then creates and organizes marketing campaigns. These marketing campaigns

were designed for a particular target group which was found after doing the analysis. Data

mining helped them a lot because now they were considering only a limited group of

people with different attributes to target instead of targeting the whole bunch of people

who don’t even require big budged house. We interviewed the manager of Keller Williams

South Austin and he said Data mining and its application has really resulted in some

focused marketing .It has also showed some improved results from the past where the

campaigns were targeted to the clients as a single entity. He continued saying that now

the campaign and marketing events are more specific and customer needs are taken into

account rather that doing bulk marketing and sending thousands of email on regular basis

to people whose requirements are not even met in those add campaigns.

DATA MINING ALGORITHMS


A data mining algorithm is a set of calculations that interprets the data. The

algorithm checks for some sort of connectivity and pattern in the data and creates results.

The algorithm then uses the results of this analysis to define the optimal parameters for

creating the mining model. These parameters are then applied across the entire data set

to extract actionable patterns and detailed statistics.

There can be multiple algorithms to define the model. It is not unusual for seasoned

analysts to mine data using an initial algorithm, and then use a more complex one to

refine their results. Examples of research papers that mined data based on healthcare

databases often have found that their research findings are enhanced by the second

algorithm as this paper finds out in a subsequent section. Based on the algorithm used,

the information will be extracted which can then be used to make valuable decisions.

8
CHOOSING THE RIGHT ALGORITHM

It is not always easy to choose the best algorithm. It can be really tricky and

cumbersome at times. Every algorithm produces a different result. How different the

results are can be sometimes used to determine the efficacy of a research method

(Microsoft Technet, 2014). For example, you are working for Sam’s Club and have tens

of thousands of customer data and you have to cut down the data but not able to come

to a conclusion that which data to delete and which to keep .Then in this case Microsoft

Decision Trees algorithm can be of great use because this algorithm can identify which

columns are of least importance and that can be easily deleted.

CHOOSING AN ALGORITHM BY TYPE

 Classification algorithms: A dataset usually has several attributes. A classification

algorithm predicts one or more discrete variables based on these attributes.

Examples are Support Vector Machines (SVM) and C4.5 (Yang, 2007).

 Regression algorithms: While classification algorithms predict discrete variables,

regression algorithms predict continuous variables. Examples are AdaBoost and

Naïve Bayes

 Association algorithms: Are useful in determining the associations between

various attributes in a data set. The most famous example is the Apriori algorithm.

 Segmentation algorithms: These slice up the data into groups or clusters. The

Microsoft Clustering Algorithm is a good example.

9
 Sequence analysis algorithms: These summarize frequent sequences or episodes

in data, such as a Web path flow. An example is the CART algorithm (Microsoft

Technet, 2014).

APPLICATIONS OF DATA MINING IN HEALTHCARE

Cardiovascular disease and cancer are the two deadliest killers in the world in that

order, according to the WHO (Mathers CD, 2009). Better knowledge about causes and

symptoms can no doubt reduce or delay fatalities to a large extent. Data about patients

are present in global hospital databases. However, there seems to be no consistency,

either in the format of the data or its availability. Even if all or most of the data could be

brought in a mutually intelligible format, it is not humanly possible to draw inferences from

the hidden patterns. Most of the hidden information or pattern would go unnoticed and

the utility of the precious data would really be limited to a small group of localized patients.

Physicians in advanced technological nations like the US and the UK would not be able

to fruitfully research that data and find new ground breaking cures for all of humankind.

CARDIOVASCULAR DISEASES

A group of three Iranian scientists used classical data mining algorithms like

Decision Trees, Artificial Neural Networks (ANNs), and Support Vector Machine (SVM)

to attempt to predict the early onset of Coronary Artery Disease (CAD) (Peyman Rezaei

Hachesu1, 2013). Although the study was local, and onset of CAD is also dependent on

race, their study provides valuable insight into prediction of CAD. A group of around 5000

10
patients with CAD were analyzed using the three algorithms above. The following steps

were followed to preserver the validity and sanctity of the research:

1) The sample population was carefully chosen with expert medical advice, such

that patients of a particular heart health hospital in Teheran, Iran qualified well

for the study.

2) From the available pool of patients, all patients did not have consistent or

complete data. Data was pre-processed to remove noise, missing values were

substituted using average values in most cases and outliers were removed.

Outliers were defined as values lying outside the first and third quartile.

Minitab14 was used to further investigate the data distribution.

3) After the clean-up, only around 2000 data points were found to be complete

and valid. Since separation into a training and testing set is an important aspect

of data mining, 80% of the data was used for training and 20% for testing.

It was found that the mean age for onset of CAD was 58 with the 54-64 year old

age group being most susceptible. Overall, the SVM technique was found to be the most

accurate.

Using similar data sets in other countries and the same analysis algorithm (SVM),

onset of CAD in other countries including the US can be predicted. According to the

American Heart Association, the cost to treat heart disease in United States will triple by

2030 (American Heart Association, 2011). Further research into the factors causing CAD

can reduce this expense significantly.

11
CANCER

Although cardiovascular disease is the biggest killer, cancer is not far behind. In

fact, cancer is catching up as the number one, with global cancer deaths projected to

increase from 7.1 million in 2002 to 11.5 million in 2030 (World Health Organization, 2007)

4. The largest pharmaceutical companies in the world are (literally) in a rat-race to invent

new medications and compounds to cure cancer. A vital part of any new drug or vaccine

introduction is clinical trials. Clinical trials are research studies that explore whether a

medical strategy, treatment, or device is safe and effective for humans (National Institutes

of Health, 2014). As such, clinical trials involve huge data sets, however just collecting

the data is useless if it cannot be mined or analyzed usefully. A wealth of publicly available

clinical data can be found on the US government’s website ClinicalTrials.gov.

Using especially the data available on cancer, three US researchers tried to

summarize and visualize cancer vaccine clinical trials (Xiaohong Cao*1, 2008). The

researchers deduced that although a large volume of data was available, only simple

querying techniques were used thus far. Using sophisticated data mining and

bioinformatics, the researchers were able to answer critical questions like since when are

the trials running with or without success, vaccine platforms used and the phase of the

trials. However, the most important question answered was if any of the types of cancer

were neglected in research an trials. The researchers (not so surprisingly) found that

several varieties of equally deadly cancer like bladder, liver, pancreatic, stomach and

esophageal were neglected. This finding is sure to rattle boardrooms of many

multinational pharmaceutical companies.

12
Few other major findings using data mining techniques on the publicly available cancer

clinical trial data are:

1) Though the first cancer vaccine (lung) trial was conducted in 1971, a gradual

prevalence of trails started only as late as the early 2000s. Trails have been

steadily increasing since that time.

2) The top five cancers targeted by vaccine therapy in clinical trials are: melanoma

(skin cancer), cervical, prostate, breast, and leukemia. Melanoma is the largest

trial candidate, while cervical cancer is second.

3) In regards to institutions actually performing the trials, it was observed that the

National Cancer Institute was the undisputed leader followed by GSK

(GlaxoSmithKline). All other pharmaceutical companies had more or less

equally contributed to cancer vaccine trial and research.

4) Effectiveness of cancer vaccine trials can also be measured by the specific

type of vaccine strategy used. The researchers found that the majority of the

trials used an antigen based vaccine followed by a cellular based one. Together,

the antigen- and cellular-based vaccines forms over 80% of the trials.

5) An interesting scatter-plot with cancer incidence rates on the X-axis and five

year survival rates on the Y-axis gives an interesting representation of current

cancer prevalence and survival rates with existing medication. The four most

occurring cancers – prostate, melanoma, breast and cervix all find high clinical

trial rates (dark red circles). Interestingly, prostate cancer has a very high

survival rate too. Please see figure 1 in the appendix.

13
PEDIATRICS

Pediatrics is gaining increasing focus in the healthcare arena. With new

specialized hospitals like the Memphis, TN based St. Jude; mining all of the available

inpatient data is more important than ever. The aptly named ‘KID’ or Kids’ Inpatient

Database is a veritable one-stop shop for all pediatrics related clinical data (Bliss-Holtz,

2012). The KID is included in the HCUP (Healthcare Costs and Utilization Project) family

created in a Federal-State-Industry partnership with the Agency for Healthcare Research

and Quality (AHRQ), a federal agency. The data sizes are large, implying that relatively

rare children’s diseases like prune belly syndrome can be easily analyzed. Variables

contained in the KID include primary and secondary diagnoses; primary and secondary

procedures; admission and discharge status; patient demographics including gender, age,

race, median income (by ZIP code data); total charges; length of stay and hospital

characteristics (e.g., ownership, size, teaching status). The KID is thus a veritable gold

mine and if properly mined can help solve many pediatrics related questions that

physicians face.

OUTPATIENT HEALTHCARE

Most outpatients are not so grandly treated like inpatients in a typical hospital –

presumably because they pay much less, but outpatient illnesses can be very involved

and having adequate knowledge regarding diseases, conditions and medications can

mean cost savings for both the patient and the care provider. A research paper published

(Huang, 2013) with the help of a medical database of a Taiwanese hospital aims to

determine the best algorithm to analyze such a data set. Association rules can be

14
constructed between abnormal health examination results and outpatient illnesses. A

disease prevention knowledge database can then be built up that assists healthcare

providers in follow-up treatment and prevention. The author also proposes a new

algorithm that can analyze such a data set more effectively. Though definitely a candidate

for more rigorous testing, the power of data mining and the potential for further research

is easily demonstrated.

Few points on the choice of data mining algorithms and research methodology

used in the study:

1) Apriori algorithms are generally used to demonstrate association rules as

required by this study. Apriori algorithms were first discussed in 1993 and have

been popular since then (Huang, 2013). However, Apriori requires repeated

database scans that result in low efficiency. As medical research improves, a

requirement to correlate multiple diseases and causes has come to the

forefront.

2) Since the research was conducted in Taiwan, the data consisted of two parts

from a hospital in Taiwan: health examination results and outpatient medical

records. No distinction was made regarding medical department. Patient health

checkup data was divided into normal (01), below normal (02) and above

normal (03). Normal health data was filtered out, since the association sought

was between abnormal health results and outpatient illness (around 100,000

data points).

15
3) Outpatient illness records were obtained six months before and after the clinical

data. Also; incomplete, prenatal and dental data were removed from the

dataset.

Please see figure 2 in the appendix for a flowchart of the data integration process.

4) A new algorithm DCSM – Data Cutting and Sorting Method was proposed in

view of the limitations of the Apriori method. The DCSM is a seven step

method and consists of-

a. Data conversion into a Boolean matrix.

b. Establish large item sets for high frequency data.

c. Establish a reductions matric: essentially remove unpaired data.

d. Iterate step (b).

e. Iterate step (c).

f. Iterate step (d).

g. Return to step (b) and repeat steps (c) to (f).

5) Empirical analysis revealed that association rules found by using DCSM and

Apriori were exactly the same, thereby validating the new algorithm. However,

DCSM was found to be around ten times faster than the classical Apriori.

6) Association rules were corroborated by medical doctors and independent

research.

16
TRANSNATIONAL MEDICINE

A disease causing pathogen knows no international boundaries. Diseases and

conditions travel visa-free across international borders and time zones. The problem is

further exacerbated by different lifestyles in different countries. A particular cause of

cancer in one country might not be the culprit in another, however a related cause may

very well be. Data mining comes to the rescue again! To identify patterns of related

causes for a deadly disease, sequence clustering algorithms are very useful. Keeping in

mind the geographical distances between two countries, technologies like Service

Oriented Architecture (SOA) and Cloud Computing can be used to retrieve/query

geographically disparate datasets. With ever increasing Internet speeds, large data sets

can be quickly and integrally transmitted across oceans. Virtualization technology

eliminates most licensing needs and abstracts difficult technology from regular physician

assistants (Jigjidsuren, 2011). A very representative diagram for transnational medicine

is given in the appendix in figure 3.

LIFE EXPECTANCY CALCULATIONS

Life expectancy is a very useful metric, not only for healthcare administration, but

also for social applications like insurance, Medicare, etc. A group of researchers sought

to determine the life expectancy of a sample of outpatient population that were aged 50

and over (Jason Scott Mathias, 2013). They used predictive data mining and high

dimensional analytics. Predictive data mining is already being used by companies like

Amazon and Google to recommend products to their customers per the authors.

17
Applications in healthcare include ability to improve cancer and infectious disease

treatments.

The research experiment has around 7500 subjects- patients over 50 with at least

one visit to a large medical facility in 2003. 980 health attributes from their electronic

health records were extracted and run through complex statistical techniques (that

included predictive data mining). Attributes included information about demographics,

known diseases, hospital visits, patient vital signs, medications and healthcare utilization.

Using Correlation Feature Selection (CFS), all attributes were tested for mutual

correlation and correlation with a dichotomous variable that represented death in five

years. The number of patients who passed away in five years were noted. Using a mix of

the rotation forest ensembling techniques with alternating decision trees, the researchers

were successfully able to develop an index that could distinguish a group of high risk

patients with life expectancy less than five years.

The research has great ramifications since patients who are more likely to survive

longer can be preferred in diagnostic treatments and organ transplants.

18
LEGAL ASPECTS OF DATA MINING

Data mining of health care related databases has two broad-based uses in the

legal world. The first being its use in non-healthcare legal matters where data mining can

be used as credible evidence while testifying. It is to be noted that Federal Rule of

Evidence 404(b) makes no provision for treating prior acts found by humans any

differently than prior acts found by computer using data mining. Thus, a plaintiff with a

claims related case can very well use reasonable data mining techniques to hold his stand

in a court of law.

The second legal aspect of data mining deals with the healthcare data itself. A

good introductory fact is the US Supreme Court ruling of June 2011 in Sorrell versus IMS

Health Inc. determined that Vermont's law prohibiting pharmacies from selling

prescription data to "data-mining companies" violated the Free Speech Clause of the First

Amendment (Cohen, 2012). When it comes to healthcare data, HIPPA (Health Insurance

Portability and Accountability Act of 1996) has a leading role to play. The Supreme Court

ruling is a little surprising because of the Federal Privacy Rule that implements the HIPPA

prohibits any unauthorized use or disclosure of protected health information for marketing

purposes. However, laws are usually interpreted ‘in context’ (and this was a marketing,

not a research context) and thus the Supreme Court ruling throws many challenges in the

face of data mining evangelists who seek to make all healthcare research related data

global. Where marketing stops and gainful research starts has to be carefully determined.

Globally, however privacy laws differ and what may be legal in the US might be

illegal in another country. Especially when implementing transnational healthcare

19
systems, due diligence must be conducted prior to any significant monetary or time

commitment.

ETHICS
Ethics questions start where the law ends. Data mining firms might masquerade

as research firms, extract a lot of diverse data and sell it for their own profits. The question

of how useful such a mining exercise is going to be to the larger society in general must

be asked first. Hospitals are always cash-strapped and look for ways of making money

(other than over-billing insurance companies). A large hospital might well be tempted to

sell the data for ‘research purposes’ on a continuing basis- a step that might be legal in

some states or countries but totally unethical. Primary care physicians have their own

ethical role to play too. Bypassing HIPPA for research related data mining make quick

money, but put all patient privacy and ethics at stake.

With the decrease in the ‘digital divide’ data travels internationally – in seconds.

Most laws restrict data privacy to within the international borders. Data can be easily

traded (and not illegally since laws in most countries have not caught up yet) across

boundaries and very cheaply considering the levels of income in developing (and poor)

nations. Such international data mining ‘cartels’ can easily put large population of a region

to privacy risks without their prior approval.

20
FINDINGS AND P ROPOSED SOLUTIONS

Technology is addictive (and lucrative too), but legal regulations must be in place

to prevent misuse. Currently, there is no international legislation. Only a few advanced

countries have a few laws. Consortia of major countries (includes emerging markets)

must be formed that can deliberate and legislate on transnational and ethical aspects of

data mining. Laws must favor the poorer economies to prevent misuse.

Education is vital in a complex field like data mining. Many large universities have

started offering courses in Data Mining, but a lot more needs to be done to reach the

masses. Data Mining does not have only elitist applications, but it can be used in everyday

life in the near future.

SUMMARY AND CONCLUSION


Data Mining is new technology and is still in its infancy. Applications are minimal

and a very small slice of the pie has been discovered yet. Current applications are

restricted to more experimental areas. Data mining should get easier and more common

place every day. In the near future, however, data mining algorithms should be able to

‘self-tune’ themselves and help researchers, especially in healthcare to eliminate deadly

diseases like cancer. Also, currently most derived data mining patterns are more

mathematical than practical and is virtually ‘rocket science’ for most people not trained to

understand the science.

21
The future should see more technology abstraction layers being put (by developed

application software) that should make use and interpretation of data mining technologies

just like e-mail is today (Borgwardt, 2007).

22
APPENDIX (LIST OF FIGURES)
FIGURE 1

23
FIGURE 2

FIGURE 3

24
BIBLIOGRAPHY
(2012). Retrieved from The Atlantic: http://www.theatlantic.com/technology/archive/2012/04/everything-you-
wanted-to-know-about-data-mining-but-were-afraid-to-ask/255388/

(2014). Retrieved from Microsoft Technet: http://technet.microsoft.com/en-us/library/ms175595.aspx

American Heart Association. (2011). Retrieved from Cost to treat heart disease in United States will triple by 2030:
www.sciencedaily.com/releases/2011/01/110124121545.htm

Bliss-Holtz, J. (2012). THE KIDS’ INPATIENT DATABASE (KID) AND DATA MINING. Informa Healthcare
USA, Inc.

Borgwardt, H.-P. K. (2007). Future trends in data mining. Springer Science+Business Media.

Cohen, B. (2012). REGULATING DATA MINING FOST-SORRELL: USING HIPAA TO RESTRICT


MARKETING USES OF PATIENTS' PRIVATE MEDICAL INFORMATION. Wake Forest Law Review.

Hernandez, D. (2014). Doctors monitor patients remotely via smartphones and fitness trackers. Retrieved from
http://www.pbs.org/newshour/updates/doctors-monitor-patients-vitals-via-smartphones-fitness-trackers

Hian, C. K. (n.d.). Data mining applications in healthcare. Retrieved from Journal of Healthcare Information
Management: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.3184&rep=rep1&type=pdf

Hill, K. (2012). How target figured out a teen girl was pregnant before her father did. Retrieved from
http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-
before-her-father-did/

Huang, Y. C. (2013). Mining association rules between abnormal health examination results and outpatient medical
records. Health Information Management Journal.

Jason Scott Mathias, 1. A. (2013). Development of a 5 year life expectancy index in older adults using predictive
mining of electronic health record data. Journal of the American Medical Informatics Association.

Jigjidsuren, C.-P. S. (2011). A Data-Mining Framework for Transnational Healthcare System. Journal of Medical
Systems.

Johnson, S. (2014). (K. K, Interviewer)

Koh HC1, T. G. (n.d.). US National Library of Medicine National Institutes of Health. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/15869215

Mathers CD, L. D. (2009). Projections of global mortality and burden of disease from 2002 to 2030.

National Institutes of Health. (2014). Retrieved from https://www.nhlbi.nih.gov/health/health-


topics/topics/clinicaltrials/

Peyman Rezaei Hachesu1, M. A. (2013). Cardiac diseases prediction and rule extract with data mining - Classification
techniques. HealthMed.

Wikipedia. (2014). Retrieved from http://en.wikipedia.org/wiki/Data_mining

25
World Health Organization. (2007). Retrieved from Department of Measurement and Health Information Systems:
World Health Statistics.

Xiaohong Cao*1, K. B. (2008). Data mining of cancer vaccine trials: a bird's-eye view. Immunome Research.

Yang, X. W. (2007). Top 10 algorithms in data mining. Retrieved from


http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf

26

View publication stats

You might also like