Analysis and Identification of Cancerous Factors

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

ANALYSIS AND IDENTIFICATION OF CANCEROUS FACTORS

USING RULE BASED CLASSIFIER OF DATA MINING TECHNIQUES


1
PALLAVI MIRAJKAR, 2G. PRASANNA LAKSHMI
1
Research Scholar, Faculty of Computer Science, Pacific Academy of Higher Education and Research University, Udaipur.
2
Guide, (WOS-A) Andhra University
E-mail: 1pallavi_jamdar@rediffmail.com, 2prasannalakshmigandi@gmail.com,

Abstract - Data mining techniques have been generally utilized as a part of medical decision support systems for forecast
and finding of different diseases with great accuracy. Prediction of cancer at an early stage is a crucial task. In study
conducted by researchers was proven that patient affected by cancer consumes food which contains cancer causing
substances. It includes processed meat, processed sugar, pastry, poor diet, poor intake of fish and vegetables, and so on can
also stimulate the appearance of this dangerous disease like cancer. The proposed study centered on the application of data
mining techniques using rule based algorithm for predicting cancer at an early stage. The aim of the thesis paper is to give an
alert to the user which will save the time and cost of the treatment.

Keywords - Cancer, Data Mining, Rule-based algorithm, Decision Tree

I. INTRODUCTION mining classifier make use of a set of IF-THEN rules


for classification. We can express a rule in the format:
Latest trend of eating fast food is very challenging to
cope with cancer and perhaps cancer treatment too. IF <condition> THEN <<conclusion>>. Here, we
One of the vital reasons for the cancer is food and propose a new algorithm to classify and predict the
drink. Salt and salt preserved food can also cause cancer at an early stage. Early prediction requires an
cancer. As cancer is deadly disease, it is needed to accurate and reliable diagnosis procedure that can be
understand how diet influences the risk of developing used by doctors to distinguish the disease. A decision
cancer. Now a day’s life has become so busy and due tree is a structure that includes a root node, branches,
to which people turn towards the ready meals. There and leaf nodes. Each internal node denotes a test on
are carcinogens foods that cause cancer like an attribute, each branch denotes the outcome of a
microwave popcorn, processed meat, non-organic test, and each leaf node holds a class label.
fruits, potato chips, refined sugar and so on. People
differ in their capability to eliminate cancer causing II. RELATED WORK
agents from their body to which they have been
exposed or to repair damage DNA that was caused by Dr. P. Indra Muthu Meena, Dr. Vani Perumal (2016)
such agents. Therefore it is necessary to analyze the proposed that stomach cancer can be predicted by
food that causes cancer and find out the chance of using C4.5 algorithm and Naïve Bayes algorithm.
developing cancer due to the processed food. They have also analyzed the performance of this
algorithm to predict cancer accuracy. Dr. Vani
Data mining is the process of collecting, searching Perumal et. al. (2016) used training data and real time
and analyzing a vast amount of data to discover data for the prediction of stomach cancer using Naïve
various patterns and relationships. It permits user for Bayes algorithm. They concluded that it is most
analyzing data from many different dimensions and suitable technique to predict the stomach cancer. Dr.
categorize it and summarizing it in different T. Christopher, J. Jamera banu (2016) applied
relationship. network and Naïve Bayes algorithm on data set of
lung cancer in WEKA tool. They analyzed prediction
Data mining framework include: of the lung cancer prediction using classification
 Association – connection of one event to another algorithm such as Naive Bayes, Bayesian network
event. and J48 algorithm. Author has provided the
 Sequence pattern – find out the patterns where performance analysis of the classification algorithms
one event leads to another later event. and earlier notification to the user. Satyam Shukla et.
 Classification – Systematic arrangement in al.(2016) compared different types of studies based
groups or categories according to established on data mining techniques to predict cancer. They
criteria. found that data mining techniques can be used to
 Clustering - finding and visually documenting construct different prediction model which is useful
groups of facts not previously known. [1] to diagnosis disease at an early stage.

Data mining techniques are used to develop a system Tanvi Sharma, Anand Sharma (2016) focused on the
to predict cancer at an early stage. Rule based data different data mining classification techniques by

Proceedings of WRFER International Conference, 16th April 2017, Pune, India


52
Analysis and Identification of Cancerous Factors using Rule Based Classifier of Data Mining Techniques

using WEKA tool and Rapid miner on the public K-means algorithm to separate data relevant to the
health care dataset to analyze. Based on highest skin cancer. Finally they implemented a prediction
accuracy, the best technique for particular data set is system of skin cancer using Lotus Notes.Shweta
chosen. They analyzed performance of data mining Kharya (2012) focused on different types of current
classification technique for health care system. research using data mining techniques to improve the
breast cancer prognosis and diagnosis. Ada et. al.
Neelam Singh and Santosh Kumar Singh Bhadauria (2013) proposed a method of segmentation which
(2016) have introduced cancer prediction system contains chest position, size and hidden portion of the
using data mining. They proposed an approach for the lung area. They used feature extraction, classification
extraction of significant pattern from data warehouse etc. technique of data mining to detect lung cancer.
for efficient prediction of cancer. By using java they Ronak Sumbaly et. al. (2014) proposed data model
implemented the proposed method which can using decision tree of data mining technique to
efficiently and successfully predict the risk level of predict breast cancer at an early stage. They also
cancer. B. Muthazhagan et. al. (2016) explored the discussed different data mining approaches for
recent research on early prediction of lung cancer prediction of breast cancer.
using data mining and image processing. They
observed various data mining techniques such as III. PROPOSED WORK
classification, clustering, prediction etc. These
systems provide most accurate values of prediction of R. Agarwal [1] has introduced association rule
cancer. learning of data mining technique. Association rules
analysis is a technique to uncover how items are
Kumar Anita (2015) has expressed cancer prediction associated to each other. The algorithm is
using four data mining techniques. They used four implemented using rule set as given:
classification algorithms such as Naïve Bayes,
Logical Model Tree, Random forecast, Classification IF A & B THEN C
and Regression Tree. The result observed that
Random forecast classification method performs Where A and B is the conjunction of conditions and
better than the others. Peter Adebayo Idowu et. al. C is the Prediction class. There is no limit on the
(2015) used data mining techniques to predict breast number of conjunction of conditions in the rules, but
cancer. To understand the risk factors of breast cancer there is a constraint on the number of predicted
they studied number of case studies. They compared dimension. Association rules are constructed by
the data with two different methods like Naïve Bayes identifying data for frequent if/then patterns and
and J48 decision tree. The result showed that J48 identification of the most important relationship by
decision tree is best model to predict the risk of breast using the criteria support and confidence.
cancer.
Cancer risk factors and its domain:
Tasnuba Jesmin et.al (2013) collected 150 people
data and preprocessed, and then clustered the relevant (Table No.1: Risk Factors)
and non relevant data for brain cancer using K-means
algorithm. They developed a tool for brain cancer
detection using data mining technique which saves
time reduce the cost.

Er. Tapas Ranjan Baitharu et. al. (2015) analyzed


data classification accuracy by comparing different
classification techniques of data mining using lung
cancer data. They compared predictive performances
of popular classifiers quantitatively. Authors also
performed computer simulation on dataset of lung
cancer. Jaimini Majali et. al. (2015) used data mining
technique for detection of cancer. They also used
decision tree algorithm of data mining to predict the
cancer at an early stage. They applied frequent
pattern growth algorithm for cancer detection. They
observed that the proposed work will give high
accuracy of prediction of cancer. Kawsar Ahmed et.
al. (2013) compared different supervised learning
algorithm to predict best classifier. They have
collected data of 200 people and preprocessed. After
preprocessing the data they applied clustering using

Proceedings of WRFER International Conference, 16th April 2017, Pune, India


53
Analysis and Identification of Cancerous Factors using Rule Based Classifier of Data Mining Techniques

Attribute and Score Values: Step 3: Compute the Risk Prediction by using the
rule set (Table No. 2)
(Table No.2: Score Values)
Step 4: Compute association rule using IF ‘A1
…An’ THEN R rule. Here ‘A1…An’ are
conjunctions of conditions that may be satisfied or
unsatisfied set of predicted dimensions R. The rule
based classification technique accelerates to four
statuses TP, TN, FP, and FN as defined below:
TP: - indicates patient has cancer and it is correctly
predicted.
FP: - indicates patient has cancer and it is incorrectly
predicted.
TN: -indicates patient does not have cancer and it is
correctly predicted.
FN: -indicates patient does not have cancer and it is
incorrectly predicted.

Step 5: Calculate the confidence to measure the


number of times the if/then statements have been
found true.
Confidence =

Step 6: Determine support to identify how many


times items appear in database.
Support =

Step 7: Compute the value of cover to measure the


Decision Tree: A decision tree can easily be
number of tuples covered by conjunction condition
transformed to a set of rules by mapping from the
having predicted dimensions.
root node to the leaf nodes one by one.

Cover=

Step 8: Calculate the accuracy of prediction


Accuracy=

Step 9: Highlight the result.

CONCLUSION

Cancer is one of the important causes of death in the


world; prediction of cancer at an early stage is the
super key for better treatment and survival of the
patient. Thesis paper evaluates the accuracy for
predicting of cancer based on the attributes provided
in the table no. (1). There are various data mining
techniques that can be accustomed for the cancer
prediction. In thesis paper, researchers analyzed
cancer data using rule based classification techniques
to predict the cancer and so cipher the accuracy of the
result. The objective of this paper is to give the earlier
Proposed algorithm steps are as given below: warning to the users. This will ultimately save the
time and reduce the cost of treatment and conjointly
Step 1: Take input information from users. will increase the prospect of survivability. This
prediction algorithm will help to the patients as well
Step 2: Initialize weights with some appropriate as to the medical practitioners to predict cancer at an
values. early stage. Further studies can be conducted on the

Proceedings of WRFER International Conference, 16th April 2017, Pune, India


54
Analysis and Identification of Cancerous Factors using Rule Based Classifier of Data Mining Techniques

prediction of cancer for particular organ of the body [11] Shweta Kharya “Using Data Mining Techniques for
Diagnosis and Prognosis of Cancer Disease” International
using classification techniques.
Journal of Computer Science, Engineering and Information
Technology (IJCSEIT), Vol.2, No.2, April 2012.
REFERENCES [12] http://naturalon.com/10-of-the-most-cancer-causing-foods/
[13] Peter Adebayo Idowu, Kehinde Oladipo Williams, Jeremiah
[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association Ademola Balogun and Adeniran Ishola Oluwaranti “Breast
rules between sets of items in large databases. In the Proc. of Cancer Risk Prediction Using Data Mining Classification
the ACM SIGMOD Int’l Conf. on Management of Data Techniques”, Transactions on Networks and
(ACM SIGMOD ‘93), Washington, USA, May 1993. Communications, Volume 3 No 2, April (2015); pp: 1-11.
[2] https://en.wikipedia.org/wiki/Data_mining. [14] Tasnuba Jesmin, Kawsar Ahmed, Md. Zamilur Rahman, Md.
[3] Dr. P. Indra Muthu Meena, Dr. Vani Perumal “Performance Badrul Alam Miah “Brain Cancer Risk Prediction Tool Using
of C4.5 and Naïve Bayes Algorithm to Predict Stomach Data Mining” International Journal of Computer Applications
Cancer - An analysis” International Journal of Advanced (0975 – 8887) Volume 61– No.12, January 2013.
Research in Computer and Communication Engineering ISO [15] Er. Tapas Ranjan Baitharu, Dr.Subhendu Kumar Pani “A
3297:2007 Certified Vol. 5, Issue 11, November 2016. Comparative Study of Data Mining Classification Techniques
[4] B. Muthazhagan, T. Ravi “an early diagnosis of lung cancer using Lung Cancer Data” International Journal of Computer
disease using data mining and medical image processing Trends and Technology (IJCTT) – volume 22 Number 2–
methods: A survey” Middle-East Journal of Scientific April 2015.
Research 24(10): 3263-3267, 2016. [16] Tanupriya Choudhury, Prof.Dr. Vivek Kumar, Dr. Darshika
[5] Neelam Singh and Santosh Kumar Singh Bhadauria “Early Nigam “Intelligent Classification & Clustering Of Lung &
Detection of Cancer Using Data Mining” International Oral Cancer through Decision Tree & Genetic Algorithm”
Journal of Applied Mathematical Sciences ISSN 0973-0176 International Journal of Advanced Research in Computer
Volume 9, Number 1 (2016), pp. 47-52. Science and Software Engineering , Volume 5, Issue 12,
[6] Dr. Vani Perumal, Shibu Samuel, Dr. P. Indra Muthu Meena December 2015 ISSN: 2277 128X.
“Application of Training Dataset using Naïve Bayes [17] Jaimini Majali, Rishikesh Niranjan, Vinamra Phatak, Omkar
Classifier for Prediction of Stomach Cancer in Female Tadakhe “Data Mining Techniques For Diagnosis And
Population” International Journal of Scientific Engineering Prognosis Of Cancer” International Journal of Advanced
and Technology Research, ISSN 2319-8885 Vol.05,Issue.45 Research in Computer and Communication Engineering Vol.
November-2016. 4, Issue 3, March 2015.
[7] Dr. T. Christopher, J. Jamera banu “Study of Classification [18] Ada, Rajneet Kaur “A Study of Detection of Lung Cancer
Algorithm for Lung Cancer Prediction” IJISET - Using Data Mining Classification Techniques” , International
International Journal of Innovative Science, Engineering & Journal of Advanced Research in Computer Science and
Technology, Vol. 3 Issue 2, February 2016. ISSN 2348 – Software Engineering, Volume 3, Issue 3, March 2013 ISSN:
7968. 2277 128X.
[8] Kawsar Ahmed, Tasnuba Jesmin, Md. Zamilur Rahman [19] Ronak Sumbaly, N. Vishnusri, S. Jeyalatha “Diagnosis of
“Early Prevention and Detection of Skin Cancer Risk using Breast Cancer using Decision Tree Data Mining Technique”
Data Mining” International Journal of Computer Applications International Journal of Computer Applications (0975 –
(0975 – 8887) Volume 62– No.4, January 2013. 8887) Volume 98– No.10, July 2014.
[9] V.Krishnaiah “Diagnosis of Lung Cancer Prediction System [20] Kumar Anita “A Study on Cancer Perpetuation Using the
Using Data Mining Classification Techniques” International Classification Algorithms” International Journal of Advance
Journal of Computer Science and Information Technologies, Research in Computer and Communication, 2015.
Vol. 4 (1) 2013, 39 – 45 www.ijcsit.Com ISSN: 0975-9646. [21] Williams, Kehinde, et al. "Breast cancer risk prediction using
[10] Cancer Prevention and Control Retrieved data mining classification techniques." Transactions on
fromhttp://www.cdc.gov/cancer/dcpc/resources / features/ Networks and Communications3.2, 2015.
worldcancerday/ Retrieved on: 15 November 2013.



Proceedings of WRFER International Conference, 16th April 2017, Pune, India


55

You might also like