Professional Documents
Culture Documents
Project Synopsis On Breast Cancer Detection Using Data Mining
Project Synopsis On Breast Cancer Detection Using Data Mining
Group Members
Team Leader Name –Rishabh Kumar
Contact-8756916748 Team Member 1-RohitKumar
E-mail-rishabhkumar7847@gmail.com Team Member 2-NavneetPrasad
Team Member 3- Manjeet Singh
Introduction
It is a research project on studies of disease related to cancer. Breast cancer is
one of the most common cancers among women worldwide and cancer related
deaths making it a significant public health problem in today’s society. Breast
cancer is one of the most common cancer along with lung and bronchus cancer,
prostate cancer, colon cancer, and pancreatic cancer among others. it is a topic of
research with great value.
Risk factors for breast cancer
Age, drinking alcohol, Having dense breast tissue, Gender genes, Giving birth at an
older age, never being pregnant these are included in risk factors for breast
cancer
The Dataset
The machine learning algorithms were trained to detect breast cancer using the
Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The dataset consists of
features which were computed from a digitized image of a fine needle aspirate
(FNA) of a breast mass.
Objective
In this project we are trying to improve the past result gained by different expert
on WBDC data. We are focusing on increase in accuracy of the cancer data set so
that the model can predict more accurately.
Methodology
Data Mining techniques are used in this project. Data Mining also known as
Knowledge Discovery in Databases or KDD for short, refers to the wide route of
finding knowledge in data, and emphasizes the "high-level" function of particular
data mining methods. The combined aim of the KDD process is to dig up
knowledge from the large repositories. Data mining methods to pull out the
information according to the condition of actions, using a database along with any
vital preprocessing, sampling, and transformations of that database.
Data Selection - Data relevant to the analysis task are retrieved from the
database.
Classification is the route of finding a model that describes the data classes. The
reason is to be able to use this model to guess the class of objects whose class
label is unidentified. This obtained model is based on the analysis of sets of
training data. The obtained model can be presented in the following forms:
Decision Trees
Mathematical Formulae
Neural Networks
Classification - It predicts the class of objects whose class label is unidentified. Its
purpose is to find an obtained model that describes and distinguishes data classes
or concepts. The Derived Model is based on the study set of training data i.e. the
data object whose class label is well known.
Hardware Components:
1. Processor- Core i3
2. Hard Disk – 200GB
3. Memory – 2GB
4. Monitor
Software Components:
1. Windows 7 or Higher
2. Python with Jupyter Notebook
3. Anaconda IDE