Professional Documents
Culture Documents
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
Cancer Diagnosis
A Proof of Concept
P. K. SHARMA
Email: from_pramod @yahoo.com
Introduction
Machine learning is branch of Data Science which incorporates a large set of statistical techniques.
These techniques enable data scientists to create a model which can learn from past data and detect
patterns from massive, noisy and complex data sets.
Researchers use machine learning for cancer prediction and prognosis.
Machine learning allows inferences or decisions that otherwise cannot be made using conventional
statistical methodologies.
With a robustly validated machine learning model, chances of right diagnosis improve.
It specially helps in interpretation of results for borderline cases.
Breast Cancer: An overview
# of
Data File Name Description File Name # of records
attributes
The data is in CSV format without any column headers. Columns are interpreted from the associated “names”
files.
Flow of Data
Biopsy
Measurements Reports Evaluation Diagnosis
Procedure
wdbc.data RandomForestClassifier
breast-cancer-
wisconsin.data
StratifiedKFold GridSearchCV pyplot
The mean, standard error, and "worst" or largest (mean of the three largest 3-32. Ten real-valued features are computed for
values) of these features were computed for each image, resulting in 30 each cell nucleus:
features. a) radius (mean of distances from center to
points on the perimeter)
For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits. b) texture (standard deviation of gray-scale
values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the
contour)
h) concave points (number of concave portions
of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" -
1)
wdbc.data
Mean Smoothness,
Mean Texture,
Mean Fractal
Dimension, Mean
Symmetry and
Mean
Compactness do
not appears to
have influence on
classification.
Both type of cases
are spread across.
Data Description : breast-cancer-wisconsin.data
5. Marginal Adhesion 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
(2 for benign, 4
11. Class
for malignant)
breast-cancer-wisconsin.data
Classification
Precision Recall f1-score Support
Report:
0 0.96 0.97 0.97 71
True Malignant 3 40
breast-cancer-wisconsin.data
Training data is divided in 5 folds.
Test data has 140 records
• High accuracy.
Accuracy Score: 0.9643 • Supports the diagnosis.
Classification
Precision Recall f1-score Support
Report:
0 0.98 0.97 0.97 95