Professional Documents
Culture Documents
WBCD Slide
WBCD Slide
Purpose: To build the best predictive model that accurately diagnoses a biopsy for breast cancer Classify the patient as having malignant or benign case of breast cancer 30 data points derived from a fine needle aspirate (FNA) to produce specimen
University of Wisconsin Medical School Dr. Wolberg, Professor of Oncology Dataset: 569 Patients, 212 Malignant, 357 Benign 10 captured variables + 20 derived data points
Diagnosis Radius Texture Perimeter Area Smoothness Compactness Concavity Concave points Symmetry Fractal dimension
FNA Specimen
No missing values are present Some variables highly correlated Data can be grouped based on size, shape and texture features Scatter plots and box plots of variables show clear separation of Y (diagnosis) for concavity, texture and radius Radom Sampling ( 50 % (TR),30%(VA),20%(TE)) Cutt of = 0.5 for success = Malignant
6
10
Best Predictors Worst Radius Worst Texture Worst Concavity OR Principal components (1 & 2)
11
Pruned Tree
12
Data mining is useful for analyzing FNA specimens for cancer This procedures mentioned here provide high accuracy and rival normal methods of diagnosis
13