Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

December 7, 2010

Purpose: To build the best predictive model that accurately diagnoses a biopsy for breast cancer Classify the patient as having malignant or benign case of breast cancer 30 data points derived from a fine needle aspirate (FNA) to produce specimen

University of Wisconsin Medical School Dr. Wolberg, Professor of Oncology Dataset: 569 Patients, 212 Malignant, 357 Benign 10 captured variables + 20 derived data points

Diagnosis Radius Texture Perimeter Area Smoothness Compactness Concavity Concave points Symmetry Fractal dimension

FNA Specimen

No missing values are present Some variables highly correlated Data can be grouped based on size, shape and texture features Scatter plots and box plots of variables show clear separation of Y (diagnosis) for concavity, texture and radius Radom Sampling ( 50 % (TR),30%(VA),20%(TE)) Cutt of = 0.5 for success = Malignant
6

Box plot of concavity for different Y (diagnosis)


7

Reduce variables 30 > 2 or 3 PC1 & PC2 No correlation between variables

K- Nearest Neighbor Logistic Regression Classification Tree

10

Best Predictors Worst Radius Worst Texture Worst Concavity OR Principal components (1 & 2)

% Error Test 1.75

% Error Test 3.79

11

Pruned Tree

% Error Test 7.89

12

Data mining is useful for analyzing FNA specimens for cancer This procedures mentioned here provide high accuracy and rival normal methods of diagnosis

13

You might also like