Professional Documents
Culture Documents
ML Classification
ML Classification
CLASSIFICATION
BOOTCAMP
CHEATSHEET
1. CONFUSION MATRIX/
CLASSIFICATION REPORT
TRUE CLASS
TYPE I ERROR
+ –
(TP) (FP)
TYPE II ERROR
2.A Concept
Data set is generally divided into 75% for training and 25% for testing.
Training set: used for model training.
Testing set: used for testing trained model. Make sure that testing dataset has never been seen by the
trained model before.
3. LOGISTIC REGRESSION
PASS/FAIL
Logistic
Regression Class 1
Model
0.5
Class 0 Threshold
HOURS OF STUDYING
Step #1: Start with a Linear equation: Step #2: Apply Sigmoid function to get probability:
4. K-NEAREST NEIGHBORS
SIZE: SMALL
180
SIZE: LARGE
HEIGHT (CMS)
L
170 S
160
POINT CLASSIFIED
150 AS BLUE (S SIZE)
55 60 65 70 WEIGHT (KGS)
4.B K-nearest neighbors in sklearn
from sklearn.linear_model import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=3, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
AGE
Savings > $1M
Yes No
45 YEARS
Yes No
Class #1 Class #0
$1 MILLION SAVINGS
7. NAIVE BAYES
SUPPORT VECTORS
SAVINGS
AGE