Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Artificial Intelligence

Chapter : Classification
Marouane Ben Haj Ayech
Outline
• Presentation
• KNN
• Learning process
• Prediction process
• Evaluation

2
Presentation
• Prediction
Prediction task Description Output Nature Examples

Assigning data points to predefined - Email spam classification


categories or classes based on their Discrete categories (spam or not spam).
Classification
features, typically used for supervised or labels - Image classification (cat,
learning. dog, car, etc.).

• Learning
Learning Type Dataset Type Prediction Tasks Learning models
K-Nearest Neighbors (KNN)
Naïve Bayes
Supervised Labeled Classification Decision Tree
Logistic Regression
Réseau de neurones

3
Presentation
Classification problem
x=house=(surface , nb rooms) y=class label ∈ {0 = ′cheap′,′expensive′}
input output

Prediction of class for a


Labeled training dataset
new house
classe cheap Model classe cheap
classe expensive Model
surface surface classe expensive
surface

Learning Prediction
process process

nb rooms nb rooms
nb rooms

4
K Nearest Neighbors (KNN)
Technique Learning Process Prediction Process Hyperparameters
- Given a new data point x, KNN computes
the distances between all training data
a non-parametric - K : The number of K
points and x
technique nearest neighbors
- The points are sorted based on their
KNN does’nt learn a - Distance metric
KNN distances (ascending sort)
model (Euclidean, …)
- x takes the dominant class label of the set
of K nearest neighbors
Model
- The training dataset

5
KNN
• Learning process

Model of classifier
classe cheap Model
classe expensive classe cheap
surface surface classe expensive

nb rooms nb rooms
KNN
• Prediction process K=3

Model classe cheap


classe expensive
surface surface K nearest
surface surface
new new new neighbors
house employee house

nb rooms nb rooms nb rooms nb rooms

surface new
house Major class
is cheap

nb rooms
Evaluation
• The evaluation is performed using a test dataset that has a .
• most used metrics to evaluate the model (classifier) performance are:
• Confusion matrix
• Accuracy score
• Recall
• Precision

• In binary classification, we have at first to define :


• " Negative" class
• " Positive" class

• In our example of houses :


• “cheap" is the "negative" class
• “expensive" is the "positive" class
Evaluation
• Confusion matrix :
• It is a matrix composed of :
• True Negatives (TN): The number of “cheap" houses correctly classified as “cheap"
• False Positives (FP): The number of “cheap" houses incorrectly classified as “expensive"
• False Negatives (FN): The number of “expensive" houses incorrectly classified as “cheap“
• True Positives (TP): The number of “expensive" houses correctly classified as “expensive"

Predicted

cheap (N) expensive (P)


Actual Negatives :
cheap (N) TN FP TN+FP
Actual
Actual Positives :
expensive (P) FN TP FN+TP

Predicted Negatives : Predicted Positives :


TN + FN FP+TP
Evaluation
• Accuracy score :
• It is a measure of how often the classifier correctly predicts both “cheap" and
“expensive" houses.
• Formula : (TP + TN) / (TP + TN + FP + FN)

Example Predicted

cheap (N) expensive (P)


Actual Negatives : Accuracy
cheap (N) 8 2 TN+FP = 10
= (8+6)/(2+8+4+6)
Actual = 14/20
Actual Positives :
expensive (P) 4 6 FN+TP = 10 =70%

Predicted Negatives : Predicted Positives :


TN + FN = 12 FP+TP = 8

You might also like