Professional Documents
Culture Documents
Supervised Learning _ SVM _DT
Supervised Learning _ SVM _DT
By Ahlem Marzouk
1
Definition
2
Data
Training set
Testing set
[<=80%]
4
INPUT OUTPUT INPUT OUTPUT
X Train
Algorithm Algorithm
Y Prediction
After Training
Evaluation
Y Train Metric
X Test Y Test
5
Supervised
Learning
Classification Regression
KNN
SVM
Linear
Regression
Decision Tree 6
Classification
Classification is the process of unknown data.
classifying input data into distinct
This process is used to detect specific
classes .
types of new observations, such as
It entails training a model using a determining if an email is spam,
labeled dataset to understand the link whether we have normal or abnormal
between inputs and outputs, allowing it traffic behavior, whether we have an
to make predictions about new, attack or normal activity, and so on...
7
Evaluation Metrics
❑ To evaluate the performance of our model, we compare predicted results
with actual labels.
❑ Commonly used measures for classification are accuracy, precision and
recall.
❑ A confusion matrix provides an easy summary of the prediction results
in a classification problem.
In this matrix, correct and incorrect predictions are summarized in a table
with their values and divided by classes.
8
Evaluation Metrics
Accuracy: refers to the percentage of instances correctly
predicted by the model.
9
Example:
10
Example:
11
KNN [K-Nearest Neighbor]
12
KNN [K-Nearest Neighbor]
Concept
So, points that are close to each other are called "Neighbors".
13
KNN [K-Nearest Neighbor]
Concept
So, points that are close to each other are called "Neighbors".
14
KNN [K-Nearest Neighbor]
15
KNN [K-Nearest Neighbor]
16
KNN [K-Nearest Neighbor]
17
KNN [K-Nearest Neighbor]
18
KNN [K-Nearest Neighbor]
19
KNN [K-Nearest Neighbor]
22
KNN [K-Nearest Neighbor]
Choice of the number of neighbors K:
To avoid classification ties, it
is advised that K be an odd
integer, and cross-validation
techniques can assist you in
determining the best k for
your data set.
23
KNN [K-Nearest Neighbor]
Advantages of KNN:
24
KNN [K-Nearest Neighbor]
Disadvantages of KNN:
❑ KNN does not need a model to make a prediction. So, the cost is that it must keep all
observations in memory in order to make its prediction. This leads to carefully choose the
size of the training set.
❑ The choice of the distance calculation method and the number of neighbors k may not be
obvious. We need to try out several combinations and tuning the algorithm to get a
satisfactory result.
26
Linear SVM
28
Linear SVM
Core principle of the algorithm:
In the SVM algorithm we define:
o The point closest to the lines of the two
classes, called support vectors.
o The distance between the vectors and the
hyperplane, called the margin.
The objective of SVM is to find a separator that
maximizes this margin.
The hyperplane with the maximum margin is
called the optimal hyperplane.
29
Linear SVM
Mathematical Approach [the two-dimensional linear
case]:
30
Linear SVM
Mathematical Approach [the two-dimensional linear
case]:
We consider the vectors w=(a,-1) and X=(x,y), then we get the equation we will use to define
the two classes:
31
Linear SVM
Mathematical Approach [the two-dimensional linear
case]:
The next step is to decide whether a new point belongs to one of the two classes by comparing
the result of the equation after replacing it with the point's coordinates.
The point above or on the hyperplane will be classed +1, and the point below the hyperplane
will be classed -1.
32
Non-Linear SVM
In this case, it is impossible to draw a
single line or hyperplane to classify the
points correctly.
33
Non-Linear SVM
This tree contains nodes that can be divided into three categories:
o Internal nodes (or decision nodes): nodes that have descendants (or
children), who are themselves nodes.
35
Decision Tree [DT]
Root
Node
Yes No
Inner End
Node. Node.
Yes No
End Inner
Node. Node.
End Node. 36
Decision Tree [DT]
Does the
Example applicant have a
permanent job?
Yes No
Does the
candiadte have a No credit.
house?
Yes No
Continue the
discussion. 37
Decision Tree [DT]
The purpose of decision trees is to clean up the data, it employs a "divide and rule"
technique.
It starts with all the data and then asks a series of questions (splits) to categorize the
data points. When a new node is added in the tree, we enhance the purity of the
groups as much as possible. Of course, this procedure continues until most data
points are categorized and we have a clear answer.
38
Decision Tree [DT]
The most popular splitting criterion for decision tree models are information gain and
Gini impurity. They help to evaluate the quality of each test condition and how well it
will be able to classify samples into a class.
They both have a goal: splitting the data into more homogeneous subsets, but they use
different approaches to measure impurity.
40
Decision Tree [DT]
Information gain is the reduction in entropy after splitting on a certain characteristic.
It prefers splits that provide more balanced distributions of class labels in the resulting
subsets.
Entropy: evaluates the impurity in the dataset. If the entropy decreases, this means
that the data is more and more pure.
A Gini impurity of zero implies that all the items in the dataset belong to the same class,
whereas a Gini impurity of one shows that the items are uniformly distributed across all
categories.
41
Decision Tree [DT]
Limitations:
1. Unstable: Small changes in data might result in significantly distinct tree structures,
affecting predictions.
2. Overfitting: Decision trees can become very complicated, fitting the training data
effectively but failing to generalize to new data.
3. Biased towards features with lots of splits: Features with a higher number of
category levels can influence decisions more than other pertinent features.
42
Decision Tree [DT]
Advantages:
1. Easy interpretation for simple trees.
2. Robust against outliers and missing data: Decision trees don't require a lot of data
preparation to handle outliers and missing values.
3. Nonparametric.
4. Flexible: They can deal with category and numerical data, which allows them
to adapt to different kinds of problems.
5. Fast Training.
43
Decision Tree [DT]
Example
44
The set S with 8 examples
Root Node [4 (yes),4 (No)] 50% Yes 50% No
The split is based on the Age
Internal Node
Terminal Node
Terminal Node