Lecture 3 Part 2: Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)

Pattern Recognition (60014703-3)
Lecture 3 part 2
Classifiers
(Support Vector Machines, Decision Trees,
Nearest Neighbor Classification)
Instructor: Amany Al Luhaybi

1
Review of Concepts
2
n-fold cross-validation
•The available data is partitioned into n equal-size disjoint
subsets.
.‫ متساوية الحجم‬n ‫•يتم تقسيم البيانات المتوفرة إلى مجموعات فرعية مفككة‬
•Use each subset as the test set and combine the rest n-1
subsets as the training set to learn a classifier.
n-1 ‫•استخدم كل مجموعة فرعية كـ مجموعة االختبار واجمع المجموعات الفرعية‬
.‫المتبقية كـ مجموعة التدريب لتعلم مصنف‬
•10-fold and 5-fold cross-validations are commonly used.
.‫ أضعاف عبر التحقق من صحة‬5 ‫ أضعاف و‬10 ‫•وتستخدم عادة‬
•This method is used when the available data is not large.
•‫يتم استخدام هذا األسلوب عندما ال تكون البيانات المتوفرة كبيرة‬.
3
Outlier data points
4
Support Vector Machine
SVM
5
Main Ideas
6
Main Ideas
• If we have a new data near the threshold, it will be classified

as obese
‫ سيتم تصنيفها‬،‫إذا كان لدينا بيانات جديدة بالقرب من العتبة‬
‫كما السمنة‬
7
Main Ideas
• We can do better in choosing the threshold such that:
8
Main Ideas
• Therefore, it will be classified as not obese 9

Main Ideas
10
Main Ideas
11
Main Ideas
12
Main Ideas
13
Main Ideas
Q: how do we know which soft margin is better?
14
Main Ideas
15
Main Ideas
16
Main Ideas
17
Main Ideas
18
Main Ideas
• If x= 0.5 => y=0.25 19

Main Ideas
20
Main Ideas
21
Main Ideas behind
22
Main Ideas behind
23
Main Ideas behind
24
Main Ideas behind
25
Main Ideas behind
26
Main Ideas behind
27
Mathematics behind Support
Vector Machine
‫الرياضيات وراء دعم آلة ناقالت‬
28
Tennis example
Temperature
Humidity
= play tennis
= do not play tennis
29
Linear Support Vector
Machines
Data: <xi,yi>, i=1,..,l
xi  Rd
yi  {-1,+1}
x2
=+1
=-1
30
x1
Linear SVM
Data: <xi,yi>, i=1,..,l

xi  Rd
yi  {-1,+1}
f(x) =-1
=+1
All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

Can be expressed as w•x+b=0 (remember the equation for a hyperplane
from algebra!)
Our aim is to find a hyperplane that correctly classify our data.
31
32
33
Key Points: Linear SVM Mathematically
 Goal: 1) Correctly classify all training data
if yi = +1
wxi  b  1
if yi = -1
wxi  b  1 for all i
yi ( wxi  b)  1
2) Maximize the Margin
same as
 We can formulate a Quadratic programming Optimization Problem
34
Constrained Optimization Problem-
dual problem
Characteristics:
•Many of the i are zero

•w is a linear combination of a small number of data points
•xi with non-zero i are called support vectors (SV) ‫الخصائص‬:
The decision‫صفر‬
boundary
‫الـصطناـعي‬is‫ ا‬determined only by the SV
‫العديد من الذـكاء‬
w‫عبارـة عنمجموـعة خطية من عـدد قـليلمن نـقـاط لاــبيانات‬
‫ ويسمى‬xi ‫( مع منظمة العفو الدولية غير صفر ناقالت الدعم‬SV) 35
‫ يتم تحديد حد القرـار فقط من قبل‬SV
36
Problems with linear SVM
=-1
=+1
What if the decision function is not a linear?

37
Non-linear SVMs: Feature spaces
 General idea: the original input space can always be mapped to
some higher-dimensional feature space where the training set is
separable:
 ‫ يمكن دائما ً تعيين مساحة اإلدخال األصلي إلى بعض مساحة ميزة ذات األبعاد األعلى حيث مجموعة التدريب هو‬:‫فكرة عامة‬
‫فصل‬:

Φ: x → φ(x)
38
39
Examples of Kernel Functions
• Polynomial kernel with degree d
• Radial basis function kernel with width 
– The feature space is infinite-dimensional
40

Lecture 3 Part 2: Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 Part 2: Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)

Uploaded by

Copyright:

Available Formats

Pattern Recognition (60014703-3)

Instructor: Amany Al Luhaybi

• If we have a new data near the threshold, it will be classified

• Therefore, it will be classified as not obese 9

Q: how do we know which soft margin is better?

• If x= 0.5 => y=0.25 19

Data: <xi,yi>, i=1,..,l

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

 We can formulate a Quadratic programming Optimization Problem

•Many of the i are zero

What if the decision function is not a linear?

• Radial basis function kernel with width 

– The feature space is infinite-dimensional

You might also like