Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

Pattern Recognition (60014703-3)

Lecture 3 part 2

Classifiers
(Support Vector Machines, Decision Trees,
Nearest Neighbor Classification)

Instructor: Amany Al Luhaybi


1
Review of Concepts

2
n-fold cross-validation
•The available data is partitioned into n equal-size disjoint
subsets.
.‫ متساوية الحجم‬n ‫•يتم تقسيم البيانات المتوفرة إلى مجموعات فرعية مفككة‬
•Use each subset as the test set and combine the rest n-1
subsets as the training set to learn a classifier.
n-1 ‫•استخدم كل مجموعة فرعية كـ مجموعة االختبار واجمع المجموعات الفرعية‬
.‫المتبقية كـ مجموعة التدريب لتعلم مصنف‬
•10-fold and 5-fold cross-validations are commonly used.
.‫ أضعاف عبر التحقق من صحة‬5 ‫ أضعاف و‬10 ‫•وتستخدم عادة‬
•This method is used when the available data is not large.
•‫يتم استخدام هذا األسلوب عندما ال تكون البيانات المتوفرة كبيرة‬.
3
Outlier data points

4
Support Vector Machine
SVM

5
Main Ideas

6
Main Ideas

• If we have a new data near the threshold, it will be classified


as obese
‫ سيتم تصنيفها‬،‫إذا كان لدينا بيانات جديدة بالقرب من العتبة‬
‫كما السمنة‬

7
Main Ideas
• We can do better in choosing the threshold such that:

8
Main Ideas

• Therefore, it will be classified as not obese 9


Main Ideas

10
Main Ideas

11
Main Ideas

12
Main Ideas

13
Main Ideas

Q: how do we know which soft margin is better?

14
Main Ideas

15
Main Ideas

16
Main Ideas

17
Main Ideas

18
Main Ideas

• If x= 0.5 => y=0.25 19


Main Ideas

20
Main Ideas

21
Main Ideas behind
Support Vector Machine

22
Main Ideas behind
Support Vector Machine

23
Main Ideas behind
Support Vector Machine

24
Main Ideas behind
Support Vector Machine

25
Main Ideas behind
Support Vector Machine

26
Main Ideas behind
Support Vector Machine

27
Mathematics behind Support
Vector Machine
‫الرياضيات وراء دعم آلة ناقالت‬

28
Tennis example

Temperature

Humidity
= play tennis
= do not play tennis
29
Linear Support Vector
Machines
Data: <xi,yi>, i=1,..,l
xi  Rd
yi  {-1,+1}

x2

=+1
=-1

30
x1
Linear SVM

Data: <xi,yi>, i=1,..,l


xi  Rd
yi  {-1,+1}

f(x) =-1
=+1

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.


Can be expressed as w•x+b=0 (remember the equation for a hyperplane
from algebra!)
Our aim is to find a hyperplane that correctly classify our data.
31
32
33
Key Points: Linear SVM Mathematically
 Goal: 1) Correctly classify all training data
if yi = +1
wxi  b  1
if yi = -1
wxi  b  1 for all i
yi ( wxi  b)  1
2) Maximize the Margin
same as

 We can formulate a Quadratic programming Optimization Problem

34
Constrained Optimization Problem-
dual problem

Characteristics:

•Many of the i are zero


•w is a linear combination of a small number of data points
•xi with non-zero i are called support vectors (SV) ‫الخصائص‬:
The decision‫صفر‬
boundary
‫الـصطناـعي‬is‫ ا‬determined only by the SV
‫العديد من الذـكاء‬
w‫عبارـة عنمجموـعة خطية من عـدد قـليلمن نـقـاط لاــبيانات‬
‫ ويسمى‬xi ‫( مع منظمة العفو الدولية غير صفر ناقالت الدعم‬SV) 35
‫ يتم تحديد حد القرـار فقط من قبل‬SV
36
Problems with linear SVM

=-1
=+1

What if the decision function is not a linear?


37
Non-linear SVMs: Feature spaces
 General idea: the original input space can always be mapped to
some higher-dimensional feature space where the training set is
separable:
 ‫ يمكن دائما ً تعيين مساحة اإلدخال األصلي إلى بعض مساحة ميزة ذات األبعاد األعلى حيث مجموعة التدريب هو‬:‫فكرة عامة‬
‫فصل‬:

Φ: x → φ(x)

38
39
Examples of Kernel Functions
• Polynomial kernel with degree d

• Radial basis function kernel with width 

– The feature space is infinite-dimensional

40

You might also like