Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Department of

Computer Science and Engineering

Title: Implement Support Vector Machine


Classifier

Green University of Bangladesh


1 Objective(s)
• To Learn to implement Support Vector Machine (SVM) algorithm in python.
• To learn the application of SVM in different type of data-sets.

2 Problem analysis
Support vector machine is a discriminatory classifier that is formally planned by a separating hyper-plane. It
represents examples as points in space that are mapped so that the point of different categories is separated by
a gap as wide as possible.A classification task usually involves separating data into training and testing sets.
Each instance in the training set contains one target value" (i.e. the class labels) and several attributes" (i.e.
the features or observed variables). The goal of SVM is to produce a model (based on the training data) which
predicts the target values of the test data given only the test data attributes.

Figure 1: Linearly separable data

linearly-separable data:
we want to find the hyper-plane (i.e. decision boundary) linearly separating our classes. Our boundary will have
equation: wT x + b = 0. Anything above the decision boundary should have label 1. i.e., wT xi + b > 0 will have
corresponding yi = 1. Similarly, anything below the decision boundary should have label -1. i.e.,wT xi + b < 0
will have corresponding yi = -1[fig.1].

Nonlinear decision boundary:


Mapping your data vectors, xi, into a higher-dimension (even infnite) feature space may make them linearly
separable in that space (whereas they may not be linearly separable in the original space).
Kernel Trick: Because we’re working in a higher-dimension space (and potentially even an infinite-dimensional
space), calculating xTi , xTj may be intractable. How- ever, it turns out that there are special kernel functions
that operate on the lower dimension vectors xi and xj to produce a value equivalent to the dot- product of the
higher-dimensional vectors[fig.2].
Figure 2: Non-linearly separable data

3 Algorithm
Algorithm 1: SVM Algorithm
Input: A labeled data-set D ; A set C of all the class labels, and the maximum number of iterations T.
Output: Find the class of Du
/* SVM in python */
1 Initialize data from file
2 for t=1 to T do
3 for each c ∈ C and d ∈ D do
4 Split(dx,dy) into train,test
5 end
6 for each train set do
7 mapped the data vector
8 end
9 Find(hyper Plane)
10 for each d ∈ Du do
11 findClass(dt )
12 end
13 end
4 Flowchart

5 Implementation in Python
1
2 /* SVM Code */
3 import pandas as pd
4 dt= pd.read_csv(’iris.csv’)
5
6 x= dt.iloc[:,[0,1,2,3]].values
7 y= dt.iloc[:,−1].values
8 from sklearn.model_selection import train_test_split
9 x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=5)
10 from sklearn.svm import SVC
11 clf= SVC( kernel=’linear’ )
12 clf.fit(x_train,y_train)
13
14 pr=clf.predict(x_test)
15 #print(pr)
16 #print(x_test)
17 count=0
18 match=0
19 for i in range(0,len(pr)):
20 if pr[i]==y_test[i]:
21 match=match+1
22 # else:
23 #print(pr[i]+" "+ y_test[i])
24 count=count+1
25
26 print(f"Number of Test Case: {count}")
27 print(f"Number of Correctly Identified Class: {match}")
28 print(f"Accuracy ={match/count*100}%")
29 #print(clf.predict([[4.9,2.4,3.3,1]]))
30 from sklearn.metrics import confusion_matrix
31 cm=confusion_matrix(y_test,pr)
32 from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
33 disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=clf.classes_)
34 disp.plot()

6 Sample Input/Output (Compilation, Debugging & Testing)


Input:
Here, the input is taken from a CSV file, iris.csv
Download link is given here: https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv.
Output:

7 Discussion & Conclusion


SVM is very good when the data is in an unknown state. It works well with even disorganized data like text,
Images, and trees. The kernel trick is the real backbone of SVM. With a proper kernel function, we can solve
any complicated problem. It scales approximately well to high-dimensional data.

8 Lab Task (Please implement yourself and show the output to the
instructor)
1. Implement SVM algorithm on categorical data.

9 Lab Exercise (Submit as a report)


• Analyze a new data-set by using SVM, and explain how the percentages of test and training data-sets
affect the result.

10 Policy
Copying from internet, classmate, seniors, or from any other source is strongly prohibited. 100% marks will be
deducted if any such copying is detected.

You might also like