Machine Learning SVM - Supervised

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

upport Vector Machines (SVM)

SVM stands for a support vector machine. SVM's are typically used for
classification tasks similar to what we did with K Nearest Neighbors. They
work very well for high dimensional data and are allow for us to classify data
that does not have a linear correspondence. For example classifying a data
set like the one below.

Attempting to use K Nearest Neighbors to do this would likely give us a very


low accuracy score and is not favorable. This is where SVM's are useful.

What a SVM Does?


A SVM has a large list of applicable uses. However, in machine learning it is
typically used for classification. It is a powerful tool that is a good choice for
classifying complicated data with a high degree of dimensions(features).
Note that K-Nearest Neighbors does not perform well on high-dimensional
data.

How A Support Vector Machine Works


In short a support vector machine works by dividing data into multiple
classes using something called a hyper-plane. A hyper plane is a fancy word
for something that is straight that can divide data points. In 2D space a
hyper-plane is simply a line, in 3D space a hyper-plane is a plane. In any
space higher than 3D it is simply called a hyper-plane.
Here’s an example of a hyper-plane for the data points on the 2D graph
below.

Hyper-Planes
When we create a hyper-plane we need to do the following. We must pick
two points that are known as our support vectors. These points must be the
two closest points to the hyper-plane and their distance from the hyper-
plane must be identical. In the example above we can see that the two
circled points are our support vectors and their distance to the hyper-plane
is the same, they are also the closest points. With this rule we can actually
create an infinite amount of hyper planes (see below).
All of the images above are valid hyper-planes.
Picking a Hyper Plane
Once we create a hyper-plane we are going to use it to classify our data. If
a test point is on the left side of the plane we would classify it as red (in our
examples above) and if it is on the right we would classify it as green. So
how can we pick a hyper-plane that will give us the best classification
predictions?
Have a look at the hyper-planes above and determine which you think would
give the best classification for a mystery test point. What do you notice
about that hyper-plane?
Well the best possible hyper-plane would be the first image on this page.
Notice the distance between the support vectors and the hyper-plane is far
greater than the other generated hyper-planes.
When we pick a hyper-plane we want to pick one that has the greatest
possible margin.

Margin
The margin is the distance that separates all of the points in our test data.
The blue lines below show you the margin for this particular data and hyper-
plane. Typically the greater our margin the better our classification will be.
Note: Imagine the blue lines are parallel to the black…

Kernels
So you now have a very basic understanding of how a SVM works. Seems
pretty simple in theory, but in practice we can run into a lot of issues.
Let’s say our data isn’t as pretty and we have some points that look like
this:

Can you determine which hyper-plane would be the best for this data? Even
if you could it would make a horrible classifier. This is where we introduce
something called kernels.
Kernels provide a way for us to create a hyper-plane for data like seen
above. We use a kernel to bring our data up to a higher dimension (in this
case from 2D->3D). We hope that by doing this we will have our points
plotted in a way that we can divide them using a hyper-plane.
By applying a kernel to our data above we hope to get something that looks
like the following:
You can see that we can now divide our points with a plane in 3D. By
applying the kernel our data has become separable.

What Is A Kernel?
A kernel is simply a function that takes as input our features (x1, x2 in our
example) and returns a value equal to the third dimensional coordinate (x3).
An example of a kernel copuld be the equation:
(x1)^2 + (x2)^2 = x3

\
Typically when we use a kernel we use a pre-existing one. There is much
debate about which kernel is the best but here are some examples of
popular kernels.
– Linear
– Polynomial
– Circular
– Hyperbolic Tangent (Sigmoid)

Soft & Hard Margin


The last topic to touch on is soft and hard margins. A hard margin is
precisely what you’ve learned already, no points may exist inside the
margin. However, sometimes if we have outlier points we want to allow them
to exist inside the margin and use points that are not the closest to the
hyper-plane to be our support vectors. Doing this is called using a  soft
margin.
You can see in the example above that there is a point that exists inside the
margin. If we had not allowed this not only would it be difficult to create a
hyper-plane but our classification would perform poorly.
The amount of points you allow to exists inside the margin is something we
can define as hyper-parameter.

Importing Modules
Before we start we need to import a few things from sklearn.
import sklearn
from sklearn import svm
from sklearn import datasets

Loading Data
In previous tutorials we did quite a bit of work to load in our data sets from
places like the UCI Machine Learning Repository . That is a very useful skill
and is something you will often have to do when applying these algorithm to
your own data. However, now that we have learned this we will use the data
sets that come with sklearn. These are much nicer to work with and have
some nice methods that make loading in data very quick.
For this tutorial we will be using a breast cancer data set. It consists of
many features describing a tumor and classifies them as either cancerous
or non cancerous.
To load our data we will simply do the following.

cancer = datasets.load_breast_cancer()

To see a list of the features in the data set we can do:

print("Features: ", cancer.feature_names)

Similarly for the labels.


print("Labels: ", cancer.target_names)

Splitting Data
Now that we have loaded in our data set it is time to split it into training and
testing data. We will do this like seen in previous tutorials.
If we want to have a look at our data we can print the first few instances.

print(x_train[:5], y_train[:5])

Implementing a SVM
Implementing the SVM is actually fairly easy. We can simply create a new
model and call .fit() on our training data.

To score our data we will use a useful tool from the sklearn module.

And that is all we need to do to implement our SVM, now we can run the
program and take note of our amazing accuracy!
Wait... Our accuracy is close to 60% and that is horrible! Looks like we need
to add something else.

Adding a Kernel
The reason we received such a low accuracy score was we forgot to add a
kernel! We need to specify which kernel we should use to increase our
accuracy.
In machine learning, kernel methods are a class of algorithms for pattern analysis, whose
best known member is the support vector machine (SVM). The general task of pattern
analysis is to find and study general types of relations (for
example clusters, rankings, principal components, correlations, classifications) in datasets.
For many algorithms that solve these tasks, the data in raw representation have to be
explicitly transformed into feature vector representations via a user-specified feature map:
in contrast, kernel methods require only a user-specified kernel, i.e., a similarity
function over pairs of data points in raw representation.
Kernel Options:
- linear
- poly
- rbf
- sigmoid
- precomputed

We will use linear for this data-set.

After running this we receive a much better accuracy of close to 90%

If we do again….
Changing the Margin
By default our kernel has a soft margin of value 1. This parameter is known
as C. We can increase C to give more of a soft margin, we can also
decrease it to 0 to make a hard margin. Playing with this value should alter
your results slightly.

If you want to play around with some other parameters have a look  here.

Comparing to KNearestNeighbors
If we want to see how this algorithm runs in comparison to KNN we can run
the KNN classifier on this data-set and compare our accuracy values.
To change to the KNN classifier is quite simple.
Worse accuracy but, Note that KNN still does well on this data set but hovers
around the 90% mark.
HANDWRITTEN NUMBER RECOGNITION

Cargamos las librerías que vamos a necesitar:

matplotlib.pyplot para dibujar gráficos.

datasets para grabar ejemplos de datos en sklearn.

metrics para medir la fiabilidad.

svm para cargar el modelo Supervised Vector Machine (SVM) algoritm.

Cargo la dataset “digits”.

The data that we are interested in is made of 8x8 images of digits, let's have a look at the first
image, stored in the `images` attribute of the dataset. If we were working from image files, we
could load them using matplotlib.pyplot.imread. Note that each image must have the same
size. For these images, we know which digit they represent: it is given in the 'target' of the
dataset.

Uno las features(datos) con la

El número total de elementos en nuestra muestra:


Para aplicar nuestro modelo, we need to change the image data to a data in a (samples,
feature) matrix.

Hemos cambiado la matriz de 8x8 por una lista de 64.

Creamos el modelo SVM.

Preparamos la Data

Split data into train and test subsets.

En este caso dividimos al 50%, parece un poco alto no se debe pasar de 20%.

898 uds para TRAIN y 898 uds para TEST.

Entender que cada vez que se hace el Split se cogen elemento diferentes, es aleatorio, y por
tanto puede cambiar los resultados finales.

Entrenamos el Modelo.

Comprobamos el Modelo

Testamos el modelo con los dataset TEST.


Medimos la precisión “accuracy” del modelo.

Comparamos la predición con la realidad de dataseT TEST

VISUALIZACIÓN.

CAMBIANDO EL MODELO

1.- Algoritmo SVC, gamma=0.001

2.- Algoritmo SVC, kernel:”linear”


3.- Algoritmo SVC, kernel=”poly”

4.- Algoritmo SVC, kernel=”sigmoid”

5.- Algorimto SVC, kernel=”rbf”

6.- Algoritmo SVC, parámetro default.

PREDECIR UN ELEMENTO.

Vamos a predecir que número corresponde a la imagen dada.

Cogemos la imagen número 750 de nuestro dataset TEST.


Como vemos es un vector, para poder trabajar con el modelo necesitamos hacer una matriz,
usamos la función reshape.

Vemos que es una matriz con un solo elemento, nuestro vector.

Lo pasamos por el modelo.

Dice que es un 1.

Vamos a ver gráficamente la imagen testeada. Para ello el vector dado tenemos que
redimensionarla a 8x8.

Usamos pyplot para pintarlo.


Otro elemento:
FULL CODE:

print(__doc__)

# Author: Gael Varoquaux <gael dot varoquaux at normalesup dot org>


# License: BSD 3 clause

# Standard scientific Python imports


import matplotlib.pyplot as plt

# Import datasets, classifiers and performance metrics


from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split

# The digits dataset


digits = datasets.load_digits()

# The data that we are interested in is made of 8x8 images of digits, let's
# have a look at the first 4 images, stored in the `images` attribute of the
# dataset. If we were working from image files, we could load them using
# matplotlib.pyplot.imread. Note that each image must have the same size. For these
# images, we know which digit they represent: it is given in the 'target' of
# the dataset.
print(digits.images[0])
print(digits.target[0])
images_and_labels = list(zip(digits.images, digits.target))
print("Numero de datos",len(images_and_labels))
_, axes = plt.subplots(2, 6)
for ax, (image, label) in zip(axes[0, :], images_and_labels[:6]):
ax.set_axis_on()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
ax.set_title('Train: %i' % label)

# To apply a classifier on this data, we need to flatten the image, to


# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
print(data[0])

# Create a classifier: a support vector classifier


classifier = svm.SVC(gamma=0.001)

# Split data into train and test subsets


X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.5, shuffle=False)

# We learn the digits on the first half of the digits


classifier.fit(X_train, y_train)

# Now predict the value of the digit on the second half:


predicted = classifier.predict(X_test)
print("prediccion:",predicted)
images_and_predictions = list(zip(digits.images[n_samples // 2:], predicted))
for ax, (image, prediction) in zip(axes[1, :], images_and_predictions[:6]):
ax.set_axis_on()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
ax.set_title('Pred: %i' % prediction)

plt.show()

acc = metrics.accuracy_score(y_test, predicted)


print("accuracy con gamma 0.001:", acc)

classifier = svm.SVC(kernel="linear")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel linear:", acc)

classifier = svm.SVC(kernel="poly")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel poly:", acc)

classifier = svm.SVC(kernel="sigmoid")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel sigmoid:", acc)

classifier = svm.SVC(kernel="rbf")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel rbf:", acc)

classifier = svm.SVC()
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy:", acc)

test = X_test[750].reshape(1,-1)
prediction = classifier.predict(test)
test8x8 = test.reshape(8,8)
plt.imshow(test8x8, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediccion: %i' % prediction)
plt.show()

test = X_test[770].reshape(1,-1)
prediction = classifier.predict(test)
print(prediction)
test8x8 = test.reshape(8,8)
plt.imshow(test8x8, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediccion: %i' % prediction)
plt.show()

You might also like