Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Linear SVM Classifier

Let's first generate some data in 2 dimensions, and make them a little separated. After setting
random seed, you make a matrix x, normally distributed with 20 observations in 2 classes on 2
variables. Then you make a y variable, which is going to be either -1 or 1, with 10 in each class.
For y = 1, you move the means from 0 to 1 in each of the coordinates. Finally, you can plot the
data and color code the points according to their response. The plotting character 19 gives you
nice big visible dots coded blue or red according to whether the response is 1 or -1.

set.seed(10111)
x <- matrix(rnorm(40), 20, 2)
y <- rep(c(1, -1), c(10, 10))
x[y == 1,] = x[y ==1,] + 1
plot(x, col = y + 3, pch = 19)

Code Explanation:

- This code generates a scatter plot with two groups of points.


- First, the set.seed() function sets a seed for the random number generator to
ensure reproducibility.
- Next, a matrix x is created using the matrix() function.
- The rnorm() function generates 40 random numbers from a normal distribution with
mean 0 and standard deviation 1.
- The numbers are arranged into a matrix with 20 rows and 2 columns.
- A vector y is created using the rep() function.
- It contains 20 elements, with the first 10 being -1 and the second 10 being 1.
- The code then modifies the values of x for the rows where y is equal to 1.
- The rows are selected using the logical expression y==1, and the values in those rows
are increased by 1 using the + operator.
- Finally, the plot() function is used to create a scatter plot of the data.
- The col argument sets the color of the points based on the values in y, with -1 being
blue and 1 being red.
- The pch argument sets the shape of the points to a filled circle.

Now load the package e1071 which contains the svm function.

library(e1071)

Code explanation:

- This code imports the e1071 package in R.


- The e1071 package provides functions for statistical learning and data mining.
- By using the library() function, the package is loaded into the current R session and
its functions can be used in subsequent code.

Now you make a dataframe of the data, turning y into a factor variable. After that, you make a
call to svm on this dataframe, using y as the response variable and other variables as the
predictors. The dataframe will have unpacked the matrix x into 2 columns named x1 and x2.
You tell SVM that the kernel is linear, the tune-in parameter cost is 10, and scale equals false. In
this example, you ask it not to standardize the variables.

dat <- data.frame(x, y = as.factor(y))


svmfit <- svm(y ~ ., data = dat, kernel = "Linear", cost = 10,
scale = FALSE)
print(svmfit)

Call:
svm(formula = y ~ ., data = dat, kernal = "Linear", cost = 10,
scale = FALSE)

Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 10

Number of Support Vectors: 9

Code explanation:

- This code is written in R and performs support vector machine (SVM) classification on a
dataset.
- This first line creates a data frame called dat with two columns x and y.
- The x columns is assumed to already exist in the workspace, while the y column is
created by converting an existing variable y into a factor using as.factor() function.
- The second line fits an SVM model to the data using svm() function.
- The formula y ~ . specifies that the response variable is y and all other columns in the
data frame should be used as predictors.
- The kernel argument specifies that a linear kernel should be used, while the cost
argument sets the cost parameter to 10.
- The scale argument is set to FALSE, which means that the data will not be scaled
before fitting the model.
- The third line prints the SVM model to the console.

Printing the svmfit gives its summary. You can see that the number of support vectors is 6 -
they are the points that are close to the boundary or on the wrong side of the boundary.
There's a plot function for SVM that shows the decision boundary, as you can see below. It
doesn't seem there's much control over the colors. It breaks with convention since it puts x2 on
the horizontal axis and x1 on the vertical axis.

plot(svmfit, dat)

Code explanation:

- This code is written in R.


- The plot() function is used to create a plot of the SVM model svmfit on the data
dat.
- The plot will show the decision boundary of the SVM model and the support vectors.
- The svmfit object is the result of fitting on SVM model to the data.
- The dat object contains the data used to fit the SVM model.
- Overall, this code is used to visualize the SVM model and its performance on the data.

Non-Linear SVM Classifier

plot(x, col = y + 1)
Code explanation:

- This code is written in R.


- The plot() function is used to create a scatter plot.
- The first argument x specifies the values to be plotted on the x-axis.
- The col parameter is used to specify the color of the points in the plot.
- In this case, the color is determined by the y variable, which is added to 1.
- This means that the color of the points will be determined by the values in y, with each
value incremented by 1.
- Overall, this code creates a scatter plot with the color of the points determined by the y
variable.

dat <- data.frame(y = factor(y), x)


fit <- svm(factor(y) ~ ., data = dat, scale = FALSE, kernel =
“radial”, cost = 5)

Code explanation:

- This code is written in R and performs SVM classification on a dataset.


- The first line creates a data frame called dat with two columns: x and y.
- The y column is converted to a factor using the factor() function.
- The second line fits an SVM to the data in dat.
- The formula used for the model is factor(y) ~ . which means that the y column is
the response variable and all other columns in dat are used as predictors.
- The data argument specifies the dataset to use, scale = FALSE turns off scaling the
data, kernel = “radial” specifies the radial basis function kernel to use, and cost
= 5 sets the cost parameter to 5.
- Overall, this code is fitting an SVM model to a dataset with a radial basis function kernel
and a cost parameter of 5.

You might also like