Wart Treatment Using Machine Learning Support Vector Algorithm

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Title wart treatment using machine learning support vector algorithm

Abstract support vector machine (SVM) in data optimization has becomes powerful tools of problem
solving in machine learning. SVM algorithm can be used for Face detection, image classification, text
categorization, etc. In this paper, we present wart treatment of patients using immunotherapy by using
support vector machine algorithm model. Immunotherapy is a new class of cancer treatment that works
to harness the innate powers of our own immune system to fight cancer. We run various kinds of
algorithms and compared the performance of each algorithm with the other in terms of the wart
treatment result of patients using immunotherapy. The different types of algorithms are incorporated
step by step. The treatment of patient’s wart using immunotherapy has been considered as a
classification problem and it is evaluated using various types of machine learning algorithms. The
evaluations have been performed on diverse feature sets and the different classification methods. The
comparison of the results is also presented and the evaluation show that for the wart treatment using
immunotherapy. The immunotherapy data set has 90 instances and 8 attributes of type integer and real
type. For these, data set the data mining tool used was sklearn.

1. Introduction

SVM is a supervised machine learning algorithm which can be used for classification or regression
problems. It uses a technique called the kernel trick to transform your data and then based on these
transformations it finds an optimal boundary between the possible outputs. Simply put, it does some
extremely complex data transformations, then figures out how to separate our data based on the labels
or outputs we have defined. The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary is called a hyperplane. SVM
chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called
as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using a decision boundary or hype

Fig 1.1 svm model


Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM. The dimensions of the
hyperplane depend on the features present in the dataset, which means if there are 2 features (as
shown in fig 1.1), then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum
margin, which means the maximum distance between the data points.
Support vectors: The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
1. Methods
SVM can be understood with the example that we have used in the KNN classifier. Suppose we
see a wart treated with immunotherapy with 1 and not treated result 0, so if we want a model that
can accurately identify whether wart is treated or not , so such a model can be created by using
the SVM algorithm. We will first train our model with lots of data with their features,is that
treated or not. so that it can learn about different features of treated and not t, and then we test it
with new test data. So as support vector creates a decision boundary between these two data
(treated or not ) and choose extreme cases (support vectors), it will see the extreme case. On the
basis of the support vectors, it will classify it as a treated or not.
2. Model development
Machine learning model is the process for making your models available in production environments,
where they can provide predictions to other data. It is only once models are deployed to production that
they start adding value, making deployment simple. All of the available data is split into two categories.
In the training phase, we use 90% of the data in training the model. The remaining 10% of the data is
used in the testing phase to validate the accuracy of the model built. In the prediction phase, the model
is deployed in production and we use actual live data in predicting the outcome. I am going to use a 10-
fold cross validation to split the data into training and testing on scikitlearn and sample data to develop
our model using the Classification algorithms. These algorithms are Support vector machine, Naive
Bayes and Decision Tree.
Fig 3.1 supervised learning model
3. Related Works

One of the most common and leading cause of cancer death in human beings is lung cancer. The
advanced observation of cancer takes the main role to inflate a patient’s probability for survival of the
disease. Other researchers show the accomplishment of support vector machine (SVM) and logistic
regression (LR) algorithms in predicting the survival rate of lung cancer patients and compares the
effectiveness of these two algorithms through accuracy, precision, recall, F1 score and confusion matrix.
These techniques have been applied to detect the survival possibilities of lung cancer victims and help
the physicians to take decisions on the forecast of the disease.

Support vector machine

Support vector machine is a representation of the training data as points in space separated into
categories by a clear gap that is as wide as possible. New examples are then mapped into that same
space and predicted to belong to a category based on which side of the gap they fall. SVM it capable of
doing both classification and regression. In this post I'll focus on using SVM for classification. In particular
I'll be focusing on non-linear SVM, or SVM using a non-linear kernel. Non-linear SVM means that the
boundary that the algorithm calculates doesn't have to be a straight line. The benefit is that you can
capture much more complex relationships between your datapoints without having to perform difficult
transformations on your own. The downside is that the training time is much longer as it's much more
computationally intensive.

3.1. Decision Tree

A decision tree is a tree-like graph with nodes representing the place where we pick an attribute and ask
a question; edges represent the answers to the question; and the leaves represent the actual output or
class label. They are used in non-linear decision making with simple linear decision surface.

Fig 4.2.1 random forest model

3.2. Random Forest


Random forest is solid choice for nearly any prediction problem (even non-linear ones). It's a
relatively new machine learning strategy (it came out of Bell Labs in the 90s) and it can be used
for just about anything. It belongs to a larger class of machine learning algorithms called
ensemble methods. the algorithm to induce a random forest will create a bunch of random
decision trees automatically. Since the trees are generated at random, most won't be all that
meaningful to learning our classification/regression problem (maybe 99.9% of trees).

Fig 4.3.2 random forest model


4. Dataset
Wart treatment using immunotherapy dataset have 8 features and 90 instances. I collect the data from
UCI dataset website. Attribute Information: <class 'pandas.core.frame.DataFrame'>

RangeIndex: 90 entries, 0 to 89

Data columns (total 8 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 sex 90 non-null int64

1 age 90 non-null int64

2 Time 90 non-null float64

3 Number_of_Warts 90 non-null int64

4 Type 90 non-null int64

5 Area 90 non-null int64

6 induration_diameter 90 non-null int64

7 Result_of_Treatment 90 non-null int64

dtypes: float64(1), int64(7)

memory usage: 5.8 KB Experimental result


5. Experimental Results

In this paper I am trying to compare three different classification algorithms on the same dataset to see
the performance or accuracy of each model and provided a comparison results in terms of accuracy,
confusion matrix and classification report using the experiment result.

Result 1: Accuracy, confusion matrix and classification report for Decision Tree

Result 2 Accuracy, confusion matrix and classification report for SVM

Result 3 Accuracy, confusion matrix and classification report for random forest
So as shown in the below bar chart I have conducted comparison on different classification technique
and provided a basis among them in terms of accuracy, confusion matrix and classification report by
applying 10- fold cross validation, and random forest can achieve around 86.6 % of acuuracy.

6. Conclusion

In this paper, I am trying to compare Support vector machine, random forest and decision for
immunotherapy dataset. I use 10-fold cross validation (90% for training and 10% for testing) for all
algorithms. Based on these classifications the accuracy decision tree 83.33%, accuracy of random forest
is 86.66% and accuracy of SVM 78.88%. From these accuracy result, we can say that with the same
dataset the accuracy of different algorithms becomes different. So, we must be selecting the efficient
one by comparing them.

7. Reference
8.
https://www.researchgate.net/publication/319870836_Predicting_Lung_Cancer_Survivability_using
_SVM_and_Logistic_Regression_Algorithms
https://archive.ics.uci.edu/ml/datasets/Immunotherapy+Dataset
https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-
code/

You might also like