Professional Documents
Culture Documents
Research Survey On Support Vector Machine
Research Survey On Support Vector Machine
10th, July 2017, Chongqing, People’s Republic of China Wang et al.
80
multimedia [8]. We also need to put a great effort in de- Female
signing applications that take into account the way the 75 Male
user perceives the overall quality of the provided service. 70
SVM has been applied to many areas of mobile multi-
media. Thus, this article will focus on SVM applications 65
Weight
in the field of mobile multimedia. 60
The remainder of the paper is organized as follows. 55
Section 2 introduces the principle of SVM from three
cases. In Section 3, some algorithm optimizations and 50
parameter optimizations about SVM will be represented 45
and compared. Then, Section 4 includes some popular
40
extensions of SVM like FSVM, TWSVM and MSVM. 145 150 155 160 165 170 175 180 185 190
Height
The applications of SVM in many popular areas, espe-
cially, the area of mobile multimedia, will be presented
Figure 1: Points of Males and Females
in Section 5. Finally, in Section 6, there is a conclusion
referring to the direction of the further SVM improve-
80
ment.
75
45
Case of Linearly Separable
40
Classification problems originated from the linearly sep- 145 150 155 160 165 170 175 180 185 190
Height
arable situation, the method of SVM was applied to lin-
early separable problems at first [3]. Classification result-
Figure 2: Classification Lines.
s often can bring some valuable information for further
prediction or analysis. In hence, people are committed to
using effective classification algorithms to get valuable in which w refers to the weight vector and b is the thresh-
results. For instance, say we are offered some training old. Thus, the relation between xi and f (xi ) can be de-
data with some people’s weight, height and their cor- fined as:
responding sex, then we want to make use of them to
f (x) = wT x + b. (2)
predict the unknown gender data [2].
As Figure 1 shows, these two types of points represent Seeding the optimal hyperplane is equivalent to max-
male and female, respectively. It can be seen that there imal the distance between the closest vectors to the hy-
are a number of lines we can draw to divide the space perplane [1]. Define the Euclidean distance between the
into two regions in Figure 2. And it’s easily to find that nearest points and the hyperplane f (x) as:
the black solid line would be the optimal line, which
f (x)
maximizes the margin between itself and the nearest r=| |, (3)
points of each class. ||w||
SVM extends the two-dimensional linear separable where f (x) refers to functional margin.
problem to multidimensional, and aims to seed the opti- Assume the functional margin f (x) between the n-
mal classification surface, also called as optimal hyper- earest points and the hyperplane is 1 as the Figure 3
plane. The optimal hyperplane can be defined as: shows. The assumption was accompanied with a con-
straint condition yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n, which
implies that all training data are on the two hyperplane
wT x + b = 0, (1) or behind them and the training data on the hyperplane
Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China
FODVV
FODVVí
Z7[E
Z7[E
Z7[E í
í í í í
Figure 3: Optimal Classification Line.
Figure 4: Case of Non-linearly Separable.
are called support vectors (SVs). Thus, the margin be- L, is given by,
tween the parallel bounding planes d can be defined as: n
2 ∂L(w, b, α)
d = 2r = ||w|| . =0 ⇒ w= αi yi xi ,
Thus the objective function can be presented as: ∂w i=1
n
(9)
∂L(w, b, α)
2 ⇒
max s.t. yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n. (4) ∂b
=0 αi yi = 0.
||w|| i=1
For ease of calculation, translate equation (4) into: Hence from equations (6), (9) and the KKT condition,
the dual problem d∗ can be changed into:
1 n n
min ||w||2 s.t. yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n. (5) 1
2 θ(α) = L(w, b, α) = αi + αi αj yi yj (xi )T xj ,
i=1
2 i=1,j=1
The solution to this optimization problem is given by
(10)
the saddle point of the Lagrange functional:
which is a two convex optimization problems.
n Supposing that αi is the result of equations (10), Xr
1
L(w, b, α) = ||w||2 − αi [yi (wT xi + b)] (6) is a support vector belonging to the positive class, while
2 i=1 Xs belongs to the negative class. Then the optimal so-
where αi are Lagrange multipliers, subject to αi ≥ 0, lution w and b can be expressed as follows:
n
and equation (6) has to be minimized with respect to 1
w, b and maximized with respect to αi . The objective w= αi yi xi , b = − w[Xr + Xs ]. (11)
i=1
2
function has been translated as:
Hence from equations (2), (9) and (11), the final de-
p∗ = min(w,b) θ(w) = minw,b max(αi ≥0) L(w, b, α), (7) cision function is:
The classical Lagrange duality enables the primal prob- f (x) = sgn[ αi yi (xi )T x + b]. (12)
lem to be transformed into its dual problem, when sat- SV s
isfying the Karush-Kuhn-Tucker (KKT) condition. In Case of Non-linearly Separable
hence, as
for equation (7), when it fulfills the KKT con-
n In addition to the linearly separable problems, there are
dition of i=1 αi [yi (wT xi + b) − 1] = 0, equation (7) is
equal to its dual function: much more nonlinearly cases in real life. As the example
showed in Figure 4, there is no line can classify the two
d∗ = max(αi ≥0) minw,b L(w, b, α). (8) classes well, only curves.
SVM shows its superiorities for nonlinearly problems,
It is the KKT condition that determines only the SVs which takes the way of mapping the input vectors in
work on the extremum of the objective function. Then low dimension feature space to a high dimension feature
the minimum with respect to w and b of the Lagrange, space to hunt for a linear optimal separating hyperplane.
10th, July 2017, Chongqing, People’s Republic of China Wang et al.
And it has been demonstrated that it did not account Case of Non-separable
for large storage and computational expense [2]. Last circumstance, some points belong to positive class
The following is an example of mapping process. De- may be classified into negative class. Another way to say,
fine of the curve in two-dimensional space as: there is no hyperplane can separated the points of dif-
g(x) = C0 + C1 x + C2 x2 . ferent categories accurately. Usually these error points
are regarded as noise which can be ignored. However,
Map it into a four-dimensional space as follows: machines can’t deal with the error points as human do.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Thus, Cortes [1] introduced slack variables ξi ≥ 0 (a
y3 1 a1 C0
measure of the misclassification error) and punishment
y = ⎣y2 ⎦ = ⎣ x ⎦ , a = ⎣a 2 ⎦ = ⎣C 1 ⎦ .
y1 x2 a3 C2 coefficient C (the emphasis on the outlier) to offset the
effect of noise. And the constraint yi (wT ·xi +b) ≥ 1, i =
Then, the primal function is translated into: 1, 2, . . . , n is modified to:
g(x) ⇒ f (x) =< a, y >= ay. yi (wT · xi + b) ≥ 1 − ξi , i = 1, 2, . . . , n.
It’s a linear function, which means the primal nonlin- C is a given value and subjects to above constraint. Too
early problem has been converted to a linearly separa- small ξ may lead to over learning, while too large may
ble problem. SVM employs the empathy, mapping the lead to owe learning. The equation (5) and equation (6)
w and x into high dimensional space, then gets new w are changed into:
and x . n
1
Φ(x, ξ) = ||w||2 + C ξi , (18)
f (x) =< w, x > +b ⇒ g(x ) =< w , x > +b. 2 i=1
In hence, new decision function can be defined as fol- n
1
lows: L(w, b, α) = ||w||2 − αi [yi (wT xi + b) − (1 − ξ)].
n
2 i=1
g(x ) = αi yi < xi , xi > +b, (13) (19)
i=1 While after derivation, the decision function for this
where dot product is core operation. According to the case still is equation (12).
existence theorem of kernel function: for any sample set In summary, SVM is a powerful classifier, which is
K, there is a map of F , and in this map, f (k) (in the suitable for any case of classification with the same de-
high dimensional space) is a linear separable. Based on cision function, high classification accuracy and small
this theorem, SVM employs the kernel function during computation.
mapping process, and realizes the dot product in low di-
mensional space, which is supposed to be done in high 3 THE OPTIMIZATION OF SVM ALGORITHM
dimensional space. Decision function with kernel func- SVM shows its superiorities in many ways, however, the
tion is shown as follows: high classification accuracy and small computation of
n SVM give the credit to well treatments of quadratic pro-
f (x) = αi yi K < xi , x > +b. (14) gramming problem and parameters selection. Thus, this
i=1 section will introduce some optimization algorithms for
Different kernel functions can be choose to construct SVM.
different types SVM, the following are a few commonly
used kernel functions: Algorithms for Quadratic Programming Optimization
(1) Polynomial For the implementation point of view, training a SVM
is equivalent to solving a linearly constrained Quadratic
K(x, xi ) = (x · xi )q q = 1, 2, . . . ; (15) Programming (QP) problem [11]. QP problem is chal-
(2) Gaussian Radial Basis Function lenging when the size of the data points exceeds few
thousands, which is often the case in practical applica-
(x − xi )2 tions. Thus, much researches for the difficult QP prob-
K(x, xi ) = exp(− ); (16)
2σ 2 lem have been proposed.
(3) Multi-Layer Perceptron A comparison for three common used QP optimiza-
tion algorithms is shown as Table 1. From the compar-
K(x, xi ) = tanh(b(x · xi ) − c). (17) ison, the superiority of SMO is apparent and is most
Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China
widely used in SVM nowdays. But for the large scale 4 VARIANT SUPPORT VECTOR MACHINE
problems, it still needs further improvements. SVM is a strong classifier with a very good performance
owing to its solid theoretical foundation and good classi-
fication results. Whereas SVM shows limitations in solv-
ing large-scale problems and multi-classification prob-
lems. Therefore, this section will represents some new
Parameter Optimization for SVM
SVM for improving traditional SVM.
SVM refers to many parameters, slack variables, penal-
ty parameter and Kernel parameters, and the selection Fuzzy Support Vector Machine (FSVM)
of parameters takes many effects on the efficiency of Many research results show that SVM is very sensitive
SVM. Traditional optimization methods include cross- to noises and outliners in the training sample due to
validation [17] and grid algorithm. The former improves overfitting [22]. In hence, FSVM [23, 24] was proposed
the classification accuracy, while accompanying with high to deal with the problem. Traditional SVM treated all
computation and low efficiency. The latter is a simple training points uniformly and each training point be-
method and easy to use. However, it’s sensitive to the longs to either one or the other class. However, some in-
initial value of the parameters and is time-consuming put points may not be exactly assigned to one of these t-
with low accuracy and intensive computation. Account- wo classes. Some are more important to be fully assigned
ing for such defects, many new optimization methods to one class so that SVM can seperate these points more
have been proposed which are collectively referred to as correctly. Some data points corrupted by noises are less
swarm intelligence optimization. meaningful and should be discarded. The key point of
The Table 2 is a comparison for parameter optimiza- FSVM is that it applies a fuzzy membership (impor-
tion methods. PSO and DE have shown their own supe- tance weight) to each sample in the training datasets
riorities. PSO has been combined with many other im- such that different samples can make different contribu-
provement algorithms to improve its performance. DE tions in the construction of the SVM hyperplane. The
as a newer hot spot with significant advantages, still fuzzy membership enhances the SVM in reducing the
needs much more further research. effect of outliers and noises in data points. Some new-
er methods assigned smaller weight or even zero weight
10th, July 2017, Chongqing, People’s Republic of China Wang et al.
to noisy and outlier data [25, 26] and a new type of etc. And recently, a novel twin support vector machine
FSVM [27] combined with some other algorithms was with the pinball loss (Pin-TSVM) was proposed to deal
proposed to increase the difference of the meaningful with the quantile distance to reduce the noise sensitivity
training points and outliers. The future work for FSVM and work fast [34].
would focus on the selection of fuzzy membership func-
tion.
Multi-class Support Vector Machine
Twin Support Vector Machine (TWSVM) SVM was originally designed for binary classification,
TWSVM was proposed by Jayadeva et al. [28] for the while in real life, most applications involve multiple clas-
binary classification problem and it has become the cur- sification problems. Multi-class SVM [35] refers to two
rent researching hot spot in machine learning during the approaches: (1) consider all data in one optimization
last few years [29]. TSVM generates two nonparallel hy- formulation directly; (2) construct and combine sever-
perplanes by solving a pair of smaller-sized QPPs, such al binary classifiers. We typically construct a multiclass
that each hyperplane is closer to one class and as far classifier by combining several binary classifiers because
as possible from the other. The strategy of solving t- it made a great progress for algorithm research while
wo smaller-sized QPPs rather than a single larger-sized the former way accounts for high computational com-
QPP makes the learning speed of the TSVM faster than plexity and achieves low efficiency on application [36].
that of the standard SVM in theory. Due to TWSVM’s Multi-class problems require that the number of vari-
lower computational complexity and better generaliza- ables should be proportional to the number of class-
tion ability, it has been studied extensively and devel- es, which denotes that solve a multi-class problem con-
oped rapidly in the last few years. Many variants of the sumes more calculations. Corresponding to such prob-
TSVMs have been proposed, such as twin bounded sup- lems, some novel methods have been proposed. Lee et
port vector machine [30], nonparallel SVM [31], least- al. [37] proposed a multi-category SVM, which is a right-
square TSVM [32], twin support vector regression [33], ful extension from a binary SVM to a multi-category
Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China
SVM. Madzarov et al. [38] presented a novel architec- in all areas of life. Especially, it shows its own signifi-
ture of SVM classifiers based on Binary Decision Tree cant ability in pattern recognition, such as text recog-
architecture, takes advantage of both the efficient com- nition [47], handwriting recognition [48], face recogni-
putation of the decision tree architecture and the high tion [49], etc. In economic field, SVM is used for re-
classification accuracy of SVMs. Recently, a novel mul- al estate forecasting [50], stock forecasting [51], etc. In
tiple projection twin support vector machine (PTSVM) transportation field, SVM is applied to identify license
was proposed [39], which improves the generalization a- plate number[52], trained to distinguish driving perfor-
bility great. mance and physiological measurements [53], etc. In med-
ical field, SVM has been used in automatic medical de-
cision support system for medical image [54], prediction
Distribute Support Vector Machine (DSVM) of substrate specificities of membrane transport proteins
DSVM is based on connected networks. The network [55], gene classification [56], etc.
is divided into several sub-regions by some kind of cer- Except for the applications in traditional areas, SVM
tain strategy, and SVM is trained in each sub-region also has been applied in the area of mobile multimedia.
in serially or distributed. Finally, the sub-region adopts SVM is mainly used in the field of multimedia retrieval
obtain the approximate solution or global optimal solu- and multimedia event detection (MED). As for the mul-
tion of the original SVM problem through some com- timedia retrieval, SVM has been applied to music and
munications. Navia-Vazque et al. [40] proposed a true image retrieval in the early time. In 2001, Tong et al.
DSVM, which is relative to parallel SVM, includes two [57] proposed the use of support vector active machine
distributed schemes: a distributed packet method for algorithm for conducting effective relevance feedback for
propagating raw data, and a more detailed distributed image retrieval, which can quickly learn a boundary that
semiparametric SVM. It reduces the total amount of separates the images and achieves high search accuracy.
information transferred between nodes and provides a Mandel et al. (2006) [58] described a system for per-
privacy protection mechanism for information sharing. forming flexible music similarity queries using SVM ac-
Then, Lu et al. [41] presented a strong connectivity net- tive learning to achieve high search accuracy. As the
work, Flouri et al. [42] proposed a DSVM over wireless demand for multimedia grows and with the evolution
sensor network and Kim et al. [43] proposed an improve- of digital technology, searching and organizing large s-
ment for this method later. Recently, distributed semi- cale datasets becomes a challenging task. Thus, some
supervised SVM [44], resilient DSVM [45] and some oth- improved approaches have been presented in succession.
er new methods over DSVM has been proposed, which For example, Yildizer et al. [59] proposed an extremely
are different from existing parallel algorithms with good fast content-based image retrieval systems (CBIR) sys-
performance and experimental results. The various sub- tem which uses Multiple Support Vector Machines En-
problems only need to transfer hull information with semble, CBIR have become very popular for browsing,
each other, without the central processing unit. Account- searching and retrieving images from a large database of
ing for such significant advantages, DSVM has become digital images with minimum human intervention; Arya-
a popular direction for research. far et al. [60] introduced a classifier fusion of multimodal
It is worth mentioning that the new type of SVM audio and lyrics data to address single modality classi-
achieves better and better performance, like above men- fication limitations. As for multimedia event detection,
tioned DSVM. A more newer method is called SVM SVM has been applied to multimedia event detection
classification trees [46], which is designed for multi-class (MED) based on audio or video in recent years. Wang
problems. Giving the credit to these new methods’ sig- et al. (2016) [61] proposed the method of classifying clip-
nificant odds, they must be the future research hot spot- s for events using “recurrent SVM” and Tzelepis et al.
s. (2016) [62] using kernel SVM with isotropic gaussian
sample uncertainty for video event detection, and both
of them achieved good results. Except for those, SVM
5 APPLICATIONS OF SVM also can provide service of real-time multimedia quali-
SVM has a strong advantage on the basis of theory, it ty assessment [63], mixed type audio classification [64],
can guarantee that the solution of the extreme solution etc. In the future, accompanying with the developmen-
is the global optimal solution rather than the local min- t of SVM and its extension, SVM may be more widely
imum. In real application, SVM also has been success- used in the areas of mobile multimedia and achieve much
fully used for classification, regression and forecasting more success.
10th, July 2017, Chongqing, People’s Republic of China Wang et al.
Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China
method for large real world datasets. Turk J Elec Eng and [47] Xiao-Liang Liu, SF Ding, Hong Zhu, and et al. Appropriate-
Comp Sci, 24(1):219–233, 2016. ness in applying svms to text classification. Comput Eng Sci,
[28] R Khemchandani, Suresh Chandra, et al. Twin support vec- 32(6):106–108, 2010.
tor machines for pattern classification. IEEE Trans. pattern [48] Mohamed Elleuch and Monji Kherallah. An improved arabic
analysis and machine intelligence, 29(5), 2007. handwritten recognition system using deep support vector
[29] Divya Tomar and Sonali Agarwal. Twin support vector ma- machines. IJMDEM, 7(2):1–20, 2016.
chine: A review from 2007 to 2014. Egyptian Informatics [49] W Ouarda, H Trichili, A. M Alimi, and B Solaiman. Face
Journal, 16(1):55–69, 2015. recognition based on geometric features using support vector
[30] Yuan Hai Shao, Chun Hua Zhang, Xiao Bo Wang, and machines. In Soft Computing and Pattern Recognition, pages
Nai Yang Deng. Improvements on twin support vector ma- 89–95, 2014.
chines. IEEE Trans. Neural Netw, 22(6):962–968, 2011. [50] Xibin Wang, Junhao Wen, Yihao Zhang, and et al. Real
[31] Y. Tian, Z. Qi, X. Ju, Y. Shi, and et al. Nonparallel sup- estate price forecasting based on svm optimized by pso. Op-
port vector machines for pattern classification. IEEE Trans. tik - International Journal for Light and Electron Optics,
Cybernetics, 44(7):1067, 2014. 125(125):1439–1443, 2014.
[32] Yitian Xu, Xianli Pan, Zhijian Zhou, and et al. Structural [51] Wei Shen, Yunyun Zhang, and Xiaoyong Ma. Stock return
least square twin support vector machine for classification. forecast with ls-svm and particle swarm optimization. In
Applied Intelligence, 42(3):527–536, 2015. BIFE, pages 143–147, 2009.
[33] Sugen Chen, Xiaojun Wu, and Renfeng Zhang. A novel twin [52] Michael Del Rose and Jack Reed. Support vector machine
support vector machine for binary classification problems. application on vehicles. In International Symposium on Op-
Neural Processing Letters, 44:1–17, 2016. tical Science and Technology, pages 144–149, 2001.
[34] Yitian Xu, Zhiji Yang, and Xianli Pan. A novel twin support- [53] Huiqin Chen and Lei Chen. Support vector machine classifi-
vector machine with pinball loss. IEEE Trans. neural net- cation of drunk driving behaviour. International Journal of
works and learning systems, 28(2):359–370, 2017. Environmental Research and Public Health, 14(1), 2017.
[35] Rameswar Debnath, N. Takahide, and Haruhisa Takahashi. [54] P LATHA et al. Svm based automatic medical decision sup-
A decision based one-against-one method for multi-class sup- port system for medical image. JATIT, 66(3), 2014.
port vector machine. Pattern Anal. Appl., 7(2):164–175, [55] L Li, J Li, W Xiao, and Y Li. Prediction the substrate speci-
2004. ficities of membrane transport proteins based on support vec-
[36] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods tor machine and hybrid features. IEEE/ACM Trans. Com-
for multiclass support vector machines. IEEE Trans. Neural putational Biology and Bioinformatics, pages 1–1, 2016.
Networks, 13(2):415–425, 2002. [56] Bin Yu, Shan Li, and Haijun Liu. A hybrid gene selection
[37] Vojtech Franc and Václav Hlavác. Multi-class support vector method for tumor classification based on genetic algorithm
machine. In IEEE Trans. Pattern Recognition, volume 2, and support vector machine. J COMPUT THEOR NANOS,
pages 236–239, 2002. 12(11):4730–4735, 2015.
[38] Gjorgji Madzarov, Dejan Gjorgjevikj, and Ivan Chorbev. A [57] Simon Tong. Support vector machine active learning for im-
multi-class SVM classifier utilizing binary decision tree. In- age retrieval. In ACM International Conference on Multi-
formatica (Slovenia), 33(2):225–233, 2009. media, pages 107–118, 2001.
[39] Chun-Na Li, Yun-Feng Huang, He-Ji Wu, and et al. Multiple [58] Michael I Mandel, Graham E Poliner, and Daniel PW Ellis.
recursive projection twin support vector machine for multi- Support vector machine active learning for music retrieval.
class classification. IJMLC, 7(5):729–740, 2016. Multimedia systems, 12(1):3–13, 2006.
[40] A Navia-Vazquez and E Parrado-Hernandez. Distributed [59] Ela Yildizer, Ali Metin Balci, Mohammad Hassan, and Reda
support vector machines. IEEE Trans. Neural Networks, Alhajj. Efficient content-based image retrieval using multi-
17(4):1091–1097, 2006. ple support vector machines ensemble. Expert Systems with
[41] Yumao Lu, V Roychowdhury, and L Vandenberghe. Dis- Applications, 39(3):2385–2396, 2012.
tributed parallel support vector machines in strongly con- [60] K Aryafar and A Shokoufandeh. Multimodal sparsity-eager
nected networks. IEEE Trans. Neural Networks, 19(7):1167– support vector machines for music classification. In ICMLA,
1178, 2008. pages 405–408, 2014.
[42] K Flouri, B Beferull-Lozano, and P Tsakalides. Distribut- [61] Yun Wang and Florian Metze. Recurrent support vector ma-
ed consensus algorithms for svm training in wireless sensor chines for audio-based multimedia event detection. In ICMR,
networks. In EUSIPCO, pages 1–5, 2008. pages 265–269, 2016.
[43] W Kim, M. S Stankovic, K. H Johansson, and et al. A dis- [62] Christos Tzelepis, Vasileios Mezaris, and Ioannis Patras.
tributed support vector machine learning over wireless sensor Video event detection using kernel support vector machine
networks. IEEE Trans. Cybernetics, 45(11):2599–2611, 2015. with isotropic gaussian sample uncertainty (ksvm-igsu). In
[44] S Scardapane, R Fierimonte, Lorenzo P Di, and et al. Dis- MMM, pages 3–15. Springer, 2016.
tributed semi-supervised support vector machines. Neural [63] Wenbi Rao and Yingshu Li. Real-time multimedia quality
Netw, 80:43–52, 2016. assessment based on media delay index and support vector
[45] Zhixiong Yang and Waheed U. Bajwa. Rd-svm: A resilient machine. In IEEE, CINC’09, volume 1, pages 111–114, 2009.
distributed support vector machine. In ICASSP, pages 2444– [64] Lei Chen, Sule Gunduz, and M. Ozsu. Mixed type audio
2448, 2016. classification with support vector machine. In IEEE, ICME,
[46] Peter De Boves Harrington. Support vector machine classifi- pages 781–784, 2006.
cation trees based on fuzzy entropy of classification. Analyt-
ica Chimica Acta, 2016.