Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Research Survey on Support Vector Machine

Huibing Wang Jinbo Xiong Zhiqiang Yao∗


Faculty of Software, Fujian Fujian Engineering Research Fujian Engineering Research
Normal University Center of Public Service Big Data Center of Public Service Big Data
Fuzhou, China 350108 Mining and Application Mining and Application
wanghbfjnu@163.com Fuzhou, China 350108 Fuzhou, China
jinbo810@163.com zyy837603010g@163.com

Mingwei Lin Jun Ren


Fujian Engineering Research Faculty of Software, Fujian
Center of Public Service Big Data Normal University
Mining and Application jun ren@outlook.com
linmwcs@163.com
ABSTRACT 1 INTRODUCTION
Support vector machine (SVM) as a learning machine SVM was first proposed by Vapnik in 1995 [1] and it
has shown a good learning ability and generalization was named as “support-vector networks” in the early
ability in classification, regression and forecasting. Be- time, which is a binary classification algorithm and im-
cause of its excellent learning performance, SVM has plements the idea of mapping non-linearly vectors to a
always been a hotspot in machine learning. In hence, very high-dimension feature space to construct a linear
this paper will make a more systematic introduction decision surface (hyperplane) in this feature space. In
of SVM, including the theory of SVM, the summariza- hence, it can achieve good effect on solving separable
tion and comparison of the quadratic programming op- and non-separable problems. This hyperplane is opti-
timizations and parameter optimizations for SVM and mal in the sense of being a maximal margin classier
the introduction of some new SVMs, like FSVM, TSVM, with respect to the training data [2]. The Structural
MSVM, etc. Then, the applications of SVM in real life Risk Minimisation (SRM) principle, SVM follows, equip-
will be presented, especially, in the area of mobile mul- s SVM with a greater ability to generalise. Correspond-
timedia. Finally, we conclude with a discussion of the s to such significant advantages, SVM was applied to
direction of further SVM improvement. classification and regression problems quickly [3]. Even
more, SVM embodies the characteristics of small sam-
KEYWORDS ples, nonlinearly problem and “curse of dimensionality”,
SVM, mobile multimedia, optimization algorithms, SVM for which SVM has always been the concern of many re-
extension, application search scholars. As for application, it has been widely
ACM Reference format: used in all areas in daily life, such as economic field,
Huibing Wang, Jinbo Xiong, Zhiqiang Yao, Mingwei Lin, transportation field, medical field and so on. Except for
and Jun Ren. 2017. Research Survey on Support Vector these traditional areas, SVM also is used for new hot
Machine. In Proceedings of EAI International Conference areas, like mobile multimedia.
on Mobile Multimedia Communications, Chongqing, People’s Mobile multimedia refers to various types of content
Republic of China, July 2017 (10th), 9 pages. that are either accessed via portable devices or creat-
DOI: 10.475/123 4 ed using them. One of the areas that mobile multime-
∗ Corresponding author. dia has become ubiquitous is in smartphones that in-
corporate video and music playback capability [4], cam-
Permission to make digital or hard copies of part or all of this
3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU eras, and wireless content streaming [5, 6]. It refers to
work for personal or classroom use is granted without fee provided
SHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHV
that copies are not made or distributed for profit or commercial many salient technology dimensions, like Media Stor-
DUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJHDQGWKDW
advantage and that copies bear this notice and the full citation on
FRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ age, Retrieval and Synchronization [7]; Imaging, Display,
the first page. Copyrights for third-party components of this work
RWKHUZLVHWRUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWHWROLVWV Human-Machine Interface in Terminals; Multimedia Ap-
must be honored. For all other uses, contact the owner/author(s).
UHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH plications and Services. Successfully deploying multime-
©
02%,0(',$-XO\&KRQJTLQJ3HRSOH
10th, Chongqing, People’s Republic of ChinaV5HSXEOLFRI&KLQD dia services and applications in mobile environments
$
&RS\ULJKWk($,
2017 Copyright held by the owner/author(s). 123-4567-24-
requires adopting an interdisciplinary approach where
567/08/06. . . 15.00
DOI: 10.475/123 4


10th, July 2017, Chongqing, People’s Republic of China Wang et al.

80
multimedia [8]. We also need to put a great effort in de- Female
signing applications that take into account the way the 75 Male
user perceives the overall quality of the provided service. 70
SVM has been applied to many areas of mobile multi-
media. Thus, this article will focus on SVM applications 65

Weight
in the field of mobile multimedia. 60
The remainder of the paper is organized as follows. 55
Section 2 introduces the principle of SVM from three
cases. In Section 3, some algorithm optimizations and 50
parameter optimizations about SVM will be represented 45
and compared. Then, Section 4 includes some popular
40
extensions of SVM like FSVM, TWSVM and MSVM. 145 150 155 160 165 170 175 180 185 190
Height
The applications of SVM in many popular areas, espe-
cially, the area of mobile multimedia, will be presented
Figure 1: Points of Males and Females
in Section 5. Finally, in Section 6, there is a conclusion
referring to the direction of the further SVM improve-
80
ment.
75

2 THEORY OF SUPPORT VECTOR MACHINE 70

SVM shows its significant advantages on both separable 65


Weight
problems and non-separable problems. Separable prob-
60
lems include linear separable problems and non-linear
separable problems. Thus, this section will present the 55
theory of SVM from three cases. 50

45
Case of Linearly Separable
40
Classification problems originated from the linearly sep- 145 150 155 160 165 170 175 180 185 190
Height
arable situation, the method of SVM was applied to lin-
early separable problems at first [3]. Classification result-
Figure 2: Classification Lines.
s often can bring some valuable information for further
prediction or analysis. In hence, people are committed to
using effective classification algorithms to get valuable in which w refers to the weight vector and b is the thresh-
results. For instance, say we are offered some training old. Thus, the relation between xi and f (xi ) can be de-
data with some people’s weight, height and their cor- fined as:
responding sex, then we want to make use of them to
f (x) = wT x + b. (2)
predict the unknown gender data [2].
As Figure 1 shows, these two types of points represent Seeding the optimal hyperplane is equivalent to max-
male and female, respectively. It can be seen that there imal the distance between the closest vectors to the hy-
are a number of lines we can draw to divide the space perplane [1]. Define the Euclidean distance between the
into two regions in Figure 2. And it’s easily to find that nearest points and the hyperplane f (x) as:
the black solid line would be the optimal line, which
f (x)
maximizes the margin between itself and the nearest r=| |, (3)
points of each class. ||w||
SVM extends the two-dimensional linear separable where f (x) refers to functional margin.
problem to multidimensional, and aims to seed the opti- Assume the functional margin f (x) between the n-
mal classification surface, also called as optimal hyper- earest points and the hyperplane is 1 as the Figure 3
plane. The optimal hyperplane can be defined as: shows. The assumption was accompanied with a con-
straint condition yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n, which
implies that all training data are on the two hyperplane
wT x + b = 0, (1) or behind them and the training data on the hyperplane


Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China

FODVV
 FODVVí




Z7[E 
Z7[E  



Z7[E í





í í í í     
Figure 3: Optimal Classification Line.
Figure 4: Case of Non-linearly Separable.

are called support vectors (SVs). Thus, the margin be- L, is given by,
tween the parallel bounding planes d can be defined as: n
2 ∂L(w, b, α) 
d = 2r = ||w|| . =0 ⇒ w= αi yi xi ,
Thus the objective function can be presented as: ∂w i=1
n
(9)
∂L(w, b, α) 
2 ⇒
max s.t. yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n. (4) ∂b
=0 αi yi = 0.
||w|| i=1

For ease of calculation, translate equation (4) into: Hence from equations (6), (9) and the KKT condition,
the dual problem d∗ can be changed into:
1 n n
min ||w||2 s.t. yi (wT xi + b) ≥ 1, i = 1, 2, . . . , n. (5)  1 
2 θ(α) = L(w, b, α) = αi + αi αj yi yj (xi )T xj ,
i=1
2 i=1,j=1
The solution to this optimization problem is given by
(10)
the saddle point of the Lagrange functional:
which is a two convex optimization problems.
n Supposing that αi is the result of equations (10), Xr
1
L(w, b, α) = ||w||2 − αi [yi (wT xi + b)] (6) is a support vector belonging to the positive class, while
2 i=1 Xs belongs to the negative class. Then the optimal so-
where αi are Lagrange multipliers, subject to αi ≥ 0, lution w and b can be expressed as follows:
n

and equation (6) has to be minimized with respect to 1
w, b and maximized with respect to αi . The objective w= αi yi xi , b = − w[Xr + Xs ]. (11)
i=1
2
function has been translated as:
Hence from equations (2), (9) and (11), the final de-
p∗ = min(w,b) θ(w) = minw,b max(αi ≥0) L(w, b, α), (7) cision function is:

The classical Lagrange duality enables the primal prob- f (x) = sgn[ αi yi (xi )T x + b]. (12)
lem to be transformed into its dual problem, when sat- SV s
isfying the Karush-Kuhn-Tucker (KKT) condition. In Case of Non-linearly Separable
hence, as 
for equation (7), when it fulfills the KKT con-
n In addition to the linearly separable problems, there are
dition of i=1 αi [yi (wT xi + b) − 1] = 0, equation (7) is
equal to its dual function: much more nonlinearly cases in real life. As the example
showed in Figure 4, there is no line can classify the two
d∗ = max(αi ≥0) minw,b L(w, b, α). (8) classes well, only curves.
SVM shows its superiorities for nonlinearly problems,
It is the KKT condition that determines only the SVs which takes the way of mapping the input vectors in
work on the extremum of the objective function. Then low dimension feature space to a high dimension feature
the minimum with respect to w and b of the Lagrange, space to hunt for a linear optimal separating hyperplane.


10th, July 2017, Chongqing, People’s Republic of China Wang et al.

And it has been demonstrated that it did not account Case of Non-separable
for large storage and computational expense [2]. Last circumstance, some points belong to positive class
The following is an example of mapping process. De- may be classified into negative class. Another way to say,
fine of the curve in two-dimensional space as: there is no hyperplane can separated the points of dif-
g(x) = C0 + C1 x + C2 x2 . ferent categories accurately. Usually these error points
are regarded as noise which can be ignored. However,
Map it into a four-dimensional space as follows: machines can’t deal with the error points as human do.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Thus, Cortes [1] introduced slack variables ξi ≥ 0 (a
y3 1 a1 C0
measure of the misclassification error) and punishment
y = ⎣y2 ⎦ = ⎣ x ⎦ , a = ⎣a 2 ⎦ = ⎣C 1 ⎦ .
y1 x2 a3 C2 coefficient C (the emphasis on the outlier) to offset the
effect of noise. And the constraint yi (wT ·xi +b) ≥ 1, i =
Then, the primal function is translated into: 1, 2, . . . , n is modified to:
g(x) ⇒ f (x) =< a, y >= ay. yi (wT · xi + b) ≥ 1 − ξi , i = 1, 2, . . . , n.
It’s a linear function, which means the primal nonlin- C is a given value and subjects to above constraint. Too
early problem has been converted to a linearly separa- small ξ may lead to over learning, while too large may
ble problem. SVM employs the empathy, mapping the lead to owe learning. The equation (5) and equation (6)
w and x into high dimensional space, then gets new w are changed into:
and x .  n
1
Φ(x, ξ) = ||w||2 + C ξi , (18)
f (x) =< w, x > +b ⇒ g(x ) =< w , x > +b. 2 i=1
In hence, new decision function can be defined as fol- n
1
lows: L(w, b, α) = ||w||2 − αi [yi (wT xi + b) − (1 − ξ)].
n
 2 i=1
g(x ) = αi yi < xi , xi > +b, (13) (19)
i=1 While after derivation, the decision function for this
where dot product is core operation. According to the case still is equation (12).
existence theorem of kernel function: for any sample set In summary, SVM is a powerful classifier, which is
K, there is a map of F , and in this map, f (k) (in the suitable for any case of classification with the same de-
high dimensional space) is a linear separable. Based on cision function, high classification accuracy and small
this theorem, SVM employs the kernel function during computation.
mapping process, and realizes the dot product in low di-
mensional space, which is supposed to be done in high 3 THE OPTIMIZATION OF SVM ALGORITHM
dimensional space. Decision function with kernel func- SVM shows its superiorities in many ways, however, the
tion is shown as follows: high classification accuracy and small computation of
 n SVM give the credit to well treatments of quadratic pro-
f (x) = αi yi K < xi , x > +b. (14) gramming problem and parameters selection. Thus, this
i=1 section will introduce some optimization algorithms for
Different kernel functions can be choose to construct SVM.
different types SVM, the following are a few commonly
used kernel functions: Algorithms for Quadratic Programming Optimization
(1) Polynomial For the implementation point of view, training a SVM
is equivalent to solving a linearly constrained Quadratic
K(x, xi ) = (x · xi )q q = 1, 2, . . . ; (15) Programming (QP) problem [11]. QP problem is chal-
(2) Gaussian Radial Basis Function lenging when the size of the data points exceeds few
thousands, which is often the case in practical applica-
(x − xi )2 tions. Thus, much researches for the difficult QP prob-
K(x, xi ) = exp(− ); (16)
2σ 2 lem have been proposed.
(3) Multi-Layer Perceptron A comparison for three common used QP optimiza-
tion algorithms is shown as Table 1. From the compar-
K(x, xi ) = tanh(b(x · xi ) − c). (17) ison, the superiority of SMO is apparent and is most


Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China

Table 1: Comparison of QP Optimization Algorithms

Name Introduction Advantages Disadvantages Improvements


Chunking al- Break large QP problem Reduces the re- Overly relay on the Double chunking tech-
gorithm [9] into a series of smaller QP quirement of numbers of SVs; has nique for solving large-
problems. storage capacity. no effects when oc- scale problems [10].
curs to large data set-
s with high percentage
of SVs.
Decomposition Break large QP problem Reduces the train- Its results are over- New methods for
Algorith- into a series of smaller ing cost. ly dependent on the the working set se-
m [11] QP problems (the break- choice of working set. lection [12]; two new
ing way is different from decomposition algo-
Chunking algorithm); its rithms for training
main work is selecting a bound-constrained
suitable working set. SVM [13].
Sequential A special case of decompo- Mach faster than It is also time consum- Methods for solving
Minimal Op- sition algorithm; instead formers; reduces ing for large-scale prob- large size problem-
timization of using numerical QP as the requirement of lems; training speed s [15]; methods of
(SMO) [14] an inner loop as the form- storage capacity will slow when large speeding up training
ers, SMO uses an analytic efficiently. margin values are used. efficiency [16].
QP step.

widely used in SVM nowdays. But for the large scale 4 VARIANT SUPPORT VECTOR MACHINE
problems, it still needs further improvements. SVM is a strong classifier with a very good performance
owing to its solid theoretical foundation and good classi-
fication results. Whereas SVM shows limitations in solv-
ing large-scale problems and multi-classification prob-
lems. Therefore, this section will represents some new
Parameter Optimization for SVM
SVM for improving traditional SVM.
SVM refers to many parameters, slack variables, penal-
ty parameter and Kernel parameters, and the selection Fuzzy Support Vector Machine (FSVM)
of parameters takes many effects on the efficiency of Many research results show that SVM is very sensitive
SVM. Traditional optimization methods include cross- to noises and outliners in the training sample due to
validation [17] and grid algorithm. The former improves overfitting [22]. In hence, FSVM [23, 24] was proposed
the classification accuracy, while accompanying with high to deal with the problem. Traditional SVM treated all
computation and low efficiency. The latter is a simple training points uniformly and each training point be-
method and easy to use. However, it’s sensitive to the longs to either one or the other class. However, some in-
initial value of the parameters and is time-consuming put points may not be exactly assigned to one of these t-
with low accuracy and intensive computation. Account- wo classes. Some are more important to be fully assigned
ing for such defects, many new optimization methods to one class so that SVM can seperate these points more
have been proposed which are collectively referred to as correctly. Some data points corrupted by noises are less
swarm intelligence optimization. meaningful and should be discarded. The key point of
The Table 2 is a comparison for parameter optimiza- FSVM is that it applies a fuzzy membership (impor-
tion methods. PSO and DE have shown their own supe- tance weight) to each sample in the training datasets
riorities. PSO has been combined with many other im- such that different samples can make different contribu-
provement algorithms to improve its performance. DE tions in the construction of the SVM hyperplane. The
as a newer hot spot with significant advantages, still fuzzy membership enhances the SVM in reducing the
needs much more further research. effect of outliers and noises in data points. Some new-
er methods assigned smaller weight or even zero weight


10th, July 2017, Chongqing, People’s Republic of China Wang et al.

Table 2: Comparison of Swarm Intelligence Optimization

Name Introduction Advantages Disadvantages


Ant colony opti- An evolutionary algorithm based on With strong robustness and Convergence speed
mization algorithm population optimization; also called as strong global search ability; is slow; easy to fall
(ACO) [18] a heuristic search swarm intelligence al- not sensitive to initial value into local optimum.
gorithm. of the parameters.
Genetic algorithms Follows Darwin’s evolutionary thought; With high efficiency and glob- Complex; fall into
(GAs) [19] catch the optimal value as follows al optimization; reduce the re- part optimum easi-
through a series of operations: initial- liance on initial value selec- ly; not suit for peak
izing population, calculating the fitness tion. problem.
by fitness function, selecting, crossing,
mutating and so on.
Particle Swar- An evolutionary algorithm newer than It has GA’s advantages and Premature conver-
m Optimization GA which originated in the behavior of is more simple than GA; has gence; not suitable
(PSO) [20] birds foraging. higher optimization efficiency for multi peak prob-
and is easy to be implement- lems; easy to fall in-
ed. to part optimum.
Differential evolu- It is arguably one of the most powerful More simple and fast; its algo- Premature conver-
tion (DE) [21] stochastic real-parameter optimization rithm control parameters are gence; local search
algorithms in current use; has drawn relatively less; with strong ro- ability is weak.
the attention of many researchers all bustness and stronger global
over the world and generates a lot of search ability; can solve multi
variants of the basic algorithm with im- peak problem.
proved performance.

to noisy and outlier data [25, 26] and a new type of etc. And recently, a novel twin support vector machine
FSVM [27] combined with some other algorithms was with the pinball loss (Pin-TSVM) was proposed to deal
proposed to increase the difference of the meaningful with the quantile distance to reduce the noise sensitivity
training points and outliers. The future work for FSVM and work fast [34].
would focus on the selection of fuzzy membership func-
tion.
Multi-class Support Vector Machine
Twin Support Vector Machine (TWSVM) SVM was originally designed for binary classification,
TWSVM was proposed by Jayadeva et al. [28] for the while in real life, most applications involve multiple clas-
binary classification problem and it has become the cur- sification problems. Multi-class SVM [35] refers to two
rent researching hot spot in machine learning during the approaches: (1) consider all data in one optimization
last few years [29]. TSVM generates two nonparallel hy- formulation directly; (2) construct and combine sever-
perplanes by solving a pair of smaller-sized QPPs, such al binary classifiers. We typically construct a multiclass
that each hyperplane is closer to one class and as far classifier by combining several binary classifiers because
as possible from the other. The strategy of solving t- it made a great progress for algorithm research while
wo smaller-sized QPPs rather than a single larger-sized the former way accounts for high computational com-
QPP makes the learning speed of the TSVM faster than plexity and achieves low efficiency on application [36].
that of the standard SVM in theory. Due to TWSVM’s Multi-class problems require that the number of vari-
lower computational complexity and better generaliza- ables should be proportional to the number of class-
tion ability, it has been studied extensively and devel- es, which denotes that solve a multi-class problem con-
oped rapidly in the last few years. Many variants of the sumes more calculations. Corresponding to such prob-
TSVMs have been proposed, such as twin bounded sup- lems, some novel methods have been proposed. Lee et
port vector machine [30], nonparallel SVM [31], least- al. [37] proposed a multi-category SVM, which is a right-
square TSVM [32], twin support vector regression [33], ful extension from a binary SVM to a multi-category


Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China

SVM. Madzarov et al. [38] presented a novel architec- in all areas of life. Especially, it shows its own signifi-
ture of SVM classifiers based on Binary Decision Tree cant ability in pattern recognition, such as text recog-
architecture, takes advantage of both the efficient com- nition [47], handwriting recognition [48], face recogni-
putation of the decision tree architecture and the high tion [49], etc. In economic field, SVM is used for re-
classification accuracy of SVMs. Recently, a novel mul- al estate forecasting [50], stock forecasting [51], etc. In
tiple projection twin support vector machine (PTSVM) transportation field, SVM is applied to identify license
was proposed [39], which improves the generalization a- plate number[52], trained to distinguish driving perfor-
bility great. mance and physiological measurements [53], etc. In med-
ical field, SVM has been used in automatic medical de-
cision support system for medical image [54], prediction
Distribute Support Vector Machine (DSVM) of substrate specificities of membrane transport proteins
DSVM is based on connected networks. The network [55], gene classification [56], etc.
is divided into several sub-regions by some kind of cer- Except for the applications in traditional areas, SVM
tain strategy, and SVM is trained in each sub-region also has been applied in the area of mobile multimedia.
in serially or distributed. Finally, the sub-region adopts SVM is mainly used in the field of multimedia retrieval
obtain the approximate solution or global optimal solu- and multimedia event detection (MED). As for the mul-
tion of the original SVM problem through some com- timedia retrieval, SVM has been applied to music and
munications. Navia-Vazque et al. [40] proposed a true image retrieval in the early time. In 2001, Tong et al.
DSVM, which is relative to parallel SVM, includes two [57] proposed the use of support vector active machine
distributed schemes: a distributed packet method for algorithm for conducting effective relevance feedback for
propagating raw data, and a more detailed distributed image retrieval, which can quickly learn a boundary that
semiparametric SVM. It reduces the total amount of separates the images and achieves high search accuracy.
information transferred between nodes and provides a Mandel et al. (2006) [58] described a system for per-
privacy protection mechanism for information sharing. forming flexible music similarity queries using SVM ac-
Then, Lu et al. [41] presented a strong connectivity net- tive learning to achieve high search accuracy. As the
work, Flouri et al. [42] proposed a DSVM over wireless demand for multimedia grows and with the evolution
sensor network and Kim et al. [43] proposed an improve- of digital technology, searching and organizing large s-
ment for this method later. Recently, distributed semi- cale datasets becomes a challenging task. Thus, some
supervised SVM [44], resilient DSVM [45] and some oth- improved approaches have been presented in succession.
er new methods over DSVM has been proposed, which For example, Yildizer et al. [59] proposed an extremely
are different from existing parallel algorithms with good fast content-based image retrieval systems (CBIR) sys-
performance and experimental results. The various sub- tem which uses Multiple Support Vector Machines En-
problems only need to transfer hull information with semble, CBIR have become very popular for browsing,
each other, without the central processing unit. Account- searching and retrieving images from a large database of
ing for such significant advantages, DSVM has become digital images with minimum human intervention; Arya-
a popular direction for research. far et al. [60] introduced a classifier fusion of multimodal
It is worth mentioning that the new type of SVM audio and lyrics data to address single modality classi-
achieves better and better performance, like above men- fication limitations. As for multimedia event detection,
tioned DSVM. A more newer method is called SVM SVM has been applied to multimedia event detection
classification trees [46], which is designed for multi-class (MED) based on audio or video in recent years. Wang
problems. Giving the credit to these new methods’ sig- et al. (2016) [61] proposed the method of classifying clip-
nificant odds, they must be the future research hot spot- s for events using “recurrent SVM” and Tzelepis et al.
s. (2016) [62] using kernel SVM with isotropic gaussian
sample uncertainty for video event detection, and both
of them achieved good results. Except for those, SVM
5 APPLICATIONS OF SVM also can provide service of real-time multimedia quali-
SVM has a strong advantage on the basis of theory, it ty assessment [63], mixed type audio classification [64],
can guarantee that the solution of the extreme solution etc. In the future, accompanying with the developmen-
is the global optimal solution rather than the local min- t of SVM and its extension, SVM may be more widely
imum. In real application, SVM also has been success- used in the areas of mobile multimedia and achieve much
fully used for classification, regression and forecasting more success.


10th, July 2017, Chongqing, People’s Republic of China Wang et al.

6 CONCLUSION Computing, Communications and Applications, 12(4s):60:1–


60:19, 2016.
SVM depends on statistical learning theory, which sys- [9] Vladimir Naumovich Vapnik and Samuel Kotz. Estimation
tematically studies the problem of machine learning, e- of dependences based on empirical data, volume 40. Springer-
specially in the case of finite samples. It is based on Verlag New York, 1982.
the VC-dimensional theory and structural risk minimum [10] Figueiras-Vidal et al. Pérez-Cruz, Fernando. Double chunk-
principle and can effectively overcome the curse of di- ing for solving svms for very large datasets. Proceedings of
Learning, 2004.
mensionality with the kernel function. As a result of
[11] E Osuna, R Freund, and F Girosi. An improved training
these significant advantages, it has been successfully ap- algorithm for support vector machines. In Neural Networks
plied in many areas and achieves good efficiency. Accom- for Signal Processing, pages 276–285, 1997.
panying with the development of technology, big data [12] Chih-Wei Hsu and Chih-Jen Lin. A simple decomposition
has become a mainstream. However, SVM is in a state method for support vector machines. Machine Learning,
46(1-3):291–314, 2002.
of inferiority in dealing with large data problems. In
[13] Lingfeng Niu, Ruizhi Zhou, Xi Zhao, and et al. Two new de-
hence, there has much further work to be done for fur- composition algorithms for training bound-constrained sup-
ther improvements for SVM, further improvement on port vector machines*. Foundations of Computing and De-
parameters selection for SVM algorithm, integration of cision Sciences, 40(1):67–86, 2015.
SVM and other disciplines, etc. [14] John C. Platt. Sequential minimal optimization: A fast al-
gorithm for training support vector machines. In Advances
in Kernel Methods-support Vector Learning, pages 212–223,
ACKNOWLEDGMENT 1998.
[15] Lijuan Cao, S. Sathiya Keerthi, Chong Jin Ong, and et al.
This work is supported by the National Natural Science Parallel sequential minimal optimization for the training of
Foundation of China (61402109, 61370078, 61502102 and support vector machines. IEEE Trans. Neural Networks,
61502103); Natural Science Foundation of Fujian Province 17(4):1039–1049, 2006.
(2015J05120, 2016J05149, 2017J01737 and 2017J05099); [16] Shigeo Abe. Fusing sequential minimal optimization and
newtons method for support vector training. IJMLC,
Fujian Provincial Key Laboratory of Network Security
7(3):345–364, 2016.
and Cryptology Research Fund (Fujian Normal Univer- [17] Grace Wahba, Yi Lin, and Hao Zhang. Margin-like quanti-
sity) (15008); Distinguished Young Scientific Research ties and generalized approximate cross validation for support
Talents Plan in Universities of Fujian Province (2015, vector machines. In Proceedings of the 1999 IEEE Signal
2017). Processing Society Workshop, pages 12–20, 1999.
[18] C. B. Liu, X. F. Wang, and F. Pan. Parameters selection and
stimulation of support vector machines based on ant colony
REFERENCES optimization algorithm. Journal of Central South University,
[1] Corinna Cortes and Vladimir Vapnik. Support-vector net- 39(6):1309–1313, 2008.
works. Machine Learning, 20(3):273–297, 1995. [19] Stefan Lessmann, Robert Stahlbock, and Sven F. Crone. Ge-
[2] M. O. Stitson, J. A. E. Weston, A. Gammerman, V. Vovk, netic algorithms for support vector machine model selection.
and V. Vapnik. Theory of support vector machines. Univer- In IJCNN, pages 3063–3069, 2006.
sity of London, 117(827):188 – 191, 1996. [20] J. Kennedy and R. Eberhart. Particle swarm optimization.
[3] Richard G Brereton and Gavin R Lloyd. Support vector ma- In IEEE Trans. Neural Networks, pages 1942–1948, 1995.
chines for classification and regression. Analyst, 135(2):230, [21] S Das and P. N Suganthan. Differential evolution: A survey
2010. of the state-of-the-art. IEEE Trans. Evolutionary Computa-
[4] Dapeng Wu, Junjie Yan, Honggang Wang, and et al. Social- tion, 15(1):4–31, 2011.
ly incentive mechanisms for video distribution in device-to- [22] Yongqiao Wang, Shouyang Wang, and K. K Lai. A new fuzzy
device communications. IEEE Trans. Multimedia, 2017. support vector machine to evaluate credit risk. IEEE Trans.
[5] Dapeng Wu, Hongpei Zhang, Honggang Wang, and et al. Fuzzy Systems, 13(6):820–831, 2005.
Quality-of-protection-driven data forwarding for intermit- [23] Han Pang Huang and Yi Hung Liu. Fuzzy support vector
tently connected wireless networks. IEEE Wireless Com- machines for pattern recognition and data mining. Intl.j.of
mun., 22(4):66–73, 2015. Fuzzy Systems, 4(3):826–835, 2001.
[6] Dapeng Wu, Yanyan Wang, Honggang Wang, and et al. [24] Chun-fu Lin and Sheng-De Wang. Fuzzy support vector ma-
Dynamic coding control in social intermittent connectivi- chines. IEEE Trans. Neural Networks, 13(2):464–471, 2002.
ty wireless networks. IEEE Trans. Vehicular Technology, [25] L. U. Yan-Ling, L. I. Lei, Meng Meng Zhou, and et al. A new
65(9):7634–7646, 2016. fuzzy support vector machine based on mixed kernel function.
[7] Dapeng Wu, Boran Yang, Honggang Wang, and et al. An In ICMLC, pages 526–531, 2009.
energy-efficient data forwarding strategy for heterogeneous [26] Wan Mei Tang. Fuzzy svm with a new fuzzy membership
wbans. IEEE Access, 4:7251–7261, 2016. function to solve the two-class problems. Neural Processing
[8] Dapeng Wu, Boran Yang, Honggang Wang, and et al. Letters, 34(3):209–219, 2011.
Privacy-preserving multimedia big data aggregation in large- [27] O. N. Almasi and M. Rouhani. Fast and de-noise support
scale wireless sensor networks. ACM Trans. Multimedia vector machine training method based on fuzzy clustering


Research Survey on Support Vector Machine 10th, July 2017, Chongqing, People’s Republic of China

method for large real world datasets. Turk J Elec Eng and [47] Xiao-Liang Liu, SF Ding, Hong Zhu, and et al. Appropriate-
Comp Sci, 24(1):219–233, 2016. ness in applying svms to text classification. Comput Eng Sci,
[28] R Khemchandani, Suresh Chandra, et al. Twin support vec- 32(6):106–108, 2010.
tor machines for pattern classification. IEEE Trans. pattern [48] Mohamed Elleuch and Monji Kherallah. An improved arabic
analysis and machine intelligence, 29(5), 2007. handwritten recognition system using deep support vector
[29] Divya Tomar and Sonali Agarwal. Twin support vector ma- machines. IJMDEM, 7(2):1–20, 2016.
chine: A review from 2007 to 2014. Egyptian Informatics [49] W Ouarda, H Trichili, A. M Alimi, and B Solaiman. Face
Journal, 16(1):55–69, 2015. recognition based on geometric features using support vector
[30] Yuan Hai Shao, Chun Hua Zhang, Xiao Bo Wang, and machines. In Soft Computing and Pattern Recognition, pages
Nai Yang Deng. Improvements on twin support vector ma- 89–95, 2014.
chines. IEEE Trans. Neural Netw, 22(6):962–968, 2011. [50] Xibin Wang, Junhao Wen, Yihao Zhang, and et al. Real
[31] Y. Tian, Z. Qi, X. Ju, Y. Shi, and et al. Nonparallel sup- estate price forecasting based on svm optimized by pso. Op-
port vector machines for pattern classification. IEEE Trans. tik - International Journal for Light and Electron Optics,
Cybernetics, 44(7):1067, 2014. 125(125):1439–1443, 2014.
[32] Yitian Xu, Xianli Pan, Zhijian Zhou, and et al. Structural [51] Wei Shen, Yunyun Zhang, and Xiaoyong Ma. Stock return
least square twin support vector machine for classification. forecast with ls-svm and particle swarm optimization. In
Applied Intelligence, 42(3):527–536, 2015. BIFE, pages 143–147, 2009.
[33] Sugen Chen, Xiaojun Wu, and Renfeng Zhang. A novel twin [52] Michael Del Rose and Jack Reed. Support vector machine
support vector machine for binary classification problems. application on vehicles. In International Symposium on Op-
Neural Processing Letters, 44:1–17, 2016. tical Science and Technology, pages 144–149, 2001.
[34] Yitian Xu, Zhiji Yang, and Xianli Pan. A novel twin support- [53] Huiqin Chen and Lei Chen. Support vector machine classifi-
vector machine with pinball loss. IEEE Trans. neural net- cation of drunk driving behaviour. International Journal of
works and learning systems, 28(2):359–370, 2017. Environmental Research and Public Health, 14(1), 2017.
[35] Rameswar Debnath, N. Takahide, and Haruhisa Takahashi. [54] P LATHA et al. Svm based automatic medical decision sup-
A decision based one-against-one method for multi-class sup- port system for medical image. JATIT, 66(3), 2014.
port vector machine. Pattern Anal. Appl., 7(2):164–175, [55] L Li, J Li, W Xiao, and Y Li. Prediction the substrate speci-
2004. ficities of membrane transport proteins based on support vec-
[36] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods tor machine and hybrid features. IEEE/ACM Trans. Com-
for multiclass support vector machines. IEEE Trans. Neural putational Biology and Bioinformatics, pages 1–1, 2016.
Networks, 13(2):415–425, 2002. [56] Bin Yu, Shan Li, and Haijun Liu. A hybrid gene selection
[37] Vojtech Franc and Václav Hlavác. Multi-class support vector method for tumor classification based on genetic algorithm
machine. In IEEE Trans. Pattern Recognition, volume 2, and support vector machine. J COMPUT THEOR NANOS,
pages 236–239, 2002. 12(11):4730–4735, 2015.
[38] Gjorgji Madzarov, Dejan Gjorgjevikj, and Ivan Chorbev. A [57] Simon Tong. Support vector machine active learning for im-
multi-class SVM classifier utilizing binary decision tree. In- age retrieval. In ACM International Conference on Multi-
formatica (Slovenia), 33(2):225–233, 2009. media, pages 107–118, 2001.
[39] Chun-Na Li, Yun-Feng Huang, He-Ji Wu, and et al. Multiple [58] Michael I Mandel, Graham E Poliner, and Daniel PW Ellis.
recursive projection twin support vector machine for multi- Support vector machine active learning for music retrieval.
class classification. IJMLC, 7(5):729–740, 2016. Multimedia systems, 12(1):3–13, 2006.
[40] A Navia-Vazquez and E Parrado-Hernandez. Distributed [59] Ela Yildizer, Ali Metin Balci, Mohammad Hassan, and Reda
support vector machines. IEEE Trans. Neural Networks, Alhajj. Efficient content-based image retrieval using multi-
17(4):1091–1097, 2006. ple support vector machines ensemble. Expert Systems with
[41] Yumao Lu, V Roychowdhury, and L Vandenberghe. Dis- Applications, 39(3):2385–2396, 2012.
tributed parallel support vector machines in strongly con- [60] K Aryafar and A Shokoufandeh. Multimodal sparsity-eager
nected networks. IEEE Trans. Neural Networks, 19(7):1167– support vector machines for music classification. In ICMLA,
1178, 2008. pages 405–408, 2014.
[42] K Flouri, B Beferull-Lozano, and P Tsakalides. Distribut- [61] Yun Wang and Florian Metze. Recurrent support vector ma-
ed consensus algorithms for svm training in wireless sensor chines for audio-based multimedia event detection. In ICMR,
networks. In EUSIPCO, pages 1–5, 2008. pages 265–269, 2016.
[43] W Kim, M. S Stankovic, K. H Johansson, and et al. A dis- [62] Christos Tzelepis, Vasileios Mezaris, and Ioannis Patras.
tributed support vector machine learning over wireless sensor Video event detection using kernel support vector machine
networks. IEEE Trans. Cybernetics, 45(11):2599–2611, 2015. with isotropic gaussian sample uncertainty (ksvm-igsu). In
[44] S Scardapane, R Fierimonte, Lorenzo P Di, and et al. Dis- MMM, pages 3–15. Springer, 2016.
tributed semi-supervised support vector machines. Neural [63] Wenbi Rao and Yingshu Li. Real-time multimedia quality
Netw, 80:43–52, 2016. assessment based on media delay index and support vector
[45] Zhixiong Yang and Waheed U. Bajwa. Rd-svm: A resilient machine. In IEEE, CINC’09, volume 1, pages 111–114, 2009.
distributed support vector machine. In ICASSP, pages 2444– [64] Lei Chen, Sule Gunduz, and M. Ozsu. Mixed type audio
2448, 2016. classification with support vector machine. In IEEE, ICME,
[46] Peter De Boves Harrington. Support vector machine classifi- pages 781–784, 2006.
cation trees based on fuzzy entropy of classification. Analyt-
ica Chimica Acta, 2016.



You might also like