10.1007@s10489 019 01428 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Applied Intelligence

https://doi.org/10.1007/s10489-019-01428-1

Multi-view learning with fisher kernel and bi-bagging


for imbalanced problem
Zhe Wang1,2 · Yiwen Zhu2 · Zhaozhi Chen2 · Jing Zhang2 · Wenli Du1

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract
Existing approaches for handling imbalanced problem are based on the discriminant approaches, while only little attention is
dedicated to mining the probability information provided by generative approaches. Moreover, the multi-view learning trains
classifier through combining different representations of data for improving the performance of classifier in imbalanced
classification. In this paper, a learning framework consisting of fisher kernel and Bi-Bagging is proposed for imbalanced
problem. The Fisher kernel is employed to integrate the probability information into the pristine feature of data. Thus,
the generated fisher vector contain better discriminatory information. However, the generated fisher vector may lead to
high-dimension overfitting. So the dataset represented by the fisher vector is then processed by Bi-Bagging to generate
multi-view data and balanced training subsets, which not only reduces the high dimension of generated fisher vector but
also promotes the accuracy of minority instances. In one word, the combination of fisher kernel and Bi-Bagging makes
use of the probability information in the pristine feature and generates balanced multi-view training subsets with adequate
dimension. Therefore, the proposed learning framework is independent of specific models, and the base classifier of the
learning framework can be replaced by different linear classifier. Two experimental strategies are implemented to validate
the effectiveness of the proposed learning framework for imbalanced datasets on 30 KEEL datasets.

Keywords Fisher kernel · Multi-view learning · Ensemble learning · Imbalanced learning · Pattern recognition

1 Introduction class is usually the main objective of the imbalanced clas-


sification task. Due to the scarcity of the positive instances,
Imbalanced classification is a common study in the pat- the results of classification usually influence the accuracy
tern recognition. In imbalanced problem, the size of one of positive class. Moreover, the pattern recognition usu-
class is far more less than the others. Commonly, the class ally consists of two kinds of methodologies, that is, the
with smaller size refers to positive class, and the class with generative approach and the discriminative approach [4].
larger size is defined as the negative class. The positive The generative approach concentrates on building proba-
bility density functions, while the discriminative approach
focuses directly on the classification task. As it imple-
 Zhe Wang
ments decisions based on the posterior knowledge, the
wangzhe@ecust.edu.cn
computational cost is less than the generative approaches
 Jing Zhang and the classifier has better performance than the genera-
jingzhang@ecust.edu.cn tive ones. In other words, the generative approaches pay
attention to the process and the mechanism of a clas-
 Wenli Du
wldu@ecust.edu.cn
sifier model, while the discriminant approaches allow for
the direct learning on the decision model of a data dis-
tribution. Thus, a learning framework that combines the
1 Key Laboratory of Advanced Control and Optimization generative approach with discriminative approach to handle
for Chemical Processes, Ministry of Education, East China
University of Science and Technology, Shanghai, imbalanced problem is proposed in this paper.
200237, China The existing approaches for handling the imbalanced
2 Department of Computer Science and Engineering,
problem include sampling method, cost function, and the
East China University of Science and Technology, ensemble approach. The sampling method re-balances the
Shanghai, 200237, People’s Republic of China size between the classes to make decision boundary fair on
Z. Wang et al.

each class by cutting the partial instances from the negative include part of the whole fisher features. These separated
class or adding the instances similar to positive class. parts are different views, they reduce the risk of overfitting
The cost function adjusts the cost of inaccuracy classified as well as preserving the diversity of feature for preventing
instances belonging to different classes. Usually, the size from losing too much information of data. The different
varying between classes leads to insufficient minimised for views provide complementary information for each other
the minority class. Hence, the positive instances can be [45]. Moreover, the dataset is sampled multiple times for
strengthened by either increasing the cost of misclassified generating re-balanced subsets for handling imbalanced
positive instances or decreasing the cost of misclassified problem. The subsets are used to train corresponding sub-
negative instances. The ensemble approach combines the -classifiers. Every classifier represents part of the whole
sub-classifiers trained from the subsets, which can assure dataset, so they are combined to describe the whole dataset.
that the sub-classifiers are locally balanced as well as In other words, the feature of data and the instance of
preserving the diversity of the entire dataset. data are both handled by bagging. we denote the double
The conventional imbalanced algorithms are proposed bagging as Bi-Bagging. The sub-classifiers are connected
based on the single-view learning. The ordinary single- by ensemble finally. We name the method Multi-view
view methodologies concatenate the multi-view data into Learning with Fisher Kernel and Bi-Bagging (MLFKBB).
one single view data to satisfy the training setting [45]. The contributions of this paper are concluded as:
The concatenation leads to overfitting because each view
– This paper combines the discriminant approach with
has a particular statistical property. Multi-view Learning
the generative approach by introducing fisher kernel
describes the identical objective by using multiple attribute
on imbalanced problem. The fisher kernel brings the
sets. The attribute sets can make up the deficiency where the
probability information for the pristine feature repre-
single attribute set can not demonstrate the entire property of
sentation by fusing the statistical parameters. The fisher
the objective. The different attribute sets provide the training
features denoted by the statistical parameters are bet-
process with different discriminant information. They can
ter discriminatory features. The statistical parameters
align with each other by information from corresponding
enhance the representation of minority instances when
views. Therefore, the entire accuracy of the classification
the number of minority instances is extremely less than
can be improved by the multi-view learning.
majority instances.
Inspired by the multi-view learning and the generative
– This paper handles the imbalanced classification task
approach, it is expected to exploit a method that involves the
through combining Bi-Bagging with fisher kernel. The
multi-view learning and the generative approach. Typically,
fisher kernel maps the pristine feature into new feature
the imbalanced datasets involve recognizing few of the
space for discriminant representation, and the Bi-
instances. The minority instances are hard to study because
Bagging reduces the high dimensional overfitting and
they can not provide enough statistical information. The
re-balances the training set so that the boundary do not
multi-view learning plays a role that generates new views
over-bias to the one side of dataset.
for increasing the training data, so the over-bias of the deci-
– This paper strategically develops and evaluates a
sion boundary can be fixed with the generated views. While
flexible learning framework for handling imbalanced
the generative approach induces the distribution parameters
problem. The learning framework is an ensemble
of the dataset, the feature representation fuses the distri-
method independent of specific classifier. In practice,
bution information which has more precise discriminative
the base classifier varies according to the classification
performance than vanilla feature representation. Therefore,
problem. Thus the MLFKBB provides a flexible learn-
the multi-view learning is considered to combine with the
ing framework for handling the imbalanced problem.
generative approach to improve the performance of classifi-
cation task for imbalanced data. The rest of this paper is organized as follows. Section 2
This paper proposes a distinct method for handling reviews the existing imbalanced learning methods and the
imbalanced classification task. The method represents the multi-view learning approaches. Section 3 presents the
dataset with Gaussian Mixed Model (GMM). The parame- architecture of the MLFKBB. Section 4 reports all the
ters of GMM are used to construct fisher vector for fusing experimental results. Section 5 is the conclusion of this
the generative approach with the discriminant approach. paper.
The fisher vector has better discriminatory features than
normal features, so it can induce more separable boundaries
for classification task. The length of fisher vector is directly 2 Related work
proportional to the number of component of GMM, so
an oversize fisher vector may cause overfitting. Therefore, In this section, the existing methods for imbalanced problem
the fisher vectors are taken apart into different parts that and the multi-view learning are reviewed. The methods
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

that handle imbalanced problem are mainly categorized into The views are mapped by different kernels. They represent
three classes, sampling methodology, cost function, and the instance in different kernel spaces. Lanckriet et al. [24]
ensemble method [16]. regarded MKL as a semi-infinite programming problem.
The sampling methodology aims to change the distribu- Bach et al. [3] handled MKL in the view of a second order
tion of samples for making the model treat the minority cone program problem and came up with an SMO method
instance fair. The distribution of samples is adjusted by to efficiently induce the optimal solution. Sonnenburg et
either increasing the minority class or decreasing the major- al. [38] proposed an efficient semi-infinite linear program
ity class. The Random Under-Sampling (RUS) and Random and enabled MKL to handle large scale problems. Rako-
Over-Sampling (ROS) are common idea for adjusting the tomanonjy et al. [35, 36] explored an adaptive 2-norm regu-
distribution. They make the size of the training data bal- larization formulation with simple MKL. Szafranski et al.
anced by either randomly wiping the majority instances off [42] Xi et al. [46] and Subrahmanya and Shin [39] combined
or adding the minority instance. A different method of over- MKL and group-LASSO based on model group structure.
sampling is proposed to improve the randomicity of ROS, The co-training trains alternately on distinct views to
the Synthetic Minority Oversampling Technique (SMOTE) maximize the agreement of output of the views. Thus,
[8] generates similar instances located around real instances many variants have been developed. Nigam and Ghani [33]
to assure that the oversampled instances follow the original improved generalization of expectation-maximization (EM)
distribution when the number of instance increases. More- through combining the probabilistic labels and unlabeled
over, the improvements on SMOTE method are proposed [6, data. Muslea et al. [30–32] improved active learning with
15]. The CBO algorithm uses K means clustering method co-training and developed robust semi-supervised learning
[21] to generate the instance for imbalanced data. The BEBS method. Yu et al. [47] proposed a method that implements
[43] over-samples the minor samples near multiple SVMs’ co-training with a Bayesian undirected graphical model
soft-margin to revise the initial decision boundary towards and a method that co-trains with kernel for Gaussian
the correct direction for imbalanced data. process classifiers. Wang and Zhou [44] proposed a learning
The cost function improves imbalanced problem by framework that fuses the combinative label propagation
means of weighting the samples and error criteria [41]. The over two views into the graph and disagreement based
objective function is modified by the weight. The weight semi-supervised learning.
focus the objective function on minority class. Masnadi- Subspace learning approaches aim to explore a latent
Shirazi et al. proposed support vector machine (SVM) with subspace the multiple views that share the subspace
cost sensitive method [29]. The improved cost sensitive together. The subspace is generated from the original
SVMs are also proposed [12, 23]. Maloof observed a space. The dimensionality of subspace is lower than
decision tree threshold moving scheme [28]. Zhu et al. the original space, so the subspace learning is fast and
[48] proposed a variant of nearest neighboring method that effective for overcoming the curse of dimensionality. The
estimates weight of samples using the gravitational force of classification and clustering task can be conducted directly
samples for imbalanced classification. on the subspace. Canonical correlation analysis (CCA)
The ensemble methods iteratively learn the training [19] and kernel canonical correlation analysis (KCCA) [1]
data until the properties of data are learned completely. find out basis vectors for two views through maximizing
Every iteration induces a base classifier that is used the dependency between two views, so the basis vector
to describe local information from a part of data. The is a subspace that maximizes the correlation between
existing ensemble algorithm includes Random Forest [17], views. Further, the multi-view clustering and regression
Easyensemble [27], cost sensitive AdaBoost [11], XGBoost are developed [7, 22]. Here, The MLFKBB method is
[9] and so on. There are also hybrid ensemble methods for expected to explore a new way that combines the existing
imbalanced problem. The RUSboost is an ensemble method method for imbalanced problem. The detail of MLFKBB is
that preprocesses the dataset with Random undersampling demonstrated explicitly in the next section.
[37]. The WPOBoost algorithm improved performance
for imbalanced problem differently from RUSboost [26].
A method that combines bagging with undersampling is 3 Bi-bagging and fisher kernel
proposed by Sun et al. [40]. The BPSO-AdaBoost-KNN
algorithm proposed a synthetical ensemble method that In this section, we introduce the Bi-Bagging and fisher
consists of feature selection and boosting [14]. kernel in three parts. The first subsection introduces the con-
The multi-view learning can mainly be summarized into struction of fisher vector. The second subsection describes
three groups: 1) multiple kernel learning (MKL) 2) co- the Bi-Bagging and the process of generation of multi-
training 3) subspace learning [45]. The multiple kernel view training sets. The last subsection depicts the procedure
learning involves the representation with different views. of the training and classification.
Z. Wang et al.

3.1 Fisher kernel and fisher vector The Fλ is computationally difficult, so the diagonal
approximation is an alternative. The diagonal components
The instances can be regarded as points located in the of the approximate information matrix is as follow:
Euclidean space for the discriminative model. These points Nwi
can be classified by the decision boundary. However, fμd = d (8)
i (σi )2
the discriminative model can not induce the probability
information directly. The fisher kernel takes advantage of 2Nwi
fσ d = (9)
the merits of generative and discriminative models [20]. It i (σid )2
can integrate the probability into the discriminative models The (8) and (9) are applied to the (7) in order to construct
so that the classical discriminative model like SVM can the subvector with respect to μ and σ , respectively. Hence,
directly apply in classification based on the probability the subvector is constructed as:
information.  
1 x − μi
The original dataset with N instances is represented by Fμ (x) = √ γn (i)
i
(10)
π σi
using GMM and λ is the set of parameters of the GMM.  
λ = {wi , μi , i , i = 1, ..., K}, where wi are the mixture 1 (x − μi )2
Fσi (x) = √ γn (i) − 1 (11)
weights, μi are the mean of the Gaussian, i are the 2π (σ i )2
covariance. For simplicity, the covariance is assumed to The fisher vector is the concatenation of all Fμi (x)
be diagonal. In this study, the parameters of GMM are and Fσi (x). It’s defined as φ(x) = (Fμ1 (x), ..., FμK (x),
calculated by Expectation Maximization Algorithm (EM).
Fσ1 (x), ..., FσK (x)). The γn (i) represents the posterior of
Assuming that L(X|λ) = logp(X|λ). We expand the w i g (x,λ)
L(X|λ) as follows: the ith Gaussian component, γn (i) = K i .
j =1 wj gj (x,λ)


N To represent an instance, for instance, an image, X =
L(X|λ) = logp(xn |λ), (1) {x1 , ..., xM }, The instance X has M descriptors, and its
final representation is an average vector of all descriptors,
1 M
n=1
the probability of the instance xn is generated by the GMM (X) = M i=1 φ(x). Specially, the (X) equals to φ(x)
as: as the M is one. In other words, the instance is the normal
non-image data when M = 1.

K
p(xn |λ) = wi pi (xn |λ), (2)
i=1
3.2 The fisher kernel-based bi-bagging
the weights are subject to the constraint:
In order to create the multi-view subsets, the dataset is re-

K
sampled multiple times with feature bagging and instance
wi = 1, (3) bagging respectively. As the fisher space is a mapped space
i=1
from the original space, the dimension of fisher vector is
the Gaussian components are defined as: directly proportional to the number of component of GMM.
exp(− 12 (x − μi )T i−1 (x − μi )) Therefore, the length of fisher vector is larger than that
pi (x|λ) = , (4) of the pristine feature vector. For the small size dataset,
(2π )D/2 |i |1/2
the feature with too large dimension is prone to cause
where D denotes the dimensionality of the instances, | · | overfitting. After that, the feature bagging is implemented
is the determinant operator. Differentiating (1) with respect to generate multi-view and to reduce the risk of overfitting.
to λ: The multi-view also complements the information reduction
 
∂L(X|λ) 
N
xnd − μdi for feature bagging. According to Bryll et al. [5], the voting
= γn (i) , (5) classifier achieves the best performance for the feature
∂μdi n=1
(σid )2
subset sizes between 1/3 and 1/2 of the total number of
  features. In this study, the feature sampling is implemented
∂L(X|λ) 
N
(xnd − μdi )2 1
= γn (i) − (6) multiple times to generate some feature subsets. Note that
∂σid n=1
(σid )3 σid the optimal number of the sampled features is treated as
the hyperparameter to be adjusted on different datasets. The
Normalizing the input vectors is important to the discrimi-
instance sampling adjusts the ratio of the size of the instance
nant model. Hence the fisher information matrix is used to
subset between classes, which is one of the classical
normalize the input vectors. The normalized gradient vector
methodology for handling the imbalanced problem. The re-
is represented as:
sampling creates instance subsets for the training of voters.
Fλ−0.5 ∇λ L(X|λ) (7) In conclusion, the MLFKBB combines the fisher kernel and
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

the dual sampling for improving the performance of the under the view Fj . The final voting of an instance x is total
imbalanced classification task. polls of all views. The testing expression is described as:
In detail, the Bi-Bagging consists of two steps. First,
the original dataset is represented by computing the 
J
H(x) = arg max C (x|Fj , y), (13)
components of the GMM. The components of GMM y∈Y j =1
contain the probability information of the dataset. The fisher
vectors then are computed with (10) and (11). Similarly to the most voted label y is the final label of the instance x.
the kernel approaches, the fisher vectors are regarded as The procedure of the method is shown as Table 1.
the instances which are mapped from the original space to
the another Hilbert space by the fisher kernel. The mapped
instances act as the training set including discriminant 4 Experiments
information and probability information simultaneously.
Then, the mapped instances, called fisher feature, are In this section, we report two experimental results for
sampled multiple times. The sampled subsets consists of the demonstration of the effectiveness of the proposed method.
majority instances and the minority instances. We denote The experiments are run on the KEEL datasets.
that the N is the size of the majority instances and P
is that of minority instances. The size of the subsets is 4.1 KEEL datasets and estimate criteria
equal to the size of the majority instances P so that the
sub-classifiers trained from subsets are locally balanced. The KEEL benchmark datasets [2] are employed to estimate
The ensemble of sub-classifiers can describe the decision the performance of MLFKBB. Table 2 lists the detailed
boundary precisely. The feature subsets are obtained based properties of the used datasets. Note that the KEEL
on the instance subsets. Each instance subset generates a is an abbreviation of Knowledge Extraction based on
number of feature subsets. Hence, the number of all subsets Evolutionary Learning (KEEL) repository. These datasets
is NF × NI , where NF is the number of feature subsets and vary in the imbalanced rate and number of samples. The
NI is that of instance subsets. Imbalanced Rate (IR) varies from 1.85 to 127.42. The
dataset is a binary class dataset. We select 30 datasets
3.3 The process of combining base classifiers randomly for experiments. The selected datasets follow the
principles as below: 1) the dimension of dataset varies from
The NF × NI subsets from Bi-Bagging are used to train
base classifiers. The subsets can be divided into two Table 1 Algorithm I: Architecture of The MLFKBB
categories, multi-view subsets and the instance subsets. The
model trained on the instance subsets follows corresponding
distribution of multi-view subsets. Thus, the classifiers
following the same distribution are regarded as a set
of multi-view classifiers, and hence NF feature subsets
have NF sets of multi-view classifiers. The base classifier
can be replaced by any model. The selection of model
of base classifier varies according to the necessary of
problem. In this study, we take different linear classifiers
as the base classifier since the linear classifiers have
stable generalization and have better performance against
overfitting. In the ensemble phase, the base classifiers
output results on their own. The ensemble phase can be
categorized two steps. The first step is to integrate the
classifiers trained from the same multi-view subset. The step
of integrating multi-view classifier is described as:


T
C (x|Fj , y) = I (Hzj (x) = y), (12)
zj =1

where I (·) is the indicator function, which the value equals


1 when function satisfies equation otherwise equals zero.
C (x|Fj , y) is the polls of an instance x belonging to label y
Z. Wang et al.

Table 2 Description of the used KEEL datasets

Dataset Dim size pos neg IR Dataset Dim size pos neg IR

Ecoli0VS1 7 220 143 77 1.85 wisconsin 9 683 239 444 1.86


pima 8 768 268 500 1.87 glass0 9 214 70 144 2.06
yeast1 8 1484 429 1055 2.46 vehicle1 18 846 217 629 2.90
vehicle3 18 846 212 634 2.99 newthyroid2 5 215 35 180 5.14
glass6 9 214 29 185 6.38 ecoli034vs5 7 200 20 180 9.00
ecoli0234vs5 7 202 20 182 9.10 yeast0256vs3789 8 1004 99 905 9.14
ecoli0346vs5 7 205 20 185 9.25 ecoli01vs235 7 244 24 220 9.17
ecoli067vs35 7 222 22 200 9.09 ecoli0267vs35 7 224 22 202 9.18
glass015vs2 9 172 17 155 9.12 ecoli067vs5 6 220 10 200 10.00
ecoli01vs5 6 240 20 220 11.00 led7digit02456789vs1 7 443 37 406 10.97
ecoli0147vs56 6 332 25 307 12.28 cleveland0vs4 13 173 13 160 12.31
pageblocks13vs4 10 472 28 444 15.86 glass016vs5 9 184 9 175 19.44
yeast2vs8 8 482 20 462 23.10 shuttlec2vsc4 9 129 6 123 20.50
yeast4 8 1484 51 1433 28.10 yeast6 8 1484 35 1449 41.40
abalone918 8 731 42 689 16.40 ecoli0137vs26 7 281 7 274 39.14

small to large. 2) including the datasets with IR larger than (MHKS) [25] act as the base classifiers under the learning
9.0 and that with IR less than 9.0. 3) the size of dataset framework, respectively. The hyperparameter C of SVM
involves small scale and big scale. The principles are set to and MHKS are set as {0.01, 0.1, 1, 10, 100}. The feature
make sure that the experimental results are not biased while sampling implements along with the instance sampling. ρ
selecting the experimental datasets. and ξ are the iteration step and termination criterion of the
To compare the performance of algorithms on imbal- MatMHKS . The initialized value of ρ and ξ are set to 0.99
anced data, we evaluate the performance with the area under and 0.0001, respectively. The initial value of margin vector
the ROC curve (AUC) as follow [13]: b is set to 10−6 [11 , ..., 1N ]T . The initial value of weight
vector u from the MatMHKS are set to [0.51 , ..., 0.5N , 1]T .
1 + T PR − FPR The parameter k of GMM component is set to {2, 3, 4, 5} in
AU C = , (14)
2 order to find out the best number for k components that the
dataset needs to be divided into.
where TPR and FPR are the percentages of the accuracy
rate on the positive class and the error rate of negative class 4.2.2 The classification performance of MLFKBB
respectively. The optimal parameter is determined by 5-fold with different linear base classifiers
cross-validation. The number of the compared dataset is 40,
which are shown as Table 2. The compared algorithms are The results of each base classifier under the framework
all launched on Intel Xeon CPU E5-2403 with 1.80GHz, are shown in Table 3. The framework works on the three
8GB RAM DDR3, Windows server 2012 and MATLAB base classifiers. Since the origin versions of the three base
environment. classifiers are not designed for the imbalanced problem, the
performance of the three linear classifiers is improved on the
4.2 MLFKBB with different linear base classifiers imbalanced datasets by the framework. Shown as Figs. 1,
2 and 3 respectively, the different classifiers are embedded
First, we experimentally evaluate the promotion of our into the learning framework and different improvements
proposed method through comparing the original base clas- of performance are achieved. The framework based on
sifiers with our improved versions. Three linear classifiers SVM and MHKS have more obvious improvements. The
are used to validate the effectiveness of the fisher-based best performance resulting from the SVM suggests that
sampling learning framework. the support vector provides more precise boundary for
classification. For the high IR, the SVM has decent
4.2.1 Set-up improvements than the others since SVM can construct a
robust decision boundary according to the support vector.
The Support Vector Machine (SVM) [10], the Logistic Reg- Though the improvements of other base classifiers are less
ression (LR) [18] and the Modified HoKashyap classifier than the version of SVM, the effectiveness appears when our
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

Table 3 Test AUC (%) comparison for experimental method (The best result is written in bold)

Dataset MLFKBB(SVM) SVM MLFKBB(LR) LR MLFKBB(MHKS) MHKS


AUC AUC AUC AUC AUC AUC

ecoli0vs1 97.64 (k = 2) 98.67 96.67 (k = 2) 96.57 96.69 (k = 2) 96.25


wisconsin 97.99 (k = 2) 96.88 94.32 (k = 2) 96.56 95.83 (k = 2) 95.30
pima 75.40 (k = 2) 71.67 75.00 (k = 2) 75.75 72.48 (k = 2) 71.74
glass0 78.24 (k = 2) 73.07 75.97 (k = 2) 74.68 78.07 (k = 2) 70.97
yeast1 72.82 (k = 2) 59.97 70.21 (k = 2) 76.18 69.08 (k = 2) 67.72
vehicle1 84.64 (k = 2) 71.40 83.98 (k = 2) 79.72 81.13 (k = 2) 70.94
vehicle3 83.86 (k = 2) 70.24 82.05 (k = 2) 77.08 81.86 (k = 2) 64.77
newthyroid2 99.44 (k = 2) 98.02 94.17 (k = 2) 98.61 96.35 (k = 2) 96.87
glass6 91.53 (k = 2) 90.86 93.69 (k = 2) 89.28 94.23 (k = 2) 82.44
ecoli034vs5 91.67 (k = 2) 86.67 93.33 (k = 2) 89.44 94.17 (k = 2) 82.78
ecoli0234vs5 94.18 (k = 2) 86.95 92.79 (k = 2) 89.77 95.02 (k = 2) 82.79
yeast0256vs3789 82.03 (k = 2) 61.86 77.53 (k = 2) 76.28 75.26 (k = 2) 72.70
ecoli0346vs5 93.92 (k = 2) 86.96 91.96 (k = 2) 85.20 94.80 (k = 2) 82.84
ecoli01vs235 93.46 (k = 2) 83.59 89.18 (k = 2) 86.86 90.55 (k = 2) 79.59
ecoli067vs35 93.25 (k = 2) 85.50 87.00 (k = 2) 82.50 87.25 (k = 3) 67.50
ecoli0267vs35 88.79 (k = 2) 83.26 87.77 (k = 2) 86.07 85.03 (k = 2) 72.06
glass015vs2 79.41 (k = 2) 50.00 76.80 (k = 2) 71.59 79.79 (k = 2) 49.49
ecoli067vs5 89.00 (k = 2) 87.00 89.25 (k = 2) 83.75 88.50 (k = 2) 76.50
ecoli01vs5 95.23 (k = 2) 86.82 91.14 (k = 2) 90.68 93.18 (k = 2) 83.64
led7digit02456789vs1 91.66 (k = 2) 87.00 87.60 (k = 2) 87.39 83.50 (k = 2) 84.50
ecoli0147vs56 94.54 (k = 2) 87.19 90.41 (k = 2) 86.33 87.02 (k = 2) 83.55
cleveland0vs4 93.41 (k = 2) 76.44 87.98 (k = 2) 79.80 87.92 (k = 2) 77.80
pageblocks13vs4 98.76 (k = 2) 83.33 98.76 (k = 2) 93.05 99.33 (k = 2) 82.99
glass016vs5 94.29 (k = 2) 93.86 95.71 (k = 2) 92.86 94.29 (k = 4) 86.86
yeast2vs8 80.56 (k = 2) 77.39 76.81 (k = 2) 73.06 76.22 (k = 2) 81.31
shuttlec2vsc4 100.00 (k = 2) 94.60 100.00 (k = 2) 75.07 100.00 (k = 2) 98.78
yeast4 83.24 (k = 2) 50.00 81.74 (k = 2) 81.75 83.05 (k = 2) 65.74
yeast6 90.42 (k = 2) 52.72 88.30 (k = 2) 86.04 88.11 (k = 2) 77.47
abalone918 86.80 (k = 2) 73.73 84.95 (k = 2) 89.27 87.69 (k = 2) 84.53
ecoli0137vs26 86.54 (k = 2) 85.00 86.17 (k = 2) 79.24 93.97 (k = 3) 89.45
Average rank 1.46 – 2.32 – 2.15 –

proposed learning framework carries out. The experimental different categories of methodologies which are usually
results suggest that the linear base classifier can be applied employed to handle the imbalanced problem, including
to this learning framework for handling the imbalanced sampling method, cost sensitive method, and ensemble
problem. For further research, the improved linear classifier method. According to the first experiment, the linear
can be embedded to the learning framework for the sake SVM outperforms the compared learning frameworks.
of better performance. The best component parameter k on Hence, the linear SVM is taken as base classifier in this
the KEEL is k = 2, where the datasets are the binary experiment. The sampling method re-balances the size of
classification tasks, so the two components of GMM can instances in the each class. The SMOTE is taken as a
represent the distribution of most datasets of KEEL. comparison in this experiment. It’s an effective and common
oversampling method. The cost sensitive SVM is also
4.3 MLFKBB compared with similar approaches taken into comparison. The AdaBoost, Easyensemble and
XGBoost are taken as the representations of the ensemble
In this experiment, we compare the learning framework methods. The BEBS is taken as a compound of sampling
with the similar methods, validating the improvements for and ensemble method. As the MLFKBB also adopt SVMs
imbalanced problem. The compared methods consist of as the base classifier, the comparison between MLFKBB
Z. Wang et al.

100
MLFKBB(SVM)
SVM

90

80

70

60

50

40

30

20

10

0
ecoli0vs1 wiscoin pima glass0 yeast1 vehi.1 vehi.3 newth.2 gl.6 eco.0345 eco.02345 ye.2563789 ec.3465 ec.1235 ec.6735

100
MLFKBB(SVM)
SVM

90

80

70

60

50

40

30

20

10

0
eco.26735 gl.152 eco.675 eco.15 led. eco.14756 cl.04 pb.134 gl.165 ye.28 shtt.24 ye.4 ye.6 ablo.918 ec.13726

Fig. 1 The MLFKBB with SVM on 30 KEEL datasets, the used KEEL can be achieved. The dimensionality of the dataset is reduced by Bi-
datasets are listed in Table 2. For the high IR, the SVM improves Bagging, therefore the speed of training is faster than the training
the performance of imbalanced datasets obviously. The support vector where the complete fisher vector is used to train
increases the generalization of linear classifier, so the robust boundary
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

100
MLFKBB(LR)
LR

90

80

70

60

50

40

30

20

10

0
ecoli0vs1 wiscoin pima glass0 yeast1 vehi.1 vehi.3 newth.2 gl.6 eco.0345 eco.02345 ye.2563789 ec.3465 ec.1235 ec.6735

100
MLFKBB(LR)
LR

90

80

70

60

50

40

30

20

10

0
eco.26735 gl.152 eco.675 eco.15 led. eco.14756 cl.04 pb.134 gl.165 ye.28 shtt.24 ye.4 ye.6 ablo.918 ec.13726

Fig. 2 The MLFKBB with LR on 30 KEEL datasets, the used KEEL datasets are listed in Table 2. Though the performance is not obvious
compared to the other base classifiers, the LR shows the effectiveness for high IR as well
Z. Wang et al.

100
MLFKBB(MHKS)
MHKS

90

80

70

60

50

40

30

20

10

0
ecoli0vs1 wiscoin pima glass0 yeast1 vehi.1 vehi.3 newth.2 gl.6 eco.0345 eco.02345 ye.2563789 ec.3465 ec.1235 ec.6735

100
MLFKBB(MHKS)
MHKS

90

80

70

60

50

40

30

20

10

0
eco.26735 gl.152 eco.675 eco.15 led. eco.14756 cl.04 pb.134 gl.165 ye.28 shtt.24 ye.4 ye.6 ablo.918 ec.13726

Fig. 3 The MLFKBB framework with MHKS on 30 KEEL datasets, find out optimal decision boundary by using margin vector instead of
the used KEEL datasets are listed in Table 2. The mechanism of MHKS support vector. The performance of MHKS is close to SVM, thus the
is similar to SVM. The margin between two classes is maximized to similar improvements are reached under the framework
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

Table 4 Summary of
experimental parameters Hyper parameter Method Range
Csvm linear SVM,BEBS {0.01, 0.1, 1, 10, 100}
D MLFKBB {2, 3, 4, ..., 2*K*datadim }

Setting value Method Value


K GMM 2
J MLFKBB 5
T MLFKBB, Easyensemble,BEBS 5

Parameters for XGBoost Method Value


eta XGBoost 0.1
subsample XGBoost 0.8
maxdepth XGBoost {3,4,5}
minchildweight XGBoost {0.1,0.5,1}
scaleposweight XGBoost {1,3,5,7}

Table 5 Test AUC (%) comparison for experimental method (The best result is written in bold)

Dataset MLFKBB EasyEnsemble AdaBoost SMOTE+SVM Cost Sensitive SVM BEBS XGBoost
AUC AUC AUC AUC AUC AUC AUC

ecoli0vs1 97.64 ± 2.78 97.61 ± 1.53 98.32 ± 2.38 97.96 ± 2.19 97.96 ± 2.19 97.96 ± 1.96 98.66 ± 1.63
wisconsin 97.99 ± 0.83 97.48 ± 0.81 95.03 ± 2.28 97.38 ± 1.04 97.38 ± 1.03 97.65 ± 0.81 97.38 ± 0.77
pima 75.40 ± 2.29 73.88 ± 2.65 72.38 ± 3.53 75.38 ± 3.39 75.10 ± 3.13 73.63 ± 2.06 76.72 ± 1.79
glass0 78.24 ± 9.98 86.32 ± 4.52 76.01 ± 2.56 83.37 ± 9.2 74.46 ± 5.38 75.49 ± 3.83 85.89 ± 4.7
yeast1 72.82 ± 2.50 73.25 ± 1.82 71.15 ± 1.81 71.67 ± 3.28 71.20 ± 4.09 70.85 ± 2.60 73.91 ± 2.5
vehicle1 84.64 ± 4.71 77.24 ± 5.72 73.63 ± 4.31 79.28 ± 1.89 81.27 ± 3.12 80.99 ± 0.85 78.51 ± 5.32
vehicle3 83.86 ± 2.86 76.78 ± 5.97 75.37 ± 3.33 78.27 ± 2.83 78.57 ± 3.53 78.09 ± 2.38 80.04 ± 3.23
v newthyroid2 99.44 ± 0.76 98.89 ± 1.16 94.64 ± 4.63 99.44 ± 1.26 99.44 ± 0.76 98.02 ± 2.65 97.73 ± 3.19
glass6 92.88 ± 6.11 90.95 ± 4.63 92.84 ± 4.74 93.48 ± 4.71 93.38 ± 5.48 91.17 ± 6.59 94.18 ± 4.2
ecoli034vs5 93.89 ± 11.39 90.00 ± 12.32 84.72 ± 9.21 90.56 ± 11.13 90.56 ± 11.13 89.72 ± 5.73 91.94 ± 6.5
ecoli0234vs5 95.02 ± 5.26 92.79 ± 10.85 85.30 ± 9.39 90.29 ± 12.08 90.56 ± 11.58 87.24 ± 9.87 89.45 ± 9.8
yeast0256vs3789 82.03 ± 2.67 78.39 ± 6.45 78.17 ± 5.79 81.78 ± 5.6 80.09 ± 4.48 80.26 ± 4.29 81.72 ± 6.8
ecoli0346vs5 93.92 ± 7.18 90.68 ± 9.40 88.45 ± 10.48 90.88 ± 6.7 89.53 ± 5.87 87.91 ± 7.12 89.45 ± 5.3
ecoli01vs235 93.46 ± 6.03 89.59 ± 5.72 84.32 ± 15.55 88.73 ± 8.61 88.73 ± 87.82 84.45 ± 13.72 87.59 ± 13.31
ecoli067vs35 93.25 ± 4.73 84.50 ± 16.07 86.00 ± 16.11 88.00 ± 15.73 86.75 ± 15.53 84.75 ± 18.29 89.25 ± 15.03
ecoli0267vs35 88.79 ± 9.73 86.06 ± 10.00 83.52 ± 10.79 86.54 ± 10.51 86.53 ± 9.89 86.04 ± 9.03 86.27 ± 11.4
glass015vs2 79.49 ± 9.69 73.92 ± 11.81 70.89 ± 16.91 78.36 ± 9.35 72.93 ± 13.34 55.48 ± 2.62 65.75 ± 13.05
ecoli067vs5 89.25 ± 9.95 87.00 ± 7.53 88.25 ± 5.42 88.75 ± 5.66 88.50 ± 7.09 86.75 ± 4.65 86.75 ± 13.47
ecoli01vs5 95.23 ± 4.50 88.18 ± 6.80 87.05 ± 11.72 90.68 ± 6.69 90.45 ± 7.30 88.18 ± 9.6 86.59 ± 11.75
led7digit02456789vs1 91.66 ± 4.33 87.81 ± 9.98 90.31 ± 8.24 90.19 ± 8.57 86.07 ± 10.90 87.78 ± 7.12 90.92 ± 7.01
ecoli0147vs56 94.54 ± 4.92 90.61 ± 3.27 88.08 ± 4.27 92.53 ± 4.57 91.75 ± 3.64 88.24 ± 5.89 89.83 ± 6.33
cleveland0vs4 93.41 ± 2.63 91.67 ± 6.45 75.85 ± 10.08 92.29 ± 6.84 89.47 ± 10.46 87.8 ± 13.37 78.13 ± 11.7
pageblocks13vs4 98.76 ± 1.15 98.54 ± 0.64 97.66 ± 3.41 70.65 ± 14.07 95.40 ± 4.30 63.24 ± 7.8 99.77 ± 0.44
glass016vs5 94.86 ± 3.86 90.29 ± 7.27 92.57 ± 3.96 94.14 ± 3.92 98.86 ± 1.20 98.57 ± 1.28 89.42 ± 12.38
yeast2vs8 80.56 ± 8.06 73.84 ± 4.56 76.64 ± 10.39 79.03 ± 10.77 76.53 ± 9.64 75.76 ± 8.05 79.45 ± 9.6
shuttlec2vsc4 100.00 ± 0.00 90.00 ± 6.78 90.00 ± 22.36 100.00 ± 0 94.60 ± 10.99 100.00 ± 0.00 95.0 ± 10.0
yeast4 84.76 ± 2.38 82.39 ± 22.36 82.71 ± 3.90 82.61 ± 10.52 82.41 ± 1.86 84.59 ± 2.51 79.90 ± 6.67
yeast6 90.42 ± 6.85 84.99 ± 3.07 86.94 ± 9.06 90.20 ± 3.41 87.82 ± 6.67 89.56 ± 4.83 85.72 ± 9.51
abalone19 86.85 ± 4.68 78.75 ± 7.36 74.02 ± 5.78 88.94 ± 7.72 91.08 ± 2.84 72.65 ± 9.3 58.86 ± 6.64
ecoli0137vs26 86.72 ± 21.27 78.33 ± 20.07 75.72 ± 19.44 87.81 ± 21.66 86.72 ± 21.09 84.81 ± 20.1 74.81 ± 22.36
Average AUC 89.65 86.02 83.88 87.30 87.13 84.25 85.31
Average rank 1.75 4.67 5.65 2.97 3.78 5.07 4.12
Z. Wang et al.

and BEBS can demonstrate the superiority of the probability 5 Conclusion


information provided by the fisher kernels.
According to the previous experiment, the SVM as the The learning framework MLFKBB is proposed for handling
base classifier performs the best improvements on the imbalanced problem, which is independent of the model of
imbalanced datasets. Hence, the linear SVM acts as the base classifier. The MLFKBB takes advantage of the information
classifier under the learning frameworks that are designed got from generative approach for improvement on the
for the imbalanced problem in this experiment. Adaboost, classification task and combines the multi-view method
EasyEnsemble and BEBS take the linear SVM as base and the ensemble method to increase the learning ability
classifier as well. Its hyperparameter C is selected by grid for the minority instance in imbalanced problem. The
search range in {0.01, 0.1, 1, 10, 100}. The optimization experimental results suggest that the MLFKBB has the
approach is the Sequential minimal optimization (SMO) highest rank in the comparison experiments and works
[34]. Based on the conclusion of the previous experiment, on the imbalanced problem. For the high IR and low IR,
the component parameter K is set to 2. In EasyEnsemble, the MLFKBB is not obvious for improvements, but the
the sampling number is set to 5 and the iteration times performance is comparable with the conventional method
in every subset is set to 10 as recommended in [27]. The for imbalanced problem, and for the middle IR, MLFKBB
hyperparameters are listed as Table 4, where Csvm is the has obvious improvement. The MLFKBB successfully
parameter for linear SVM. D is the under-sampling number combine the fisher kernel and multi-view approach to
of fisher features for MLFKBB, T is the under-sampling achieve satisfying classification performance. The fisher
times on majority instances and J is the feature under- kernel constructed based on GMM works for providing
sampling times for MLFKBB. The performances of all probability information. The multi-view data generated by
the used algorithm are reported based on their optimal Bi-Bagging reduces the overfitting and re-balances the
hyperparameters. subsets for training. Thus, the proposed MLFKBB is an
The results of the comparison are listed as Table 5. It effective method for handling the imbalanced problem.
shows us the performance of MLFKBB. According to the
Acknowledgments This work is supported by Natural Science
average rank, the MLFKBB sits on the highest rank. For the Foundation of China under Grant No. 61672227, “Shuguang Program”
datasets with IR less than 7.0, the best performance method supported by Shanghai Education Development Foundation and
is different. Easyensemble outperforms other methods on Shanghai Municipal Education Commission, and National Science
Foundation of China for Distinguished Young Scholars under Grant
the glass0. XGBoost outperforms other methods on the
61725301.
ecoli0vs1,pima,glass6 and celveland0vs6. The AdaBoost
performs best on the ecoli0vs1. The SMOTE and cost Publisher’s note Springer Nature remains neutral with regard to
sensitive method do not stand out, but they have a higher jurisdictional claims in published maps and institutional affiliations.
rank than the other methods. In the range of middle
IR (larger than 9.0 and less than 30.0), the MLFKBB
outperforms the contrast methods. It says that the MLFKBB References
adapts the middle IR problem. For the dataset with IR higher
than 30.0, the MLFKBB and other ensemble methods do not 1. Akaho S (2006) A kernel method for canonical correlation
analysis. arXiv:cs/0609071
perform well. Because the ensemble methods are not stable 2. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, Garcı́a S,
when the minority instances are extremely outnumbered Sánchez L, Herrera F (2011) Keel data-mining software tool: data
by the majority instance. The SMOTE and cost sensitive set repository, integration of algorithms and experimental analysis
method can precisely adjust the error of classification when framework. J Multiple-Valued Logic Soft Comput 17:255–287
3. Bach F, Lanckriet GR, Jordan MI (2004) Multiple kernel learning,
the minority instances are obviously less than the majority conic duality, and the SMO algorithm. In: International conference
instance. The experience risk of the minority instances is on machine learning. ACM, pp 6–13
well controlled by the cost function and the SMOTE. So the 4. Bishop CM (2007) Pattern recognition and machine learning.
Springer
performance of the sampling and the cost sensitive method 5. Bryll R, Gutierrez-Osuna R, Quek F (2003) Attribute bagging:
is better than the ensemble methods. Compared with BEBS improving accuracy of classifier ensembles by using random
and XGBoost, the MLFKBB has a superior performance feature subsets. Pattern Recogn 36(6):1291–1302
on these datasets with different IR. Hence, according to the 6. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009)
Safe-level-smote: safe-level-synthetic minority over-sampling
results, we can conclude that the MLFKBB is qualified for technique for handling the class imbalanced problem. In: Pacific-
IR within the middle range. The proposed MLFKBB has Asia conference on advances in knowledge discovery and data
potential application values. mining, pp 475–482
Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

7. Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi- on machine learning workshop learning from imbalanced data
view clustering via canonical correlation analysis. In: International sets II
conference on machine learning, pp 129–136 29. Masnadi-Shirazi H, Vasconcelos N, Iranmehr A (2012) Cost-
8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) sensitive support vector machines. arXiv:1212.0975
Smote: synthetic minority over-sampling technique. J Artif Intell 30. Muslea I, Minton S, Knoblock CA (2002) Adaptive view
Res 16:321–357 validation: a first step towards automatic view detection. In:
9. Chen T, Guestrin C (2016) Xgboost; a scalable tree boosting International conference on machine learning, pp 443–450
system. In: Proceedings of the 22Nd ACM SIGKDD international 31. Muslea I, Minton S, Knoblock CA (2003) Active learning with
conference on knowledge discovery and data mining, KDD ’16. strong and weak views: a case study on wrapper induction. In:
ACM, New York, pp 785–794 International joint conference on artificial intelligence, vol 3, pp
10. Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 415–420
20(3):273–297 32. Muslea IA (2011) Active learning with multiple views. J Artif
11. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclas- Intell Res 27(1):203–233
sification cost-sensitive boosting. In: International conference on 33. Nigam K, Ghani R (2000) Analyzing the effectiveness and
machine learning, vol 99, pp 97–105 applicability of co-training. In: International conference on
12. Fumera G, Roli F (2002) Support vector machines with embedded information and knowledge management, pp 86–93
reject option. Pattern Recogn Support Vector Mach, 68–82 34. Platt J (1998) Sequential minimal optimization: a fast algorithm
13. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F for training support vector machines. In: Advances in kernel
(2012) A review on ensembles for the class imbalance problem: methods-support vector learning, pp 212–223
bagging-, boosting-, and hybrid-based approaches. IEEE Trans 35. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More
Syst Man Cybern Part C (Appl Rev) 42(4):463–484 efficiency in multiple kernel learning. In: International conference
14. Guo H, Li Y, Li Y, Liu X, Li J (2016) Bpso-adaboost-knn on machine learning, pp 775–782
ensemble learning algorithm for multi-class imbalanced data 36. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008)
classification. Eng Appl Artif Intel 49:176–193 Simplemkl. J Mach Learn Res 9(3):2491–2521
15. Han H, Wang W, Mao BH (2005) Borderline-smote: a new over-
37. Seiffert C, Khoshgoftaar TM, Van HJ, Napolitano A (2010)
sampling method in imbalanced data sets learning. In: Advances
Rusboost: a hybrid approach to alleviating class imbalance.
in intelligent computing, vol 3644. Springer, pp 878–887
IEEE Trans Syst Man Cybern-Part A: Syst Humans 40(1):185–
16. He HB, Garcia EA (2009) Learning from imbalanced data. IEEE
197
Trans Knowl Data Eng 21(9):1263–1284
38. Sonnenburg S (2005) A general and efficient multiple kernel
17. Ho TK (1995) Random decision forests. In: International
learning algorithm. Adv Neural Inf Process Syst 18:1273–1280
conference on document analysis and recognition, vol 1. IEEE, pp
39. Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning
278–282
for signal processing applications. IEEE Trans Pattern Anal Mach
18. Hosmer DW Jr, Lemeshow S, Sturdivant RX (1991) Applied
Intell 32(5):788–798
logistic regression. Stat Med 10(7):1162–1163
19. Hotelling H (1935) Relations between two sets of variants. 40. Sun B, Chen HY, Wang J, Xie H (2018) Evolutionary under-
Biometrika 28(3-4):312–377 sampling based bagging ensemble method for imbalanced data
20. Jaakkola TS, Haussler D (1998) Exploiting generative models classification. Front Comput Sci 12(2):331–350
in discriminative classifiers. Adv Neural Inf Process Syst 41. Sun Y, Wong AKC, Kamel MS (2009) Classification of imba-
11(11):487–493 lanced data: a review. Int J Pattern Recogn Artif Intell 23(4):687–
21. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. 719
ACM Sigkdd Explor Newslett 6(1):40–49 42. Szafranski M, Grandvalet Y, Rakotomamonjy A (2010) Compos-
22. Sham MK, Dean PF (2007) Multi-view regression via canonical ite kernel learning. Mach Learn 79(1–2):73–103
correlation analysis. Lect Notes Comput Sci 4539:82–96 43. Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble
23. Kwok T (1999) Moderating the outputs of support vector machine method for imbalanced data learning: bagging of extrapolation-
classifiers. IEEE Trans Neural Netw 10(5):1018–1031 smote svm. Comput Intell Neurosci 2017:11
24. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI 44. Wang W, Zhou ZH (2010) A new analysis of co-training. In:
(2004) Learning the kernel matrix with semidefinite program- International conference on international conference on machine
ming. J Mach Learn Res 5(Jan):27–72 learning, pp 1135–1142
25. Leski J (2003) Ho–kashyap classifier with generalization control. 45. Xu C, Tao D, Xu C (2013) A survey on multi-view learning.
Pattern Recogn Lett 24(14):2281–2290 arXiv:1304.5634
26. Li Q, Li G, Niu WJ, Cao Y, Chang L, Tan J, Guo L 46. Xu Z, Jin R, Yang H, King I, Lyu MR (2010) Simple and
(2016) Boosting imbalanced data learning with wiener process efficient multiple kernel learning by group lasso. In: International
oversampling. Front Comput Sci, 1–16 conference on machine learning, pp 1175–1182
27. Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for 47. Yu S, Krishnapuram B, Rosales R, Rao RB (2011) Bayesian
class-imbalance learning. IEEE Trans Syst Man Cybern Part B co-training. J Mach Learn Res 12(3):2649–2680
(Cybern) 39(2):539–550 48. Zhu YJ, Wang Z, Gao DQ (2015) Gravitational fixed radius
28. Maloof MA (2003) Learning when data sets are imbalanced and nearest neighbor for imbalanced problem. Knowl-Based Syst
when costs are unequal and unknown. In: International conference 90:224–238
Z. Wang et al.

Zhe Wang received the B.Sc. Jing Zhang received the


and Ph.D. degrees in Depart- Ph.D. degrees in Department
ment of Computer Science of Computer Science and
and Engineering, Nanjing Engineering, Fudan Univer-
University of Aeronautics and sity, Shanghai, China in 2007,
Astronautics, Nanjing, China, She is now a full associate
2003 and 2008, respectively. professor in Department of
He is now an Associate Computer Science and Engi-
Professor in Department of neering, East China University
Computer Science and Engi- of Science and Technology,
neering, East China University Shanghai, China. Her research
of Science and Technol- interests include Computer
ogy, Shanghai, China. His vision, image understanding
research interests include fea- and video semantic analysis.
ture extraction, kernel-based At present, she has more than
methods, image processing, 20 papers with the first or cor-
and pattern recognition. At present, he has several papers with the responding author published on some famous international journals
first author published on some international journals including IEEE and conferences.
Trans. Pattern Anal. and Mach. Intell, IEEE Trans. Neural Networks,
Pattern Recognition etc.
Wenli Du received the BS
and MS degrees in chemi-
Yiwen Zhu received his bach- cal process control from the
elor degree in Shanghai Jiao Dalian University of Technol-
Tong University, he is now ogy, Dalian, China, in 1997
under post-graduate in Depart- and 2000, respectively, and the
ment of Computer Science and PhD degree in control the-
Engineering, East China Uni- ory and control engineering
versity of Science and Tech- from the East China Univer-
nology, Shanghai, China. His sity of Science and Technol-
research interest include pat- ogy, Shanghai, China, in 2005.
tern recognition and feature She is currently a professor
extraction. and the dean of the College of
Information Science and Engi-
neering and vice dean of the
Key Lab of Advanced Control
and Optimization for chemical process, Ministry of Education, East
China University of Science and Technology, China. Her research
interests include control theory and application, system modeling,
Zhaozhi Chen received his advanced control, and process optimization.
bachelor degree from Depart-
ment of computer science
and technology in Hubei Uni-
versity, master degree from
Department of computer sci-
ence and technology in East
China University of Science
and Technology. When he
was a graduate student, he
focused on the machine lear-
ning for designing classification
algorithm, including imbal-
anced problem classification
and ensemble learning.

You might also like