Professional Documents
Culture Documents
Neurocomputing: Nikolaos L. Tsakiridis, Christos G. Chadoulos, John B. Theocharis, Eyal Ben-Dor, George C. Zalidis
Neurocomputing: Nikolaos L. Tsakiridis, Christos G. Chadoulos, John B. Theocharis, Eyal Ben-Dor, George C. Zalidis
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: To ensure the sustainability of the soil ecosystem, which is the basis for food production, efficient large-
Received 13 March 2019 scale baseline predictions and trend assessments of key soil properties are necessary. In that regard,
Revised 22 November 2019
visible, near-infrared, and shortwave infrared (VNIR–SWIR) spectroscopy can provide an alternative for
Accepted 7 January 2020
the expensive wet chemistry. In this paper, we examined the application of the Multiple-Kernel Learn-
Available online 11 January 2020
ing (MKL) approach to soil spectroscopy by integrating the information from heterogeneous features. In
Communicated by Dr. Ivor Tsang particular, the proposed three-level MKL framework acts in the following way: at the first level, it uses
multiple kernels at each spectral feature (wavelength) to maximize the information of each band. At the
Keywords:
Multiple Kernel Learning (MKL) second level, it performs implicit feature selection at the spectral source level, enabling it to provide in-
Kernel alignment terpretable results. Finally, at the third level of integration it combines the complementary information
Heterogeneous source combination contained within a pool of spectral sources, each derived from its own set of pre-processing techniques.
Soil organic carbon Additionally, at this stage, the proposed approach is also capable of fusing heterogeneous sources of in-
Soil texture formation, such as auxiliary predictors, which can assist the spectral predictions. The experimental anal-
VNIR–SWIR spectroscopy ysis was conducted using the pan-European LUCAS (Land Use/Cover Area frame statistical Survey) topsoil
database, with a goal to predict from the VNIR–SWIR spectra the concentration of soil organic carbon
(SOC), a key indicator for agricultural productivity and environmental resilience. The particle size distri-
bution which describes the soil texture was selected as the set of auxiliary predictors. The proposed MKL
framework was compared with other state-of-the-art approaches, and the results indicated that it attains
the best performance in terms of accuracy, whilst at the same time producing interpretable results.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.neucom.2020.01.008
0925-2312/© 2020 Elsevier B.V. All rights reserved.
28 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41
produce accurate estimates for key physical and chemical soil significant attention lately, due to its ability to represent or dis-
properties [5–7]. Compared to traditional wet chemistry, it is a criminate between data using multiple base kernels in a more
faster and more cost-effective solution. Soil is a significantly com- efficient way. MKL can address the aforementioned shortcomings
plex material extremely variable in physical and chemical compo- by: i) performing feature selection by combining and selecting
sition comprised of all three phases of matter: solid (mixture of the most appropriate feature kernels, and ii) combining hetero-
inorganic and organic matter in a concoction of primary and sec- geneous sources of information for learning the decision function
ondary minerals, organic components, and salts), liquid (water and by constructing optimal base kernels for each source. The two
dissolved anions), and gas. Due to this inherent complexity, the re- most important aspects of most MKL methods are the selection
flectance spectrum of a given soil sample is influenced by a num- of the kernels and how they contribute towards the final kernel.
ber of chromophores, which are parameters or substances (chem- With respect to the first, many diverse approaches for optimizing
ical or physical) affecting the shape and nature of the spectrum. the kernel mixture coefficients have emerged over the years, such
In addition, the spectral signals related to one chromophore of- as gradient descent methods [31], localized methods [32,33], and
ten overlap with other chromophores, constituting the problem of kernel alignment methods [34,35]. Their relevant contribution
associating the bands of VNIR–SWIR with concentrations of soil is usually addressed using linear or convex combination of the
properties a challenging task [8]. This also renders the task of base kernels. Models may also be developed in either one-stage
wavelength assignment, i.e. of identifying whether the wavelengths or two-stages: in the former, the optimal kernel combination
used by any prediction model are ascribed to a chromophore or are parameters and the structural parameters of the classifier / regres-
just spectral noise, very important. sor are learned simultaneously, whilst the latter decouples these
In recent years, much work has concentrated around the devel- processes. MKL has been applied in a plethora of domains, such
opment of large soil spectral libraries (SSLs), where soil samples as image classification [36], remote sensing [37], financial distress
are collected using a sampling strategy – the VNIR–SWIR spectrum prediction [38], face recognition [39], with multimedia [40], and
of each sample and its key soil properties (using wet chemistry) to identify drug side effects [41].
are then recorded. In that regard, many datasets around the world Recent advances in MKL have proposed a hybrid kernel align-
have been developed ranging from national [9,10] to continental ment method, by introducing a combination of the traditional
scales [11], whilst a recent effort focused on assimilating local SSLs global and a local kernel [42]. In essence, the incorporation of the
into a global one [12]. In the European Union, the European Statis- local information (i.e. from the nearest neighbors of each sample)
tical Office (EUROSTAT) is organizing a regular, harmonised surveys when computing the kernel alignment can lead to better perfor-
across all member states to gather information on land cover and mance. Another important point is whether a sparse or a non-
land use, as detailed in [13] and abbreviated to as LUCAS (Land sparse kernel weight mixture should be preferred; sparse MKL
Use/Cover Area frame statistical Survey). is helpful in interpreting the results (i.e. performs implicit fea-
At the same time, novel machine learning methods are being ture selection), whereas non-sparse can lead to better performance
developed to better identify and interpret the relation between (sparsity-accuracy trade-off, [43]). In fact, the l1 -norm MKL which
VNIR–SWIR spectra and properties [14–18]. As a first step, most promotes sparsity is rarely observed to outperform the trivial uni-
approaches use spectral pre-processing techniques (also called pre- form weight mixture. To that end, a more efficient solution has
treatments) to enhance the absorption peaks or perform scatter been proposed, involving the use of arbitrary norms (i.e. lp -norms
correction and assist the models to attain enhanced accuracy [19]. with p ≥ 1) which attain better accuracy but are non-sparse [44].
However, these approaches usually disregard the complementary In this paper we present a framework for multi-source com-
information contained by these spectral sources, since they re- bination and feature selection for VNIR–SWIR soil spectroscopy,
tain only the best (in terms of accuracy) spectral source. In [20] a which can be extended with the use of additional predictors. The
simple approach was proposed to use model stacking and implic- goal of this work is to demonstrate that:
itly use this information by combining the predictions of differ-
ent models (developed using various machine learning models and 1. Combining multiple spectral sources yields better results
pre-processing techniques). Another effort focused on combining than using only the best source, considering that there exists
the information by two spectral sources using a memory-based sufficient complementary information which must be appro-
learning technique [21]. The focus in [22] was placed on deriving priately combined;
interpretable results to identifying the relationship between spec- 2. The inclusion of auxiliary predictors is straightforward and
trum and output property. The PARACUDA II® data mining engine can assist the VNIR–SWIR predictions;
[23,24] is another example, which creates a fused spectral source 3. The use of multiple-spectral sources and additional predic-
by picking individually for each wavelength the pre-process tech- tors can happen simultaneously which can lead to the best
nique exhibiting the highest correlation with the output. However, results compared to the current state-of-the-art.
no concrete effort has been made hitherto to explicitly identify and
use the information contained within a pool of pre-processed spec- The presented framework entails the following three levels of
tra during the model building step in a more structured approach. MKL integration:
A different way to enhance predictions is to use auxiliary variables
such as e.g. the geographical coordinates, or other easily and inex- MKL at the feature level: Whereby different kernels are defined
pensively measurable properties such as the pH, or the soil parti- at each wavelength in order to maximize the use of its in-
cle size distribution [25]. The latter describes the physical texture formation;
by measuring the distribution of three soil separates according to MKL at the source level: Here the kernels associated with some
their relative size: in sand (0.063 - 2mm), silt (0.002 - 0.063 mm), of the individual features are combined to form the kernel
and clay (≤ 0.002 mm). of each spectral source, thereby performing implicit feature
Thus, the present study is driven by the need for more ac- selection.
curate and interpretable global models. Kernel methods such MKL at the source combination level: The heterogeneous sources,
as the support vector machines (SVM) [26,27] and Gaussian Pro- namely the different spectral sources originating from differ-
cesses [28] have been shown to be more robust and less sensitivity ent spectral pre-treatments and the auxiliary information in
to the increase of dimensionality compared to other techniques. the form of the textural information, are combined to yield
The Multiple Kernel Learning (MKL) approach [29,30] has gained enhanced predictions.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 29
Thus, the specific contribution of the present work is the adap- trade-off between accuracy and smoothness. We typically consider
tation of the MKL framework in the domain of soil spectroscopy, linear models of the form
and in particular: (i) the integration/fusion of multiple spectral
f ( x ) = w, φ ( x ) + b (2)
sources in a structured approach, where the complementary infor-
mation contained within is effectively combined, and (ii) the inte- where · , · is the inner product operator, φ : X → H denotes
gration of heterogeneous sources whereby the spectral information a possibly nonlinear mapping from the original input space to a
of each sample is combined with the textural one. Hilbert space H and b. is a bias term. The regularization function
The rest of the paper is organized as follows: Section 2 de- assumes the form ( f ) = 12
w
22 , which constrains the decision
scribes the LUCAS topsoil database (Section 2.1) and provides function f to be smooth.
an overview on the Support Vector Regression (SVR) algo- Plugging Vapnik’s -insensitive loss function l = |y − f (x )| =
rithm (Section 2.2) and on past MKL approaches (Section 2.3). max{0, |y − f (x )| − } in (1), where codifies our tolerance to the
Section 3 presents the proposed MKL framework for soil spec- prediction errors, the minimizer f is obtained through the solution
troscopy and details the application as well as adaptions made of the following optimization problem
on the past MKL approaches. The experimental set-up is given in 1
Section 4, whereas the results are presented in Section 5. A discus-
N
min
w
2 + C |yi − f ( xi )| (3)
sion of the positive attributes of the proposed framework is made w 2
i=1
in Section 6, whereas Section 7 presents the conclusions and future
directives of this work. where C corresponds to the regularization parameter of (1), per-
forming the same trade-off between accuracy and smoothness.
2. Materials and methods Using the method of Lagrange multipliers, the above uncon-
strained optimization problem can be reformulated as a con-
2.1. The LUCAS spectral library strained one in the dual:
1
N
The LUCAS survey is an effort to build a large and consistent max − (αi − αi∗ )(α j − α ∗j )φ (xi ), φ (xj )
α 2
spatial database of the topsoils across the European Union (EU), i, j=1
based on a single sampling protocol and analysis (spectral and
N
N
chemical) carried out in a single laboratory. Approximately 20,0 0 0 − (αi + α ) + ∗
i yi (αi − α ) s.t.
∗
i
topsoil samples were collected to assess the state of soil across the i=1 i=1
continent in 2009–2012, with more samples collected in 2015 and
2018 [13]. Herein we used the LUCAS 2009 topsoil database, as
N
× (αi − αi∗ ) = 0 and αi , αi∗ ∈ [0, C] (4)
it is the one currently available to the public. In this period, the
i=1
following properties were collected for each sample: particle size
distribution, pH in H2 O, pH in CaCl2 , organic carbon, carbonates, The resulting decision function assumes the following form:
nitrogen, phosphorus, potassium, cationic exchange capacity (CEC)
f (x ) = (αi∗ − αi )φ (xi ), φ (x ) + b (5)
and VNIR–SWIR diffuse reflectance spectrum. The chemical anal-
i∈nSV
yses were performed using traditional soil analysis methods (the
exact description per each property may be found in the LUCAS where nSV is the number of support vectors.
document). The spectra were measured using a spectrometer op- Through the use of the kernel trick [47], the inner product
erating in the 40 0–250 0 nm wavelength range with 2 nm spectral φ (xi ), φ (x) can be computed without the explicit knowledge of
resolution, resulting in a total of 1050 spectral bands for each sam- the non-linear mapping φ ( · ), through an appropriate kernel func-
ple [45]. The whole database is freely available for download for tion:
non commercial purposes1 .
K ( x i , x ) = φ ( x i ), φ ( x ) , K :X ×X →R (6)
2.2. Support vector regression The kernel function implicitly maps the input patterns in a high
dimensional feature space, with the implicit assumption that for a
The support vector machine is a state-of-the-art learning ma- sufficiently high dimensionality, the learning machine will be able
chine employed in a wide variety of modeling applications. Intro- to identify a linear optimizer. The structure of this feature space,
duced in [46] within the context of structural risk minimization and subsequently, the performance of the kernel learning machine
theory, it aims to maximize its generalization ability through solv- is heavily influenced by the choice of the kernel function and its
ing a quadratic optimization problem [26]. corresponding parameters.
Given a sample of training N data D = {(x1 , y1 ), . . . , (xN ,
yN )} ⊂ X × R, where (xi , yi ) denote the pairs of input variables and 2.3. Multiple Kernel Learning
real output, and X denotes the space of input patterns (e.g. X = RM ,
with M being the dimensionality of the input patterns xi ), our goal Selecting the kernel function and its corresponding parameters
is to find a hypothesis f ∈ H, where H is a high-dimensional feature is one of the main concerns in the learning process. Generally, this
space, that generalizes well on new and unseen data. Regularized is achieved by searching for the optimal parameters in the param-
risk minimization returns a minimizer f∗ , eter space, a process that becomes prohibitive if there are other
structural parameters to be optimized. Multiple Kernel Learning
f∗ ∈ min Remp ( f ) + λ( f ) (1) (MKL) [30] offers an elegant alternative, allowing us to make use of
f ∈H multiple kernel functions instead of having to choose a particular
one, along with its parameters.
where Remp ( f ) = N1 l ( f (xi ), yi ) is the empirical risk of the hypoth-
In this paper we solely focus on Multiple Kernel Learning algo-
esis f with respect to a loss function l: R × Y → R, : H → R
rithms that implement a linear combination of P base kernels
is a regularizer, and λ > 0 is a constant parameter controlling the
P
1
Kμ (xi , xj ) = μm Km (xi , xj ) (7)
http://eusoils.jrc.ec.europa.eu/projects/Lucas/data.html.
m=1
30 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41
μ
p = | μm | p p =1 (8) ing l1 -norm. Alignment maximization lends itself to a computa-
m=1 tionally efficient two-stage approach; the optimal combined kernel
is learned in the first stage by 12, and is applied, in the second
controlling its structure; for p = 1, the sparsity inducing l1 -norm stage, in a standard kernel machine such as the SVR.
is implemented, while for p > 1, the corresponding lp -norm gives The definition of alignment (Definitions 2.1 and 2.2) can be re-
rise to increasingly denser mixture weights, approaching the uni- garded as global, in that it involves all the available samples of
form kernel combination as p → ∞. While sparsity is an appeal- the training data in its computation. As such, it may fail to ac-
ing property in many statistical modeling applications, as it always count for any local structure in the data, while also forcing sample
leads to more interpretable models and can additionally be used pairs to be equally aligned to their corresponding target similari-
for implicit feature selection, it has been observed that sparse ker- ties, regardless of their own distance or dissimilarity. To overcome
nel combinations often yield worse performance than their non- these inefficacies of the global alignment, Wang et al. [42] intro-
sparse counterparts. On the other hand, non-sparse kernel com- duced the notion of local alignment, defined on local kernels, and
binations are found to be lacking in model interpretability. This employed it in conjunction with the corresponding global one to
conflicting behaviour is known as the sparsity-accuracy trade-off achieve greater performance.
[48,49], and the ultimate choice for p depends on the particular Definitions and Algorithm. For each sample i = 1, 2, . . . , N and
aim of the corresponding application. each base kernel {K}Pm=1 , a local kernel is defined through the use
of the sample’s k nearest neighbors by the following formula:
(i )
2.3.1. Kernel alignment Km = [Km ( j, l )]k×k , x j , xl ∈ Nk ( xi ) (13)
Kernel alignment is a measure of similarity between two kernel
functions, which captures the degree of agreement between a ker- where Nk (xi ) denotes the neighborhood of the ith sample. Having
nel and a given learning task. In the context of MKL, kernel align- obtained the local kernels, the local kernel alignment between any
ment can serve as the objective function to be maximized by an two kernels can be then computed using a modification of the def-
optimization process, providing the coefficients of the kernel com- inition of its global counterpart:
bination. The definition of kernel alignment, as given by [50,51] is:
1
N
Kc(i) , K c(i) F
ρl = (14)
Definition 2.1 (Kernel Alignment). The (empirical) alignment of a N
i=1
Kc(i)
F
K c(i)
F
kernel k1 , with a kernel k2 is:
This approach in the alignment maximization consists of defining a
K1 , K2 F new quantity, called hybrid alignment, by combining both the local
A ( k1 , k2 ) = (9) and the global definitions. Letting ρ
g denote the global alignment
K1 , K1 F K2 , K2 F
as defined in (10), the hybrid alignment between two kernels can
where Ki is the kernel matrix derived from kernel function Ki . then be obtained by the following convex combination:
i.e.
Kc
F ,
Kc
F
= 0. The centered alignment between them is de- Defining the auxiliary variable τ i below,
fined by:
Pm=1 μm Km(i)
F
K , K τi = (17)
ρ = c c F (10)
Pm=1 μm Km
F
K
K
Proposition 2.1. Cortes et al. [52] Let v∗ be the solution to the fol- one, is solved repeatedly using a standard SVR on the learned com-
lowing quadratic program (QP): bined kernel. The overall solution is achieved by alternating be-
tween optimizing with respect to (w.r.t.) the weights μ and w.r.t.
min vT Mv − 2vT c (20)
v≥0 the remaining variables (structural parameters of SVR).
The basic idea for this approach is that for a given, fixed set of
Then the solution μ∗ of the alignment maximization problem is given
∗ primal variables (w, b), the optimal μ can be calculated analyti-
by μ∗ =
vv∗
m =1
w
H p+1
p
Input: {K p }mp=1
, Y, 0 , k, λ m
Output: μ
The next step is to devise a way to solve the optimization problem
1 Initialize τ 0 and set t = 0
(21) w.r.t. variables (w, b) given fixed kernel mixture coefficients
2 Obtain the neighbor list of each sample.
μ. Omitting the detailed derivation, the resulting dual optimization
3 Calculate Kc and KY for each sample
problem is:
ob j (t−1 ) −ob j (t )
4 while ≤ 0 do
ob j (t )
N α 1
P
Update v(t+1 ) with fixed τ (t ) by (20)
5
max −C l ∗
−
i
, yi − μm αT K m α (23)
v(t+1 )
6 μ(t+1) =
v(t+1)
1
αT 1=0
i=1
C 2
m=1
7 Update τ (t+1 ) by (17) using fixed μ(t+1) Using the KKT conditions at the optimal point, we can derive the
8 Compute ob j (t ) by (12) and (15) following formula for wm :
9 t =t +1
wm
2 = μ2m αT Km α ∀m = 1, . . . , P (24)
Algorithm 3: lp -norm MKL chunking-based algorithm via an- Algorithm 4: Per feature kernel construction.
alytical update. Kernel weighting μ and SVR α are optimized Data: Spectral bands, SOC content:
interleavingly. (s )
x˜ m ∈ RN , m = 1, . . . , M, y ∈ RN
Input: subproblem size Q, accuracy Result: Kernel mixture weights per spectral band:
(s )
1 Initialize: gm,i = gˆi = αi = 0, ∀i, L = S = −∞, hm ∈ RM×L , m = 1, . . . , M
1 for m = 1, . . . , M do
μm = p
P , ∀m
1
= 1, . . . , P (s ) (s ) (s )
2 Sample L values σm = [σm, 1
, . . . , σm,L ] within the 0.1, 0.9
2 while optimality conditions are not met do (s ) (s )
quantile of
xm ( i ) − xm ( j )
, i, j = 1, . . . , N, i
= j.
3 Select Q variables αi,1 , . . . , αi,Q based on the gradient gˆ of
for = 1, . . . , L do
(23) w.r.t. α
3
(s ) (s )
(s )
xm (i )−xm ( j )
2
4 Store α old and then update α according to (23) with 4 Compute base kernel Km, = exp − (s )2 .
σm,
respect to the selected variables
Center kernel according to definition 2.2 and add
5 Update gradients gm.i ← gm,i + Q q=1
(αiq − αiold )km (xiq , xi ), 5
q regularization parameter to diagonal elements.
∀m = 1, . . . , M, i = 1, . . . , N 6 Compute corresponding kernel weight by
6 Compute the quadratic terms (s ) (s )
hm, =ρ
(Km, , KY ).
Sm = 12 i gm,i αi , qm = 2μ2m Sm , ∀m = 1, . . . , M;
(s )
7 Lold = L, L = i yi αi , Sold = S, S = m μm Sm 7 Normalize kernel mixture weights so that
hm
= 1.
8 if |1 − L L−S
−S | ≥ then
old old
1
1+ p
qm
9 μm = p , ∀m = 1, . . . , M
M 1+ p formed through the following function:
q
m =1 m
10 else
M
(s ) (s ) (s ) (s ) (s )
11 break K (x(s) (i ), x(s) ( j )) = dm Km (xm (i ), xm ( j )) (26)
12 gˆi = m μm gm,i , ∀i = 1, . . . , N m=1
L 3.3. MKL at the source combination level
Km(s ) (xm
(s ) (s )
( i ), xm ( j )) = (s )
hm, (s ) (s )
· Km, (s )
( xm ( i ), xm ( j )) (25)
=1 The third level of integration entails the implicit combination of
the complementary information contained by the different spectral
3.2. MKL at the single spectral source level sources, which are calculated from the original spectra using dif-
ferent pre-treatments. The goal is to establish a more robust and
The next step involves the formation of the composite source accurate statistical model.
kernel comprised by the individual feature kernels, by calculating
a feature weight vector d(s) ∈ RM . 3.3.1. Combining spectral sources
As detailed above in the single source level (Eq. (28)) for each
3.2.1. Feature selection source s = 1, . . . , Q a separate source kernel K(s) is defined. These
By promoting the use of sparse solutions for d(s) it is possible source kernels can be combined into an overall spectral kernel
to effectively perform feature selection, that is retain only the most
spc
Kcomb = Q μ K(s) . The kernel mixture weights μ ∈ RQ may be
s=1 s
relevant features with respect to y. This is achieved through the interpreted as the significance of each spectral source towards the
use of the HKAM (Algorithm 1) so that the composite kernel is final kernel, i.e. can act as a source ranking method.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 33
The additional predictors are thus defined in the X(a) ∈ R2 space
σ2(s )
(s ) as follows:
K KY F
6 ρ = (s )
c
(a ) (a ) T
K
F
KY
F
c x(a ) = [xsand , xclay ] (29)
(s )
7 Choose the σ corresponding to max ρ
Textural Kernel Construction. The additional information is not
subject to spectral pre-processing, and hence the constructed ker-
nel is common across all spectral sources. For both Sand and Clay
contents we construct a feature kernel using Algorithm 4. The ad-
ditional textural kernel is then constructed via Algorithm 1, de-
fined as the sum of two composite kernels:
Ka = d1(a) Ksand
(a )
+ d2(a) Kclay
(a )
(30)
where the weighting vector d(a ) = [d1(a ) , d2(a ) ] encodes the relative
importance of the two additional predictors.
Single Source Stage + Additional predictors. First, we consider the
incorporation of the additional kernel to the single source stage.
For each spectral kernel K(s) a combined spectral-textural kernel
(s )+a
Kcomb is calculated through Algorithm 3, with the weighting vector
μ(s) encoding the relative importance of the spectral and textural
information.
Multiple Source Stage + Additional predictors. In the final step we
incorporate the additional textural information into the multiple
source combination kernel to enhance the performance of the re-
sulting model. Here, all spectral source kernels K(s ) , s = 1, . . . , Q
and the textural kernel Ka are treated as base kernels, and used as
inputs in Algorithm 3. The result is an augmented multiple-source
statistical model with a kernel Kcomb and a weighting vector μ ,
spc+a
4. Experimental set-up
Fig. 1. Overview of the three-level MKL approach for spectral source combination
Fig. 2. The different spectral sources considered for the Mineral dataset – depicted are the 5th, 16th, 50th, 84th and 95th percentiles.
sources) each corresponding to a different combination of prepro- performed. At each grid point (k, λ), exploiting the two-stage na-
cessing steps. ture of the HKAM algorithm, we construct the composite (source)
The different spectral sources for the Mineral dataset are de- kernel by maximizing the hybrid alignment (Algorithm 1) of the
(s )
picted in Fig. 2. feature kernels Km with the target kernel KY and proceed to eval-
In order to limit the large dimensionality of the dataset, rather uate the constructed kernel through a 5-fold CV where in each fold
than working with all the 1050 spectral bands of each sample, an SVR problem is solved, with the estimated hyperparameters (C,
we chose to downsample the available spectra in intervals of ). After obtaining the optimal values k, λ and the corresponding
10nm, thus keeping 210 spectral bands. Finally, for each subset, the source kernel K(s) , a second grid search and 5-fold CV is performed,
Conditioned Latin Hypercube Sampling [57] algorithm was imple- this time to identify the optimal values of C and and to obtain a
mented in order to create training (66.6%) and testing (33.3%) sets, final evaluation for our single source model. The whole process is
with the training set of each subset being further split in five folds, detailed in Algorithm 6.
through the use of the Fuzzy c-means algorithm [58].
Algorithm 6: Source Kernel Construction - Single Source
4.2. MKL model calibration - hyperparameter estimation Model Evaluation.
(s )
Data: Feature(base) kernels, SOC content: Km , y ∈ RN
The MKL models described above, rely on the optimization of
Result: Source Kernel, feature ranking vector: K(s ) , d(s )
their respective hyperparameters.
Create 2-D grid of hyperparameters (k, λ )
1
2 C = max |ȳ + 3σy |, |ȳ − 3σy | , = 0.01
4.2.1. Optimizing HKAM
The HKAM method (Algorithm 1) depends on two hyperpa- 3 for each pair (k, λ ) in grid do
rameters, namely, the number of nearest neighbors k used in the 4 Calculate source kernel mixture weights d(s ) by Algorithm
computation of the local kernel alignment, and the regularization 1 with feature kernels as inputs
parameter λ controlling the trade-off between local and global 5 Calculate additional kernel by Algorithm 5
information incorporated in the model. In addition, because the 6 Calculate source kernel by (28)
SVR optimization step is calculated using the final source kernel 7 Run 5-fold CV to evaluate pair (k, λ )
(Eq. (28)), two more hyperparameters need to be optimized: the 8 Save K(s ) , d(s ) corresponding to (k, λ ) with best performance
regularization parameter C and the -tube width, . To alleviate 9 Create 2-D grid of hyperparameters (C, )
the computation cost associated with simultaneous optimization 10 for each pair (C, ) in grid do
of all 4 parameters, we solve two optimization sub-problems each 11 Run 5-fold CV to evaluate pair (C, )
concerned with a pair of parameters, whilst the other two are set.
Evaluate single source model with optimal hyperparameters
To avoid selecting arbitrary values for both C and in the first
12
4.3. Nomenclature
1. The simple SVR algorithm for regression using a Gaussian The algorithm was implemented in the Python programming
kernel. language (Python 2.7) and all the experiments were executed on
2. The Partial Least Squares (PLS) regression algorithm [60], a machine with 48 cores (AMD Opteron, 2.1 GHz) and 32 GBs of
which performs regression in a transformed input space, RAM. In order to speed up execution time, we took effort to paral-
formed by successively selecting orthogonal factors (latent lelize large sections of the code, mainly those involving parameter
variables) maximizing the covariance between predictor and selection through grid search and cross validation. The source com-
the response variable. bination part which was formulated as an lp -norm MKL problem,
3. The Cubist algorithm [61,62], a rule-based model constructed was implemented through the use of the SHOGUN machine learn-
as a tree, whose branches formulate the premise part, while ing toolbox [65].
the leaves contain linear regression models; a boosting-like
scheme (termed committees) and an error correction mech- 5. Experimental results
anism are further employed to enhance the accuracy of pre-
dictions. 5.1. Accuracy results of the proposed approach
4. The Spectrum-Based Learner (SBL) [63] which uses memory-
based learning and builds a Gaussian Process Regression The results of the proposed MKL framework for the predic-
model for each unknown testing sample using its optimal tion of logSOC across the mineral soil datasets of the LUCAS SSL
spectral neighbors. are presented in Table 1. The effect of the different spectral pre-
treatments is evident as the more accurate models were developed
Unlike the proposed framework, none of the above detailed al-
from the spectral derivatives. Additionally, the positive effect of the
gorithms have been previously examined using multiple spectral
combination of the spectral sources as well as of the incorporation
sources, because this integration is not an easy task for them due
of the additional predictors can be identified. This positive influ-
to the considerable expansion of the feature space. This is pre-
ence is illustrated in Fig. 3, where the absolute differences in R2
sented here for the first time and is the cornerstone of the pro-
for the Mineral dataset are visualized. For example, the MKL-MS
posed MKL framework. Therefore, the comparisons were made us-
model which combines the different spectral sources, outperforms
ing as predictors the single spectral sources without and with the
all MKL-S models utilizing its constituent kernels. What is more,
use of auxiliary predictors (i.e. the textural information).
all models benefit from the usage of the additional predictors. The
least impact is found in spectral sources which attained signifi-
4.5. Performance metrics cant results using the spectral information alone, but even in those
cases the relative increase in R2 is about 14%. Overall, as expected,
The models were validated on the independent test set using the best results are derived from the MKL-MSa model, which takes
the following metrics:(i) the Root Mean Squared Error (RMSE), (ii) advantage of all possible spectral sources and additional predictors.
the coefficient of determination R2 , and (iii) the ratio of perfor- The model attained a performance of RPIQ 5.17, a notable result.
mance to interquartile range (RPIQ) [64].
R2 quantifies the degree of any linear correlation between the 5.2. Model interpretation and discussion
observed and the model predicted output; it usually ranges from 0
to 1 (higher is better) and is calculated thusly: This section will examine the interpretation capacities of the
proposed approach across the two last levels of integration, and
N
(yi − yˆi )2 specifically: (i)its feature selection capabilities within each spectral
R2 (y, yˆ ) = 1 − iN=1 (32) source, and (ii) the importance of each constituent kernel in the
i=1 (yi − ȳ )
2
developed combined models.
with yˆi being the prediction for the ith pattern.
RPIQ on the other hand takes both the prediction error and the 5.2.1. Feature importance
variation of observed values into account, without making assump- The sparse feature weight vectors d(s) depict the contribution
tions about the distribution of the observed values. It is defined as of each feature kernel towards the source kernel, and may there-
the interquartile range of the observed values divided by the RMSE upon be used to identify the relative importance of the selected
of prediction, i.e. RPIQ=IQR/RMSE. few wavelengths. These weights are illustrated per each spectral
36 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41
Table 1
Performance metrics on all four datasets using the proposed framework.
MKL-S
R 0.57 0.20 1.99 0.58 0.23 1.99 0.54 0.16 1.69 0.57 0.24 1.79
CR 0.56 0.20 1.96 0.63 0.22 2.14 0.50 0.17 1.62 0.66 0.20 2.17
Abs-SG0 0.60 0.19 2.07 0.63 0.22 2.14 0.55 0.16 1.72 0.62 0.23 1.83
Abs-SG0-SNV 0.70 0.17 2.37 0.66 0.21 2.22 0.67 0.14 2.01 0.78 0.16 2.69
Abs-SG1 0.75 0.15 2.59 0.76 0.18 2.62 0.73 0.13 2.22 0.80 0.15 2.83
Abs-SG1-SNV 0.71 0.17 2.43 0.73 0.19 2.48 0.67 0.14 2.01 0.79 0.16 2.72
MKL-MS 0.82 0.13 3.06 0.81 0.16 2.97 0.79 0.11 2.49 0.86 0.13 3.39
a
MKL-S
R 0.76 0.15 2.64 0.76 0.18 2.65 0.64 0.14 1.92 0.67 0.19 2.20
CR 0.86 0.12 3.45 0.86 0.14 3.41 0.80 0.11 2.55 0.87 0.13 3.42
Abs-SG0 0.78 0.15 2.76 0.77 0.17 2.70 0.64 0.14 1.92 0.69 0.19 2.26
Abs-SG0-SNV 0.86 0.11 3.52 0.79 0.16 2.84 0.82 0.10 2.70 0.88 0.12 3.65
Abs-SG1 0.91 0.09 4.34 0.89 0.12 4.00 0.85 0.09 3.01 0.91 0.10 4.20
Abs-SG1-SNV 0.90 0.10 4.16 0.89 0.12 3.98 0.85 0.09 3.01 0.92 0.10 4.46
MKL-MSa 0.93 0.08 5.04 0.92 0.10 4.59 0.89 0.08 3.47 0.94 0.08 5.17
Fig. 5. Relative importance between spectral and textural kernels for the MKL-Sa
models and the Mineral dataset.
Table 2
Comparison among the competing methodologies using as predictors (a) the best single spectral source, and b) the best single spectral source + the auxiliary predictors
(particle size distribution); the best spectral source is identified by each methodology as the one attaining the maximum accuracy.
Source R2 RMSE RPIQ Source R2 RMSE RPIQ Source R2 RMSE RPIQ Source R2 RMSE RPIQ
Table 3
Time efficiency (mean runtime) of the proposed approach compared to the competing methodologies - the time format is in hh:mm:ss. For multiple-source
models the time reported refers to the development of all spectral sources plus their subsequent combination.
outperform the best models from Cubist and SBL alike across all form of performing sparse wavelength selection at the source level,
datasets, albeit by incurring a higher computational cost. For ex- and by shedding light into which of the heterogeneous sources are
ample, in the Grassland dataset, the MKL-MS approach achieved a more important than the others.
performance of RPIQ 3.06 and R2 0.82 compared to the second- Because this approach performs feature selection, and uses ap-
best model (SBL) whose accuracy was RPIQ 2.92 and R2 0.80; the proximately 30% of the available features, it understandably per-
MKL-MSa approach in the same dataset attained an accuracy of forms slightly worse than the next best models (namely Cubist
RPIQ 5.04 and R2 0.93, whilst the second-best (Cubist) was at RPIQ and SBL) that use all available wavelengths, when only single spec-
4.51 and R2 0.91. tral sources are used. However, the strength of the proposed ap-
At the same time, it should be noted that both Cubist and proach lies in the combination of spectral sources, which the other
SBL are not as interpretable as the MKL approach. Although Cu- models cannot effectively do. Whereas other approaches neglect
bist can potentially perform feature selection and produce simple the complementary information contained within different spectral
linear regression models in the form of rules, and thus provide in- sources, and only use the single best spectral source, this frame-
terpretable results, in reality it uses all available spectral features work can effortlessly integrate it and thus predict the target prop-
in the consequent part. Moreover, its good performance is mostly erty more accurately by taking advantage of all sources. As demon-
due to the use of committees (i.e. ensemble models) and of the strated, when all spectral sources were accounted for, the accuracy
error-correction mechanism, modules that further jeopardize its in- across the four data subsets compared to the single best source
terpretability. As far as the SBL is concerned, it performs well due was increased by RPIQ ≈ 15% and R2 ≈ 8%. In this case, the MKL
to its local nature; for each testing pattern a separate model is con- approach outperforms its counterparts.
structed using its spectral neighbors. Naturally, this does not allow At the same time, due to its kernel combination ability, the in-
for any interpretation of the models. tegration of auxiliary predictors and other heterogeneous sources
is also straightforward. We tested this ability by using the textu-
6. Discussion ral information of the soil samples, which indubitably enhanced
the accuracy of prediction. Across the single spectral sources, an
The novel three-level MKL approach presented herein has increase of RPIQ ≈ 44% and R2 ≈ 25%, proving that the use of het-
demonstrated its capacity to be successfully applied in soil spec- erogeneous was particularly beneficial to the model.
troscopy, as evidenced by its application in the LUCAS SSL and the Altogether, the best results were attained when all different
comparison with other state-of-the-art algorithms. This proposed sources, i.e. all spectral sources and the textural information,
framework produced the most accurate predictions, whilst at the were combined; the model then outperformed all other compet-
same time maintained a fair interpretability degree, mainly in the ing methodologies and attained noteworthy results. In particular,
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 39
in the largest Mineral dataset (containing all mineral soil samples [2] J.P. Scharlemann, E.V. Tanner, R. Hiederer, V. Kapos, Global soil carbon: under-
irrespective of their land use) the performance was RPIQ 5.17 and standing and managing the largest terrestrial carbon pool, Carbon Manag. 5
(1) (2014) 81–91, doi:10.4155/cmt.13.77.
R2 0.94. [3] H. Blanco-Canqui, R. Lal, Mechanisms of carbon sequestration in soil ag-
Compared to the current state-of-the-art the MKL approaches gregates, Critical Rev. Plant Sci. 23 (6) (2004) 481–504, doi:10.1080/
are considerably more time-consuming. This is due to the calcula- 07352680490886842.
[4] J. Baldock, J. Skjemstad, Role of the soil matrix and minerals in protecting
tion of multiple kernel matrices whose complexity is O (N 2 M ) and natural organic materials against biological attack, Organic Geochem. 31 (7-8)
of solving multiple quadratic optimization problems whose com- (20 0 0) 697–710, doi:10.1016/S0146-6380(0 0)0 0 049-8.
plexity is O (N 3 ). Therefore, SVR-based models do not scale well [5] B. Stenberg, R.A. Viscarra Rossel, A.M. Mouazen, J. Wetterlind, Visible and near
infrared spectroscopy in soil science, Adv. Agron. 107 (10) (2010) 163–215,
to the number of training patterns. However, the task at hand is
doi:10.1016/S0065-2113(10)07005-7.
not time critical and the model construction phase happens offline. [6] J.M. Soriano-Disla, L.J. Janik, R.a. Viscarra Rossel, L.M. Macdonald,
Moreover, the LUCAS SSL is the largest one to date and develop- M.J. McLaughlin, The performance of visible, near-, and mid-infrared
reflectance spectroscopy for prediction of soil physical, chemical, and
ing such libraries is a laborious multi-year effort that is costly in
biological properties, Appl. Spectrosc. Rev. 49 (2) (2014) 139–186,
time and resource. Consequently, the large training time is not a doi:10.1080/05704928.2013.811081.
limiting factor, as the interest lies in developing the most accurate [7] M. Nocita, A. Stevens, B. van Wesemael, M. Aitkenhead, M. Bachmann,
model possible. B. Barthès, E. Ben Dor, D.J. Brown, M. Clairotte, A. Csorba, P. Dardenne, J.A. De-
mattê, V. Genot, C. Guerrero, M. Knadel, L. Montanarella, C. Noon, L. Ramirez-
It should further be noted, that the same novel framework de- Lopez, J. Robertson, H. Sakai, J.M. Soriano-Disla, K.D. Shepherd, B. Stenberg,
scribed herein may be applied to combine other heterogeneous E.K. Towett, R. Vargas, J. Wetterlind, Soil spectroscopy: an alternative to wet
sources in a similar fashion. For example, spectra from the VNIR– chemistry for soil monitoring, Adv. Agron. 132 (2015) 139–159, doi:10.1016/bs.
agron.2015.02.002.
SWIR and the medium infrared (MIR) range where some of the [8] E. Ben-Dor, Quantitative remote sensing of soil properties, Adv. Agron. 75 (July)
fundamental vibrations take place, as detailed in [6], may be ap- (2002) 173–243, doi:10.1016/S0065- 2113(02)75005- 0.
propriately combined to yield better performance than the one at- [9] R.A. Rossel, T. Behrens, Using data mining to model and interpret soil diffuse
reflectance spectra, Geoderma 158 (1-2) (2010) 46–54, doi:10.1016/j.geoderma.
tained when the individual sources are used. Other pools of spec- 2009.12.025.
tral pre-treatments may be also used, which can potentially in- [10] Z. Shi, W. Ji, R.A. Viscarra Rossel, S. Chen, Y. Zhou, Prediction of soil organic
clude additional complementary information. matter using a spatially constrained local partial least squares regression and
the Chinese vis-NIR spectral library, Eur. J. Soil Sci. 66 (4) (2015) 679–687,
doi:10.1111/ejss.12272.
7. Conclusions [11] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, An evolutionary fuzzy rule-based
system applied to real-world Big Data - the GEO-CRADLE and LUCAS soil spec-
The proposed three-level MKL framework successively uses ker- tral libraries, in: Proceedings of the IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE), IEEE, 2018, pp. 1–8, doi:10.1109/FUZZ-IEEE.2018.8491489.
nel combinations at three different levels to efficiently combine
[12] R. Viscarra Rossel, T. Behrens, E. Ben-Dor, D. Brown, J. Demattê, K. Shep-
the information from the constituent kernels. It is able to per- herd, Z. Shi, B. Stenberg, A. Stevens, V. Adamchuk, H. Aïchi, B. Barthès,
form sparse feature selection, which can aid in the interpretation H. Bartholomeus, A. Bayer, M. Bernoux, K. Böttcher, L. Brodský, C. Du, A. Chap-
pell, Y. Fouad, V. Genot, C. Gomez, S. Grunwald, A. Gubler, C. Guerrero, C. Hed-
of the underlying processes. Moreover, it is the first model pre-
ley, M. Knadel, H. Morrás, M. Nocita, L. Ramirez-Lopez, P. Roudier, E.R. Cam-
sented that can readily combine the information present within pos, P. Sanborn, V. Sellitto, K. Sudduth, B. Rawlins, C. Walter, L. Winowiecki,
different spectral sources, originating from different spectral pre- S. Hong, W. Ji, A global spectral library to characterize the world’s soil, Earth-
treatments, at the learning stage. Finally, the use of heterogeneous Sci. Rev. 155 (February) (2016) 198–230, doi:10.1016/j.earscirev.2016.01.012.
[13] A. Orgiazzi, C. Ballabio, P. Panagos, A. Jones, O. Fernández-Ugalde, LUCAS Soil,
sources is supported, by incorporating auxiliary predictors to en- the largest expandable soil dataset for Europe: a review, Eur. J. Soil Sci. 69 (1)
hance the performance. The proposed framework may thus be ap- (2018) 140–153, doi:10.1111/ejss.12499.
propriately used in soil spectroscopy to derive more interpretable [14] M. Nocita, A. Stevens, G. Toth, P. Panagos, B. van Wesemael, L. Montanarella,
Prediction of soil organic carbon content by diffuse reflectance spectroscopy
and accurate models than the current state-of-the-art. using a local partial least square regression approach, Soil Biol. Biochem. 68
(2014) 337–347, doi:10.1016/j.soilbio.2013.10.022.
Declaration of Competing Interest [15] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, A fuzzy rule-based system uti-
lizing differential evolution with an application in vis-NIR soil spectroscopy,
Proceedings of the IEEE International Conference on Fuzzy Systems(2017).
The authors declare that they have no known competing finan- doi:10.1109/FUZZ-IEEE.2017.8015563.
cial interests or personal relationships that could have appeared to [16] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, DECO3RUM: A Differential Evolution
influence the work reported in this paper. learning approach for generating compact Mamdani fuzzy rule-based models,
Expert Syst. Appl. 83 (2017) 257–272, doi:10.1016/j.eswa.2017.04.026.
[17] N. Carmon, E. Ben-Dor, An advanced analytical approach for spectral-based
CRediT authorship contribution statement modelling of soil properties, Int. J. Emerg. Technol. Adv. Eng. 7 (2017) 90–
97.
Nikolaos L. Tsakiridis: Methodology, Formal analysis, Inves- [18] N.L. Tsakiridis, J.B. Theocharis, P. Panagos, G.C. Zalidis, An evolutionary fuzzy
rule-based system applied to the prediction of soil organic carbon from soil
tigation, Writing - original draft, Writing - review & edit- spectral libraries, Appl. Soft Comput. 81 (2019) 105504, doi:10.1016/j.asoc.2019.
ing. Christos G. Chadoulos: Methodology, Software, Investigation, 105504.
Writing - original draft. John B. Theocharis: Conceptualization, [19] Å. Rinnan, F. van den Berg, S.B. Engelsen, Review of the most common pre-
processing techniques for near-infrared spectra, TrAC Trends Anal. Chem. 28
Methodology, Resources, Supervision. Eyal Ben-Dor: Validation, In-
(10) (2009) 1201–1222, doi:10.1016/j.trac.20 09.07.0 07.
vestigation. George C. Zalidis: Project administration. [20] N.L. Tsakiridis, N.V. Tziolas, J.B. Theocharis, G.C. Zalidis, A GA-based stacking
algorithm for predicting soil organic matter from vis-NIR spectral data, Eur. J.
Acknowledgment Soil Sci. (2018), doi:10.1111/ejss.12760.
[21] N. Tziolas, N. Tsakiridis, E. Ben-Dor, J. Theocharis, G. Zalidis, A memory-based
learning approach utilizing combined spectral sources and geographical prox-
This research has been co-financed by the European Re- imity for improved VIS-NIR-SWIR soil properties estimation, Geoderma 340
gional Development Fund of the European Union and Greek na- (2019) 11–24, doi:10.1016/j.geoderma.2018.12.044.
tional funds through the Operational Program Competitiveness, En- [22] N.L. Tsakiridis, J.B. Theocharis, E. Ben-Dor, G.C. Zalidis, Using interpretable
fuzzy rule-based models for the estimation of soil organic carbon from
trepreneurship and Innovation, under the call RESEARCH - CREATE VNIR/SWIR spectra and soil texture, Chemometr. Intell. Laborat. Syst. 189
- INNOVATE (project code: T1EDK-02296). (2019) 39–55, doi:10.1016/j.chemolab.2019.03.011.
[23] A. Gholizadeh, N. Carmon, A. Klement, E. Ben-Dor, L. Borøuvka, Agricultural
References soil spectral response and properties assessment: effects of measurement pro-
tocol and data mining technique, Remote Sens. 9 (10) (2017) 1078, doi:10.
[1] FAO, ITPS, Status of the World’s Soil Resources (SWSR)- 3390/rs9101078.
Main Report., 2015. http://www.fao.org/documents/card/en/c/
c6814873- efc3- 41db- b7d3- 2081a10ede50/.
40 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41
[24] A. Gholizadeh, M. Saberioon, N. Carmon, L. Boruvka, E. Ben-Dor, Examining the [57] B. Minasny, A.B. McBratney, A conditioned Latin hypercube method for sam-
Performance of PARACUDA-II data-mining engine versus selected techniques to pling in the presence of ancillary information, Comput. Geosci. 32 (9) (2006)
model soil carbon from reflectance spectra, Remote Sens. 10 (8) (2018) 1172, 1378–1388, doi:10.1016/j.cageo.20 05.12.0 09.
doi:10.3390/rs10081172. [58] J.C. Bezdek, R. Ehrlich, W. Full, FCM: The fuzzy c-means clustering algorithm,
[25] D.J. Brown, K.D. Shepherd, M.G. Walsh, M. Dewayne Mays, T.G. Reinsch, Global Comput. Geosci. 10 (2-3) (1984) 191–203, doi:10.1016/0 098-30 04(84)90 020-7.
soil characterization with VNIR diffuse reflectance spectroscopy, Geoderma 132 [59] V. Cherkassky, Y. Ma, Practical selection of SVM parameters and noise esti-
(3-4) (2006) 273–290, doi:10.1016/j.geoderma.2005.04.025. mation for SVM regression, Neural Netw. 17 (1) (2004) 113–126, doi:10.1016/
[26] V. Vapnik, Principles of risk minimization for learning theory, Adv. Neural Inf. S0893- 6080(03)00169- 2.
Process. Syst. (1992) 831–838. [60] S. Wold, H. Martens, H. Wold, The multivariate calibration problem in chem-
[27] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector istry solved by the PLS method, Matrix pencils (1981) (1983) 286–293, doi:10.
regression machines, in: Proceedings of the 9th International Conference on 10 07/BFb0 062108.
Neural Information Processing Systems, in: NIPS’96, MIT Press, Cambridge, MA, [61] J.R. Quinlan, Learning with continuous classes, Mach. Learn. 92 (1992)
USA, 1996, pp. 155–161. 343–348.
[28] C. Williams, C.E. Rasmussen, Gaussian processes for regression, Adv. Neural Inf. [62] J.R. Quinlan, Combining Instance-Based and Model-Based Learning, Mach.
Process. Syst. 8 (1996). Learn. 76 (1993) 236–243.
[29] F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple Kernel Learning, conic dual- [63] L. Ramirez-Lopez, T. Behrens, K. Schmidt, A. Stevens, J.A.M. Demattê,
ity, and the SMO algorithm, in: Proceedings of the Twenty-first International T. Scholten, The spectrum-based learner: A new local approach for modeling
Conference on Machine Learning - ICML ’04, ACM Press, New York, New York, soil vis-NIR spectra of complex datasets, Geoderma 195-196 (2013) 268–279,
USA, 2004, p. 6, doi:10.1145/1015330.1015424. doi:10.1016/j.geoderma.2012.12.014.
[30] M. Gönen, E. Alpaydin, Multiple Kernel Learning Algorithms, J. Mach. Learn. [64] V. Bellon-Maurel, E. Fernandez-Ahumada, B. Palagos, J.-M. Roger, A. McBratney,
Res. 12 (2011) 2211–2268. Critical review of chemometric indicators commonly used for assessing the
[31] A. Jain, S.V.N. Vishwanathan, M. Varma, SPG-GMKL: generalized Multiple Ker- quality of the prediction of soil attributes by NIR spectroscopy, TrAC Trends
nel Learning with a million kernels, Proceedings of the 18th ACM SIGKDD In- Anal. Chem. 29 (9) (2010) 1073–1081, doi:10.1016/j.trac.2010.05.006.
ternational Conference on Knowledge Discovery and Data Mining (KDD)(2012) [65] S. Sonnenburg, H. Strathmann, S. Lisitsyn, V. Gal, F.J.I. García, W. Lin, S. De, C.
750–758. doi:10.1145/2339530.2339648. Zhang, Frx, Tklein23, E. Andreev, JonasBehr, Sploving, P. Mazumdar, C. Widmer,
[32] M. Gönen, E. Alpaydin, Localized algorithms for multiple kernel learning, Pat- P.D. Zora, G.D. Toni, S. Mahindre, A. Kislay, K. Hughes, R. Votyakov, Khalednasr,
tern Recogn. 46 (3) (2013) 795–807, doi:10.1016/j.patcog.2012.09.002. S. Sharma, A. Novik, A. Panda, E. Anagnostopoulos, L. Pang, A. Binder, Serial-
[33] J. Moeller, S. Swaminathan, S. Venkatasubramanian, J. Moeller, S. Swaminathan, hex, B. Esser, shogun-toolbox/shogun: Shogun 6.1.0, 2017, doi:10.5281/zenodo.
S. Venkatasubramanian, J. Moeller, S. Swaminathan, S. Venkatasubramanian, 1067840.
J. Moeller, S. Swaminathan, S. Venkatasubramanian, A unified view of localized [66] Z. Shi, Q.L. Wang, J. Peng, W.J. Ji, H.J. Liu, X. Li, R.a. Viscarra Rossel, Devel-
kernel learning, in: Proceedings of the 2016 SIAM International Conference on opment of a national VNIR soil-spectral library for soil classification and pre-
Data Mining, 2016, pp. 252–260, doi:10.1137/1.9781611974348.29. diction of organic matter concentrations, Sci. China Earth Sci. 57 (7) (2014)
[34] J. Kandola, J. Shawe-Taylor, N. Cristianini, Optimizing kernel alignment over 1671–1680, doi:10.1007/s11430- 013- 4808- x.
combination of kernels, Adv. Neural Inf. Process. Syst. (NIPS) (2002). [67] E. Ben-Dor, Y. Inbar, Y. Chen, The reflectance spectra of organic matter in the
[35] T. Wang, D. Zhao, S. Tian, An overview of kernel alignment and its applications, visible near-infrared and short wave infrared region (40 0-250 0 nm) during a
Artifi. Intell. Rev. 43 (2) (2012) 179–192, doi:10.1007/s10462- 012- 9369- 4. controlled decomposition process, Remote Sens. Environ. 61 (1) (1997) 1–15,
[36] J. Bao, Y. Chen, L. Yu, C. Chen, A multi-scale kernel learning method and its doi:10.1016/S0 034-4257(96)0 0120-4.
application in image classification, Neurocomputing 257 (2017) 16–23, doi:10. [68] A. Stevens, M. Nocita, G. Tóth, L. Montanarella, B. van Wesemael, Prediction
1016/j.neucom.2016.11.069. of Soil Organic Carbon at the European Scale by Visible and Near InfraRed
[37] Y. Gu, Q. Wang, X. Jia, J.A. Benediktsson, A Novel MKL model of integrating Reflectance Spectroscopy, PLoS ONE 8 (6) (2013) e66409, doi:10.1371/journal.
LiDAR data and MSI for Urban Area Classification, IEEE Trans. Geosci. Remote pone.0066409.
Sens. 53 (10) (2015) 5312–5326, doi:10.1109/TGRS.2015.2421051. [69] E.T. Elliott, Aggregate Structure and Carbon, Nitrogen, and Phosphorus in Na-
[38] X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial tive and Cultivated Soils, Soil Sci. Soc. Am. J. 50 (3) (1986) 627, doi:10.2136/
distress prediction of Chinese listed companies, Neurocomputing 177 (2016) sssaj1986.036159950 050 0 0 030 017x.
636–642, doi:10.1016/j.neucom.2015.11.078.
[39] Z. Zheng, H. Sun, G. Zhang, Multiple kernel locality-constrained collaborative Nikolaos L. Tsakiridis received the B.S. and M.S. degrees
representation-based discriminant projection for face recognition, Neurocom- in electrical and computer engineering from the Aristotle
puting 318 (2018) 65–74, doi:10.1016/j.neucom.2018.08.032. University of Thessaloniki, Thessaloniki, Greece, in 2014.
[40] Yi-Ren Yeh, Ting-Chu Lin, Yung-Yu Chung, Y.-C.F. Wang, A novel multiple ker- He is currently pursuing the Ph.D. degree at the Depart-
nel learning framework for heterogeneous feature fusion and variable selec- ment of Electrical and Computer Engineering, Aristotle
tion, IEEE Trans. Multimed. 14 (3) (2012) 563–574, doi:10.1109/TMM.2012. University of Thessaloniki. His research interests include
2188783. fuzzy systems, evolutionary algorithms, soil spectroscopy,
[41] Y. Ding, J. Tang, F. Guo, Identification of drug-side effect association via multi- remote sensing, and big data analysis.
ple information integration with centered kernel alignment, Neurocomputing
325 (2019) 211–224, doi:10.1016/j.neucom.2018.10.028.
[42] Y. Wang, X. Liu, Y. Dou, Q. Lv, Y. Lu, Multiple kernel learning with hybrid ker-
nel alignment maximization, Pattern Recogn. 70 (2017) 104–111, doi:10.1016/j.
patcog.2017.05.005.
[43] R. Tomioka, T. Suzuki, Sparsity-accuracy trade-off in MKL(2010) 3–10.
Christos G. Chadoulos received the B.S. and M.S. degrees
[44] M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien, lp-norm multiple kernel learning,
in electrical and computer engineering from the Aristotle
J. Mach. Learn. Res. 12 (2011) 953–997.
University of Thessaloniki, Thessaloniki, Greece, in 2017.
[45] G. Tóth, A. Jones, L. Montanarella, LUCAS Topsoil Survey: Methodology, Data,
He is currently pursuing the Ph.D. degree at the Depart-
and Results, EU publications, 2013, doi:10.2788/97922.
ment of Electrical and Computer Engineering, Aristotle
[46] V.N. Vapnik, Statistical Learning Theory, 1st ed., Wiley, New York, New York,
University of Thessaloniki. His research interests include
USA, 1998.
computer vision, image analysis, and machine learning.
[47] B. Schölkopf, Learning with kernels, J. Electrochem. Soc. 129 (November)
(2002) 2865, doi:10.1198/jasa.2003.s269.
[48] C.S. Ong, A. Smola, B. Williamson, Learning the Kernel with Hyperkernels, J.
Mach. Learn. Res. 6 (2005) 1043–1071.
[49] J. Aflalo, a. Ben-Tal, C. Bhattacharyya, J.S. Nath, S. Raman, Variable Sparsity Ker-
nel Learning, J. Mach. Learn. Res. 12 (2011) 565–592.
[50] N. Cristianini, J. Kandola, A. Elisseeff, J. Shawe-Taylor, On kernel-target align-
ment, Adv. Neural Inf. Process. Syst. 14 (2002) 367–373. doi:10.1.1.23.6757. John B. Theocharis (M’90) received the degree in elec-
[51] J. Kandola, J. Shawe-Taylor, N. Cristianini, On the Extensions of Kernel Align- trical engineering and the Ph.D. degree from the Aristotle
ment, Technical Report, 2002. University of Thessaloniki, Thessaloniki, Greece, in 1980
[52] C. Cortes, M. Mohri, A. Rostamizadeh, Algorithms for Learning Kernels Based and 1985, respectively. He is currently a Professor in the
on Centered Alignment, J. Mach. Learn. Res. 13 (2012) 795–828. Department of Electrical and Computer Engineering, Aris-
[53] T. Joachims, Advances in Kernel Methods, MIT Press, Cambridge, MA, USA, totle University of Thessaloniki. His research activities in-
1999, pp. 169–184. clude fuzzy systems, neural networks, evolutionary algo-
[54] Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, Bernhard Schölkopf, Large rithms, pattern recognition and image analysis. He has
Scale Multiple Kernel Learning, J. Mach. Learn. Res. 7 (2006) 1531–1565. published numerous papers in several application areas
[55] A. Karatzoglou, A. Smola, K. Hornik, A. Zeileis, kernlab – an S4 package for such as neuro-fuzzy modeling, power demand and wind
kernel methods in R, J. Stat. Softw. 11 (9) (2004) 1–20. speed prediction, land cover classification and segmenta-
[56] IUSS Working Group WRB, World reference base for soil resources 2014. In- tion from remotely sensed images. Recently his research
ternational soil classification system for naming soils and creating legends for is focused on addressing challenges in soil-spectroscopy
soil maps, 2014, doi:10.1017/S0014479706394902. and medical imaging using machine learning and deep learning techniques.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 41
Eyal Ben-Dor received the M.Sc. and Ph.D degrees in George C. Zalidis received the B.S. degree in agriculture
Soil Science from the Faculty of Agriculture, the Hebrew from Aristotle University of Thessaloniki, Thessaloniki,
University of Jerusalem in 1986 and 1992 respectively. Greece, in 1980, and the Ph.D. degree in soil physics from
Currently he is serving as the chair of the Geography Michigan State University, East Lansing, MI, USA in 1987.
Department of Tel Aviv University and the head of the Currently, he is a Professor of Soil Pollution and Degrada-
Remote Sensing Laboratory (RSL) at this department. His tion with the Laboratory of Remote Sensing, Spectroscopy,
researches are focused on monitoring the earth from and Geographic Information systems, in the Faculty of
space and air as well as on developing innovative tools Agronomy, of the Aristotle University of Thessaloniki. His
to monitor soils and minerals from all domains. He was research interests include soil quality and sustainability,
a pioneer scientist who opened up in the last decade of bio-remediation of degraded areas, restoration and reha-
the 20th century the field of soil proximal sensing using bilitation of wetland ecosystems, wetland inventory, and
spectral information in the reflective spectral domain. He mapping.
has more than 27 years’ experience in remote sensing of
the Earth with a special emphasis on the hyperspectral remote sensing (HSR), soil
spectroscopy (passive and active) and environmental issues. He developed many
quantitative applications for monitoring of soils from reflectance information and
is the owner of 4 patents in this field.