Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Neurocomputing 389 (2020) 27–41

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A three-level Multiple-Kernel Learning approach for soil spectral


analysis
Nikolaos L. Tsakiridis a,∗, Christos G. Chadoulos a, John B. Theocharis a, Eyal Ben-Dor b,
George C. Zalidis c
a
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
b
Department of Geography, School of Earth Science, Tel Aviv University, Israel
c
Faculty of Agriculture, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece

a r t i c l e i n f o a b s t r a c t

Article history: To ensure the sustainability of the soil ecosystem, which is the basis for food production, efficient large-
Received 13 March 2019 scale baseline predictions and trend assessments of key soil properties are necessary. In that regard,
Revised 22 November 2019
visible, near-infrared, and shortwave infrared (VNIR–SWIR) spectroscopy can provide an alternative for
Accepted 7 January 2020
the expensive wet chemistry. In this paper, we examined the application of the Multiple-Kernel Learn-
Available online 11 January 2020
ing (MKL) approach to soil spectroscopy by integrating the information from heterogeneous features. In
Communicated by Dr. Ivor Tsang particular, the proposed three-level MKL framework acts in the following way: at the first level, it uses
multiple kernels at each spectral feature (wavelength) to maximize the information of each band. At the
Keywords:
Multiple Kernel Learning (MKL) second level, it performs implicit feature selection at the spectral source level, enabling it to provide in-
Kernel alignment terpretable results. Finally, at the third level of integration it combines the complementary information
Heterogeneous source combination contained within a pool of spectral sources, each derived from its own set of pre-processing techniques.
Soil organic carbon Additionally, at this stage, the proposed approach is also capable of fusing heterogeneous sources of in-
Soil texture formation, such as auxiliary predictors, which can assist the spectral predictions. The experimental anal-
VNIR–SWIR spectroscopy ysis was conducted using the pan-European LUCAS (Land Use/Cover Area frame statistical Survey) topsoil
database, with a goal to predict from the VNIR–SWIR spectra the concentration of soil organic carbon
(SOC), a key indicator for agricultural productivity and environmental resilience. The particle size distri-
bution which describes the soil texture was selected as the set of auxiliary predictors. The proposed MKL
framework was compared with other state-of-the-art approaches, and the results indicated that it attains
the best performance in terms of accuracy, whilst at the same time producing interpretable results.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction production by being a key driver of soil fertility, buffering against


climate change, regulating water availability, and more [3]. SOC is
One of the most vulnerable resources on the planet is the soil the measurable component of SOM, which has an important role in
ecosystem, which is imperilled by climate change, land degrada- the physical, chemical and biological function of agricultural soils
tion, and biodiversity loss [1]. Soils constitute the largest terres- by contributing to nutrient retention and turnover, soil structure,
trial carbon pool [2], with more carbon residing in soil than in carbon sequestration and soil resilience. It can exist in a variety
the atmosphere and all plant life combined. Soil organic matter of pools of differing decomposition rates and turnover times, rang-
(SOM) comprises a complex mixture ranging from partially decom- ing from freshly deposited plant residues to organic carbon con-
posed organic substances from plant litter, to faunal and microbial tained within complex and stable molecular structures or bound
biomass. It is a key element of soil health considering that it regu- in soil aggregates, which may reside for hundreds of years undis-
lates a cornucopia of soil functions and ecosystem services includ- turbed [4].
ing: carbon storage as soil organic carbon (SOC), supporting food The protection and sustainable management of the soil re-
sources on the planet requires accurate large-scale baseline and
trend assessments to determine and monitor the SOC quality and

Corresponding author.
quantity allover the world. To this end, visible, near-infrared, and
E-mail addresses: tsakirin@ece.auth.gr (N.L. Tsakiridis), christgc@ece.auth.gr (C.G.
Chadoulos), theochar@eng.auth.gr (J.B. Theocharis), bendor@post.tau.ac.il (E. Ben-
shortwave-infrared 350–2500 nm spectroscopy (abbreviated to as
Dor), zalidis@agro.auth.gr (G. C. Zalidis). VIS–NIR–SWIR or VNIR–SWIR) has demonstrated its capacity to

https://doi.org/10.1016/j.neucom.2020.01.008
0925-2312/© 2020 Elsevier B.V. All rights reserved.
28 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

produce accurate estimates for key physical and chemical soil significant attention lately, due to its ability to represent or dis-
properties [5–7]. Compared to traditional wet chemistry, it is a criminate between data using multiple base kernels in a more
faster and more cost-effective solution. Soil is a significantly com- efficient way. MKL can address the aforementioned shortcomings
plex material extremely variable in physical and chemical compo- by: i) performing feature selection by combining and selecting
sition comprised of all three phases of matter: solid (mixture of the most appropriate feature kernels, and ii) combining hetero-
inorganic and organic matter in a concoction of primary and sec- geneous sources of information for learning the decision function
ondary minerals, organic components, and salts), liquid (water and by constructing optimal base kernels for each source. The two
dissolved anions), and gas. Due to this inherent complexity, the re- most important aspects of most MKL methods are the selection
flectance spectrum of a given soil sample is influenced by a num- of the kernels and how they contribute towards the final kernel.
ber of chromophores, which are parameters or substances (chem- With respect to the first, many diverse approaches for optimizing
ical or physical) affecting the shape and nature of the spectrum. the kernel mixture coefficients have emerged over the years, such
In addition, the spectral signals related to one chromophore of- as gradient descent methods [31], localized methods [32,33], and
ten overlap with other chromophores, constituting the problem of kernel alignment methods [34,35]. Their relevant contribution
associating the bands of VNIR–SWIR with concentrations of soil is usually addressed using linear or convex combination of the
properties a challenging task [8]. This also renders the task of base kernels. Models may also be developed in either one-stage
wavelength assignment, i.e. of identifying whether the wavelengths or two-stages: in the former, the optimal kernel combination
used by any prediction model are ascribed to a chromophore or are parameters and the structural parameters of the classifier / regres-
just spectral noise, very important. sor are learned simultaneously, whilst the latter decouples these
In recent years, much work has concentrated around the devel- processes. MKL has been applied in a plethora of domains, such
opment of large soil spectral libraries (SSLs), where soil samples as image classification [36], remote sensing [37], financial distress
are collected using a sampling strategy – the VNIR–SWIR spectrum prediction [38], face recognition [39], with multimedia [40], and
of each sample and its key soil properties (using wet chemistry) to identify drug side effects [41].
are then recorded. In that regard, many datasets around the world Recent advances in MKL have proposed a hybrid kernel align-
have been developed ranging from national [9,10] to continental ment method, by introducing a combination of the traditional
scales [11], whilst a recent effort focused on assimilating local SSLs global and a local kernel [42]. In essence, the incorporation of the
into a global one [12]. In the European Union, the European Statis- local information (i.e. from the nearest neighbors of each sample)
tical Office (EUROSTAT) is organizing a regular, harmonised surveys when computing the kernel alignment can lead to better perfor-
across all member states to gather information on land cover and mance. Another important point is whether a sparse or a non-
land use, as detailed in [13] and abbreviated to as LUCAS (Land sparse kernel weight mixture should be preferred; sparse MKL
Use/Cover Area frame statistical Survey). is helpful in interpreting the results (i.e. performs implicit fea-
At the same time, novel machine learning methods are being ture selection), whereas non-sparse can lead to better performance
developed to better identify and interpret the relation between (sparsity-accuracy trade-off, [43]). In fact, the l1 -norm MKL which
VNIR–SWIR spectra and properties [14–18]. As a first step, most promotes sparsity is rarely observed to outperform the trivial uni-
approaches use spectral pre-processing techniques (also called pre- form weight mixture. To that end, a more efficient solution has
treatments) to enhance the absorption peaks or perform scatter been proposed, involving the use of arbitrary norms (i.e. lp -norms
correction and assist the models to attain enhanced accuracy [19]. with p ≥ 1) which attain better accuracy but are non-sparse [44].
However, these approaches usually disregard the complementary In this paper we present a framework for multi-source com-
information contained by these spectral sources, since they re- bination and feature selection for VNIR–SWIR soil spectroscopy,
tain only the best (in terms of accuracy) spectral source. In [20] a which can be extended with the use of additional predictors. The
simple approach was proposed to use model stacking and implic- goal of this work is to demonstrate that:
itly use this information by combining the predictions of differ-
ent models (developed using various machine learning models and 1. Combining multiple spectral sources yields better results
pre-processing techniques). Another effort focused on combining than using only the best source, considering that there exists
the information by two spectral sources using a memory-based sufficient complementary information which must be appro-
learning technique [21]. The focus in [22] was placed on deriving priately combined;
interpretable results to identifying the relationship between spec- 2. The inclusion of auxiliary predictors is straightforward and
trum and output property. The PARACUDA II® data mining engine can assist the VNIR–SWIR predictions;
[23,24] is another example, which creates a fused spectral source 3. The use of multiple-spectral sources and additional predic-
by picking individually for each wavelength the pre-process tech- tors can happen simultaneously which can lead to the best
nique exhibiting the highest correlation with the output. However, results compared to the current state-of-the-art.
no concrete effort has been made hitherto to explicitly identify and
use the information contained within a pool of pre-processed spec- The presented framework entails the following three levels of
tra during the model building step in a more structured approach. MKL integration:
A different way to enhance predictions is to use auxiliary variables
such as e.g. the geographical coordinates, or other easily and inex- MKL at the feature level: Whereby different kernels are defined
pensively measurable properties such as the pH, or the soil parti- at each wavelength in order to maximize the use of its in-
cle size distribution [25]. The latter describes the physical texture formation;
by measuring the distribution of three soil separates according to MKL at the source level: Here the kernels associated with some
their relative size: in sand (0.063 - 2mm), silt (0.002 - 0.063 mm), of the individual features are combined to form the kernel
and clay (≤ 0.002 mm). of each spectral source, thereby performing implicit feature
Thus, the present study is driven by the need for more ac- selection.
curate and interpretable global models. Kernel methods such MKL at the source combination level: The heterogeneous sources,
as the support vector machines (SVM) [26,27] and Gaussian Pro- namely the different spectral sources originating from differ-
cesses [28] have been shown to be more robust and less sensitivity ent spectral pre-treatments and the auxiliary information in
to the increase of dimensionality compared to other techniques. the form of the textural information, are combined to yield
The Multiple Kernel Learning (MKL) approach [29,30] has gained enhanced predictions.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 29

Thus, the specific contribution of the present work is the adap- trade-off between accuracy and smoothness. We typically consider
tation of the MKL framework in the domain of soil spectroscopy, linear models of the form
and in particular: (i) the integration/fusion of multiple spectral
f ( x ) = w, φ ( x ) + b (2)
sources in a structured approach, where the complementary infor-
mation contained within is effectively combined, and (ii) the inte- where · , · is the inner product operator, φ : X → H denotes
gration of heterogeneous sources whereby the spectral information a possibly nonlinear mapping from the original input space to a
of each sample is combined with the textural one. Hilbert space H and b. is a bias term. The regularization function
The rest of the paper is organized as follows: Section 2 de- assumes the form ( f ) = 12
w
22 , which constrains the decision
scribes the LUCAS topsoil database (Section 2.1) and provides function f to be smooth.
an overview on the Support Vector Regression (SVR) algo- Plugging Vapnik’s  -insensitive loss function l = |y − f (x )| =
rithm (Section 2.2) and on past MKL approaches (Section 2.3). max{0, |y − f (x )| − } in (1), where  codifies our tolerance to the
Section 3 presents the proposed MKL framework for soil spec- prediction errors, the minimizer f is obtained through the solution
troscopy and details the application as well as adaptions made of the following optimization problem
on the past MKL approaches. The experimental set-up is given in 1 
Section 4, whereas the results are presented in Section 5. A discus- 
N
min
w
2 + C |yi − f ( xi )| (3)
sion of the positive attributes of the proposed framework is made w 2
i=1
in Section 6, whereas Section 7 presents the conclusions and future
directives of this work. where C corresponds to the regularization parameter of (1), per-
forming the same trade-off between accuracy and smoothness.
2. Materials and methods Using the method of Lagrange multipliers, the above uncon-
strained optimization problem can be reformulated as a con-
2.1. The LUCAS spectral library strained one in the dual:

1 
N
The LUCAS survey is an effort to build a large and consistent max − (αi − αi∗ )(α j − α ∗j )φ (xi ), φ (xj )
α 2
spatial database of the topsoils across the European Union (EU), i, j=1
based on a single sampling protocol and analysis (spectral and 

N 
N
chemical) carried out in a single laboratory. Approximately 20,0 0 0 − (αi + α ) + ∗
i yi (αi − α ) s.t.

i
topsoil samples were collected to assess the state of soil across the i=1 i=1
continent in 2009–2012, with more samples collected in 2015 and
2018 [13]. Herein we used the LUCAS 2009 topsoil database, as 
N
× (αi − αi∗ ) = 0 and αi , αi∗ ∈ [0, C] (4)
it is the one currently available to the public. In this period, the
i=1
following properties were collected for each sample: particle size
distribution, pH in H2 O, pH in CaCl2 , organic carbon, carbonates, The resulting decision function assumes the following form:
nitrogen, phosphorus, potassium, cationic exchange capacity (CEC) 
f (x ) = (αi∗ − αi )φ (xi ), φ (x ) + b (5)
and VNIR–SWIR diffuse reflectance spectrum. The chemical anal-
i∈nSV
yses were performed using traditional soil analysis methods (the
exact description per each property may be found in the LUCAS where nSV is the number of support vectors.
document). The spectra were measured using a spectrometer op- Through the use of the kernel trick [47], the inner product
erating in the 40 0–250 0 nm wavelength range with 2 nm spectral φ (xi ), φ (x) can be computed without the explicit knowledge of
resolution, resulting in a total of 1050 spectral bands for each sam- the non-linear mapping φ ( · ), through an appropriate kernel func-
ple [45]. The whole database is freely available for download for tion:
non commercial purposes1 .
K ( x i , x ) = φ ( x i ), φ ( x ) , K :X ×X →R (6)

2.2. Support vector regression The kernel function implicitly maps the input patterns in a high
dimensional feature space, with the implicit assumption that for a
The support vector machine is a state-of-the-art learning ma- sufficiently high dimensionality, the learning machine will be able
chine employed in a wide variety of modeling applications. Intro- to identify a linear optimizer. The structure of this feature space,
duced in [46] within the context of structural risk minimization and subsequently, the performance of the kernel learning machine
theory, it aims to maximize its generalization ability through solv- is heavily influenced by the choice of the kernel function and its
ing a quadratic optimization problem [26]. corresponding parameters.
Given a sample of training N data D = {(x1 , y1 ), . . . , (xN ,
yN )} ⊂ X × R, where (xi , yi ) denote the pairs of input variables and 2.3. Multiple Kernel Learning
real output, and X denotes the space of input patterns (e.g. X = RM ,
with M being the dimensionality of the input patterns xi ), our goal Selecting the kernel function and its corresponding parameters
is to find a hypothesis f ∈ H, where H is a high-dimensional feature is one of the main concerns in the learning process. Generally, this
space, that generalizes well on new and unseen data. Regularized is achieved by searching for the optimal parameters in the param-
risk minimization returns a minimizer f∗ , eter space, a process that becomes prohibitive if there are other
  structural parameters to be optimized. Multiple Kernel Learning
f∗ ∈ min Remp ( f ) + λ( f ) (1) (MKL) [30] offers an elegant alternative, allowing us to make use of
f ∈H multiple kernel functions instead of having to choose a particular
one, along with its parameters.
where Remp ( f ) = N1 l ( f (xi ), yi ) is the empirical risk of the hypoth-
In this paper we solely focus on Multiple Kernel Learning algo-
esis f with respect to a loss function l: R × Y → R, : H → R
rithms that implement a linear combination of P base kernels
is a regularizer, and λ > 0 is a constant parameter controlling the

P

1
Kμ (xi , xj ) = μm Km (xi , xj ) (7)
http://eusoils.jrc.ec.europa.eu/projects/Lucas/data.html.
m=1
30 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

where μ = [μ1 , μ2 , . . . , μP ]T , μ ≥ 0 is the kernel mixture weight 


P
vector. An additional constraint is usually imposed on the norm of max ρ
μm Km , KY 1 = {
μ
1 = 1, μ > 1} (12)
μ∈1
μ, m=1

where KY = yyT is the target kernel and
·
1 the sparsity induc-

P
1

μ
p = | μm | p p =1 (8) ing l1 -norm. Alignment maximization lends itself to a computa-
m=1 tionally efficient two-stage approach; the optimal combined kernel
is learned in the first stage by 12, and is applied, in the second
controlling its structure; for p = 1, the sparsity inducing l1 -norm stage, in a standard kernel machine such as the SVR.
is implemented, while for p > 1, the corresponding lp -norm gives The definition of alignment (Definitions 2.1 and 2.2) can be re-
rise to increasingly denser mixture weights, approaching the uni- garded as global, in that it involves all the available samples of
form kernel combination as p → ∞. While sparsity is an appeal- the training data in its computation. As such, it may fail to ac-
ing property in many statistical modeling applications, as it always count for any local structure in the data, while also forcing sample
leads to more interpretable models and can additionally be used pairs to be equally aligned to their corresponding target similari-
for implicit feature selection, it has been observed that sparse ker- ties, regardless of their own distance or dissimilarity. To overcome
nel combinations often yield worse performance than their non- these inefficacies of the global alignment, Wang et al. [42] intro-
sparse counterparts. On the other hand, non-sparse kernel com- duced the notion of local alignment, defined on local kernels, and
binations are found to be lacking in model interpretability. This employed it in conjunction with the corresponding global one to
conflicting behaviour is known as the sparsity-accuracy trade-off achieve greater performance.
[48,49], and the ultimate choice for p depends on the particular Definitions and Algorithm. For each sample i = 1, 2, . . . , N and
aim of the corresponding application. each base kernel {K}Pm=1 , a local kernel is defined through the use
of the sample’s k nearest neighbors by the following formula:
(i )
2.3.1. Kernel alignment Km = [Km ( j, l )]k×k , x j , xl ∈ Nk ( xi ) (13)
Kernel alignment is a measure of similarity between two kernel
functions, which captures the degree of agreement between a ker- where Nk (xi ) denotes the neighborhood of the ith sample. Having
nel and a given learning task. In the context of MKL, kernel align- obtained the local kernels, the local kernel alignment between any
ment can serve as the objective function to be maximized by an two kernels can be then computed using a modification of the def-
optimization process, providing the coefficients of the kernel com- inition of its global counterpart:

bination. The definition of kernel alignment, as given by [50,51] is:
1
N
Kc(i) , K c(i) F
ρ l = (14)
Definition 2.1 (Kernel Alignment). The (empirical) alignment of a N
i=1
Kc(i)
F
K c(i)
F
kernel k1 , with a kernel k2 is:
This approach in the alignment maximization consists of defining a
K1 , K2 F new quantity, called hybrid alignment, by combining both the local
A ( k1 , k2 ) = (9) and the global definitions. Letting ρ
g denote the global alignment
K1 , K1 F K2 , K2 F
as defined in (10), the hybrid alignment between two kernels can
where Ki is the kernel matrix derived from kernel function Ki . then be obtained by the following convex combination:

If we consider K2 = yyT , where y is the vector of the target val-


ρ = (1 − λ )ρ g + λρ l (15)
ues, we can view K2 as the ideal kernel representing the target Here, λ (λ ∈ [0, 1]) is a regularization parameter controlling the
information in the context of alignment maximization. trade-off between local and global information incorporated in the
Cortes et al. [52] suggested a slight modification which has computation of the hybrid alignment.
been proven to yield better generalization performance. They ob- The alignment maximization problem (12) can be rewritten in
served that in certain situations, kernel alignment does not cor- a way that emphasizes the incorporation of both local and global
relate well with performance and suggested an improvement by alignment information, in the following manner:
introducing the notion of centered kernel alignment.   

(i ) (i )
1−λ  m μm  K m , K Y F  i μm K m , K Y F
Definition 2.2 (Centered Kernel Alignment). Let K ∈ RN × N and max  +λ  (16)
μ∈1 N
m μm K p
F(i )
m μm K m
F
K ∈ RN × N be two kernel matrices with non-zero Frobenius norms, i

i.e.
Kc
F ,
K c
F = 0. The centered alignment between them is de- Defining the auxiliary variable τ i below,
fined by: 

Pm=1 μm Km(i)
F
K , K τi =  (17)
ρ = c c F (10)
Pm=1 μm Km
F

K

K

c F c F we can rewrite (16) as:


where Kc and Kc are centered matrices computed by the formula   
1 −λ 1
i τi m μm Km(i) , KY (i) F + λ m μm K p , K Y F
  max
N
 (18)
11 T
11 T μ
m μm K m
F
Kc = I − K I− (11) 
N N (i ) (i )
Defining Mmq = Km , Kq , αm = 1
N i τi Km , KY
1
and bm =

with I denoting the identity matrix and 1 ∈ RN × 1 the vector with


Km , KY and setting c = (1 − λ )a + λb, the optimal μ is obtained
through solving the following optimization problem, formulated
all entries equal to one.
by [52]:
μT ccT μ
2.3.2. Hybrid kernel alignment maximization μ∗ = max (19)
Most kernel alignment maximization algorithms obtain the
μ∈2 μT Mμ
combined kernel through the solution of the following optimiza- which can be reduced to a simple QP problem by the following
tion problem: proposition:
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 31

Proposition 2.1. Cortes et al. [52] Let v∗ be the solution to the fol- one, is solved repeatedly using a standard SVR on the learned com-
lowing quadratic program (QP): bined kernel. The overall solution is achieved by alternating be-
tween optimizing with respect to (w.r.t.) the weights μ and w.r.t.
min vT Mv − 2vT c (20)
v≥0 the remaining variables (structural parameters of SVR).
The basic idea for this approach is that for a given, fixed set of
Then the solution μ∗ of the alignment maximization problem is given
∗ primal variables (w, b), the optimal μ can be calculated analyti-
by μ∗ =
vv∗

cally by the following formula:


To maximize the proposed hybrid alignment, Algorithm 1 is 2

wm
Hp+1
μm =  m
 , ∀m = 1 , . . . , P (22)
Algorithm 1: Hybrid Kernel Alignment Maximization (HKAM). P 2p
1

m =1

w
H p+1

p

Input: {K p }mp=1
, Y, 0 , k, λ m

Output: μ
The next step is to devise a way to solve the optimization problem
1 Initialize τ 0 and set t = 0
(21) w.r.t. variables (w, b) given fixed kernel mixture coefficients
2 Obtain the neighbor list of each sample.
μ. Omitting the detailed derivation, the resulting dual optimization
3 Calculate Kc and KY for each sample
problem is:
ob j (t−1 ) −ob j (t )
4 while ≤ 0 do  
ob j (t )

N  α  1
P
Update v(t+1 ) with fixed τ (t ) by (20)
5
max −C l ∗

i
, yi − μm αT K m α (23)
v(t+1 )
6 μ(t+1) =
v(t+1)
1
αT 1=0
i=1
C 2
m=1

7 Update τ (t+1 ) by (17) using fixed μ(t+1) Using the KKT conditions at the optimal point, we can derive the
8 Compute ob j (t ) by (12) and (15) following formula for wm :
9 t =t +1

wm
2 = μ2m αT Km α ∀m = 1, . . . , P (24)

A simple wrapper algorithm for lp -MKL training may then be de-


utilized. In each iteration, the variable τ is updated according to
fined, as shown in Algorithm 2.
(17) and then its value is used to compute the parameters needed
to solve the quadratic optimization problem (20). After each solu-
tion is obtained, the objective function is updated. If the normal- Algorithm 2: Simple lp > 1 -norm MKL wrapper-based training
ized difference in two successive values of the objective function algorithm. The analytical updates of μ and SVR computations
is smaller than a threshold  0 the algorithm terminates. An impor- are optimized alternatingly.
tant thing to note here is that the neighbor list is computed once Input: feasible α and μ
and kept unchanged throughout the iterative process. To choose 1 while optimality conditions are not met do
the neighbors for each sample, the average kernel of all base ker- 2 Compute α according to (23) (e.g., SVR)
nels is first computed. Then, for the ith sample, the ith column of 3 Compute
wm
2 ∀m = 1, . . . , P using (24)
the average kernel is sorted in descending order. The neighbor list 4 Update μ according to (22)
for this sample then consists of the k first elements of the sorted
list. Algorithm 1 is guaranteed to be monotonically increasing and
usually converges in the first few iterations.
A major disadvantage of the wrapper approach is the heavy
memory usage as well as the time wastage induced. The kernel
2.3.3. lp -norm Multiple Kernel Learning
matrices either need to be precomputed and stored in memory, or
The MKL algorithms studied in the previous sections, involving
if this is impossible, be computed again and again between iter-
the kernel alignment measure, solve optimization problems lead-
ations. An efficient alternative involves the use of a chunking ap-
ing to sparse kernel combinations. To allow for more robust kernel
proach in order to break the optimization in a series of subprob-
mixtures that generalize well, Kloft et al. [44] proposed a unify-
lems. First introduced in [53], it was subsequently incorporated in
ing framework for MKL, able to incorporate arbitrary cost functions
a large-scale sparse MKL framework [54]. Finally, it was extended
and norm penalties. In the context of linear kernel combinations,
[44] to include non-sparse kernel mixtures, with the resulting al-
with lp -norm regularization, where p > 1 for non-sparse solutions,
gorithm presented in Algorithm 3 . The optimality conditions ob-
the following optimization problem is formulated:
served for the solution can be the same as the ones in Algorithm 2.


  1 
wm
Hm
N P P 2
min C l wm , φm (xi ) Hm + b, yi +
w,μ≥0
i=1 m=1
2
m=1
μm 3. Application of the multi-kernel framework for soil
spectroscopy
(21a)
Let the recorded SSL be comprised of N soil samples with the
initially recorded reflectance (or absorbance) spectra described by
s.t
μ
≤ 12
p (21b)
M bands. Assume moreover that a set of Q pre-processing tech-
Optimization. Unlike the two-stage HKAM problem, lp -MKL as- niques is defined, acting upon the initially recorded spectra, and
sumes a one-stage approach; the structural parameters of the base generating an equal number of spectral sources. Then, the ith
learner and the kernel mixture weights are optimized simulta- soil sample is represented in the sth spectral source by the vec-
neously. However, an efficient and straightforward optimization tor: x(s ) (i ) = [x1(s ) (i ), . . . , xm
(s ) (s )
( i ), . . . , xM (i )] ∈ RM . By denoting the
scheme is not available for 21, especially for large-scale problems. predictor space of the sth spectral source as X (s ) = {x(s ) (i ), i =
Thus, a two-layer optimization procedure is proposed; a master 1, . . . , N}, the respective dataset is given by: D(s ) = {X (s ) , Y }.
problem, parametrized only by μ is solved to obtain the kernel The MKL framework may then be applied in three successive
mixture weights and a slave problem, nested inside the master levels of integration as described below.
32 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

Algorithm 3: lp -norm MKL chunking-based algorithm via an- Algorithm 4: Per feature kernel construction.
alytical update. Kernel weighting μ and SVR α are optimized Data: Spectral bands, SOC content:
interleavingly. (s )
x˜ m ∈ RN , m = 1, . . . , M, y ∈ RN
Input: subproblem size Q, accuracy  Result: Kernel mixture weights per spectral band:
(s )
1 Initialize: gm,i = gˆi = αi = 0, ∀i, L = S = −∞, hm ∈ RM×L , m = 1, . . . , M

1 for m = 1, . . . , M do
μm = p
P , ∀m
1
= 1, . . . , P (s ) (s ) (s )
2 Sample L values σm = [σm, 1
, . . . , σm,L ] within the 0.1, 0.9
2 while optimality conditions are not met do (s ) (s )
quantile of
xm ( i ) − xm ( j )
, i, j = 1, . . . , N, i = j.
3 Select Q variables αi,1 , . . . , αi,Q based on the gradient gˆ of
for = 1, . . . , L do
(23) w.r.t. α
3
 (s ) (s ) 
(s )
xm (i )−xm ( j )
2
4 Store α old and then update α according to (23) with 4 Compute base kernel Km, = exp − (s )2 .
σm,
respect to the selected variables
 Center kernel according to definition 2.2 and add
5 Update gradients gm.i ← gm,i + Q q=1
(αiq − αiold )km (xiq , xi ), 5
q regularization parameter to diagonal elements.
∀m = 1, . . . , M, i = 1, . . . , N 6 Compute corresponding kernel weight by
6 Compute the quadratic terms (s ) (s )
 hm, =ρ
(Km, , KY ).
Sm = 12 i gm,i αi , qm = 2μ2m Sm , ∀m = 1, . . . , M;
  (s )
7 Lold = L, L = i yi αi , Sold = S, S = m μm Sm 7 Normalize kernel mixture weights so that
hm
= 1.
8 if |1 − L L−S
−S | ≥  then
old old
1
1+ p
qm
9 μm = p , ∀m = 1, . . . , M
M 1+ p formed through the following function:
q
m =1 m
10 else 
M
(s ) (s ) (s ) (s ) (s )
11 break K (x(s) (i ), x(s) ( j )) = dm Km (xm (i ), xm ( j )) (26)

12 gˆi = m μm gm,i , ∀i = 1, . . . , N m=1

To avoid noisy additions to the source kernel and obtain a more


well-defined feature ranking, a thresholding scheme is applied to
(s )
dispose of the values very close to zero, i.e. all dm < 10−3 are
3.1. MKL at the feature level
made equal to zero.
(s ) (s )
Let x˜ m = {xm (i ), i = 1, . . . , N} ∈ RN represent for each sepa-
3.2.2. Augmenting the source kernel
rate feature m, the set of spectral values across all N samples.
To boost the statistical model’s performance, an additional
We employ the independent alignment maximization algorithm to
Gaussian kernel K(s ) is defined on the subset of selected features,
construct a kernel mixture of L base Gaussian kernels K (x, xi ) =

x−x
2
using the following kernel function:
exp( 2σ 2i ). The widths σ of the constituent base kernels are  (s )

selected following the proposition placed in [55], where the au- (s ) (x(s ) (i ), x(s ) ( j )) = exp −
x
K
( i ) − x ( s ) ( j )
2
(27)
thors showed that the optimal values of σ lie within the 0.1 and σ ( s )2
0.9 quantile of the pairwise distances between the corresponding
(s ) (s )
samples. Thus, the base kernels are initialized by sampling L values with x (s ) (i ) = {xm ∀m ∈ [1, . . . , M] : dm > 0}.
from within that range so that a mixture weight vector is com- Thus, the final source kernel is formed:
puted, in order to produce the feature kernel as a convex com- 
M
bination of the L Gaussians. Each kernel is centered, and a small K (s ) = dm Km +
(s ) (s )
K (s ) (28)
positive constant is added to its diagonal elements, for numeri- m=1
cal stability reasons. Finally, the weight vector is obtained as the
The optimal value of σ (s) is obtained by sampling L values from
solution to the independent alignment maximization problem us-
within the 0.1 and 0.9 quantile of the samples’ pairwise distances,
ing the global definition of [52], i.e. with λ = 0 in (15). This choice
was made due to its computational simplicity, considering that this

x (s) (i ) − x (s) ( j )
2 . A Gaussian kernel is defined for each separate
value, and its centered alignment to the target kernel is computed;
procedure is going to be repeated for all M features. The whole
the one with the greatest alignment is picked and the rest are dis-
process is summarized in Algorithm 4 , which produces the final
(s ) carded. Algorithm 5 summarizes the procedure.
composite feature kernel Km using a kernel function of the form:


L 3.3. MKL at the source combination level
Km(s ) (xm
(s ) (s )
( i ), xm ( j )) = (s )
hm, (s ) (s )
· Km, (s )
( xm ( i ), xm ( j )) (25)
=1 The third level of integration entails the implicit combination of
the complementary information contained by the different spectral
3.2. MKL at the single spectral source level sources, which are calculated from the original spectra using dif-
ferent pre-treatments. The goal is to establish a more robust and
The next step involves the formation of the composite source accurate statistical model.
kernel comprised by the individual feature kernels, by calculating
a feature weight vector d(s) ∈ RM . 3.3.1. Combining spectral sources
As detailed above in the single source level (Eq. (28)) for each
3.2.1. Feature selection source s = 1, . . . , Q a separate source kernel K(s) is defined. These
By promoting the use of sparse solutions for d(s) it is possible source kernels can be combined into an overall spectral kernel

to effectively perform feature selection, that is retain only the most
spc
Kcomb = Q μ K(s) . The kernel mixture weights μ ∈ RQ may be
s=1 s
relevant features with respect to y. This is achieved through the interpreted as the significance of each spectral source towards the
use of the HKAM (Algorithm 1) so that the composite kernel is final kernel, i.e. can act as a source ranking method.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 33

3.3.2. Combining spectral sources & additional predictors


Algorithm 5: Augmenting the Source Kernel.
In addition to the use of the spectral information, it is possible
Data: Subset of selected features, SOC content: X (s ) , Y to incorporate additional predictors to enhance the performance.
Result: σ (s ) hyperparameter for additional kernel This use of heterogeneous sources of information can be easily in-
1 Initialize ρ
= 0 tegrated into the proposed framework. We consider herein the use
2 Sample L values, {σ1(s ) , . . . , σ10
(s )
} within the 0.1 and 0.9 of the textural information of each sample in the form of the parti-

quantiles of
x (s ) (i ) − x (s ) ( j )
2 cle size distribution, and specifically using the Sand and Clay con-
3 Compute target kernel KY = yyT tent. The same principle could be applied with other auxiliary pre-
4 for = 1, . . . , L do dictors (e.g. pH, Cation Exchange Capacity, Electrical Conductivity
 (s ) (s ) 2 etc.).
5 K(s ) = exp −
x (i )−x ( j )


The additional predictors are thus defined in the X(a) ∈ R2 space
σ 2(s )
(s ) as follows:

K KY F
6 ρ = (s )
c
(a ) (a ) T

K
F
KY
F
c x(a ) = [xsand , xclay ] (29)
(s )
7 Choose the σ corresponding to max ρ

Textural Kernel Construction. The additional information is not
subject to spectral pre-processing, and hence the constructed ker-
nel is common across all spectral sources. For both Sand and Clay
contents we construct a feature kernel using Algorithm 4. The ad-
ditional textural kernel is then constructed via Algorithm 1, de-
fined as the sum of two composite kernels:

Ka = d1(a) Ksand
(a )
+ d2(a) Kclay
(a )
(30)

where the weighting vector d(a ) = [d1(a ) , d2(a ) ] encodes the relative
importance of the two additional predictors.
Single Source Stage + Additional predictors. First, we consider the
incorporation of the additional kernel to the single source stage.
For each spectral kernel K(s) a combined spectral-textural kernel
(s )+a
Kcomb is calculated through Algorithm 3, with the weighting vector
μ(s) encoding the relative importance of the spectral and textural
information.
Multiple Source Stage + Additional predictors. In the final step we
incorporate the additional textural information into the multiple
source combination kernel to enhance the performance of the re-
sulting model. Here, all spectral source kernels K(s ) , s = 1, . . . , Q
and the textural kernel Ka are treated as base kernels, and used as
inputs in Algorithm 3. The result is an augmented multiple-source
statistical model with a kernel Kcomb and a weighting vector μ ,
spc+a

encoding the relative importance of the individual spectral sources


and the single textural source.

4. Experimental set-up
Fig. 1. Overview of the three-level MKL approach for spectral source combination

4.1. Data preparation

In this work, the focus was placed in predicting the logarithm


In the single source level, the kernel combination K(s) was of Soil Organic Carbon (i.e. log10 (SOC + 1 )), hereafter abbreviated
achieved through the use of two-stage MKL algorithms while to as logSOC. The proposed framework could also be applied to
the kernel mixture weights were regularized using the sparsity- predict other soil properties as well.
inducing l1 -norm. This was motivated by the following: (i) the The following spectral pre-processing techniques were exam-
computational advantages of two-stage methods, and (ii) the fea- ined and applied to the original Absorbance (Abs) spectra of the
ture selection properties of the l1 -norm which produces more sim- LUCAS topsoil database: (i) transformation into reflectance (R =
1
ple and interpretable models. These two choices, however, have ), (ii)continuum removal (CR), (iii) a zero-order Savitzky-Golay
10Abs
their own shortcomings; it has been found that two-stage MKL filter with a window of 50 points (Abs-SG0), (iv) the previous step
methods are outclassed in terms of performance by their one-stage and additionally its standard normal variate (Abs-SG0-SNV), (v) a
counterparts, and that the l1 -norm regularization can inadvertently first-order derivative implemented with a Savitzky-Golay filter of
lead to the elimination of some useful features. In order to avoid width 50 (Abs-SG1), (vi) as well as its standard normal variate
both of these issues, the source (kernel) combination is performed (Abs-SG1-SNV).
by the lp -MKL algorithm, a one-stage MKL method that employs In order to account for some large differences in the spectral
non-sparse regularization through the lp > 1 -norm (Algorithm 3). response, the whole dataset was divided in mineral and organic
This enables the model to achieve greater performance than the soil materials according to the FAO definition for organic soils [56].
two-stage method, as well as obtain a source ranking vector where Subsequently, the 17,938 mineral soil samples were further split
no coefficient is zero. The last part guarantees that the final statis- into cropland (8393), grassland (4096) and woodland (4623) soils
tical model will incorporate information from all spectral sources, according to land cover classes of the LUCAS database. For each of
and no accidental elimination of such information will take place. the four subsets (namely Cropland, Grassland, Woodland, and Min-
Fig. 1 depicts the entire process. eral), we end up with six data representations (henceforth called
34 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

Fig. 2. The different spectral sources considered for the Mineral dataset – depicted are the 5th, 16th, 50th, 84th and 95th percentiles.

sources) each corresponding to a different combination of prepro- performed. At each grid point (k, λ), exploiting the two-stage na-
cessing steps. ture of the HKAM algorithm, we construct the composite (source)
The different spectral sources for the Mineral dataset are de- kernel by maximizing the hybrid alignment (Algorithm 1) of the
(s )
picted in Fig. 2. feature kernels Km with the target kernel KY and proceed to eval-
In order to limit the large dimensionality of the dataset, rather uate the constructed kernel through a 5-fold CV where in each fold
than working with all the 1050 spectral bands of each sample, an SVR problem is solved, with the estimated hyperparameters (C,
we chose to downsample the available spectra in intervals of  ). After obtaining the optimal values k, λ and the corresponding
10nm, thus keeping 210 spectral bands. Finally, for each subset, the source kernel K(s) , a second grid search and 5-fold CV is performed,
Conditioned Latin Hypercube Sampling [57] algorithm was imple- this time to identify the optimal values of C and  and to obtain a
mented in order to create training (66.6%) and testing (33.3%) sets, final evaluation for our single source model. The whole process is
with the training set of each subset being further split in five folds, detailed in Algorithm 6.
through the use of the Fuzzy c-means algorithm [58].
Algorithm 6: Source Kernel Construction - Single Source
4.2. MKL model calibration - hyperparameter estimation Model Evaluation.
(s )
Data: Feature(base) kernels, SOC content: Km , y ∈ RN
The MKL models described above, rely on the optimization of
Result: Source Kernel, feature ranking vector: K(s ) , d(s )
their respective hyperparameters.
Create 2-D grid of hyperparameters (k, λ )
1
 
2 C = max |ȳ + 3σy |, |ȳ − 3σy | ,  = 0.01
4.2.1. Optimizing HKAM
The HKAM method (Algorithm 1) depends on two hyperpa- 3 for each pair (k, λ ) in grid do
rameters, namely, the number of nearest neighbors k used in the 4 Calculate source kernel mixture weights d(s ) by Algorithm
computation of the local kernel alignment, and the regularization 1 with feature kernels as inputs
parameter λ controlling the trade-off between local and global 5 Calculate additional kernel by Algorithm 5
information incorporated in the model. In addition, because the 6 Calculate source kernel by (28)
SVR optimization step is calculated using the final source kernel 7 Run 5-fold CV to evaluate pair (k, λ )
(Eq. (28)), two more hyperparameters need to be optimized: the 8 Save K(s ) , d(s ) corresponding to (k, λ ) with best performance
regularization parameter C and the  -tube width,  . To alleviate 9 Create 2-D grid of hyperparameters (C,  )
the computation cost associated with simultaneous optimization 10 for each pair (C,  ) in grid do
of all 4 parameters, we solve two optimization sub-problems each 11 Run 5-fold CV to evaluate pair (C,  )
concerned with a pair of parameters, whilst the other two are set.
Evaluate single source model with optimal hyperparameters
To avoid selecting arbitrary values for both C and  in the first
12

step, we estimate them by a process described in [59], where a


good estimate for both is given by:
4.2.2. Optimizing lp -norm MKL
C = max (|ȳ + 3σy |, |ȳ − 3σy | )  ∝ σn (31)
The performance of the algorithm is controlled by three hyper-
where ȳ is the mean of the target variable y, σ y is the correspond- parameters, the epsilon-tube width  , the regularization parame-
ing standard deviation and σ n is the standard deviation of the in- ter C and the norm p, all three to be optimized through a grid
put noise level. In order to save the computational time that would search. As in the previous stage, we follow a decomposing scheme
be needed to estimate the input noise levels, we skip the  esti- whereby, in the first step, we optimize C and  while fixing p to a
mation and simply set it to  = 0.01. With these two values set, certain value, and in the second step, using the optimal values of
a grid search and 5-fold CV for the optimal values of k and λ is C and  , we search for the optimal p.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 35

4.3. Nomenclature

To differentiate among the different approaches herein the fol-


lowing naming system is used:

1. MKL-S: Multi-kernel SVR models developed using as predic-


tors only a single spectral source.
2. MKL-MS: Multi-kernel SVR models developed using as pre-
dictors all available spectral sources.
3. MKL-Sa and MKL-MSa : use the predictors described in the
above points as well as the additional set of auxiliary pre-
dictors (here the textural information).

4.4. Competing methodologies


Fig. 3. Comparison between the R2 performance across the different models pro-
posed for the Mineral dataset.
The efficacy of the proposed methodology was compared with
the following algorithms which are the state-of-the-art in soil
spectroscopy: 4.6. Implementation details

1. The simple SVR algorithm for regression using a Gaussian The algorithm was implemented in the Python programming
kernel. language (Python 2.7) and all the experiments were executed on
2. The Partial Least Squares (PLS) regression algorithm [60], a machine with 48 cores (AMD Opteron, 2.1 GHz) and 32 GBs of
which performs regression in a transformed input space, RAM. In order to speed up execution time, we took effort to paral-
formed by successively selecting orthogonal factors (latent lelize large sections of the code, mainly those involving parameter
variables) maximizing the covariance between predictor and selection through grid search and cross validation. The source com-
the response variable. bination part which was formulated as an lp -norm MKL problem,
3. The Cubist algorithm [61,62], a rule-based model constructed was implemented through the use of the SHOGUN machine learn-
as a tree, whose branches formulate the premise part, while ing toolbox [65].
the leaves contain linear regression models; a boosting-like
scheme (termed committees) and an error correction mech- 5. Experimental results
anism are further employed to enhance the accuracy of pre-
dictions. 5.1. Accuracy results of the proposed approach
4. The Spectrum-Based Learner (SBL) [63] which uses memory-
based learning and builds a Gaussian Process Regression The results of the proposed MKL framework for the predic-
model for each unknown testing sample using its optimal tion of logSOC across the mineral soil datasets of the LUCAS SSL
spectral neighbors. are presented in Table 1. The effect of the different spectral pre-
treatments is evident as the more accurate models were developed
Unlike the proposed framework, none of the above detailed al-
from the spectral derivatives. Additionally, the positive effect of the
gorithms have been previously examined using multiple spectral
combination of the spectral sources as well as of the incorporation
sources, because this integration is not an easy task for them due
of the additional predictors can be identified. This positive influ-
to the considerable expansion of the feature space. This is pre-
ence is illustrated in Fig. 3, where the absolute differences in R2
sented here for the first time and is the cornerstone of the pro-
for the Mineral dataset are visualized. For example, the MKL-MS
posed MKL framework. Therefore, the comparisons were made us-
model which combines the different spectral sources, outperforms
ing as predictors the single spectral sources without and with the
all MKL-S models utilizing its constituent kernels. What is more,
use of auxiliary predictors (i.e. the textural information).
all models benefit from the usage of the additional predictors. The
least impact is found in spectral sources which attained signifi-
4.5. Performance metrics cant results using the spectral information alone, but even in those
cases the relative increase in R2 is about 14%. Overall, as expected,
The models were validated on the independent test set using the best results are derived from the MKL-MSa model, which takes
the following metrics:(i) the Root Mean Squared Error (RMSE), (ii) advantage of all possible spectral sources and additional predictors.
the coefficient of determination R2 , and (iii) the ratio of perfor- The model attained a performance of RPIQ 5.17, a notable result.
mance to interquartile range (RPIQ) [64].
R2 quantifies the degree of any linear correlation between the 5.2. Model interpretation and discussion
observed and the model predicted output; it usually ranges from 0
to 1 (higher is better) and is calculated thusly: This section will examine the interpretation capacities of the
proposed approach across the two last levels of integration, and
N
(yi − yˆi )2 specifically: (i)its feature selection capabilities within each spectral
R2 (y, yˆ ) = 1 − iN=1 (32) source, and (ii) the importance of each constituent kernel in the
i=1 (yi − ȳ )
2
developed combined models.
with yˆi being the prediction for the ith pattern.
RPIQ on the other hand takes both the prediction error and the 5.2.1. Feature importance
variation of observed values into account, without making assump- The sparse feature weight vectors d(s) depict the contribution
tions about the distribution of the observed values. It is defined as of each feature kernel towards the source kernel, and may there-
the interquartile range of the observed values divided by the RMSE upon be used to identify the relative importance of the selected
of prediction, i.e. RPIQ=IQR/RMSE. few wavelengths. These weights are illustrated per each spectral
36 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

Table 1
Performance metrics on all four datasets using the proposed framework.

Model Grassland Woodland Cropland Mineral

R2 RMSE RPIQ R2 RMSE RPIQ R2 RMSE RPIQ R2 RMSE RPIQ

MKL-S
R 0.57 0.20 1.99 0.58 0.23 1.99 0.54 0.16 1.69 0.57 0.24 1.79
CR 0.56 0.20 1.96 0.63 0.22 2.14 0.50 0.17 1.62 0.66 0.20 2.17
Abs-SG0 0.60 0.19 2.07 0.63 0.22 2.14 0.55 0.16 1.72 0.62 0.23 1.83
Abs-SG0-SNV 0.70 0.17 2.37 0.66 0.21 2.22 0.67 0.14 2.01 0.78 0.16 2.69
Abs-SG1 0.75 0.15 2.59 0.76 0.18 2.62 0.73 0.13 2.22 0.80 0.15 2.83
Abs-SG1-SNV 0.71 0.17 2.43 0.73 0.19 2.48 0.67 0.14 2.01 0.79 0.16 2.72

MKL-MS 0.82 0.13 3.06 0.81 0.16 2.97 0.79 0.11 2.49 0.86 0.13 3.39
a
MKL-S
R 0.76 0.15 2.64 0.76 0.18 2.65 0.64 0.14 1.92 0.67 0.19 2.20
CR 0.86 0.12 3.45 0.86 0.14 3.41 0.80 0.11 2.55 0.87 0.13 3.42
Abs-SG0 0.78 0.15 2.76 0.77 0.17 2.70 0.64 0.14 1.92 0.69 0.19 2.26
Abs-SG0-SNV 0.86 0.11 3.52 0.79 0.16 2.84 0.82 0.10 2.70 0.88 0.12 3.65
Abs-SG1 0.91 0.09 4.34 0.89 0.12 4.00 0.85 0.09 3.01 0.91 0.10 4.20
Abs-SG1-SNV 0.90 0.10 4.16 0.89 0.12 3.98 0.85 0.09 3.01 0.92 0.10 4.46

MKL-MSa 0.93 0.08 5.04 0.92 0.10 4.59 0.89 0.08 3.47 0.94 0.08 5.17

their presence is controlled by SOM, and provide signifi-


cant evidence vis-à-vis the underlying soil-weathering pro-
cess which formed the soil; (ii) at 580 nm, which has been
shown to be strongly correlated with SOC [66]; (iii) the 680
nm band, linked with hydroxyl, found e.g. within the humus
carboxyl but also with chlorophyll pigments, i.e. the plant
residues contributing to SOM [67];
• In the near-infrared range: (i) at 1150 nm, where weak ab-
sorption bands of water and aromatic organic compounds
may be found; (ii) a broad area centered around 1400 nm
is considered important for SOC, associated with both water
(bending and stretching bonds of O–H) and clay minerals;
• In the shortwave-infrared range: (i) at 1720 nm, associated
with organic compounds (overtone of aliphatic C–H stretch);
(ii) with a series of peaks above 2100 nm (e.g. at 2260 and
2320 nm), which can be ascribed to the presence of clay
minerals (kaolinites, smectites, and illites) as well as organic
Fig. 4. The relative wavelength importance per each spectral source for the Mineral compounds.
dataset, depicted on the y-axis of each figure.
It should be noted that the soil samples were air-dried prior
to the spectral measurements. Thus, bands associated with water
source in Fig. 4, and some important bands associated with chro-
are due to the presence of (i) hygroscopic (also termed adsorbed)
mophores are shown above them (information adapted from [9]).
water that is adsorbed on the surface areas of both organic
It becomes evident that the more complex pre-processed sources
matter (and in particular humus) and of clay minerals (especially
utilize more features than the simple ones (i.e. the Reflectance and
smectite), and (ii) structural water which is incorporated into the
the Abs-SG0), which enabled them to attain more accurate results.
mineral lattice. In this way, water can be indirectly correlated with
This is yet another indication that spectral pre-treatments can have
SOC.
a profound effect on the model, by enabling it to readily identify
important spectral regions.
Because SOM is comprised of various organic compounds, and 5.2.2. Relative source importance at the source combination level
given its contribution to soil structure and nutrient retention, it Use of additional predictors. The relative weighting vector μ(s)
can be either directly or indirectly correlated with many physi- regulating the contribution of the spectral kernel K(s) and the
cal and chemical chromophores which are in turn associated with textural kernel Ka towards the combined spectral-textural kernel
(s )+a
bands across the entire VNIR–SWIR range. This justifies the use of Kcomb is visualized for each spectral source in Fig. 5. The spec-
wavelengths across the whole range by the three most accurate tral kernel always contributes more strongly towards the combined
models. The total number of the significant features (i.e. with a kernel, but the textural kernel despite being comprised of only two
relative weight > 0.01) selected by these models is on average additional features, has a relative weight of more than 20%.
≈ 38 = 20%, whilst all selected features are ≈ 59 = 30%, meaning Although in principle the particle size distribution (soil texture),
that they performed effective and sparse feature selection. being a physical parameter, is uncorrelated with SOC as it de-
Some important bands and their potential interpretation identi- scribes the inorganic mineral part, it nevertheless has a profound
fied in the source which produced the best results, namely Abs- effect on the spectral response by affecting the soil aggregate dis-
SG1, are given below (interpretations may be derived from the tribution and ipso facto the albedo. A coarser texture generally in-
other spectral sources in similar fashion): creases the scattering (reduces reflection), and the apparent ab-
sorbance increases as path length increases [5]. The knowledge of
• In the visible range: (i) the 500 nm band, associated the particle size distribution therefore allows the model to account
with iron oxides which like SOM influence the soil colour, for changes in the spectrum which would otherwise have been at-
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 37

Fig. 5. Relative importance between spectral and textural kernels for the MKL-Sa
models and the Mineral dataset.

tributed to changes due to SOC content. Three explanations may be


offered to identify why the relative importance and the effect on
the accuracy results is so significant. First, the effect on the albedo
due to the soil aggregate distribution may be confounded with the
one caused due to the presence of organic matter [68]; it is a well
known observation that soils become darker with increasing or-
ganic matter and particle aggregation. Second, the presence of wa-
ter may be better accounted for. For example, almost all source
kernels (and particularly the kernel developed from the SG0-SNV Fig. 6. Relative importance between the different sources for the MKL-MS and
spectra) identify bands associated with the presence of the water. MKL-MSa models and the Mineral dataset.
The water molecules in the air-dried samples are found both in the
mineral lattice, and on the particle surface, termed as hygroscopic tors’ kernel has a weight of around 10%, which is significantly di-
water, which is adsorbed both by clay minerals and the organic minished if we compare it with the respective values in the single
matter. Ergo, the knowledge of the clay content and mainly its source cases. In other words, the use of multiple sources can as-
speciese (smectite has a large surface area of 800 m2 /g compared sist the model in explaining some of the spectral variability which
to kaolinite with around 40 m2 /g, and thus higher electrochemi- may be attributed (directly or indirectly) to the particle size dis-
cal activity) enables the model to better associate the water-related tribution as explained previously. The above notwithstanding, the
absorption peaks with the presence of organic compounds (humus MKL-MSa is the most accurate model, exhibiting that the use of
has surface area of around 10 0 0 m2 /g). Indeed, the textural ker- multiple spectral sources and the particle size distribution may be
nel is very important in the SG0-SNV kernel (30%) which is known appropriately combined.
to particularly emphasize the water absorption bands (as seen in
Fig. 4), and the above described mechanism may explain why. And 5.2.3. Comparison with competing methodologies
third, smaller particle sizes (i.e. clay) may be bound with SOM and The accuracy comparison between MKL-S (MKL-Sa ) and the rest
other microaggregates and form larger and more stable aggregates of the competing methodologies without (with) the use of addi-
which protect and assist the stability of SOM in the long-term [69]. tional predictors is presented in Table 2. It should be noted again,
Use of multiple spectral sources. In both cases of the MKL-MS that in this Table the comparison is not made with the versions
and MKL-MSa models, where the lp -norm MKL algorithm is em- that use multiple spectral sources (namely MKL-MS and MKL-MSa )
ployed to combine the spectral sources, the resulting kernel mix- which is one of the cornerstones and novelties of the proposed
ture weights μ and μ respectively, may be used to identify to three-level MKL framework, because the competing methodologies
what extent each spectral kernel affects the final kernel. Fig. 6 il- cannot effortlessly integrate and use the information from multiple
lustrates these weights which show the relative importance of each spectral sources. Table 3 compares the runtimes of the proposed
spectral source in the combined kernel. Clearly, the effect of lp - models and their counterparts.
norm is visible given that no individual source has a weight of The PLS algorithm has the lowest training time, producing the
zero. The two kernels which lead to the best performing single simplest and consequently the least accurate models; it has at-
source models, namely Abs-SG1 and Abs-SG1-SNV have the most tained the lowest accuracy across all experiments. Compared to
significant contribution. The other sources however, despite hav- the simple SVR algorithm, which does not perform feature selec-
ing a lower performance, are also utilized by the combined ker- tion but rather uses all available features, the MKL-S and MKL-Sa
nel, albeit with less significance. This demonstrates the significance approaches achieve similar performance, being better than SVR on
of the kernel combination which uses multiple spectral sources, 4 cases and worse on the other 4. This attests to the importance
each performing a different task and highlighting different parts of using multiple kernels at the first level of integration, namely at
of the spectral information, thus combining the complementary in- each band. When using single spectral sources, the Cubist and SBL
formation they contain. In particular, the information regarding the algorithms are faster and more accurate than the MKL approach,
albedo (baseline) of the spectrum which is useful as a first-order with the latter achieving the best results when only the spectral
approximation of the presence of organic compounds is not re- information is used, while the former produces the best results
tained in the sources utilizing the first-derivative (i.e. both Abs-SG1 when both spectral and textural information are available to the
and Abs-SG1-SNV) or in the CR, explaining why the Abs-SG0 and model. Notwithstanding the above, it must be underscored that
R sources have together a relative weight of approximately 20%. when the proposed MKL approach is used in full and the infor-
In fact, the effect of the spectral source combination in MKL-MS is mation from different spectral sources is taken into account simul-
more evident if we observe that in MKL-MSa the additional predic- taneously (i.e. the MKL-MS and MKL-MSa approaches), its results
38 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

Table 2
Comparison among the competing methodologies using as predictors (a) the best single spectral source, and b) the best single spectral source + the auxiliary predictors
(particle size distribution); the best spectral source is identified by each methodology as the one attaining the maximum accuracy.

Model Grassland Woodland Cropland Mineral

Source R2 RMSE RPIQ Source R2 RMSE RPIQ Source R2 RMSE RPIQ Source R2 RMSE RPIQ

Best single spectral source


PLS Abs 0.71 0.16 2.44 Abs-SG0 0.71 0.19 2.39 Abs-SG1 0.67 0.14 1.97 Abs 0.76 0.17 2.59
SVR Abs-SG1 0.78 0.15 2.78 Abs-SG1 0.75 0.17 2.48 Abs-SG1 0.73 0.12 2.19 Abs-SG1 0.84 0.13 3.20
Cubist Abs-SG0-SNV 0.75 0.14 2.63 Abs-SG1-SNV 0.78 0.16 2.74 Abs-SG1-SNV 0.75 0.12 2.25 Abs-SG1-SNV 0.85 0.13 3.28
SBL Abs-SG1 0.80 0.14 2.92 Abs-SG0-SNV 0.79 0.16 2.84 Abs-SG1 0.79 0.11 2.48 Abs-SG1 0.86 0.13 3.37
MKL-S Abs-SG1 0.75 0.15 2.59 Abs-SG1 0.76 0.17 2.62 Abs-SG1 0.73 0.12 2.22 Abs-SG1 0.80 0.15 2.83
Best single spectral source + auxiliary predictors
PLS Abs 0.87 0.11 3.70 Abs 0.86 0.13 3.61 Abs-SG0 0.80 0.11 2.68 Abs 0.87 0.12 3.71
SVR Abs-SG1-SNV 0.89 0.10 4.08 Abs-SG1-SNV 0.88 0.12 3.87 Abs-SG1-SNV 0.85 0.09 3.07 Abs-SG1-SNV 0.92 0.10 4.71
Cubist Abs-SG1 0.91 0.09 4.51 Abs-SG1-SNV 0.91 0.11 4.39 Abs-SG1-SNV 0.87 0.09 3.26 Abs-SG1-SNV 0.92 0.09 4.85
SBL Abs-SG1 0.90 0.10 4.11 Abs-SG0 0.90 0.11 4.07 Abs-SG1 0.85 0.09 3.00 Abs-SG1 0.92 0.10 4.46
a
MKL-S Abs-SG1 0.91 0.09 4.34 Abs-SG1 0.89 0.12 4.00 Abs-SG1 0.85 0.09 3.01 Abs-SG1-SNV 0.92 0.10 4.46

Table 3
Time efficiency (mean runtime) of the proposed approach compared to the competing methodologies - the time format is in hh:mm:ss. For multiple-source
models the time reported refers to the development of all spectral sources plus their subsequent combination.

Model Grassland Woodland Cropland Mineral

Training Testing Training Testing Training Testing Training Testing

Best single spectral source


PLS 00:00:05.28 00:00:00.01 00:00:05.57 00:00:00.01 00:00:07.31 00:00:00.01 00:00:10.47 00:00:00.02
SVR 00:01:01.33 00:00:00.17 00:01:15.52 00:00:00.15 00:03:51.29 00:00:00.42 01:05:46.56 00:00:01.55
Cubist 00:01:32.77 00:00:01.27 00:01:43.38 00:00:01.31 00:03:41.66 00:00:03.08 00:10:05.95 00:00:13.01
SBL 00:02:10.75 00:00:08.67 00:02:34.46 00:00:10.15 00:05:22.81 00:00:19.50 00:16:00.99 00:00:51.73
MKL-S 03:19:05.21 00:00:00.21 03:42:38.51 00:00:00.23 15:29:13.10 00:00:00.59 77:12:41.34 00:00:01.15

MKL-MS 06:52:16.32 00:00:00.35 07:32:41.91 00:00:00.41 31:25:18.01 00:00:00.62 143:12:21.18 00:00:01.73

Best single spectral source + auxiliary predictors


PLS 00:00:05.39 00:00:00.01 00:00:06.01 00:00:00.01 00:00:07.45 00:00:00.01 00:00:10.75 00:00:00.02
SVR 00:01:02.33 00:00:00.18 00:01:16.40 00:00:00.15 00:03:59.12 00:00:00.44 01:08:01.86 00:00:02.12
Cubist 00:01:26.77 00:00:01.15 00:01:32.15 00:00:01.24 00:03:21.29 00:00:02.95 00:09:45.95 00:00:11.02
SBL 00:02:12.34 00:00:08.75 00:02:39.06 00:00:10.22 00:05:23.47 00:00:19.92 00:16:12.41 00:00:52.45
MKL-Sa 03:23:26.51 00:00:00.21 03:57:55.21 00:00:00.24 15:45:01.76 00:00:02.50 77:47:23.40 00:00:01.19

MKL-MSa 07:02:49.19 00:00:00.37 07:48:10.18 00:00:00.45 31:35:20.71 00:00:00.80 143:36:15.12 00:00:01.81

outperform the best models from Cubist and SBL alike across all form of performing sparse wavelength selection at the source level,
datasets, albeit by incurring a higher computational cost. For ex- and by shedding light into which of the heterogeneous sources are
ample, in the Grassland dataset, the MKL-MS approach achieved a more important than the others.
performance of RPIQ 3.06 and R2 0.82 compared to the second- Because this approach performs feature selection, and uses ap-
best model (SBL) whose accuracy was RPIQ 2.92 and R2 0.80; the proximately 30% of the available features, it understandably per-
MKL-MSa approach in the same dataset attained an accuracy of forms slightly worse than the next best models (namely Cubist
RPIQ 5.04 and R2 0.93, whilst the second-best (Cubist) was at RPIQ and SBL) that use all available wavelengths, when only single spec-
4.51 and R2 0.91. tral sources are used. However, the strength of the proposed ap-
At the same time, it should be noted that both Cubist and proach lies in the combination of spectral sources, which the other
SBL are not as interpretable as the MKL approach. Although Cu- models cannot effectively do. Whereas other approaches neglect
bist can potentially perform feature selection and produce simple the complementary information contained within different spectral
linear regression models in the form of rules, and thus provide in- sources, and only use the single best spectral source, this frame-
terpretable results, in reality it uses all available spectral features work can effortlessly integrate it and thus predict the target prop-
in the consequent part. Moreover, its good performance is mostly erty more accurately by taking advantage of all sources. As demon-
due to the use of committees (i.e. ensemble models) and of the strated, when all spectral sources were accounted for, the accuracy
error-correction mechanism, modules that further jeopardize its in- across the four data subsets compared to the single best source
terpretability. As far as the SBL is concerned, it performs well due was increased by RPIQ ≈ 15% and R2 ≈ 8%. In this case, the MKL
to its local nature; for each testing pattern a separate model is con- approach outperforms its counterparts.
structed using its spectral neighbors. Naturally, this does not allow At the same time, due to its kernel combination ability, the in-
for any interpretation of the models. tegration of auxiliary predictors and other heterogeneous sources
is also straightforward. We tested this ability by using the textu-
6. Discussion ral information of the soil samples, which indubitably enhanced
the accuracy of prediction. Across the single spectral sources, an
The novel three-level MKL approach presented herein has increase of RPIQ ≈ 44% and R2 ≈ 25%, proving that the use of het-
demonstrated its capacity to be successfully applied in soil spec- erogeneous was particularly beneficial to the model.
troscopy, as evidenced by its application in the LUCAS SSL and the Altogether, the best results were attained when all different
comparison with other state-of-the-art algorithms. This proposed sources, i.e. all spectral sources and the textural information,
framework produced the most accurate predictions, whilst at the were combined; the model then outperformed all other compet-
same time maintained a fair interpretability degree, mainly in the ing methodologies and attained noteworthy results. In particular,
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 39

in the largest Mineral dataset (containing all mineral soil samples [2] J.P. Scharlemann, E.V. Tanner, R. Hiederer, V. Kapos, Global soil carbon: under-
irrespective of their land use) the performance was RPIQ 5.17 and standing and managing the largest terrestrial carbon pool, Carbon Manag. 5
(1) (2014) 81–91, doi:10.4155/cmt.13.77.
R2 0.94. [3] H. Blanco-Canqui, R. Lal, Mechanisms of carbon sequestration in soil ag-
Compared to the current state-of-the-art the MKL approaches gregates, Critical Rev. Plant Sci. 23 (6) (2004) 481–504, doi:10.1080/
are considerably more time-consuming. This is due to the calcula- 07352680490886842.
[4] J. Baldock, J. Skjemstad, Role of the soil matrix and minerals in protecting
tion of multiple kernel matrices whose complexity is O (N 2 M ) and natural organic materials against biological attack, Organic Geochem. 31 (7-8)
of solving multiple quadratic optimization problems whose com- (20 0 0) 697–710, doi:10.1016/S0146-6380(0 0)0 0 049-8.
plexity is O (N 3 ). Therefore, SVR-based models do not scale well [5] B. Stenberg, R.A. Viscarra Rossel, A.M. Mouazen, J. Wetterlind, Visible and near
infrared spectroscopy in soil science, Adv. Agron. 107 (10) (2010) 163–215,
to the number of training patterns. However, the task at hand is
doi:10.1016/S0065-2113(10)07005-7.
not time critical and the model construction phase happens offline. [6] J.M. Soriano-Disla, L.J. Janik, R.a. Viscarra Rossel, L.M. Macdonald,
Moreover, the LUCAS SSL is the largest one to date and develop- M.J. McLaughlin, The performance of visible, near-, and mid-infrared
reflectance spectroscopy for prediction of soil physical, chemical, and
ing such libraries is a laborious multi-year effort that is costly in
biological properties, Appl. Spectrosc. Rev. 49 (2) (2014) 139–186,
time and resource. Consequently, the large training time is not a doi:10.1080/05704928.2013.811081.
limiting factor, as the interest lies in developing the most accurate [7] M. Nocita, A. Stevens, B. van Wesemael, M. Aitkenhead, M. Bachmann,
model possible. B. Barthès, E. Ben Dor, D.J. Brown, M. Clairotte, A. Csorba, P. Dardenne, J.A. De-
mattê, V. Genot, C. Guerrero, M. Knadel, L. Montanarella, C. Noon, L. Ramirez-
It should further be noted, that the same novel framework de- Lopez, J. Robertson, H. Sakai, J.M. Soriano-Disla, K.D. Shepherd, B. Stenberg,
scribed herein may be applied to combine other heterogeneous E.K. Towett, R. Vargas, J. Wetterlind, Soil spectroscopy: an alternative to wet
sources in a similar fashion. For example, spectra from the VNIR– chemistry for soil monitoring, Adv. Agron. 132 (2015) 139–159, doi:10.1016/bs.
agron.2015.02.002.
SWIR and the medium infrared (MIR) range where some of the [8] E. Ben-Dor, Quantitative remote sensing of soil properties, Adv. Agron. 75 (July)
fundamental vibrations take place, as detailed in [6], may be ap- (2002) 173–243, doi:10.1016/S0065- 2113(02)75005- 0.
propriately combined to yield better performance than the one at- [9] R.A. Rossel, T. Behrens, Using data mining to model and interpret soil diffuse
reflectance spectra, Geoderma 158 (1-2) (2010) 46–54, doi:10.1016/j.geoderma.
tained when the individual sources are used. Other pools of spec- 2009.12.025.
tral pre-treatments may be also used, which can potentially in- [10] Z. Shi, W. Ji, R.A. Viscarra Rossel, S. Chen, Y. Zhou, Prediction of soil organic
clude additional complementary information. matter using a spatially constrained local partial least squares regression and
the Chinese vis-NIR spectral library, Eur. J. Soil Sci. 66 (4) (2015) 679–687,
doi:10.1111/ejss.12272.
7. Conclusions [11] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, An evolutionary fuzzy rule-based
system applied to real-world Big Data - the GEO-CRADLE and LUCAS soil spec-
The proposed three-level MKL framework successively uses ker- tral libraries, in: Proceedings of the IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE), IEEE, 2018, pp. 1–8, doi:10.1109/FUZZ-IEEE.2018.8491489.
nel combinations at three different levels to efficiently combine
[12] R. Viscarra Rossel, T. Behrens, E. Ben-Dor, D. Brown, J. Demattê, K. Shep-
the information from the constituent kernels. It is able to per- herd, Z. Shi, B. Stenberg, A. Stevens, V. Adamchuk, H. Aïchi, B. Barthès,
form sparse feature selection, which can aid in the interpretation H. Bartholomeus, A. Bayer, M. Bernoux, K. Böttcher, L. Brodský, C. Du, A. Chap-
pell, Y. Fouad, V. Genot, C. Gomez, S. Grunwald, A. Gubler, C. Guerrero, C. Hed-
of the underlying processes. Moreover, it is the first model pre-
ley, M. Knadel, H. Morrás, M. Nocita, L. Ramirez-Lopez, P. Roudier, E.R. Cam-
sented that can readily combine the information present within pos, P. Sanborn, V. Sellitto, K. Sudduth, B. Rawlins, C. Walter, L. Winowiecki,
different spectral sources, originating from different spectral pre- S. Hong, W. Ji, A global spectral library to characterize the world’s soil, Earth-
treatments, at the learning stage. Finally, the use of heterogeneous Sci. Rev. 155 (February) (2016) 198–230, doi:10.1016/j.earscirev.2016.01.012.
[13] A. Orgiazzi, C. Ballabio, P. Panagos, A. Jones, O. Fernández-Ugalde, LUCAS Soil,
sources is supported, by incorporating auxiliary predictors to en- the largest expandable soil dataset for Europe: a review, Eur. J. Soil Sci. 69 (1)
hance the performance. The proposed framework may thus be ap- (2018) 140–153, doi:10.1111/ejss.12499.
propriately used in soil spectroscopy to derive more interpretable [14] M. Nocita, A. Stevens, G. Toth, P. Panagos, B. van Wesemael, L. Montanarella,
Prediction of soil organic carbon content by diffuse reflectance spectroscopy
and accurate models than the current state-of-the-art. using a local partial least square regression approach, Soil Biol. Biochem. 68
(2014) 337–347, doi:10.1016/j.soilbio.2013.10.022.
Declaration of Competing Interest [15] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, A fuzzy rule-based system uti-
lizing differential evolution with an application in vis-NIR soil spectroscopy,
Proceedings of the IEEE International Conference on Fuzzy Systems(2017).
The authors declare that they have no known competing finan- doi:10.1109/FUZZ-IEEE.2017.8015563.
cial interests or personal relationships that could have appeared to [16] N.L. Tsakiridis, J.B. Theocharis, G.C. Zalidis, DECO3RUM: A Differential Evolution
influence the work reported in this paper. learning approach for generating compact Mamdani fuzzy rule-based models,
Expert Syst. Appl. 83 (2017) 257–272, doi:10.1016/j.eswa.2017.04.026.
[17] N. Carmon, E. Ben-Dor, An advanced analytical approach for spectral-based
CRediT authorship contribution statement modelling of soil properties, Int. J. Emerg. Technol. Adv. Eng. 7 (2017) 90–
97.
Nikolaos L. Tsakiridis: Methodology, Formal analysis, Inves- [18] N.L. Tsakiridis, J.B. Theocharis, P. Panagos, G.C. Zalidis, An evolutionary fuzzy
rule-based system applied to the prediction of soil organic carbon from soil
tigation, Writing - original draft, Writing - review & edit- spectral libraries, Appl. Soft Comput. 81 (2019) 105504, doi:10.1016/j.asoc.2019.
ing. Christos G. Chadoulos: Methodology, Software, Investigation, 105504.
Writing - original draft. John B. Theocharis: Conceptualization, [19] Å. Rinnan, F. van den Berg, S.B. Engelsen, Review of the most common pre-
processing techniques for near-infrared spectra, TrAC Trends Anal. Chem. 28
Methodology, Resources, Supervision. Eyal Ben-Dor: Validation, In-
(10) (2009) 1201–1222, doi:10.1016/j.trac.20 09.07.0 07.
vestigation. George C. Zalidis: Project administration. [20] N.L. Tsakiridis, N.V. Tziolas, J.B. Theocharis, G.C. Zalidis, A GA-based stacking
algorithm for predicting soil organic matter from vis-NIR spectral data, Eur. J.
Acknowledgment Soil Sci. (2018), doi:10.1111/ejss.12760.
[21] N. Tziolas, N. Tsakiridis, E. Ben-Dor, J. Theocharis, G. Zalidis, A memory-based
learning approach utilizing combined spectral sources and geographical prox-
This research has been co-financed by the European Re- imity for improved VIS-NIR-SWIR soil properties estimation, Geoderma 340
gional Development Fund of the European Union and Greek na- (2019) 11–24, doi:10.1016/j.geoderma.2018.12.044.
tional funds through the Operational Program Competitiveness, En- [22] N.L. Tsakiridis, J.B. Theocharis, E. Ben-Dor, G.C. Zalidis, Using interpretable
fuzzy rule-based models for the estimation of soil organic carbon from
trepreneurship and Innovation, under the call RESEARCH - CREATE VNIR/SWIR spectra and soil texture, Chemometr. Intell. Laborat. Syst. 189
- INNOVATE (project code: T1EDK-02296). (2019) 39–55, doi:10.1016/j.chemolab.2019.03.011.
[23] A. Gholizadeh, N. Carmon, A. Klement, E. Ben-Dor, L. Borøuvka, Agricultural
References soil spectral response and properties assessment: effects of measurement pro-
tocol and data mining technique, Remote Sens. 9 (10) (2017) 1078, doi:10.
[1] FAO, ITPS, Status of the World’s Soil Resources (SWSR)- 3390/rs9101078.
Main Report., 2015. http://www.fao.org/documents/card/en/c/
c6814873- efc3- 41db- b7d3- 2081a10ede50/.
40 N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41

[24] A. Gholizadeh, M. Saberioon, N. Carmon, L. Boruvka, E. Ben-Dor, Examining the [57] B. Minasny, A.B. McBratney, A conditioned Latin hypercube method for sam-
Performance of PARACUDA-II data-mining engine versus selected techniques to pling in the presence of ancillary information, Comput. Geosci. 32 (9) (2006)
model soil carbon from reflectance spectra, Remote Sens. 10 (8) (2018) 1172, 1378–1388, doi:10.1016/j.cageo.20 05.12.0 09.
doi:10.3390/rs10081172. [58] J.C. Bezdek, R. Ehrlich, W. Full, FCM: The fuzzy c-means clustering algorithm,
[25] D.J. Brown, K.D. Shepherd, M.G. Walsh, M. Dewayne Mays, T.G. Reinsch, Global Comput. Geosci. 10 (2-3) (1984) 191–203, doi:10.1016/0 098-30 04(84)90 020-7.
soil characterization with VNIR diffuse reflectance spectroscopy, Geoderma 132 [59] V. Cherkassky, Y. Ma, Practical selection of SVM parameters and noise esti-
(3-4) (2006) 273–290, doi:10.1016/j.geoderma.2005.04.025. mation for SVM regression, Neural Netw. 17 (1) (2004) 113–126, doi:10.1016/
[26] V. Vapnik, Principles of risk minimization for learning theory, Adv. Neural Inf. S0893- 6080(03)00169- 2.
Process. Syst. (1992) 831–838. [60] S. Wold, H. Martens, H. Wold, The multivariate calibration problem in chem-
[27] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector istry solved by the PLS method, Matrix pencils (1981) (1983) 286–293, doi:10.
regression machines, in: Proceedings of the 9th International Conference on 10 07/BFb0 062108.
Neural Information Processing Systems, in: NIPS’96, MIT Press, Cambridge, MA, [61] J.R. Quinlan, Learning with continuous classes, Mach. Learn. 92 (1992)
USA, 1996, pp. 155–161. 343–348.
[28] C. Williams, C.E. Rasmussen, Gaussian processes for regression, Adv. Neural Inf. [62] J.R. Quinlan, Combining Instance-Based and Model-Based Learning, Mach.
Process. Syst. 8 (1996). Learn. 76 (1993) 236–243.
[29] F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple Kernel Learning, conic dual- [63] L. Ramirez-Lopez, T. Behrens, K. Schmidt, A. Stevens, J.A.M. Demattê,
ity, and the SMO algorithm, in: Proceedings of the Twenty-first International T. Scholten, The spectrum-based learner: A new local approach for modeling
Conference on Machine Learning - ICML ’04, ACM Press, New York, New York, soil vis-NIR spectra of complex datasets, Geoderma 195-196 (2013) 268–279,
USA, 2004, p. 6, doi:10.1145/1015330.1015424. doi:10.1016/j.geoderma.2012.12.014.
[30] M. Gönen, E. Alpaydin, Multiple Kernel Learning Algorithms, J. Mach. Learn. [64] V. Bellon-Maurel, E. Fernandez-Ahumada, B. Palagos, J.-M. Roger, A. McBratney,
Res. 12 (2011) 2211–2268. Critical review of chemometric indicators commonly used for assessing the
[31] A. Jain, S.V.N. Vishwanathan, M. Varma, SPG-GMKL: generalized Multiple Ker- quality of the prediction of soil attributes by NIR spectroscopy, TrAC Trends
nel Learning with a million kernels, Proceedings of the 18th ACM SIGKDD In- Anal. Chem. 29 (9) (2010) 1073–1081, doi:10.1016/j.trac.2010.05.006.
ternational Conference on Knowledge Discovery and Data Mining (KDD)(2012) [65] S. Sonnenburg, H. Strathmann, S. Lisitsyn, V. Gal, F.J.I. García, W. Lin, S. De, C.
750–758. doi:10.1145/2339530.2339648. Zhang, Frx, Tklein23, E. Andreev, JonasBehr, Sploving, P. Mazumdar, C. Widmer,
[32] M. Gönen, E. Alpaydin, Localized algorithms for multiple kernel learning, Pat- P.D. Zora, G.D. Toni, S. Mahindre, A. Kislay, K. Hughes, R. Votyakov, Khalednasr,
tern Recogn. 46 (3) (2013) 795–807, doi:10.1016/j.patcog.2012.09.002. S. Sharma, A. Novik, A. Panda, E. Anagnostopoulos, L. Pang, A. Binder, Serial-
[33] J. Moeller, S. Swaminathan, S. Venkatasubramanian, J. Moeller, S. Swaminathan, hex, B. Esser, shogun-toolbox/shogun: Shogun 6.1.0, 2017, doi:10.5281/zenodo.
S. Venkatasubramanian, J. Moeller, S. Swaminathan, S. Venkatasubramanian, 1067840.
J. Moeller, S. Swaminathan, S. Venkatasubramanian, A unified view of localized [66] Z. Shi, Q.L. Wang, J. Peng, W.J. Ji, H.J. Liu, X. Li, R.a. Viscarra Rossel, Devel-
kernel learning, in: Proceedings of the 2016 SIAM International Conference on opment of a national VNIR soil-spectral library for soil classification and pre-
Data Mining, 2016, pp. 252–260, doi:10.1137/1.9781611974348.29. diction of organic matter concentrations, Sci. China Earth Sci. 57 (7) (2014)
[34] J. Kandola, J. Shawe-Taylor, N. Cristianini, Optimizing kernel alignment over 1671–1680, doi:10.1007/s11430- 013- 4808- x.
combination of kernels, Adv. Neural Inf. Process. Syst. (NIPS) (2002). [67] E. Ben-Dor, Y. Inbar, Y. Chen, The reflectance spectra of organic matter in the
[35] T. Wang, D. Zhao, S. Tian, An overview of kernel alignment and its applications, visible near-infrared and short wave infrared region (40 0-250 0 nm) during a
Artifi. Intell. Rev. 43 (2) (2012) 179–192, doi:10.1007/s10462- 012- 9369- 4. controlled decomposition process, Remote Sens. Environ. 61 (1) (1997) 1–15,
[36] J. Bao, Y. Chen, L. Yu, C. Chen, A multi-scale kernel learning method and its doi:10.1016/S0 034-4257(96)0 0120-4.
application in image classification, Neurocomputing 257 (2017) 16–23, doi:10. [68] A. Stevens, M. Nocita, G. Tóth, L. Montanarella, B. van Wesemael, Prediction
1016/j.neucom.2016.11.069. of Soil Organic Carbon at the European Scale by Visible and Near InfraRed
[37] Y. Gu, Q. Wang, X. Jia, J.A. Benediktsson, A Novel MKL model of integrating Reflectance Spectroscopy, PLoS ONE 8 (6) (2013) e66409, doi:10.1371/journal.
LiDAR data and MSI for Urban Area Classification, IEEE Trans. Geosci. Remote pone.0066409.
Sens. 53 (10) (2015) 5312–5326, doi:10.1109/TGRS.2015.2421051. [69] E.T. Elliott, Aggregate Structure and Carbon, Nitrogen, and Phosphorus in Na-
[38] X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial tive and Cultivated Soils, Soil Sci. Soc. Am. J. 50 (3) (1986) 627, doi:10.2136/
distress prediction of Chinese listed companies, Neurocomputing 177 (2016) sssaj1986.036159950 050 0 0 030 017x.
636–642, doi:10.1016/j.neucom.2015.11.078.
[39] Z. Zheng, H. Sun, G. Zhang, Multiple kernel locality-constrained collaborative Nikolaos L. Tsakiridis received the B.S. and M.S. degrees
representation-based discriminant projection for face recognition, Neurocom- in electrical and computer engineering from the Aristotle
puting 318 (2018) 65–74, doi:10.1016/j.neucom.2018.08.032. University of Thessaloniki, Thessaloniki, Greece, in 2014.
[40] Yi-Ren Yeh, Ting-Chu Lin, Yung-Yu Chung, Y.-C.F. Wang, A novel multiple ker- He is currently pursuing the Ph.D. degree at the Depart-
nel learning framework for heterogeneous feature fusion and variable selec- ment of Electrical and Computer Engineering, Aristotle
tion, IEEE Trans. Multimed. 14 (3) (2012) 563–574, doi:10.1109/TMM.2012. University of Thessaloniki. His research interests include
2188783. fuzzy systems, evolutionary algorithms, soil spectroscopy,
[41] Y. Ding, J. Tang, F. Guo, Identification of drug-side effect association via multi- remote sensing, and big data analysis.
ple information integration with centered kernel alignment, Neurocomputing
325 (2019) 211–224, doi:10.1016/j.neucom.2018.10.028.
[42] Y. Wang, X. Liu, Y. Dou, Q. Lv, Y. Lu, Multiple kernel learning with hybrid ker-
nel alignment maximization, Pattern Recogn. 70 (2017) 104–111, doi:10.1016/j.
patcog.2017.05.005.
[43] R. Tomioka, T. Suzuki, Sparsity-accuracy trade-off in MKL(2010) 3–10.
Christos G. Chadoulos received the B.S. and M.S. degrees
[44] M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien, lp-norm multiple kernel learning,
in electrical and computer engineering from the Aristotle
J. Mach. Learn. Res. 12 (2011) 953–997.
University of Thessaloniki, Thessaloniki, Greece, in 2017.
[45] G. Tóth, A. Jones, L. Montanarella, LUCAS Topsoil Survey: Methodology, Data,
He is currently pursuing the Ph.D. degree at the Depart-
and Results, EU publications, 2013, doi:10.2788/97922.
ment of Electrical and Computer Engineering, Aristotle
[46] V.N. Vapnik, Statistical Learning Theory, 1st ed., Wiley, New York, New York,
University of Thessaloniki. His research interests include
USA, 1998.
computer vision, image analysis, and machine learning.
[47] B. Schölkopf, Learning with kernels, J. Electrochem. Soc. 129 (November)
(2002) 2865, doi:10.1198/jasa.2003.s269.
[48] C.S. Ong, A. Smola, B. Williamson, Learning the Kernel with Hyperkernels, J.
Mach. Learn. Res. 6 (2005) 1043–1071.
[49] J. Aflalo, a. Ben-Tal, C. Bhattacharyya, J.S. Nath, S. Raman, Variable Sparsity Ker-
nel Learning, J. Mach. Learn. Res. 12 (2011) 565–592.
[50] N. Cristianini, J. Kandola, A. Elisseeff, J. Shawe-Taylor, On kernel-target align-
ment, Adv. Neural Inf. Process. Syst. 14 (2002) 367–373. doi:10.1.1.23.6757. John B. Theocharis (M’90) received the degree in elec-
[51] J. Kandola, J. Shawe-Taylor, N. Cristianini, On the Extensions of Kernel Align- trical engineering and the Ph.D. degree from the Aristotle
ment, Technical Report, 2002. University of Thessaloniki, Thessaloniki, Greece, in 1980
[52] C. Cortes, M. Mohri, A. Rostamizadeh, Algorithms for Learning Kernels Based and 1985, respectively. He is currently a Professor in the
on Centered Alignment, J. Mach. Learn. Res. 13 (2012) 795–828. Department of Electrical and Computer Engineering, Aris-
[53] T. Joachims, Advances in Kernel Methods, MIT Press, Cambridge, MA, USA, totle University of Thessaloniki. His research activities in-
1999, pp. 169–184. clude fuzzy systems, neural networks, evolutionary algo-
[54] Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, Bernhard Schölkopf, Large rithms, pattern recognition and image analysis. He has
Scale Multiple Kernel Learning, J. Mach. Learn. Res. 7 (2006) 1531–1565. published numerous papers in several application areas
[55] A. Karatzoglou, A. Smola, K. Hornik, A. Zeileis, kernlab – an S4 package for such as neuro-fuzzy modeling, power demand and wind
kernel methods in R, J. Stat. Softw. 11 (9) (2004) 1–20. speed prediction, land cover classification and segmenta-
[56] IUSS Working Group WRB, World reference base for soil resources 2014. In- tion from remotely sensed images. Recently his research
ternational soil classification system for naming soils and creating legends for is focused on addressing challenges in soil-spectroscopy
soil maps, 2014, doi:10.1017/S0014479706394902. and medical imaging using machine learning and deep learning techniques.
N.L. Tsakiridis, C.G. Chadoulos and J.B. Theocharis et al. / Neurocomputing 389 (2020) 27–41 41

Eyal Ben-Dor received the M.Sc. and Ph.D degrees in George C. Zalidis received the B.S. degree in agriculture
Soil Science from the Faculty of Agriculture, the Hebrew from Aristotle University of Thessaloniki, Thessaloniki,
University of Jerusalem in 1986 and 1992 respectively. Greece, in 1980, and the Ph.D. degree in soil physics from
Currently he is serving as the chair of the Geography Michigan State University, East Lansing, MI, USA in 1987.
Department of Tel Aviv University and the head of the Currently, he is a Professor of Soil Pollution and Degrada-
Remote Sensing Laboratory (RSL) at this department. His tion with the Laboratory of Remote Sensing, Spectroscopy,
researches are focused on monitoring the earth from and Geographic Information systems, in the Faculty of
space and air as well as on developing innovative tools Agronomy, of the Aristotle University of Thessaloniki. His
to monitor soils and minerals from all domains. He was research interests include soil quality and sustainability,
a pioneer scientist who opened up in the last decade of bio-remediation of degraded areas, restoration and reha-
the 20th century the field of soil proximal sensing using bilitation of wetland ecosystems, wetland inventory, and
spectral information in the reflective spectral domain. He mapping.
has more than 27 years’ experience in remote sensing of
the Earth with a special emphasis on the hyperspectral remote sensing (HSR), soil
spectroscopy (passive and active) and environmental issues. He developed many
quantitative applications for monitoring of soils from reflectance information and
is the owner of 4 patents in this field.

You might also like