Soil & Tillage Research: Yong Liu, Huifeng Wang, Hong Zhang, Karsten Liber

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Soil & Tillage Research 155 (2016) 19–26

Contents lists available at ScienceDirect

Soil & Tillage Research

journal homepage:

A comprehensive support vector machine-based classification model

for soil quality assessment
Yong Liua,* , Huifeng Wanga , Hong Zhanga , Karsten Libera,b
Institute of Loess Plateau, Shanxi University, Taiyuan, Shanxi, China
Toxicology Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada


Article history: Soil quality is defined here as the capacity of soil to have biological function, to sustain plant and animal
Received 27 April 2015 production, to maintain or enhance water and air quality and to support human health and habitation.
Received in revised form 11 July 2015 There are different soil quality assessment models based on diverse methods and data, but none of the
Accepted 14 July 2015
models can fully meet all purposes. The selection of an appropriate soil classification model therefore
becomes an important aspect in soil quality assessment. This paper presents a new comprehensive
Keywords: support vector machine-based classification model for classification of urban soil quality and then uses
Soil quality assessment
that model to assess the soil quality of Taiyuan relative to Chinese environmental quality standards and
Support vector machine
Comprehensive classification model
regional background values. The results indicated that the support vector machine-based soil quality
Heavy metal contamination model combined soil heavy metal contamination and soil fertility data satisfactorily, with an accuracy of
Soil fertility 98.3333%. The soil quality of Taiyuan was subsequently divided into five classes (IA, IB, IC, IIA and IIB).
Fifty percent of all samples were classified as class IB, indicating that soil quality within the study area
was good. This paper shows that a comprehensive support vector machine-based classification model is
feasible and reliable for soil quality assessment. Furthermore, the assessment presented could provide
references for related ecological problems.
ã 2015 Elsevier B.V. All rights reserved.

1. Introduction and soil sustainability. Urban soil environmental quality has

often been impacted by a substantial amount of contaminations,
With the rapid industrialization and urbanization that has including heavy metals, originating from different sources. Heavy
occurred in most cities of the world in recent decades, there have metals do not degrade by biological or biochemical processes,
been increasing concern for and attention on soil quality. The main although their speciation may change, become bioaccumulated
threats to soils are identified as a decline in organic matter, and in a few cases, biomagnified along the food chain (Liao et al.,
increased soil erosion, compaction and salinization and an 2006). Hence, heavy metals in soils are widely considered to
increased probability of floods, landslides, contamination, acidifi- potentially have unfavorable environmental impacts, as well as
cation and sealing (Montanarella and Rusco, 2008). Urban soil is an affect human health and prosperity (e.g., economic losses, safety
important component of urban ecosystems, and it provides a of agricultural products and endangerment of human health,
medium and nutrients to urban landscape plants and crops. It is especially children) (Guo et al., 2012; Praveena et al., 2014). In
also a sink and source of urban contamination and thus can affect many cases, urban soil productivity has been impacted by
urban eco-environmental quality and human health (Cachada anthropogenic factors such as fertilization and wastewater
et al., 2012; Praveena et al., 2014). Urban soil quality is influenced irrigation, which increase soil fertility (Bhaduri and Purakayastha,
by both natural and anthropogenic factors, but humans play the 2014). Soil fertility in turn can affect vegetation composition,
most influential role in altering the performance characteristics of plant community biomass and plant functional group biomass.
soils by mining, industry, agriculture, waste treatment and Many researchers and practicing farmers have also observed that
transportation (Karim et al., 2014; Yu et al., 2014). fertile soil with high soil organic matter content and diversity of
The assessment of soil quality is generally based on a soil microbiota enhances vegetation health through various
combination of soil environmental quality, soil productivity processes (Zheng et al., 2015). Urban soil also provides nutrients
for both crops and ornamental plants. Thus, soil fertility is
another important parameter to assess to determine urban soil
* Corresponding author. productivity and sustainability.
E-mail address: (Y. Liu).
0167-1987/ ã 2015 Elsevier B.V. All rights reserved.
20 Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26

The risk assessment methods used to evaluate soil heavy metal 2010; Wang, 2005). Many classification algorithms are based on an
contamination and soil fertility assessment are diverse (Bhuiyan independence assumption and are thus greatly influenced by the
et al., 2010; Li et al., 2014a; Saby et al., 2009). Traditional methods correlation among characteristics, but SVMs are not sensitive to
of soil heavy metal contamination assessment include the single- this. SVMs obey the structural risk minimization principle (SRMP),
factor index and the integrated pollution index (Xu and Hu, 2014), which has been shown to be superior to many other modeling
the analytic hierarchy process (AHP) (Chen et al., 2012), the fuzzy techniques obeying the traditional Empirical Risk Minimization
comprehensive evaluation method (Fu et al., 2014; Hui, 2013), and Principle (ERMP) (Araghinejad, 2014). This technique has been
the geo-accumulation index method (Akbulut et al., 2013). The proven to have superior performances in addressing various
geo-accumulation index method is recommended by the U.S. problems due to its generalization abilities, robustness against
Environmental Protection Agency and widely used by scholars. The noise and interferences (Steinwart and Christmann, 2008) and its
traditional methods of soil fertility assessment include the analytic computational efficiency compared with several other methods,
hierarchy process (AHP) (Zhou et al., 2009), the fuzzy comprehen- such as neural networks and fuzzy networks (Wang, 2005; Were
sive evaluation method (Fang, 2012) and the gray correlation et al., 2015). The published literature has shown that although the
analysis method (Zhang et al., 2009). While many researchers are SVM method has been used to address many environmental
concerned with the assessment of soil heavy metal contamination problems (Jiang et al., 2014; Aryafar et al., 2012), it has rarely been
or soil fertility (Taylor et al., 2010; Teng et al., 2014), it is rare that used in research on soil quality assessment.
research combines two approaches to assess soil quality. This is The aims of this study were (1) to develop a comprehensive
possibly because there is a complex nonlinear relationship method to assess soil quality; (2) to explore the application of the
between soil heavy metal content and soil fertility and because SVM method in the classification of heavy metal levels in soils; (3)
traditional methods do not perform well in addressing with the to assess soil quality by using the SVM method in combination with
complex nonlinearity. Additionally, because the weights in the levels of soil heavy metal contamination and soil fertility; (4) to
assessment indices are artificially set in soil assessments, the identify the distribution of heavy metals and the pattern in soil
results obtained by traditional methods often lack reliability, quality in Taiyuan city; and (5) to describe correlation among
objectivity and currency (Jiang et al., 2014). metals and other parameters in soils.
Machine learning algorithms, such as artificial neural networks Therefore, in this paper, we present a comprehensive soil
(ANNs), k-means algorithms, genetic algorithms, decision trees, quality assessment model based on SVM methodology for Taiyuan
support vector machines (SVMs) and multiple linear regression city, which could be applied in other areas when evaluating soil
methods have been used to advance classification and forecasting heavy metal contamination and soil fertility, etc. Furthermore, this
in recent years. Among these methods, ANNs have the most paper provides useful information regarding soil quality and soil
complex mathematical structure and can simulate human learning management in Taiyuan.
and pattern recognition (Were et al., 2015). However, due to the
lack of theoretical results from a statistical perspective, as well as 2. Materials and methods
the low interpretability of this class of black-box models, some
alternative strategies have been considered (Poggi and Portier, 2.1. Area description
As a more robust statistical method, SVMs are now widely used Taiyuan city is located on the east edge of the Loess plateau and
for environmental assessment with satisfactory performance at the center of Shanxi Province with heavy industries (Fig. 1). It is
(Aryafar et al., 2012; Jiang et al., 2014). The SVM method is an heavily industrialized with structurally complex factories concen-
artificial intelligence machine learning theory introduced in the trated in specific areas. Large volumes of waste products and
early 1990s as a non-linear solution for regression and classifica- emissions are discharged to the local environment and have
tion tasks (Behzad et al., 2009; Vapnik, 1995). It relies on the resulted in the accumulation of heavy metals in soils, especially
statistical learning theory or VC theory, which enables learning soils in the suburbs. Previous studies have shown that the heavy
machines to find important support vector information using a metals Hg, Cd and Pb have been enriched in the surface soil of
very small number of parameters (Bishop, 2006; Kova9 cevic et al., Taiyuan (Li et al., 2004). In addition, soil contamination in the

Fig. 1. Location of research area.

Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26 21

southern region of the city is more serious than in the northern furnace atomic absorption spectroscopy (GB/T 17141-1997), with a
region (Wang et al., 2008). method detection limit of 0.09 mg/kg. Flame atomic absorption
The topography of Taiyuan resembles like a dustpan. The central spectroscopy was used to determine Pb, Cr, Cu, Zn, Ni content of the
and south regions of the area are comprised of the alluvial plain of soil (GB/T 17140-1997, HJ 491-2009); their method detection
the Fen River. The south side is higher than the north and the Fen limits were 0.2 mg/kg, 5 mg/kg, 1 mg/kg, 0.5 mg/kg and 5 mg/kg,
River runs through the whole city from north to south. The total respectively.
area of the city is 6988 km2. The highest elevation is 2670 m and Several additional soil parameters were also quantified in all
the lowest elevation is 760 m. Annual rainfall in the area varies soil samples. Total nitrogen (TN) was determined by the Kjeldahl
from 317 to 395 mm per year. method (NY/T 53-1987), available nitrogen (AN) was determined
by the alkaline hydrolysis diffusion method, available phosphorus
2.2. Sampling procedures (AP) was determined by the Olsen method, available potassium
(AK) was determined by the ammonium acetate method (NY/T
The study was conducted in the suburban area of Taiyuan city, 889-2004), and organic matter (OM) content was determined by
except in the west and east of Taiyuan. The west, east and parts of heating potassium dichromate in an oil bath (NY/T 1121.6-2006).
north Taiyuan are highly mountainous; therefore, these areas Soil pH was measured using a pH meter (NY/T 1121.2-2006).
could not be sampled. Based on the above consideration, the study To ensure the accuracy and precision of the analysis, quality
area was divided into 1 1 km grids and sampled following a assurance (QA) protocol was followed through the use of reagent
stochastic strategy. Soil samples of approximately 1 kg were blanks, analytical duplicates and standard reference materials
obtained using a stainless-steel spade from the upper 15 cm of the (GBW07424, GBW07429) in the digestion and determination. The
soil profile depth in April 2013, and they included 140 soil samples recovery rates for metals in the standard reference materials were
distributed throughout the city (Fig. 2). Soil samples were stored in approximately 96–102%.
polyethylene plastic bags for transportation and storage subse-
quently air-dried at room temperature and then sieved through a
2-mm mesh sieve. Then, each soil sample was further sieved 2.4. Assessment methods
through a 0.150-mm plastic mesh sieve to focus on metals that are
highly reactive with other molecules (Kelepertzis, 2014). 2.4.1. Assessment standard
Following the Chinese environmental quality standards for soil
2.3. Chemical determinations (GB15618-1995), the heavy metal contents were divided into three
grades: I, II and III (Table 1). Grade I represents the average natural
Analytical procedures were adapted from the standard operat- background metal levels for uncontaminated soils in China, and
ing procedures specified in the State Environmental Protection Grade II represents the metal levels that are hazardous to
Administration of China methodology guidelines. For analysis of As agricultural production and human health. Grade III represents
and Hg, soil digestion was completed using a mixture of HNO3 and the maximum allowable concentrations for agricultural produc-
HCL (aqua regia). For analysis of Cr, Cd, Cu, Zn, Ni and Pb, soil tion and plant growth.
samples were digested by adding HNO3, HCL, HF and HCLO4 in that According to the classification of type regions and fertility of
order. Atomic Fluorescence Spectrometry was used to quantify Hg cultivated land in China (NY/T 309-1996), the parameters
(GB/T 22105.1-2008) and As (GB/T 22105.2-2008) of the soil, with accounting for soil nutrition were divided into three levels: A, B
method detection limits of 0.002 mg/kg and 0.01 mg/kg, respec- and C (Table 1). The conditions of soil fertility in level A are
tively. Determination of Cd was accomplished using graphite considered to be the most fertile soils. The conditions in level B are

Fig. 2. Sampling setting.

22 Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26

Table 1 overfitting problems. The basic idea of SVM can be summarized

Chinese environmental quality standard for soils.
as the follows.
Heavy metal items Grade I Grade II Grade III Suppose that the experimental dataset {xi,yi}, i = 1,...,l is
Cu (mg/kg) (0,35] (35,100] (100,400] composed of an instance space x 2 Rn and the label set y = {1,1}
Zn (mg/kg) (0100] (100,300] (300,500] (Lapin et al., 2014). SVM solves the binary classification problem by
Ni (mg/kg) (0,40] (40,100] (100,200]
finding a hyperplane wT ’ðxÞ þ b ¼ 0, which implements the idea of
Cr (mg/kg) (0,90] (90,350] (350,400]
Pb (mg/kg) (0,35] (35,80] (80,500]
simultaneously minimizing the empirical classification error and
Cd (mg/kg) (0,0.200] (0.200,0.800] (0.800,1] maximizing the geometric margin (Li et al., 2014b). The hyperplane
Hg (mg/kg) (0,0.150] (0.150,1] (1,1.500] is determined by solving the following convex quadratic
As (mg/kg) (0,15] (15,25] (25,40] programming:
Fertility items Level A Level B Level C
1 X l

TN (%) (0.150,+1) (0.075,0.150] (0,0.075] min k w k2 þ C ji s:t:yi ðwT wðxi Þ þ bÞ  1  ji ; ji  0; i ¼ 1; :::; l

2 i¼1
AN (mg/kg) (120,+1) (60,120] (0,60]
AP (mg/kg) (20,+1) (5,20] (0,5] where ’ðÞ is a map from the input space to a feature space, and w, b
AK (mg/kg) (150,+1) (50,150] (0,50]
and ji are the parameters that should be optimized in the training
OM (%) (3,+1) (1,3] (0,1]
The previous optimizing problem can be solved in the
Lagrangian form:
considered to be moderately fertile soils, and the conditions in
1X l X l X l
level C are considered to be the most barren soils. max  yi yj ai aj Kðxi ; xj Þ þ aj
Because the enrichment and mobility of heavy metals in soils 2 i¼1 j¼1 j¼1
may cause serious harm to humans (Franco-Uría et al., 2009), we
assign priority to the heavy metal content of soil when assessing s:t: yi ai ¼ 0; 0  ai  C; i ¼ 1; :::; l
overall soil quality. In addition, we consider that the higher i¼1
nutrient content in soil is, the better the soil quality is. Based on where ai are the Lagrange multipliers and K(xi, xj) = <’(xi), ’(xj)> is
these assumptions, the quality of Taiyuan city soil were recorded as the kernel function.
either IA, IB, IC, IIA, IIB, IIC, IIIA, IIIB and IIIC. IA indicates the best The optimal boundary is then determined by the support vector
soil quality, whereas IIIC refers to the worst soil quality. expansion:
2.4.2. SVM method Xl

As one of the most best binary classifier methods, SVM is f ðxÞ ¼ sgn ai yi Kðxi ; xÞ þ b

applied in a wide variety of fields (Hastie et al., 2004; Li et al., 2014
b). It was first introduced by Vapnik (1995); it is based on the where ai is the support vector and b* is the bias, which can be
structural risk minimization principle and can overcome calculated from the Karush–Kuhn–Tucker conditions.

Soil Heavy Metal Chinese standard: I, II, III


Soil Heavy Metal experiment samples (600)

Training samples Test samples

(480) (120) Soil Fertility Chinese standard: A, B, C
Optimization Verification Interpolation
SVM model Soil Fertility experiment samples (600)

Soil Samples Input A trained SVM for SHM

of Taiyuann classification (SHM-SVM) Training samples Test samples
(140) (480) (120)
Optimization Verification
Soil grade Soil grade Soil grade
SVM model

Input Input Input

A trained SVM for SF classification (SF-SVM)


Soil class Soil class Soil class Soil class Soil class Soil class Soil class Soil class Soil class

Fig. 3. The technical flowchart of soil quality comprehensive assessment model.

The number in parentheses () refers to the number of samples.
Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26 23

2.4.3. Algorithm and technical flowchart 1.124, respectively, indicating high variation (Nielsen and Bouma,
The algorithm of soil quality comprehensive assessment model: 1985). High variation in AP may result from various planting
The comprehensive soil quality classification methodology using patterns and the amount of fertilizer used in different areas. The
heavy metal and fertility based on SVM (SQ-SVM). high variation in Hg concentrations may be the result of different
Input: Chinese standard of soil heavy metals (SHM–Chinese- sources of contamination. Comparing the heavy metal concen-
standard), Chinese standard of soil fertility (SF-Chinese-standard) trations of the Taiyuan soils with the Chinese standards (GB15618-
and the detected concentrations of heavy metals and soil fertilities 1995) and the background values for Taiyuan city, the mean
of 140 soil-samples in Taiyuan. concentration of each of the heavy metals in each sample was
Output: Soil quality comprehensive assessment/classification, below CS I (Chinese standard I), except for Cd, which showed only a
nine classes (IA, IB, IC, IIA, IIB, IIC, IIIA, IIIB and IIIC). light level of metal pollution in the soil environment within the
Step 1: Construct the SHM (soil heavy metal) experimental research area.
dataset fðxi ; yi Þgpi¼1 , where xi 2 Rn ; yi 2 ½1; 2; 3; p ¼ 600 is the
number of the experiment data, by interpolating the SHM– 3.2. Correlation analyses
Step 2: Select 80% SHM data as training sets randomly and the Correlation analysis was performed using R 3.0.2. The correla-
remainder are considered as the test sets. tion matrix of soil variables based on Pearson correlation
Step 3: Train the SVM using the training dataset and verify the coefficients is presented in Fig. 4. The significance levels were
validity of the model using the test dataset. The result is a trained 0.01 (**) and 0.05(*). The correlations clearly showed the relevance
SVM model for SHM classification (SHM–SVM). of all of the measured soil parameters. The pie charts with blue and
Step 4: Input the measured heavy metal concentrations in the positive rotation expressed positive correlations, whereas the pie
140 samples of Taiyuan soil into the previously trained SHM–SVM charts with red and reverse rotation expressed negative correla-
model and obtain the classification results for 140 soil samples tions. Furthermore, the deeper the color is, the stronger the
(grade I, grade II and grade III). correlation. Correlation coefficients can be found in the block
Step 5: Construct the SF (soil fertility) experimental dataset diagram in the lower left corner of the figure.
fðxi ; yi Þgqi¼1 , where xi 2 Rm ; yi 2 ½A; B; C; q ¼ 600 is the number of From Fig. 4, it can be seen that soil Ph was significantly
the experimental data, by interpolating the SF-Chinese-standard. negatively correlated with most elements in the soils. On the other
Step 6: Randomly select 80% of the SF data as the training sets hand, soil organic matter (OM) was significantly positively
and the remainder are considered as the test data sets. correlated with all nutrients and all heavy metals, except for Ni
Step 7: Train and verify the SVM using the SF experimental data. and As. Organic matter not only determines soil productivity but
The outcome is a trained SVM model for SF classification (SF-SVM). also influences heavy metal adsorption and mobility in soil (Li,
Step 8: Input the measured soil fertility data for the soil samples 2004). The concentrations of all metals were positively correlated
respectively into the trained SF-SVM model, and it will produce the except for As, which was positively or negatively correlated with
classification results (i.e., IA, IB, IC, IIA, IIB, IIC, IIIA, IIIB and IIIC). the other metals. In addition, As was significantly negatively
The technical flowchart of the algorithm is presented in Fig. 3. correlated with all nutrients and Hg. Ni was significantly negatively
The realization of the algorithm is performed with Matlab 2011b. correlated with OM and TN. Cd and Pb were significantly positive
correlated with all of the nutrients. Because there are important
correlations between soil heavy metal contaminations and soil
3. Results nutrient status, comprehensive assessments of soil quality should
consider both of these assessment categories.
3.1. Descriptive statistical analyses
3.3. SVM-based classification model for heavy metal contamination
Descriptive statistical analyses were performed with R version assessment
3.0.2. The descriptive statistics for the 140 soil samples are listed in
Table 2. The coefficient of variation (C.V) value for soil pH was According to the SHM–SVM algorithm and technical flowchart,
0.022, which indicated very low variation. Soil pH has often been the analysis of soil heavy metal content was performed in Matlab.
found to be less variable than other soil properties (Bai and Wang, The kernel function chosen for the SHM–SVM was the Gaussian
2011; Liu et al., 2013). The C.V for soil Hg and AP were 1.085 and kernel function, based on the research of Dibike et al. (2001) and

Table 2
Descriptive statistics of soil elements.

Items Total Mean SD C.V Maximum Minimum CS I CS II BGV

pH 140 7.754 0.169 0.022 8.180 7.080 – – –
Total N(%) 140 0.121 0.058 0.479 0.445 0.026 – – –
O.M.(%) 140 2.199 1.959 0.891 13.420 0.140 – – –
AN(mg/kg) 140 97.749 35.686 0.365 238.480 32.310 – – –
AP(mg/kg) 140 19.236 21.630 1.124 166.710 2.530 – – –
AK(mg/kg) 140 172.314 69.751 0.405 430.000 55.000 – – –
Cu(mg/kg) 140 29.512 9.703 0.329 74.820 15.280 35 100 22.900
Zn(mg/kg) 140 89.187 24.163 0.271 179.060 52.870 100 300 63.500
Ni(mg/kg) 140 29.864 4.635 0.155 47.420 16.840 40 100 29.900
Cr(mg/kg) 140 74.457 20.528 0.276 202.000 39.980 90 250 55.300
Pb(mg/kg) 140 27.310 8.469 0.310 68.430 16.560 35 80 14.700
Cd(mg/kg) 140 0.244 0.138 0.566 0.924 0.040 0.2 0.8 0.102
Hg(mg/kg) 140 0.094 0.102 1.085 0.801 0.012 0.15 1.5 0.023
As(mg/kg) 140 10.725 2.332 0.217 18.080 4.760 15 25 9.100

SD = Standard deviation; C.V = Coefficient of variation; CS I = The average background value of soil heavy metals in China (GB15618-1995); CS II = The value of soil heavy metals
in China for protecting agricultural production and human health (GB15618-1995); BGV = Background value of soil heavy metals in Taiyuan.
24 Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26

Fig. 4. The correlation matrix of soil variables. (For interpretation of the references to color in the text, the reader is referred to the web version of this article.)
*P < 0.05, **P < 0.01

Fig. 5. Parameters option contour map of SHM–SVM.

Han and Cluckie (2004) who indicated the Gaussian radial basis
function has superior efficiency compared to other Kernel
functions. A cross validation method was used to optimize the
model by searching for better parameters. Fig. 5 shows the
parameter option contour map from the SHM–SVM. The best cost Fig. 7. The spatial distribution of heavy metal grades in Taiyuan.
parameter (C) was 0.00097656, and the optimal parameter of the
kernel function (g) was 2. The SHM–SVM had an accuracy of grade I, and 17.86% were classified as grade II. This suggests that soil
98.5417% in classifying the degree of soil heavy metal contamina- heavy metal contamination was not serious in Taiyuan.
tion in Taiyuan. The results also showed that the samples were
divided unevenly into the two grades, with 115 in grade I and 25 in
grade II. Fig. 6 shows 82.14% of the soil samples were classified as

Fig. 6. Percent of soil samples in each soil heavy metal grade. Fig. 8. Parameter option contour map from SQ-SVM.
Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26 25

The spatial distribution of soil quality classes in Taiyuan is

shown in Fig. 10. Generally the soil quality in the north of Taiyuan is
better than that in the south. Most soil samples were classified as
class IB, except in the middle east of Taiyuan. The worst soil quality
in Taiyuan (IIB) occurs in the southeastern part of the city.

4. Discussion and conclusions

This paper presented an innovative SVM-based classification

model that linked the soil characteristics and our understanding of
their meaning to quantify soil quality in a relatively scientific way.
The SHM–SVM model, when applied to 140 soil samples collected
from across Taiyuan city, with an accuracy of 98.5417%, showed
that 115 samples belonging to grade I, accounting for 82.14% of the
Fig. 9. Percent of soil samples in each soil quality classification. total number of samples; the remaining 25 samples were classified
as grade II. Overall, these results suggest that heavy metal
The spatial distribution of soil heavy metal grades in Taiyuan is contamination in Taiyuan soils is not serious, but it is higher than
shown in Fig. 7. As shown, the Grade I soils were distributed in all the background values found in many locations, mainly in the
Taiyuan districts. Grade II soils, which indicate a relatively high southern part of Taiyuan, which correspond to the sewage
degree of contamination were mainly distributed in the southern irrigation zone. The sewage irrigation system may be the main
part of Taiyuan, which corresponds to the sewage irrigation zone. reason leading to the relatively serious soil contamination of
farmland. Therefore, the authorities should immediately pay more
3.4. SVM-based classification model for soil quality assessment attention to the sewage treatment technology and the improve-
ment of the irrigation system.
Based on the classification of soil heavy metals content, the The SQ-SVM model, with an accuracy of 98.3333% showed that
assessment of soil quality was performed using the SQ-SVM model. 35 samples belonged to soil standard IA, accounting for 25% of the
Fig. 8 shows the parameter option contour map from SQ-SVM. The total number of samples. No sample corresponded with standard
best cost parameter (C) was 0.70711, and the optimal parameter of IIIC. Most soil samples were classified as class IB, accounting for
the kernel function (g) was 2. The accuracy of the SQ-SVM was 50% of all samples. In general, the soil quality in the north of
98.3333%. The results of the soil quality classification for Taiyuan Taiyuan is better than that in the south. Numerous heavy industrial
based on SQ-SVM are shown in Fig. 9. factories are located in the southwestern section of the city, which
From the assessment of soil heavy metal content described may account for the increase in soil heavy metal contamination
above, 115 samples were classified as grade I. Combined with the and decrease in soil fertility, thus making the soil quality relatively
soil fertility levels, these 115 samples were further divided into poorer in this area.
three classes, IA, IB and IC, representing 35, 70 and 10 sites, The overall goal of this paper was to develop a SVM-based
respectively. For the 25 samples classified as grade II, the SQ-SVM classification model that combined soil heavy metal contamina-
assessment showed that the samples were divided into two tion and soil fertility data together in one model to better assess
classes, IIA and IIB, with 14 and 11 sites, respectively. Overall, 25% urban soil quality. The general classification is based on the
of soil samples were classified as class IA. No sample was classified Chinese soil quality standards for heavy metals and the published
as grade III. Most soil samples were classified as class IB, soil background concentrations of the research area. Undoubtedly,
accounting for 50% of the total samples. there are various factors that influence soil quality, including heavy
metal and organic chemical contamination, soil fertility, soil
texture and soil biodiversity. Thus, a SVM-based classification
model could be improved by considering the various soil influence
factors mentioned above.
This study has highlighted several advantages of using
comprehensive SVM-based classification models, including the
following three examples. (1) The SVM-based model structure is
based on statistical learning theory and VC theory. The VC theory
can generalize unseen data; thus, it can learn well with only a very
small number of parameters. (2) Many classification algorithms are
based on the independence assumption and are thus greatly
influenced by the characteristics of the correlation, but they are not
sensitive for the feature correlation. (3) Its computational
efficiency is higher than several other methods.
Of course, the SVM-based model also has some aspects to
carefully consider, as follows. (1) There is no universal solution for
nonlinear problems; therefore, the Kernel function must be
carefully chosen, which may affect the true assessment of soil
quality. (2) Different assessment models group soils based on
diverse national standards and native background values. It is
therefore necessary to select the soil classification standards
cautiously because this operation establishes a link between soil
characteristics and our understanding of their meaning in a wider
context (Montanarella and Rusco, 2008). (3) The experimental
Fig. 10. The spatial distribution of soil quality classes in Taiyuan. datasets are interpolated by soil standards, and the datasets do not
26 Y. Liu et al. / Soil & Tillage Research 155 (2016) 19–26

include those extreme outliers. One of the main drawbacks of Hastie, T., Rosset, S., Tibshirani, R., Zhu, J., 2004. The entire regularization path for
SVMs is their sensitivity to outliers or noise in the training sample the support vector machine. J. Mach. Learn. Res. 5, 1391–1415.
Hui, X., 2013. Analysis of heavy metal pollution in soil by fuzzy comprehensive
due to overfitting. Thus, further studies regarding the application evaluation method. Comput. Eng. Appl..
of fuzzy SVMs are proposed to address this issue. Jiang, X., Lu, W.X., Yang, Q.C., Zhao, H.Q., 2014. Application of support vector
The presented SVM-based classification model is not only valid machine in soil environmental quality assessment. China Environ. Sci. 34 (5),
1229–1235 (in Chinese).
for Taiyuan but can also be applied to other areas in China. A key Karim, Z., Qureshi, B.A., Mumtaz, M., Qureshi, S., 2014. Heavy metal content in urban
factor in the application of the model is to understand the soils as an indicator of anthropogenic and natural influences on landscape of
relationship among soil environmental quality, soil productivity Karachi—a multivariate spatio-temporal analysis. Ecol. Indic. 42, 20–31.
Kelepertzis, E., 2014. Accumulation of heavy metals in agricultural soils of
and soil sustainability. Furthermore, the comprehensive SVM- Mediterranean: insights from Argolida basin, Peloponnese, Greece. Geoderma
based classification framework and algorithm is expected to have 221, 82–90.
wide utilization potential, not only in soil quality assessment but Kova9 cevi
c, M., Bajat, B., Gaji
c, B., 2010. Soil type classification and estimation of soil
properties using support vector machines. Geoderma 154 (3), 340–347.
also in other environmental fields when well designed.
Lapin, M., Hein, M., Schiele, B., 2014. Learning using privileged information: SVM+
and weighted SVM. Neural Netw. 53, 95–108.
Acknowledgements Li, D.S., Yang, Z.F., Jin, Z.B., 2004. Geochemical characters of trace element of soil
from the Taiyuan basin. Geol. Prospect. 40 (3), 86–89 (in Chinese).
Li, Z., Ma, Z., van der Kuijp, T.J., Yuan, Z., Huang, L., 2014a. A review of soil heavy metal
This work was financially supported by the National Natural pollution from mines in China: pollution and health risk assessment. Sci. Total
Science Foundation of China under Grant 41271513 and by the Environ. 468, 843–853.
Important Specialized Science and Technology Item of Shanxi Li, Z., Zhou, M., Xu, L., Lin, H., Pu, H., 2014b. Training sparse SVM on the core sets of
fitting-planes. Neurocomputing 130, 20–27.
Province, China under Grant 20121101011. Li, Y., 2004. Effect of pH and organic matter on the bioavailability Cd and Zn in soil. J.
Yunnan Agric. Univ. 20 (4), 539–543 (in Chinese).
References Liao, Y.C., Chien, S.C., Wang, M.C., Shen, Y., Hung, P.L., Das, B., 2006. Effect of
transpiration on Pb uptake by lettuce and on water soluble low molecular
weight organic acids in rhizosphere. Chemosphere 65 (2), 343–351.
Akbulut, S., Grieken, R., Kılıc, M.A., Cevik, U., Rotondo, G.G., 2013. Identification of
Liu, Z.P., Shao, M.A., Wang, Y.Q., 2013. Large-scale spatial interpolation of soil pH
heavy metal origins related to chemical and morphological soil properties using
across the Loess Plateau, China. Environ. Earth Sci. 69 (8), 2731–2741.
several non-destructive X-ray analytical methods. Environ. Monit. Assess. 185
Montanarella, L., Rusco, E., 2008. Threats to soil quality in Europe.
(3), 2377–2394.
Nielsen, D.R., Bouma, J., 1985. Soil spatial variability.
Araghinejad, S., 2014. Support Vector Machines. Data-Driven Modeling: Using
Poggi, J.M., Portier, B., 2011. PM10 forecasting using clusterwise regression. Atmos.
MATLAB03 in Water Resources and Environmental Engineering. Springer,
Environ. 45 (38), 7005–7014.
Praveena, S.M., Yuswir, N.S., Aris, A.Z., Hashim, Z., 2014. Contamination assessment
Aryafar, A., Gholami, R., Rooki, R., Ardejani, F.D., 2012. Heavy metal pollution
and potential human health risks of heavy metals in klang urban soils: a
assessment using support vector machine in the Shur River, Sarcheshmeh
preliminary study. Environ. Earth Sci..
copper mine, Iran. Environ. Earth Sci. 67 (4), 1191–1199.
Saby, N.P.A., Thioulouse, J., Jolivet, C.C., Ratié, C., Boulonne, L., Bispo, A., Arrouays, D.,
Bai, Y., Wang, Y., 2011. Spatial variability of soil chemical properties in a jujube slope
2009. Multivariate analysis of the spatial patterns of 8 trace elements using the
on the Loess Plateau of China. Soil Sci. 176 (10), 550–558.
French soil monitoring network data. Sci. Total Environ. 407 (21), 5644–5652.
Behzad, M., Asghari, K., Eazi, M., Palhang, M., 2009. Generalization performance of
Steinwart, I., Christmann, A., 2008. Support Vector Machines. Springer Science &
support vector machines and neural networks in runoff modeling. Expert Syst.
Business Media.
Appl. 36 (4), 7624–7629.
Taylor, M.D., Kim, N.D., Hill, R.B., Chapman, R., 2010. A review of soil quality
Bhaduri, D., Purakayastha, T.J., 2014. Long-term tillage, water and nutrient
indicators and five key issues after 12 yr soil quality monitoring in the Waikato
management in rice-wheat cropping system: assessment and response of soil
region. Soil Use Manage. 26 (3), 212–224.
quality. Soil Tillage Res. 144, 83–95.
Teng, Y., Wu, J., Lu, S., Wang, Y., Jiao, X., Song, L., 2014. Soil and soil environmental
Bhuiyan, M.A., Parvez, L., Islam, M.A., Dampare, S.B., Suzuki, S., 2010. Heavy metal
quality monitoring in China: a review. Environ. Int. 69, 177–199.
pollution of coal mine-affected agricultural soils in the northern part of
Vapnik, V.N., 1995. The Nature of Statistical Learning Theory Berlin.
Bangladesh. J. Hazard. Mater. 173 (1), 384–392.
Wang, X.J., Lai, J.Q., Lu, Y.H., Li, D.S., Zhou, J.H., Wang, J.W., 2008. Main source of soil
Bishop, C.M., 2006. Pattern Recognition and Machine Learning, vol. 4. Springer, New
heavy metal pollution based on factor analysis in Taiyuan. Ecol. Environ. 17 (2),
York No. 4, p. 12.
671–676 (in Chinese).
Cachada, A., Pereira, M.E., Ferreira, d.S.E., Duarte, A.C., 2012. Sources of potentially
Wang, L., 2005. Support Vector Machines: Theory and Applications. Nanyang
toxic elements and organic pollutants in an urban area subjected to an
Technological University, School of Electrical & Electronic Engineering.
industrial impact. Environ. Monit. Assess. 184 (1), 15–32.
Were, K., Bui, D.T., Dick, Øystein B., Singh, B.R., 2015. A comparative assessment of
Chen, F., Jiang, X., Tang, F., Bian, Y., 2012. Application of AHP and GIS in evaluation of
support vector regression, artificial neural networks, and random forests for
agricultural soil heavy metals pollution. Environ. Pollut. Control 7 004.
predicting and mapping soil organic carbon stocks across an afromontane
Dibike, Y.B., Velickov, S., Solomatine, D., Abbott, M.B., 2001. Model induction with
landscape. Ecol. Indic. 52, 394–403.
support vector machines: introduction and applications. J. Comput. Civil Eng. 15
Xu, F.Y., Hu, Y.Y., 2014. Distribution and pollution assessment on heavy metals in
(3), 208–216.
urban soils of different functional areas in Chongqing, Chinese. J. Soil Sci. 45 (1),
Fang, R.H., 2012. Fuzzy synthetical evaluation of soil fertility in the area of
227–231 (in Chinese).
Guanzhong Plain and the Loess Plateau—a case study in Chang’an District of
Yu, H., Ni, S.J., He, Z.W., Zhang, C.J., Nan, X., Kong, B., et al., 2014. Analysis of the
Xi’an. Agric. Res. Arid Areas 1, 006 (in Chinese).
spatial relationship between heavy metals in soil and human activities based on
Franco-Uría, A., López-Mateo, C., Roca, E., Fernández-Marcos, M.L., 2009. Source
landscape geochemical interpretation. J. Geochem. Explor. 146, 136–148.
identification of heavy metals in pastureland by multivariate analysis in NW
Zhang, L., Fang-Li, S.U., Guo, C.J., Hong, Y.L., Song, Y.L., 2009. Application of grey
Spain. J. Hazard. Mater. 165 (1), 1008–1015.
correlation analysis in different models of ecological restoration in soil quality
Fu, S.M., Xiao, F., Su, W.J., Qiu, J.Q., Wang, D.F., Chang, X.Y., 2014. The evaluation of
evaluation. J. Shenyang Agric. Univ. (in Chinese).
heavy metals pollution in soils of the lower reaches of the Hengshi River within
Zheng, L., Wu, W., Wei, Y., Hu, K., 2015. Effects of straw return and regional factors on
the Dabaoshan mining area based on fuzzy mathematics. Geol. Bull. China 33
spatio-temporal variability of soil organic matter in a high-yielding area of
(8), 1140–1146.
northern china. Soil Tillage Res. 145, 78–86.
Guo, G., Fengchang, Wu, Xie, F., Zhang, R., 2012. Spatial distribution and pollution
Zhou, X., An, Y.L., Xu, W.C., Deng, M.L., 2009. Fuzzy evaluation on soil fertility of
assessment of heavy metals in urban soils from southwest china. J. Environ. Sci.
cultivated land based on GIS and improved AHP—a Case of Puan County in
(China) 24 (3), 410–418.
Guizhou Province [J]. Chin. J. Soil Sci. 1, 020.
Han, D.A.W.E.I., Cluckie, I., 2004. Support vector machines identification for runoff
modeling. In Proceedings of the sixth international conference on
hydroinformatics 21–24.

You might also like