1 s2.0 S016816992200240X Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Computers and Electronics in Agriculture 197 (2022) 106923

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture

journal homepage: www.elsevier.com/locate/compag

Detection of heavy metals in vegetable soil based on THz spectroscopy

Wei Lu a, Hui Luo a, *, Linxuan He a, Wenxuan Duan a, Yilin Tao a, Xinyi Wang a, Shuaishuai Li b
College of Artificial Intelligence, Nanjing Agricultural University, Nanjing City, Jiangsu Province 210031, China
College of Engineering, Nanjing Agricultural University, Nanjing City, Jiangsu Province 210031, China


Keywords: Heavy metal pollution in soil endangers food safety and human health. Thus, it is important to study accurate
Heavy metals and rapid detection methods. Here, an efficient nondestructive detection method for mercury (Hg), cadmium
Soil pollution (Cd) and copper (Cu) in soils was studied by terahertz (THz) spectroscopy. First, regression equations were
THz spectroscopy
established between heavy metal contents and absorption coefficients at the selected frequency points. Then, the
Qualitative detection
Quantitative prediction
pollution type and pollution level of the soils containing three heavy metals were detected at the same time.
Reference blank soil was also tested. Probabilistic neural network (PNN) and random forest (RF) models verified
the effects of qualitative detection. Next, the contents of the three heavy metals in soils were predicted simul­
taneously by a backpropagation neural network (BPNN) and an extreme learning machine (ELM). The results
showed that the absorption coefficients increased regularly in the THz spectral range from 0.05 THz to 0.7 THz.
The average detection result of the PNN model was better than that of RF. The average detection accuracy for
heavy metal pollution level and type were all higher than 95%. In addition, the prediction results of heavy metal
content showed that BPNN model has better prediction performance. The optimal decision coefficients (DC) of
BPNN model for soils containing three heavy metals were 0.95, 0.99 and 0.98, respectively, and their corre­
sponding root mean square errors (RMSE) were 0.37, 0.02 and 2.62, respectively. The results proved that THz
spectroscopy has good qualitative and quantitative detection ability for soils contaminated with Hg, Cd and Cu,
which could bring new opportunities for detection of heavy metal pollutants in soil.

1. Introduction Direct detection methods are accurate and sensitive. However, they are
time consuming and laborious when testing large numbers of samples
Soil is a substance closely related to human life. However, heavy (Zhang et al., 2013; Borrego et al., 2002; Li et al., 2021a, 2021b, 2021c).
metal pollution has become a major threat to the human living envi­ Recently, spectroscopic technology has been used increasingly for ana­
ronment due to overuse of chemical fertilizers and pesticides (Ebenezer lyses of metal elements in soil due to its nondestructive detection
et al., 2019; Clairotte et al., 2016). In addition, heavy metal contami­ characteristics. X-ray florescence spectroscopy based on a statistical
nants enter the soil due to gravity or irrigation. Cadmium (Cd), chro­ approach was used to study the functions of heavy metals in soils
mium (Cr), lead (Pb), arsenic (As), zinc (Zn) and copper (Cu) (Chandrasekaran et al., 2015). Visible near infrared reflectance spec­
contaminants in soil can migrate by following the soil-crop food chain, troscopy was applied to predict heavy metal concentrations in soil (Luce
and they pose a considerable threat to human health and the environ­ et al., 2017; Pyo et al., 2020). A magnetic field enhanced detection
ment (Mamat et al., 2020; Tóth et al., 2016; Xie et al., 2022). With the method using laser-induced breakdown spectroscopy was applied to
acceleration of industrialization, the types of heavy metal pollutants detect heavy metals in soil (Akhtar et al., 2018). Moreover, photo­
have increased, and heavy metal pollution in the soil has also increased acoustic spectroscopy and fluorescence spectroscopy have also been
(Rajendran et al., 2022; Yu et al., 2018). Therefore, it is particularly adopted to monitor heavy metal ions in soil (Liu et al., 2020; Shen et al.,
important to find an accurate and rapid method for the detection of soil 2019). With the advantages of economy, speed and accuracy, spectro­
heavy metal pollution. scopic technology brings new opportunities for the detection of soil
Accordingly, direct detection methods using chemical and physical heavy metal pollution.
tests were used to monitor the composition of heavy metals in the soil. The terahertz (THz) spectrum is located between the microwave and

* Corresponding author.
E-mail address: lh821005@njau.edu.cn (H. Luo).

Received 18 October 2021; Received in revised form 29 March 2022; Accepted 29 March 2022
Available online 9 April 2022
0168-1699/© 2022 Elsevier B.V. All rights reserved.
W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Fig. 1. THz time-domain spectral test system, (a) Schematic diagram of THz spectral system, (b) Optical path diagram of THz time-domain spectral system, (c) Test
sample platform and photoconductance antenna.

infrared regions of the electromagnetic spectrum (Ferguson et al., 2002). detecting heavy metals in soil by THz spectrum technology are still not
THz radiation exhibits low energy and high frequency, and it penetrates comprehensive and systematic. Due to the complex composition of soil
solids well (Pfleger et al., 2014). Since many polar molecules have and the opacity of soil to the THz frequency, the validity and feasibility
unique spectral fingerprints in the THz range (0.1–10 THz), THz spec­ of detecting heavy metals in soil by THz spectrum need to be proved by
troscopy can directly measure biomolecule features at the picosecond in-depth studies.
level (Arik et al., 2014). The THz spectrum has been proven to be In this paper, nondestructive methods for detecting mercury (Hg),
effective in analyzing the compositions of liquids (Flanders et al., 1996), cadmium (Cd) and copper (Cu) in vegetable soils were studied using THz
solids (Li et al., 2020; Roggenbuck et al., 2010) and gases (Suzuki et al., transmission spectroscopy. Most of the existing research methods use
2018; Mittleman et al., 1998). the THz spectrum to predict the concentrations of heavy metals in soil.
Although many research results are not currently available, THz The purpose of our study is to determine if THz technology can be used
spectroscopy has also been applied to soil detection. THz transmission to detect three heavy metals in soil qualitatively and quantitatively. In
spectroscopy has proved to be effective in evaluating soil minerals in the addition, regression models were established to investigate whether
sedimentation profile and archaeological recordings (Café et al., 2020). there was a linear relationship between the characteristic frequencies of
Four heavy metal ions with different pollution levels in soils have been the THz spectrum and the concentrations of heavy metals. Three ex­
identified through THz time-domain spectroscopy (Li et al., 2011). THz periments were designed to validate the proposed methods. The first
spectroscopy was applied to identify organic matter content and mois­ experiment obtained the absorption parameters of the THz spectrum and
ture content in soil (Dworak et al., 2011). The least squares support established regression models. The second experiment studied two
vector machine model and THz spectral data were used to quickly detect qualitative detection models for detecting heavy metal pollution types
microplastic pollution levels in soils (Li et al., 2021a, 2021b, 2021c). and pollution levels in soil. The experimental results from the probabi­
THz technology with path analysis was used to observe and identify lead listic neural network (PNN) and the random forest (RF) model were
pollution in soil at different pH levels (Li et al., 2021a, 2021b, 2021c). compared and analyzed. Last, two quantitative prediction models of
Since soil is rather opaque to THz frequencies, the interaction of THz back propagation neural network (BPNN) and extreme learning Machine
radiation with soils has been studied around four THz phenomena (ELM) were applied to predict the concentrations of the three heavy
(Lewis et al., 2017). Priors studies showed that, since the THz time- metals in soil.
domain spectrum carries useful information on heavy metals in soil, it
is feasible to detect heavy metals in soil. However, existing studies on 2. Materials and methods

2.1. THz time-domain spectral test system

Table 1
The contents of heavy metals in vegetable soils. The THz time-domain spectrometer, model 1008, was purchased
Heavy metal Content of heavy metal Actual heavy metal content in soils from Batop Company in Germany. The ultrafast femtosecond laser,
type solution (mg⋅kg− 1) (mg⋅kg− 1) model LF7808A, was purchased from Shanghai Langyan Optoelectronic
1 2 3 4
Technology Co., Ltd., China. The laser had two output bands of 780 nm
and 1560 nm, and the pulse width was less than 100 fs (Luo et al., 2019).
Hg 0.5 0.674 0.449 0.385 0.567
Fig. 1 shows the THz time-domain spectral test system. The THz radia­
1.0 1.74 1.25 1.76 1.62
2.0 4.28 3.23 3.25 3.07 tion was generated by the ultrafast femtosecond laser, and then the light
4.0 5.76 3.94 6.84 5.63 source was divided into two paths by the spectroscope. One path was the
Cd 0.5 0.86 0.56 0.76 0.34 pump light, and the other path was the probe light. The probe light was
2.0 1.94 1.59 1.93 0.93
directed into the tested sample through the photoconductive antenna
4.0 3.43 3.63 3.17 1.86
8.0 3.89 4.01 4.17 2.12
shown in Fig. 1 (c), and then the THz time-domain waveform of the
Cu 25 52 62 74 61 tested sample was obtained at different times by THz spectrum acqui­
50 59 73 89 69 sition software.
200 153 174 180 190
400 320 327 355 352

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Fig. 2. Four light paths of THz radiation in soil.

2.2. Sample preparation

Nutritive soil with a composition analysis report was purchased from

the local horticultural shop, and the standard solution of Hg and Cd ions Fig. 3. THz time-domain spectra of soils.
with concentrations of 1000 µg⋅mL− 1 and the pure copper sulfate pen­
tahydrate were purchased from Nanjing Wanqing Chemical Glass Ware Dimension reduction maps the high-dimensional THz spectrum into the
& Instrument Co., Ltd., China. low-dimensional space, thus removing redundant information and
The standard solution of Hg and Cd ions was diluted to 100 µg⋅mL− 1 retaining valid information. The THz spectrum was reduced in dimen­
with ultrapure water. The copper sulfate pentahydrate powder was sion according to the cumulative contribution of 99% of the remaining
dissolved into ultrapure water to establish a solution concentration of spectrum in our experiments.
100 mg⋅mL− 1. Four portions of soil were baked to a constant weight.
Then, the average water content of the soil was controlled at approxi­ 2.4. Collection of the soil THz spectrum
mately 42.16%.
Table 1 lists the contents of heavy metals in vegetable soils. The In the experiments, the soil was pressed into thin soil tablets. When
second column of Table 1 shows the contents of the three heavy metals THz radiation strikes soil, little THz radiation is transmitted, much is
added to the vegetable soil. Therefore, 10 mL, 20 mL, 40 mL, and 80 mL scattered, and some is reflected and absorbed. Four light paths are
of Hg ion solution, 10 mL, 40 mL, 80 mL and 160 mL of Cd ion solution, shown in Fig. 2 (Lewis et al. 2017). Absorption of THz frequencies is
5 mL, 10 mL, 40 mL, and 80 mL of Cu solution were added to the usually associated with excitation of particular vibrational modes in a
planting soils. Additionally, the reference soil without heavy metals was material. When soil contains heavy metals, its absorption of THz radi­
prepared. A total of 13 groups of soils containing different heavy metals ation changes. The characteristic absorption frequencies may be used to
were prepared to grow vegetables. The soils were poured into planting identify the particular species. In this way, soils have been investigated
pots and sealed with plastic wrap for two weeks to allow the heavy to determine their compositions (Lewis et al. 2017). Although Fig. 2
metals to mix well with the soils. shows the light paths of THz wave irradiation in soil, how heavy metals
Two weeks later, cabbage seeds were evenly planted in the soils. affect or change the THz spectral curve needs further study.
When the cabbages were ripe, the root soil of the cabbage was collected The uniform experimental environment can reduce the variations of
every 7 days. Soil samples were collected four times in total. After the collected THz spectral data. In addition, water has strong absorption
mixing well and drying, the collected soil samples were divided into two of THz waves. Higher humidity in the air can reduces the intensity of the
parts. One part of the soil samples were sent to the testing agency to THz peak (Luo et al., 2019). The environment temperature remained at
detect the actual heavy metal content as comparison results. The approximately 26 ℃ and the indoor relative humidity was maintained at
average actual contents of the three heavy metals in soils are shown in less than 40% during the experiments. The spectroscopic reference was
Table 1. The other part of the soil samples were evenly mixed with the spectrum of air. The sampling step of the THz spectrometer was 0.02
polyethylene powder in a mass ratio of 1 to 1, and then pressed into thin ps, and the integration interval was 85 ps to 115 ps.
soil tablets at 25 MPa pressure in a tablet machine. These thin soil tablets Thirteen groups of soils, including reference soils, were analyzed.
were eventually tested with the THz time-domain spectrometer. Table 1 During cabbage growth, spectral data were collected 4 times for each
shows that the actual heavy metal contents in vegetable soil were not group of soil sample. Twelve sets of air reference spectra and 40 sets of
regular for four groups of soil samples. This is because the soil itself soil THz spectra were collected for each group of soil. A total of 624 air
contained trace amounts of heavy metals, and the cabbages absorbed THz spectra and 2080 soil THz spectra were collected. Fig. 3 and Fig. 4
them differently as they grew. show all THz spectra of soils in the time domain and frequency domain,
respectively. In the time domain, the peak of the soil THz spectra was
2.3. Pretreatment of the soil THz spectrum slightly higher and later than that of the air reference. The frequency-
domain spectra of the soils show three obvious peaks between 0.25
To improve the analytical capability of the soil THz spectral data, the THz and 1 THz. However, according to the regression analysis, these
time-domain THz spectral data were pretreated by dimension reduction frequencies were not directly related to the three heavy metals added to
and denoising. The wavelet transform method (WT), standard normal vegetable soils. A possible explanation is that, due to the complex
variate transformation method (SNV) and discrete cosine transform composition of soil, these frequencies may indicate the properties of
method (DCT) were used to denoise the soil THz spectral data. The other substances in the soils.
principal component analysis method (PCA) and multidimensional
scaling transform method (MDS) were adopted to reduce the dimen­ 2.5. Absorption parameters for the soil THz spectrum
sionality of the THz spectral data. Denoising removes the THz spectral
noise caused by the environment, machine, and manual operations. The absorption coefficient effectively represents the optical proper

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Fig. 4. THz spectra of soils in frequency domain, (a) THz frequency-domain spectra of reference air, (b) THz frequency-domain spectra of soil.

Table 2 Table 3
The pollution levels of soil. The number of soil THz spectra.
Heavy metals in soils Safe soil Mildly polluted soil Heavily polluted soil Pollution level of soil Number of soil THz spectra
(mg⋅kg− 1) (mg⋅kg− 1) (mg⋅kg− 1)
Hg Cd Cu
Hg ⩽ 2.4 2.4 ~ 4.0 ⩾ 4.0
Safe soil 384 128 384
Cd ⩽ 0.3 0.3 ~ 3.0 ⩾ 3.0
Mildly polluted soil 128 320 128
Cu ⩽ 100 100 ~ 300 ⩾ 300
Heavily polluted soil 128 192 128

ties of materials. Duvillaret and Dorney constructed an efficient physical

2.6. Sample and model for qualitative detection of heavy metal pollution
model for optical parameters in the THz band (Duvillaret et al., 2002;
in soils
Dorney et al., 2001). In the experiment, The THz spectrum of soil was
collected by transmission mode of THz spectrometer. The calculation of
To identify the levels of heavy metal pollutants in soils, the soil
absorbance rate is performed in transmission mode as follows.
samples were divided into three levels based on the actual content of
⃒ ⃒
⃒Esam (ω) ⃒2 heavy metals and the soil pollution control standards of the local gov­
Absorbance rate = − lg⃒⃒ ⃒ (1) ernment for agricultural land (file no. GB 15618–2018). They were
Eref (ω) ⃒
identified as safe soil, mildly polluted soil, and heavily polluted soil. The
where Eref(ω) is the THz frequenc-domain spectrum of the reference air pollution levels for soils containing Hg, Cd and Cu are shown in Table 2.
and Esam(ω) is the THz frequenc-domain spectrum of the soil. Eref(ω) and Table 3 lists the sample number of THz spectra for the three groups of
Esam(ω) used in subsequent calculations were the average spectrum. soil. In addition, the types of heavy metal soils were also detected. There
The THz s-polarized wave was obtaied in the experiment, and its are four kinds of soil: Hg-contaminated soil, Cd-contaminated soil, Cu-
incident direction was perpendicular to the THz pulse direction. contaminated soil, and reference soil. The number of THz spectra for
Therefore, the refractive coefficient n(ω) and extinction coefficient k(ω) the reference soil was 160, and the number of THz spectra for each
are written as follows. contaminated soil was 640. The total number of THz spectra of the tested
soils is 2080.
φ(ω)c Two machine learning-based models were used to identify heavy
n(ω) = +1 (2)
ωd metal pollution in soils. PNN is a radial basis neural network that in­
{ } tegrates density function theory and Bayesian decision theory. A PNN
c 4n(ω) mode is composed of an input layer, a mode layer, a summation layer
k(ω) = ln (3)
ωd ρ(ω)[n(ω) + 1]2 and an output layer (Specht, 1990). Appropriate smoothing factors can
improve the classification performance of PNN models. According to the
where ρ(ω) is the amplitude ratio for the soil spectrum and the reference optimal fitting results of PNN model, the smoothing factors used in the
air spectrum, and φ(ω) is the phase difference between of them. More­ experiment were 0.17 and 0.18. RF is a highly flexible machine learning
over, c is the density of the substance, which is equivalent to the density algorithm combining the bagging method and decision trees. It is a
of the tablet, and d is the thickness of the tablet. In subsequent experi­ strong classifier that votes on multiple weak classification results (Bei­
ment, as a soil tablet is 1 cm in diameter, and 1.2 mm in thickness, and man, 2001). The performance of the RF model is related to the number
0.36 g in weight, so c is 3.822 g/cm2 and d is 1.2 mm. of decision trees. In order to achieve better detection results, a RF model
According to Lambert’s theorem, the absorption coefficient α(ω) is containing 70 decision trees was used to detect heavy metal pollution in
calculated as follows. soils.
{ }
2ωk(ω) 2 4n(ω)
α(ω) = = ln (4)
c d ρ(ω)[n(ω) + 1]2 2.7. Sample and model for quantitative prediction of heavy metal content
in soils

The three heavy metal contents of the polluted soils were predicted

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Fig. 5. Absorption parameters of THz spectra of soils, (a) Absorption parameters of soil containing Hg, (b) Absorption parameters of soil containing Cd, (c) Ab­
sorption parameters of soil containing Cu.

evaluating the prediction model were root mean square error (RMSE)
Table 4
and decision coefficient (DC).
Results of linear modeling of optical parameters of soil THz spectra.
Heavy metal type Frequency point (THz) Regression equation DC 3. Results and discussion
Hg 0.20 y= 0.49⋅x-2.64 0.89
0.27 y= 0.89⋅x-4.64 0.92 3.1. Analysis of absorption parameters
0.33 y= 0.74⋅x-5.29 0.86
0.40 y= 1.03⋅x-7.46 0.86
Cd 0.20 y= 0.84⋅x-3.29 0.91
The absorption parameters of soils with different contents of Hg, Cd
0.27 y= 1.19⋅x-4.95 0.97 and Cu were calculated. It was found that when the time-domain spectra
0.33 y= 1.52⋅x-6.42 0.97 of soils with different heavy metal contents were denoised by the DCT
0.40 y= 1.98⋅x-9.04 0.99 method, their absorption parameters had obvious regularity in the range
Cu 0.17 y= 0.01⋅x-4.58 0.31
from 0.1 THz to 0.7 THz for certain concentrations of heavy metals.
0.23 y= 0.01⋅x-3.96 0.41
Fig. 5 shows the absorption parameter curves for soils with five con­
centrations of heavy metals. The concentrations of heavy metals marked
at the same time. The THz spectrum of each contaminated soil was 640, in Fig. 5 are actual detected values. Peak frequency points are also
so the total number of THz spectra used for prediction is 1920. The marked with arrows in Fig. 5. The curves for the absorption coefficients
content of heavy metals in soil was quantitatively predicted with BPNN and absorption rates showed rising and falling trends, respectively.
and ELM. BPNN is a forward propagation neural network with feedback, Then, linear models for the absorption coefficients were established
and ELM is a single hidden layer feedforward neural network (Zhang at the selected peak frequency points. The obtained regression equations
and Zhang, 2015). The hidden layer structure and activation function and DC values are shown in Table 4. Since the linear models for the
are important for BPNN model. The BPNN used in the predict experi­ heavy metal Cu had low DC values, the selected two frequency points
ment had a single hidden layer. The activation function between the were obviously not the characteristic frequency points of the heavy
input layer and hidden layer was the sigmoid function, and the linear metal Cu in soil. The DC values for linear modeling of Hg and Cd were
function was used as the activation function between the hidden layer both higher. However, these frequency points cannot uniquely represent
and the output layer. Moreover, the number of network iterations was the linear relationship between them and the concentrations of heavy
1000, and the target mean square deviation was 0.0004, and the number metals. Therefore, it is impossible to determine whether these frequency
of neurons in the hidden layer was 9. The activation function and the points are characteristic frequency points of Hg and Cd in soil. The
number of neurons in the hidden layer are also important for the ELM reasons may be that, the THz spectra of single molecular components of
model. In the experiment, the activation function was the sigmoid heavy metals cannot be obtained due to the complex composition of soil,
function, and the number of neurons in the hidden layer was 100. The or the concentrations of heavy metals chosen for the experiment were
prediction results for the two models were compared with the actual not right enough to show the THz spectrum characteristics of the heavy
heavy metal contents detected by the testing facility. The criteria for metals, or the instrument used was not precise enough. The THz

Table 5
Detection accuracy of soil heavy metal pollution types.
Heavy metal type Denoising method Dimension reduction method Accuracy (%)


Training set Testing set Training set Testing set

Reference soil WT-SNV None 100 99.17 98.63 95.37

PCA 99.46 99.58 99.58 98.43
MDS 99.29 98.75 99.61 98.75
Hg WT-SNV None 99.70 99.31 97.89 93.84
PCA 99.11 99.31 99.31 97.82
MDS 99.40 95.83 99.40 97.96
Cd WT-SNV None 99.64 97.92 99.27 97.17
PCA 99.11 97.50 99.32 97.89
MDS 99.82 98.33 99.39 97.54
Cu WT-SNV None 99.11 97.92 97.43 91.78
PCA 98.81 94.44 98.99 95.90
MDS 98.21 95.14 98.73 96.09
Average result 99.31 97.77 98.96 96.55

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Table 6
Detection accuracy of soil heavy metal pollution levels.
Heavy metal type Denoising method Dimension reduction method Accuracy (%)


Training set Testing set Training set Testing set

Hg WT-SNV None 95.78 92.25 95.46 91.77

PCA 95.69 92.91 92.30 92.21
MDS 95.63 92.35 92.01 91.94
Cd WT-SNV None 99.82 99.36 99.64 98.48
PCA 99.68 99.02 99.02 98.85
MDS 99.73 99.11 99.14 98.94
Cu WT-SNV None 99.67 98.94 99.67 98.94
PCA 99.64 98.84 98.81 99.00
MDS 99.67 98.83 98.97 98.85
Average result 98.37 96.85 97.22 96.55

spectrum of soil carries the comprehensive spectral information after the shown in Table 6. There were three pollution levels for each heavy
THz spectrum irradiates the soil. How to analyze the generation mech­ metal-contaminated soils. They were also analyzed with the PNN and RF
anism of THz spectrum of heavy metals in soil is an in-depth research to models. The modeling average results for the training data set were
be solved. 98.37% and 97.22%, respectively. The final detection results for the
testing set were 96.85% and 96.55%. Both the calibration model and
validation model exhibited satisfactory detection results. Therefore, it is
3.2. Qualitative detection results of heavy metal pollution in soils
concluded that PNN model and RF model based on the THz time-domain
spectrum accurately identified the heavy metal pollution levels in soils.
Two kinds of heavy metal pollution were identified based on THz
Before the spectrum was tested, the THz time-domain spectral data
spectra of the soil. One experiment tested the types of heavy metals in
were preprocessed with denoising and dimensionality reduction. Taking
soils. The second experiment tested the levels of heavy metal pollutants
the detection for Hg pollution level as an example, the optimal result of
in soils. First, the THz time-domain spectra for soil were preprocessed by
the RF model combined with the pretreatment of WT-SNV -MDS was
different spectral processing methods. Then, the data set was randomly
97.96%. However, the lowest accuracy was 93.84% for the RF model
divided into training set and testing set in a ratio of 7 to 3. Finally, the
using the WT-SNV spectrum processing method. Reducing the dimen­
average accuracy obtained after running the test mode 30 times was
sion of THz spectrum did not necessarily improve detection results. This
calculated to evaluate the detection model. The detection results for the
is because dimensionality reduction may lose useful information about
training set and testing set are listed in Table 5 and Table 6.
the spectrum data. The results indicates that the accuracy of the detec­
Table 5 lists the accuracy for identifying heavy metal pollution types
tion model can be affected by the spectral pretreatment method.
in soils. Four kinds of soil samples (including the blank reference soil)
were detected at the same time. As shown in Table 5, the average results
for the PNN and RF training models were 99.31% and 98.96%, respec­ 3.3. Quantitative prediction results of heavy metal content in soils
tively, which indicates that the established detection models were
effective. The detection results for the testing set were 97.77% and In the quantitative prediction experiment, four pretreatment
96.55%, respectively. The average result for the PNN model was higher methods including WT-PCA, DCT-PCA, WT and DCT were used to pre­
than that of the RF model. Compared with RF model, the superiority of process soil THz time domain spectrum. The dataset was also randomly
PNN model in detection performance is not significant. Four kinds of soil divided into training set and testing set at a ratio of 7 to 3. ELM and
samples can be accurately detected with the two models. These exper­ BPNN models were established with the THz refraction coefficients,
imental results showed that THz time-domain spectroscopy can be used which were calculated by Formula 2. The models predicted three heavy
effectively to detect the types of heavy metals in soils containing Hg, Cd metals simultaneously. Table 7 lists the predicted results of the training
and Cu at the same time. set and testing set. The listed RMSE values and DC values obtained after
The accuracy for detection of heavy metal pollution levels in soils is running the prediction models 30 times.

Table 7
Prediction results of heavy metal content in soils.
Heavy metal type Dimension reduction method Prediction results


Training set Testing set Training set Testing set


Hg WT-PCA 0.44 0.95 1.42 0.82 2.18 0.75 3.41 0.62

WT 0.22 0.97 0.79 0.90 0.48 0.94 2.24 0.90
DCT-PCA 0.17 0.98 0.53 0.93 1.01 0.88 1.64 0.80
DCT 0.10 0.99 0.37 0.95 0.41 0.95 1.26 0.94
Cd WT-PCA 0.01 0.99 0.04 0.97 0.06 0.96 0.11 0.96
WT 0.01 1.00 0.02 0.99 0.03 0.97 0.06 0.92
DCT-PCA 0.02 0.99 0.06 0.95 0.06 0.95 0.13 0.98
DCT 0.02 0.99 0.05 0.96 0.04 0.97 0.12 0.91
Cu WT-PCA 2.03 0.83 1.31 0.89 4.29 0.97 0.11 0.92
WT 8.38 0.93 8.39 0.93 2.54 0.96 2.02 0.94
DCT-PCA 6.03 0.95 3.51 0.97 7.94 0.98 3.15 0.89
DCT 3.90 0.97 2.62 0.98 4.70 0.99 8.24 0.98
Average result – 0.96 – 0.94 – 0.94 – 0.90

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Fig. 6. Prediction results of BPNN model, (a) Prediction results of soils containing Hg, (b) Prediction results of soils containing Cd, (c) Prediction results of soils
containing Cu.

DC values of 0.96 and 0.94 for the training set in Table 7 indicated Acknowledgments
the good performance of the established prediction model. The pre­
dicted DC values for the testing set were 0.94 and 0.9. This shows that This work is supported by National Natural Science Foundation of
the predicted value is in good agreement with the observed value. As an China (No. 32071896, 31960487, 61401215), and Jiangsu Natural
example of the BPNN model, the fitting curves are shown in Fig. 6. In Science Foundation (No. BK20181315). The authors are grateful for
addition, except for the RMSE for soils containing Cu, the RMSE values editors and anonymous reviewers who make constructive comments.
between the predicted results and actual results were very small. The
reason for the larger prediction error with soils containing Cu may be References
that the applied spectral pretreatment methods were not suitable for
THz spectral data for soils containing Cu. The DC values predicted by the Akhtar, M., Jabbar, A., Mehmood, S., Ahmed, N., Ahmed, R., Baig, M.A., 2018. Magnetic
fifield enhanced detection of heavy metals in soil using laser induced breakdown
BPNN model were higher than that predicted by the ELM model. The spectroscopy. Spectroc. Acta Pt. B-Atom. Spectr. 148, 143–151.
optimal DC values by the BPNN model were 0.95, 0.99 and 0.98 for the Arik, E., Altan, H., Esenturk, O., 2014. Dielectric properties of diesel and gasoline by
three heavy metals, and their corresponding RMSE values were 0.37, terahertz spectroscopy. J. Infrared Millim. THz Waves 35 (9), 759–769.
Beiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
0.02 and 2.62, respectively. Similar to the previous detection results, the Borrego, J., Morales, J.A., de la Torre, M.L., Grande, J.A., 2002. Geochemical
effect of spectral data pretreatment directly affected the performance of characteristics of heavy metal pollution in surface sediments of the Tinto and Odiel
quantitative prediction models. For example, the prediction results of river estuary (southwestern Spain). Environ. Geol. 41, 785–796.
Café, A.I., Bacaoco, M., Tugado, C., Reyes, A.D.L., Faustino, M.A., Lopez, L.,
Hg content through ELM model was very low combined with WT-PCA Hernandez, V., Mabanag, M., Lipardo, I., Tesoro, G.B., Estacio, E.S., 2020. Terahertz
pretreatment. The DC values of training set and testing set was only transmission spectroscopy of soil minerals for geoarchaeological evaluation of
0.75 and 0.62. How to improve the preprocessing performance of THz sediments excavated from Pinagbayanan Batangas Philippines. Infrared Phys.
Technol. 111, 103568.
spectral data is a common problem both in soil qualitative and quanti­
Chandrasekaran, A., Ravisankar, R., 2015. Spatial distribution of physico-chemical
tative detection technology. properties and function of heavy metals in soils of Yelagiri hills, Tamilnadu by
energy dispersive X-ray florescence spectroscopy (EDXRF) with statistical approach.
Spectrochim. Acta, Part A 150, 586–601.
4. Conclusion
Clairotte, M., Grinand, C., Kouakoua, E., Thébault, A., Saby, N.P.A., Bernoux, M.,
Barthès, B.G., 2016. National calibration of soil organic carbon concentration using
Qualitative and quantitative methods for detection of Hg, Cd and Cu diffuse infrared reflectance spectroscopy. Geoderma 276, 41–52.
Dorney, T.D., Baraniuk, R.G., Mittleman, D.M., 2001. Material parameter estimation
in vegetable soil with THz spectroscopy were studied in this paper. First,
with terahertz time-domain spectroscopy. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 18,
the absorption parameters for THz time-domain spectra of soils 1562–1571.
increased regularly in the range from 0.05 THz to 0.7 THz. However, Duvillaret, L., Garet, F., Coutaz, J.L., 2002. A reliable method for extraction of material
according to the results of the regression model, the characteristic fre­ parameters in terahertz time-domain spectroscopy. IEEE J. Sel. Top. Quantum
Electron. 2, 739–746.
quency points for the three heavy metals in soils were not found, which Dworak, V., Augustin, S., Gebbers, R., 2011. Application of terahertz radiation to soil
may be due to the complex composition of the soil or the insufficient measurements: initial reults. Sensors 11 (10), 9973–9988.
precision of the instrument. Then, two detection models and two pre­ Ebenezer, G., Emmanuel, K.A.A., Kwaku, A.A., 2019. Potential heavy metal pollution of
soil and water resources from artisanal mining in Kokoteasua. Ghana. Groundw.
dictive models were used for qualitative and quantitative detection of Sustain. Dev. 8, 450–456.
heavy metals in soils. The results for the training set and the testing set Ferguson, B., Zhang, X.-C., 2002. Materials for terahertz science and technology. Nat.
verified the validity of the applied models. The obtained results indi­ Mater. 1 (1), 26–33.
Flanders, B.N., Cheville, R.A., Grischkowsky, D., Scherer, N.F., 1996. Pulsed terahertz
cated that, although the spectral pretreatment method had a great in­ transmission spectroscopy of liquid chcl3, ccl4, and their mixtures. J. Phys. Chem.
fluence on the detection results, the THz spectra of soils combined with 100 (29), 11824–11835.
the machine learning models effectively detect and predict heavy metals Lewis, R.A., 2017. Invited review terahertz transmission, scattering, reflection, and
absorption —the interaction of THz radiation with soils. J. Infrared Milli Terahz
in soils. However, the detection mechanism for the soil heavy metal
Waves 38 (7), 799–807.
pollutant with THz spectroscopy has not been studied in this paper. We Li, B., Wang, M.H., Wei, C., Zhang, Z.W., 2011. Research on heavy metal ions detection
will focus on this subject in future studies. Although the application of in soil with terahertz time-domain spectroscopy. Proc. SPIE - The Int. Soc. Optical
Eng. 8195–81951V.
THz spectroscopy in soil detection needs further research, the results of
Li, B., Zhao, X., Zhang, Y., Zhang, S., Luo, B., 2020. Prediction and monitoring of leaf
this study indicate that THz technology can provide new ideas and op­ water content in soybean plants using terahertz time-domain spectroscopy. Comput.
portunities for soil detection. Electron. Agric. 170.
Li, F., Xu, L., You, T., Lu, A., 2021a. Measurement of potentially toxic elements in the soil
through NIR, MIR, and XRF spectral data fusion. Comput. Electron. Agric. 187.
Declaration of Competing Interest Li, Y., Yao, J., Nie, P., Feng, X., Liu, J., 2021b. An effective method for the rapid
detection of microplastics in soil. Chemosphere 276.
Li, B., Li, C., Dong, C., Li, P., Ma, J., Ye, D., 2021c. Mechanism of lead pollution detection
The authors declare that they have no known competing financial in soil using terahertz spectrum. Int. J. Environ. Sci. Technol. https://doi.org/
interests or personal relationships that could have appeared to influence 10.1007/s13762-021-03588-5.
the work reported in this paper.

W. Lu et al. Computers and Electronics in Agriculture 197 (2022) 106923

Liu, L., Huan, H., Zhang, L., Zhao, B., Shao, X., 2020. Determination of heavy metal soil Roggenbuck, A., Schmitz, H., Deninger, A., Mayorga, I.C.C., Grüninger, M., 2010.
contaminants based on photoacoustic spectroscopy. Int. J. Thermophys. 41, 1–10. Coherent broadband continuous-wave terahertz spectroscopy on solid- state
Luce, M.S., Ziadi, N., Gagnon, B., Karam, A., 2017. Visible near infrared reflflectance samples. New J. Phys. 12.
spectroscopy prediction of soil heavy metal concentrations in paper mill biosolid- Shen, Q., Xia, K., Zhang, S., Kong, C., Hu, Q., Yang, S., 2019. Hyperspectral indirect
and liming by-product- amended agricultural soils. Geoderma. 288, 23–36. inversion of heavy-metal copper in reclaimed soil of iron ore area. Spectrochim.
Luo, H., Zhu, J.P., Xu, W.N., Cui, M.J., 2019. Identification of soybean varieties by Acta. A. Mol. Biomol. Spectrosc. 222.
terahertz spectroscopy and integrated learning method. Optik 184, 177–184. Specht, D.F., 1990. Probabilistic neural networks. Neural Netw. 3, 109–118.
Mamat, A., Zhang, Z., Mamat, Z., Zhang, F., Yinguang, C., 2020. Pollution assessment Suzuki, T., Katagiri, T., Matsuura, Y., 2018. Time-domain terahertz gas spectroscopy
and health risk evaluation of eight (metalloid) heavy metals in farmland soil of 146 using hollow-optical-fiber gas cell. Opt. Eng. 57.
cities in China. Environ. Geochem. Health. 42 (11), 3949–3963. Tóth, G., Hermann, T., Da Silva, M.R., Montanarella, L., 2016. Heavy metals in
Mittleman, D.M., Jacobsen, R.H., Neelamani, R., Baraniuk, R.G., Nuss, M.C., 1998. Gas agricultural soils of the European Union with implications for food safety. Environ.
sensing using terahertz time-domain spectroscopy. Appl. Phys. B. 67 (3), 379–390. Int. 88, 299–309.
Pfleger, M., Roitner, H., Pühringer, H., Wiesauer, K., Grün, H., Katletz, S., 2014. Xie, N., Kang, C., Ren, D., Zhang, L., 2022. Assessment of the variation of heavy metal
Advanced birefringence measurements in standard terahertz time-domain pollutants in soil and crop plants through field and laboratory tests. Sci. Total
spectroscopy. Appl. Opt. 53, 3183–3190. Environ. 811.
Pyo, J.C., Hong, S.M., Kwon, Y.S., Kim, M.S., Cho, K.H., 2020. Estimation of heavy metals Yu, K., Geel, M.V., Ceulemans, T., Geerts, W., Ramos, M.M., Serafim, C., Sousa, N.,
using deep neural network with visible and infrared spectroscopy of soil. Sci. Total Castro, P.M.L., Kastendeuch, P., Najjar, G., Ameglio, T., Ngao, J., Sauderau, M.,
Environ. 741. Honnay, O., Somers, B., 2018. Vegetation reflectance spectroscopy for biomonitoring
Rajendran, S., Priya, T.A.K., Khoo, K.S., Hoang, T.K.A., Ng, H.S., Munawaroh, H.S.H., of heavy metal pollution in urban soils. Environ. Pollut. 243, 1912–1922.
Karaman, C., Orooji, Y., Show, P.L., 2022. A critical review on various remediation Zhang, C., Appel, E., Qiao, Q., 2013. Heavy metal pollution in farmland irrigated with
approaches for heavy metal contaminants removal from contaminated soils. river water near a steel plant-magnetic and geochemical signature. Geophys. J. Int.
Chemosphere 287. 192, 963–974.
Zhang, L., Zhang, D., 2015. Domain adaptation extreme learning machines for drift
compensation in E-nose systems. IEEE Trans. Instrum. Meas. 64, 1790–1801.

You might also like