Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

LWT - Food Science and Technology 158 (2022) 113173

Contents lists available at ScienceDirect

LWT
journal homepage: www.elsevier.com/locate/lwt

Vision transformer for quality identification of sesame oil with stereoscopic


fluorescence spectrum image
Zhilei Zhao , Xijun Wu *, Hailong Liu
Measurement Technology & Instrumentation Key Laboratory of Hebei Province, Institute of Electrical Engineering, Yanshan University, Qinhuangdao, 066004, China

A R T I C L E I N F O A B S T R A C T

Keywords: Sesame oil (SO), as a high-priced edible oil, is often counterfeited and adulterated. A new method for SO quality
Sesame oil identification using Vision Transformer (ViT) network based on stereoscopic images of Excitation-emission
Identification matrix fluorescence (EEMF) and Total synchronous fluorescence (TSyF) spectroscopy was proposed. The basic
Fluorescence spectroscopy
samples including pure, counterfeit and adulterated SOs were characterized by fluorescence spectroscopy. A data
Vision transformer
Few sample learning
augmentation strategy including linear interpolation, shift and noise injection was selected for few sample
learning. All fluorescence spectral data were visualized as stereoscopic images with rich spectral characteristics.
The ViT network architecture based on attention mechanism was designed and trained to establish four SO
quality identification models. The macro averages of precision, recall and F1-score on the validation set were
greater than 0.99. The values of these indicators on the test samples were equal to one. In conclusion, deep
learning based on ViT using stereoscopic fluorescence spectrum image provided a new method for sesame oil
identification.

1. Introduction destructiveness and simple operation. Raman, infrared and fluores­


cence spectroscopy as common spectral analysis methods are widely
Worldwide, the report on the quality of edible oil is the most in food- used in green chemistry research. Some reports have shown that Raman
related reports, accounting for 24% of the total number (Moore, Spink, spectroscopy has good applications in the qualitative and quantitative
& Lipp, 2012). Sesame oil (SO), as a kind of edible oil with high eco­ analysis of vegetable oil (Pan et al., 2014; Qiu, Hou, Huyen, Yang, &
nomic value and rich nutrition, is usually used for seasoning with unique Chen, 2019). However, the signal intensity of the traditional Raman
flavor. Unfortunately, it often becomes the target of economically spectroscopy is much lower than that of the fluorescence signal. Infrared
motivated adulteration and counterfeiting (Yuan et al., 2020). Regard­ spectroscopy technology is also widely utilized in the research of
less of relevant regulations, some illegal manufacturers add cheap vegetable oil quality (Chu, Wang, Li, Zhao, & Jiang, 2018; da Costa,
vegetable oils to SO, or even directly add chemically synthesized food Fernandes, Gomes, de Almeida, & Veras, 2016; Elzey, Pollard, &
additive sesame oil essence (SOE) to cheap vegetable oils. The former Fakayode, 2016). The shortcomings of infrared spectroscopy technology
dilutes the content of nutrients such as vitamin E (also known as are difficult spectral analysis and poor detection sensitivity. Compared
tocopherol), sesamin, sesamolin, sesamol in SO. The latter makes the with the above two methods, the main advantages of fluorescence
SOE over absorbed by the human body, which may cause headaches, spectroscopy are high sensitivity and good selectivity. The presence of
nausea, and even affect the liver and kidney functions (Ni, Zhang, & fluorophores in vegetable oil makes it theoretically feasible to charac­
Kokot, 2005). Therefore, the quality of SO should be guaranteed terize vegetable oil by fluorescence spectroscopy, and some studies have
whether from the normal operation of economic market or the health of confirmed this feasibility (Sikorska, Górecki, Khmelinskii, Sikorski, &
consumers perspective. However, the identification of vegetable oil is Kozioł, 2005; Wang et al., 2019; Xu, Liu, & Wang, 2016). Compared with
not a simple work because the compositions of the inferior oil are similar conventional low-dimensional spectroscopy technology,
to those of authentic oil (Filoda et al., 2019). high-dimensional fluorescence spectroscopy has more obvious advan­
Among the many vegetable oil quality control methods developed by tages (Liu, Yao, Xia, Gao, & Gong, 2021). Excitation-emission matrix
researchers, spectroscopy is known for its fast detection speed, non- fluorescence (EEMF) and Total synchronous fluorescence (TSyF)

* Corresponding author.
E-mail addresses: zzl18833827996@163.com (Z. Zhao), wuxijun@ysu.edu.cn (X. Wu), liuhl@ysu.edu.cn (H. Liu).

https://doi.org/10.1016/j.lwt.2022.113173
Received 23 June 2021; Received in revised form 27 January 2022; Accepted 28 January 2022
Available online 2 February 2022
0023-6438/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Z. Zhao et al. LWT 158 (2022) 113173

spectroscopy can simultaneously display all fluorophores in a Table 1


three-dimensional map with good selectivity, which is beyond the reach The sample code, brand and geographical origin of the raw sample.
of conventional fluorescence analysis methods. However, it should be Order Classification Sample Brand Geographical origin
pointed out that these two methods inevitably have the problem of code
unclear attribution information of spectral peaks, which increases the 1 corn oil CO1 Wannian Suihua
difficulty of spectral analysis. 2 rapeseed oil RSO1 JinDing Shanghai
Different from other chemometrics methods, deep neural network 3 soybean oil SBO1 Fulinmen Tianjin
can learn key patterns from spectra hierarchically and automatically in (Transgenosis,
Brazil)
complex systems, which reduces human factors in feature selection 4 sesame oil SO1 Fulaiwei Shanghai
(Dong, Zhang, Zuo, & Wang, 2021; Ju, Lyu, Hao, Shen, & Cui, 2019; 5 SO2 Jinlongyu Tianjin
Yang et al., 2019). In the field of food safety, fluorescence spectroscopy 6 SO3 Luhua Laiyang
combined with deep learning has been applied to tea classification and 7 SO4 Damingyongzhen Tianjin a
8 SO5 Fulinmen Tianjin b
citrus maturity prediction (Itakura, Saito, Suzuki, Kondo, & Hosoi, 2019;
9 SO6 Damingyongzhen Tianjin a
Lin et al., 2019). In addition, the combination of these two technologies 10 SO7 Fulinmen Tianjin b
has been well tried in other fields, such as smart medicine and industrial 11 SO8 Lihong Qinhuangdao
production (Gan et al., 2019; Hu et al., 2019; Ju et al., 2019). Con­ 12 SO9 Huarui Heze
volutional neural network (CNN) and recurrent neural network (RNN) 13 sesame oil SOE1 Shangwei Qingdao
essence
are the two mainstream networks in deep learning. However, CNN filter 14 sunflower seed SSO1 Fulinmen Tianjin
typically has a local receptive field, and RNN processes one element of oil
the input sequence at a time (Bazi, Bashmal, Rahhal, Dayil, & Ajlan, 15 SSO2 Jinlongyu Qinhuangdao
2021). The emerging Transformer based on attention mechanism can a
Samples of the same brand from different production dates.
skillfully avoid the above two problems (Han et al., 2020). Transformer b
The same sample was scanned at different dates.
was first applied in natural language processing (NLP) domain, and
Vision Transformer (ViT) is the first work to show how Transformer can
completely replace standard convolution in the field of computer vision Table 2
(CV) (Khan et al., 2021). As far as we know, there is no report about the The sample code, raw materials and mixture ratio of the blend sample.
application of ViT in fluorescence spectrum modeling. In practice, the Order Classification Sample Raw materials and
number of samples is still the biggest handicap for the application of ViT code mixture ratio
network to the deep modeling of vegetable oil fluorescence spectrum. 1 counterfeit sesame oil based on CSOCO1 CO1: SOE1 = 9:1
The main objective of this paper is to demonstrate the goodness of corn oil
applying ViT to fluorescence data. In order to the problems of the 2 CSOCO2 CO1: SOE1 = 8:2
difficult detection, the limitations of traditional deep networks in 3 CSOCO3 CO1: SOE1 = 7:3
4 CSOCO4 CO1: SOE1 = 6:4
spectral analysis, and the small number of samples, the fluorescence 5 CSOCO5 CO1: SOE1 = 5:5
stereoscopic image and ViT were studied to complete the few sample 6 CSOCO6 CO1: SOE1 = 4:6
learning for SO quality identification. Different SO samples were scan­ 7 CSOCO7 CO1: SOE1 = 3:7
ned by EEMF and TSyF spectra. Three data augmentation generators 8 CSOCO8 CO1: SOE1 = 2:8
9 CSOCO9 CO1: SOE1 = 1:9
including linear interpolation, shift and noise injection were designed to
10 counterfeit sesame oil based on CSORSO1 RSO1: SOE1 = 9:1
expand the number of basic fluorescence data. The augmented data were rapeseed oil
visualized as stereoscopic images with richer fluorescence information. 11 CSORSO2 RSO1: SOE1 = 8:2
Four SO quality identification models were established through training 12 CSORSO3 RSO1: SOE1 = 7:3
well-designed ViT networks, and these models were well evaluated by 13 CSORSO4 RSO1: SOE1 = 6:4
14 CSORSO5 RSO1: SOE1 = 5:5
some indicators. Visualization, repeated training and comparison ex­
15 CSORSO6 RSO1: SOE1 = 4:6
periments were performed to highlight the interpretability, stability and 16 CSORSO7 RSO1: SOE1 = 3:7
accuracy of the proposed method, respectively. 17 CSORSO8 RSO1: SOE1 = 2:8
18 CSORSO9 RSO1: SOE1 = 1:9
19 counterfeit sesame oil based on CSOSBO1 SBO1: SOE1 = 9:1
2. Materials and methods
soybean oil
20 CSOSBO2 SBO1: SOE1 = 8:2
2.1. Sample preparation 21 CSOSBO3 SBO1: SOE1 = 7:3
22 CSOSBO4 SBO1: SOE1 = 6:4
Table 1 indicates the 15 raw samples prepared in the laboratory. 23 CSOSBO5 SBO1: SOE1 = 5:5
24 CSOSBO6 SBO1: SOE1 = 4:6
These samples include pure corn oil (CO), rapeseed oil (RSO), soybean 25 CSOSBO7 SBO1: SOE1 = 3:7
oil (SBO), SO, SOE and sunflower seed oil (SSO), and they are as 26 CSOSBO8 SBO1: SOE1 = 2:8
different as possible for the oils with the same category. The raw ma­ 27 CSOSBO9 SBO1: SOE1 = 1:9
terials and mixture ratios of 27 counterfeit sesame oil (CSO) and six 28 adulterated sesame oil ASO1 SO2: SSO1 = 1:2
29 ASO2 SO2: SSO1 = 1:1
adulterated sesame oil (ASO) samples are shown in Table 2. Counterfeit
30 ASO3 SO2: SSO1 = 2:1
sesame oils with CO, RSO and SBO are named CSOCO, CSORSO and 31 ASO4 SO1: SSO2 = 1:2
CSOSBO, respectively. For the adulterated sesame oil identification 32 ASO5 SO1: SSO2 = 1:1
experiment, the basic samples consist of all adulterated sesame oil and 33 ASO6 SO1: SSO2 = 2:1
SO samples. For the experiment of identifying adulterated sesame oil,
basic samples are composed of all adulterated sesame oil samples and
wide wavelength response range (200–900 nm), good signal-to-noise
SO1 ~ SO6 samples.
ratio (6000:1), ultra-high spectral resolution (0.1 nm) and excellent
stray light suppression ability. The undiluted samples were scanned by
2.2. Apparatus and fluorescence spectral acquisition
spectrometer in order to identify the quality of SO quickly. The slit width
of the FS920 spectrometer was 1.11 mm and the dwell time was 0.1 s.
FS920 steady-state fluorescence spectrometer was applied to mea­
The control of the spectrometer is completed on the F900 software. This
sure the fluorescence spectra of oil samples. This spectrometer has a

2
Z. Zhao et al. LWT 158 (2022) 113173

software and the FS920 spectrometer are provided by the Edinburgh where Imn
original noise
and Imn denote the original data and data with noise,
Instruments Company in the UK. respectively. norm is a normal distribution function with expectation μ
The EEMF spectrum consists of emission spectra corresponding to and standard deviation σ.
different excition wavelengths. The excitation wavelength range was Both training data after data augmentation and test data need to be
250–550 nm (10 nm step size), and the emission wavelength range was visualized as images because CV is based on image pixels. In previous
260–750 nm (2 nm step size). In addition, the emission wavelength reports, the three-dimensional fluorescence spectrum was described as a
lagged the excitation wavelength by 10 nm to eliminate the influence of stereogram or a contour map. In this study, the two descriptions were
Rayleigh scattering. The TSyF spectra expressed fluorescence intensity unified into one image through perspective, which provided richer
as a function of excitation wavelength and wavelength interval. When fluorescence signals for CV tasks. Specifically, in order to balance the
the TSyF spectra were scanned, the excitation wavelength range was burden of the computer and the representativeness of the spectral image,
250–750 nm (5 nm step size), and the wavelength interval range was the fluorescence data was visualized as a 144 × 144 × 3 pixel stereo­
10–120 nm (10 nm step size). One of the main advantages of TSyF scopic fluorescence spectrum image including 40 contour lines and a
spectroscopy is that the Rayleigh scattering can be effectively avoided, 70% perspective stereogram.
while the advantage of EEMF compared with TSyF is more intuitionistic.
The analysis work after obtaining the spectral data was done in
Spyder integrated development environment. The fluorescence data was 2.4. Vision Transformer
visualized through Matplotlib library, and data augmentation was done
through Pandas tool and NumPy package. The Keras application pro­ The Transformer proposed by Ashish Vaswani (Vaswani et al., 2017)
gramming interface was responsible for building deep neural networks. has great potential in artificial intelligence applications. Different from
More information about these tools can be found on their official web­ RNN using recurrent units, Transformer with the encoder-decoder ar­
sites listed in Table S1 (Supplementary Material). chitecture uses attention mechanism to capture the remote relationship
of sequential data in parallel. Recently, the Transformer has been
applied from NLP to CV domain by some researchers. For image data
2.3. Data augmentation and data visualization that is higher dimensional, noisier and more redundant than sequential
data, the Transformer in the ViT proposed by Alexey Dosovitskiy (Dos­
Data augmentation has been widely used in many machine learning ovitskiy et al., 2020) is directly applied to image classification for the
fields, such as video processing, biometrics, text analysis, but it has very first time. The attention mechanism applied by ViT can focus on
few applications in chemometrics and food science (Georgouli, Osorio, different regions of the image and integrate the information of the whole
Del Rincon, & Koidis, 2018). It can be seen as injecting prior knowledge image, which is superior than CNN filter with local receptive field.
about data invariant attributes into training samples to generate addi­ The overview of ViT for stereoscopic fluorescence spectrum image is
tional training data. Data augmentation can well solve the problem of exhibited in Fig. 1 (a). It follows the original architecture of Transformer
insufficient data for few sample learning. Combining the common data as much as possible. In order to process the two-dimensional image,
augmentation methods in deep learning and the characteristics of image x ∈ ℝH×W×C is divided into a series of small patches
vegetable oil spectra, three data augmentation generators including 2
xp ∈ ℝN×(P ⋅C) . (H, W) represents the pixel size of the original image, C is
linear interpolation, shift and noise injection were proposed. These data
the number of image channels, and (P, P) means the pixel size of each
augmentation generators can be used alone or in combination. The label
of virtual spectrum obtained by data augmentation is the same as that of small patch. N = HW/P2 is the number of small image patches generated
basic data. by dividing and the input sequence length for ViT.
Linear interpolation can simulate more ratios of mixed samples, The sequence of flattened patches is linearly projected to a vector of
which is inspired by mixup (Zhang, Cisse, Dauphin, & Lopez-Paz, 2017). the model dimension D by a learnable embedding matrix E. A learnable
The virtual spectrum is calculated by the following formula: embedding Xclass is prepended to the patch embeddings, which is
required for classification task. In order to retain position information,
I interpolation = λI a + (1 − λ)I b (1) the position embedding Epos is added to the patch embeddings. These can
be expressed by the following formula:
where Iinterpolation represents the augmented virtual data, Ia and Ib are [ ]
two basic data with the same class label, and λ is the mixed proportional z0 = Xclass ; ​ Xp1 E; ​ Xp2 E; ​ ⋅⋅⋅ ​ ; ​ XpN E + Epos
random number obeying uniform distribution in the range of 0–1. (6)
E ∈ ℝ(P ⋅C)×D
2

The data augmentation generator of shift is to simulate the influence (N+1)×D


Epos ∈ ℝ
of the typical instrument correction error on the spectral data. The
process of shift can be expressed by the following formulas: The sequence of embedded patches z0 is the input of Transformer
encoder. As displayed in Fig. 1 (b), the encoder consists of L alternating
Imshift original
= Imn (2)
layers. A multiheaded self-attention (MSA) block and a multilayer per­
′ ′
n

ceptron (MLP) block constitute the main structure of each layer. The
(3)

m = m + round(L(am , bm )) MLP block includes two fully connected layers with GELU non-linearity
activation functions. Each block of the encoder uses a residual connec­
(4)

n = n + round(L(an , bn )) tion and a layer normalization (LN). The calculation of the encoder is as
follows:
original
where Imn represents the original fluorescence intensity at the m − th
(7)

shift
row and n − th column of the spectral matrix. Im ′ ′ is the fluorescence
n
zℓ = MSA(LN(zℓ− 1 )) + zℓ− 1 , ℓ = 1…L
intensity of the m − th row and n − th column after shift. L is the
′ ′

(8)
′ ′
zℓ = MLP(LN(zℓ )) + zℓ , ℓ = 1…L
Laplacian function with position parameter a and scale parameter b.
Noise injection is the third data augmentation generator which can Different from the literature (Dosovitskiy et al., 2020), all the outputs
simulate the jitter caused by the random noise of the spectrometer. The zℓ of Transformer encoder were used as the expression of the image for
formula for multiplicative Gaussian noise is expressed as follows: subsequent classification. The y represents the prediction category label:
noise
Imn original
= (1 + norm(μ, σ ))Imn (5) y = LN(zℓ ) (9)
The MSA block that discovers the relative importance between patch

3
Z. Zhao et al. LWT 158 (2022) 113173

Fig. 1. The overview of Vision Transformer for stereoscopic fluorescence spectrum image (a); The Transformer encoder (b); Multiheaded Self-Attention (c); Scaled
Dot-Product Attention (d). MLP: multilayer perceptron; L: linear; Q: query; K: key; V: value.

embeddings in the sequence is the core of Transformer. It is an extension prediction label is the category with the highest probability.
of self-attention (SA) because it runs h SA operations in parallel and Before defining other indicators, the true positive (TP), the false
connects their outputs. Its structure is shown in Fig. 1 (c), and its formula negative (FN), the false positive (FP) and the true negative (TN) are
for the input sequence z ∈ ℝN×D is as follows: exhibited taking two categories as an example in Table S2 (Supple­
mentary Material). The overall performance of the identification model
MSA(z) = Concat(SA1 (z); ​ SA2 (z); ​ … ​ SAh (z))W is evaluated by Accuracy:
(10)
W ∈ ℝh⋅Dk ×D
TP + TN
Accuracy = × 100% (16)
where Dk is usually equal to D/h. TP + TN + FP + FN
The particular attention in SA is called "Scaled Dot-Product Atten­
The Precision indicates how many of the predicted positive samples
tion" as shown in Fig. 1 (d). For the input sequence z,the weighted sum
are really positive and it is calculated by the following formula:
of all values V is calculated. The attention weight A is obtained by the
calculation of query Q, key K and a softmax function. Three values of Q, Precision =
TP
(17)
K and V are generated by multiplying the input sequence z by the TP + FP
learned UQKV . These operations are calculated as follows: The Recall represents how many positive samples in the dataset are
SA(z) = AV (11) correctly predicted:

/ √̅̅̅̅̅̅ TP
Recall = (18)
A = softmax(QK T Dh ), A ∈ ℝN×N (12) TP + FN
The F1-score combines Precision and Recall, and it is calculated as
[Q, ​ K, ​ V] = zUQKV , UQKV ∈ ℝD×3Dk (13) follows:
Precision × Recall
2.5. Evaluation of performance F1 − score = 2 × (19)
Precision + Recall

The loss value, category prediction probability, Accuracy, Precision, The macro average is the arithmetic average of Precision, Recall, and
Recall, and F1-score were used to evaluate the performance of all the F1-score for all categories.
deep models. The value of Categorical_crossentropy loss function is a Partial least squares discriminant analysis (PLS-DA) was used to
direct indicator to evaluate the inconsistency between output and input, contrast the performance of the proposed method with that of common
and it is defined as: chemometrics. PLS-DA is based on the partial least squares regression
(PLSR), and it combines the properties of PLSR with the discriminant
1 ∑M
ability of classification techniques. The class vector in PLS-DA is trans­
L(y, yreal ) = − yreal i × log(softmax(yi )) (14)
M i=1 formed into a binary Y matrix constituted by n rows (number of samples)
and m columns (the class information), which is a difference from PLSR.
eyi
softmax(yi ) = (15)

M 3. Results and discussion
eyi
i=1
3.1. Analysis of fluorescence spectra
where y and yreal are the output and input values of the deep neural
network, respectively. M is the number of categories. The output of The EEMF and TSyF spectra of all oil samples prepared in the
softmax can be regarded as the category prediction probability, and the

4
Z. Zhao et al. LWT 158 (2022) 113173

laboratory were scanned. The nutrient composition is not the same for fluorescence characteristics of adulterated sesame oil. On the contrary,
different vegetable oils (Xinyan Wu, Zhao, et al., 2021). Therefore, the EEMF spectrum of adulterated sesame oil with low adulteration level
before analyzing the fluorescence spectra of adulterated and adulterated is more similar to that of pure SO, which challenges the identification.
sesame oils, the fluorescence characteristics of pure vegetable oils are Stereoscopic fluorescence spectra can be viewed from multiple di­
shown in Fig. 2 (a), Fig. 2 (b) and Fig. S1 (Supplementary Material). The rections, which is different from low-dimensional spectral visualization
standard fluorescence peak of vitamin E rich in SO appears in the forms. As can be seen in Fig. S2 (Supplementary Material), contour maps
emission band centered at 325 nm (Zandomeneghi, Carbonaro, & Caf­ appear when stereoscopic fluorescence spectra were analyzed on over­
farata, 2005). However, the characteristic peak of vitamin E appears in head view (full perspective for stereogram). Compared with the spectra
the emission band centered at 540 nm as shown in Fig. 2 (a), which is visualized in Fig. 2, the contour maps in Fig. S2 assist in emphasizing
caused by the inner filter effect. The positions of the maximum fluo­ spectral position information, but the height of the characteristic peak is
rescence intensity in Fig. 2 (a) and Fig. 2 (b) are consistent, which in­ not intuitive.
dicates that both the EEMF and TSyF spectroscopy can well characterize Overall, the differences of various oil samples are well demonstrated
oil samples. The emission spectrum between 400 and 500 nm is derived in EEMF and TSyF spectra. However, if the identification of fluorescence
from the oxidation products of fatty acids (Kongbonga et al., 2011; spectroscopy is to be further improved, intelligent analysis methods
Milanez et al., 2017), which can be reflected by CO (Fig. S1 (a)), SBO need to be introduced.
(Fig. S1 (c)) and SSO (Fig. S1 (d)). The spectrum of RSO shown in Fig. S1
(b) is obviously different from that of other vegetable oils because it is 3.2. Effect of data augmentation
rich in chlorophyll that corresponds to the 650–730 nm emission band
(Kyriakidis & Skarkalis, 2000). The samples in the identification models based on EEMF and TSyF
The different adulterated sesame oil samples are shown in Fig. 2. spectroscopy were all expanded by data augmentation. EEMF spectros­
Comparing Fig. 2 (a), (c), (d) and (e), it is found that the EEMF spectra of copy was used as an example to illustrate the effect. For the experiment
pure SO and adulterated sesame oil are quite different. As can be seen of identifying adulterated sesame oil, basic samples with odd tail
from the characteristic peak at 470 nm excitation and 540 nm emission, numbers were selected to train the model, and samples with even tail
the vitamin E of adulterated sesame oil is lower than that of pure SO. numbers were utilized to test the model. For the experiment of identi­
This phenomenon supports the judgment of whether the unknown SO fying adulterated sesame oil, the samples used to train models were
sample is counterfeit. The c, d and e subgraphs illustrate that the adul­ composed of ASO1 ~ ASO3 and SO1 ~ SO3 samples. The remaining
terated sesame oil samples are more or less different from each other samples were employed for testing.
because the raw materials are reflected in these blend samples. The Three data augmentation generators were applied in turn for training
difference between them provides the possibility for adulterated sesame samples. Based on the spectra shown in Fig. S3 (a) (Supplementary
oil traceability, but the slight difference like Fig. 2 (c) and Fig. 2 (e) Material) and Fig. S3 (b), the 50% adulterated virtual adulterated ses­
increases the difficulty of traceability. ame oil spectrum obtained by linear interpolation (λ equals to 0.5) is
It is difficult to identify adulterated sesame oil because SO is the base shown in Fig. S3 (c). The two fluorescence peaks in this virtual spectrum
oil of adulterated samples. Fig. 2 (f) displays the EEMF spectrum of a (Fig. S3 (c)) correspond to vitamin E in Fig. S3 (a) and fatty acids in
adulterated sesame oil sample with the 50% adulteration level. The Fig. S3 (b), which explains that the fluorophores in the basic sample are
spectral shape of this EEMF is wider than that of pure SO because it is well displayed. The position parameter a and the scale parameter b of
enhanced by the fluorescence of SSO. As the level of adulteration in­ shift were 0 and 0.2, respectively. The virtual spectrum shown in Fig. S3
creases, the EEMF spectrum of adulterated sesame oil will show more (d) was obtained after the basic spectrum shown in Fig. S3 (b) was

Fig. 2. The stereoscopic Excitation-emission matrix fluorescence (a) and Total synchronous fluorescence (b) images of SO1 sample; The stereoscopic Excitation-
emission matrix fluorescence images of the CSOCO5 sample (c), the CSORSO5 sample (d), the CSOSBO5 sample (e) and the ASO2 sample (f).

5
Z. Zhao et al. LWT 158 (2022) 113173

shifted by two shift steps along the excitation and emission wavelengths. After the loss value of the validation set no longer changes, the early stop
The redshift phenomenon of the virtual spectrum relative to the basic technique of training the model for another 5 epochs is adopted to avoid
spectrum proves that the shift data augmentation generator simulates the overfitting.
the correction error well and retains the effective fluorescence infor­
mation. The expectation μ and standard deviation σ of noise injection are 3.4. Identification models
0 and 0.02, respectively. Compared with Fig. S3 (b), the EEMF spectrum
shown in Fig. S3 (e) has more noise disturbances. It is clear that the noise 3.4.1. Identification models for counterfeit sesame oil
injection data augmentation generator simulates the noise of the fluo­ The models that use EEMF and TSyF spectroscopy coupled with the
rometer well and does not destroy the effective spectral characteristics. ViT network to identify adulterated sesame oil samples were called
For model training for adulterated sesame oil identification, there Model 1 and Model 2, respectively. The subgraphs a and b in Fig. 3
were a total of 860 data after data augmentation, including 20 original demonstrate that the accuracy and loss of the two models on the training
data, 40 data of linear interpolation, 400 data of linear interpolation & set and validation set are close to 100% and 0 respectively after iterative
shift, and 400 data of linear interpolation & shift & noise injection. For training. It is clear that there is no overfitting in the modeling, which can
model training for adulterated sesame oil identification, there were a be attributed to the data augmentation. It has been found from Table S3
total of 132 data after data augmentation, including six original data, six (Supplementary Material) that the accuracies of test set samples are
data of linear interpolation, 60 data of linear interpolation & shift, and 100%. All of these bold prediction probabilities have exceeded 0.97,
60 data of linear interpolation & shift & noise injection. The 70% of the which is very beneficial for models to output category labels. Table 4
data after data augmentation constituted the training set, and the displays the values of Precision, Recall and F1-score on the validation set
remaining 30% constituted the validation set. and test set. These indicators demonstrate that these two models both
misjudged individual CSOCO and CSOSBO samples on the validation set
because the spectra of the two types of samples are similar as shown in
3.3. Designed Vision Transformer network Fig. 2. The macro averages of these indicators denote that the pure and
counterfeit samples are well identified.
Considering the fluorescence data of oil samples and the ViT prin­ The interpretability of deep learning method has always been a
ciple, a complete ViT network architecture as shown in Table 3 was concern. The output of an intermediate layer in the deep network was
designed. The first layer of ViT network is an input layer with the size of visualized by truncated singular value decomposition (Truncated SVD)
144 × 144 × 3. The second layer divides the input image into patches of dimensionality reduction to assist in increasing the interpretability of
size 12 × 12 × 3. The third layer is responsible for encoding the patch the proposed model. As shown in the two subgraphs of Fig. 4, although
into a vector of length 32 and appending the position embedding. The SO and CSORSO samples are well clustered, CSOCO and CSOSBO sam­
Transformer encoder includes the fourth to twelfth layers which are the ples cannot be well distinguished, which requires deeper network layers
core of the ViT network architecture. The number of attention heads of to learn more. As can be seen from indexes in Table 4, the deeper
MSA is 4. The dimensions of the two fully connected layers in the MLP network layers do make an important contribution to the final distinc­
block are 64 and 32 respectively. The thirteenth layer normalizes the tion between CSOCO and CSOSBO samples.
output of Transformer encoder. The fourteenth network layer is The stability of the deep model should not be ignored in practice
responsible for flattening. The fifteenth to the eighteenth layers are the because there are differences between diverse trainings under the same
MLP block, in which the dimensions of the two fully connected layers are hyperparameters. Taking Model 1 as an example, it was trained 10 times
512 and 256 respectively. The nineteenth layer is the classification repeatedly to evaluate the stability of the ViT network in oil spectral
output layer with a softmax activation function. The number of output modeling. As can be seen from Fig. S4 (Supplementary Material), both
categories is 4 for adulterated sesame oil identification and 2 for adul­ loss and accuracy curves tend to be stable after iterative training
terated sesame oil identification. although their convergence processes vary in different times. Therefore,
The optimizer in the network training process is the AdamW with the stability of the deep models in this study can meet the practical
0.0001 learning rate and 0.0001 wt decay. The batch size is set to 32. requirements.
The comparative analysis with common approaches can highlight
Table 3 the advantages and disadvantages of the proposed method. In the
Layers and their connections of the Vision Transformer network for fluorescence comparative experiment, the optimal emission spectrum in the EEMF
spectroscopy. was manually extracted and fed to PLS-DA. The accuracy on the test set
Order Layer Connected to is 93.75%, and its confusion matrix is shown in Fig. S5 (Supplementary
1 input_1 –
Material). The accuracy is 87.50% for TSyF spectroscopy, which has
2 patches input_1 been reported in a previous study (Xijun Wu, Zhao, et al., 2021). The
3 patch_encoder Patches super parameters of the two researches were determined by the same
4 layer_normalization patch_encoder method. There are misjudgments between CSOCO and CSOSBO samples
5 multi_head_attention layer_normalization
in the two comparison experiments, which does not occur in the
layer_normalization
6 add multi_head_attention approach proposed in this work. However, contrasted with common
patch_encoder chemometrics, the disadvantage of deep learning relying on computer
7 layer_normalization_1 Add resources was exposed. For the comparison with other deep learning
8 dense_1 layer_normalization_1 approaches, the authors of ViT have reported that ViT attained excellent
9 dropout dense_1
10 dense_2 Dropout
results on popular image classification benchmarks compared with
11 dropout_1 dense_2 SOTA (state-of-the-art) CNN models (Dosovitskiy et al., 2020).
12 add_1 dropout_1
Add 3.4.2. Identification models for adulterated sesame oil
13 layer_normalization_2 add_1
The models of EEMF and TSyF spectroscopy to identify adulterated
14 flatten layer_normalization_2
15 dense_3 Flatten sesame oil samples were named Model 3 and Model 4, respectively.
16 dropout_2 dense_3 Fig. 3 (c) and (d) show the training of these two models. All the accu­
17 dense_4 dropout_2 racies are 100% and all the loss values tend to be stable after training,
18 dropout_3 dense_4 which indicates that the models are convergent. There are two reasons
19 dense_5 dropout_3
that may explain why the curves of adulterated sesame oil identification

6
Z. Zhao et al. LWT 158 (2022) 113173

Fig. 3. After the iterative training of deep neural


networks, the loss and accuracy curves for Model 1
(a), Model 2 (b), Model 3 (c) and Model 4 (d). These
models are convergent with the loss curve being close
to 0 and the accuracy curve tends or equals 100%.
Model 1 and Model 2 identify counterfeit sesame oil,
and Model 3 and Model 4 identify adulterated sesame
oil. Training loss: red line with squares; Validation
loss: red line with circles; Training accuracy: blue line
with diamonds; Validation accuracy: blue line with
asterisks. (For interpretation of the references to
colour in this figure legend, the reader is referred to
the Web version of this article.)

Table 4
Precision, Recall and F1-score on the validation set and test set.
Validation set Test set

Precision Recall F1-score Precision Recall F1-score

Model 1 a CSOCO 1.0000 0.9846 0.9922 1.0000 1.0000 1.0000


CSORSO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
CSOSBO 0.9848 1.0000 0.9924 1.0000 1.0000 1.0000
SO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
e
Macro avg 0.9962 0.9962 0.9962 1.0000 1.0000 1.0000
Model 2 b CSOCO 1.0000 0.9692 0.9844 1.0000 1.0000 1.0000
CSORSO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
CSOSBO 0.9701 1.0000 0.9848 1.0000 1.0000 1.0000
SO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
e
Macro avg 0.9925 0.9923 0.9923 1.0000 1.0000 1.0000
Model 3 c ASO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
SO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
e
Macro avg 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Model 4 d ASO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
SO 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
e
Macro avg 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
a
Counterfeit sesame oil (CSO) was identified by Excitation-emission matrix fluorescence (EEMF) spectroscopy.
b
CSO was identified by Total synchronous fluorescence (TSyF) spectroscopy.
c
Adulterated sesame oil (ASO) was identified by EEMF spectroscopy.
d
ASO was identified by TSyF spectroscopy.
e
Macro average.

Fig. 4. The visualization of features learned via


flatten layer of the Model 1 (a) and the Model 2 (b) on
validation set by truncated singular value decompo­
sition. The ellipse represents the confidence interval
at the 99.7% level. Model 1 is the identification by
excitation-emission matrix fluorescence spectroscopy,
and Model 2 is based on total synchronous fluores­
cence spectroscopy. CSOCO (circle), CSORSO
(square) and CSOSBO (diamond) represent counter­
feit sesame oil based on corn oil, rapeseed oil and
soybean oil, respectively. SO (pentagon) represents
pure sesame oil.

models have more intense jitter than those of adulterated sesame oil 100%, which can be easily inferred from the prediction categories and
identification models. One reason is the increased difficulty for adul­ probabilities shown in Table S4 (Supplementary Material). For the
terated sesame oil identification and the other reason is the relatively adulterated sesame oil samples, the bold prediction probabilities
small number of samples. The accuracies of the test set samples are decrease as the proportion of SSO decreases, because the less adulterated

7
Z. Zhao et al. LWT 158 (2022) 113173

the more difficult it is to be identified. As can be seen from the values of Dong, J.-E., Zhang, J., Zuo, Z.-T., & Wang, Y.-Z. (2021). Deep learning for species
identification of bolete mushrooms with two-dimensional correlation spectral
Precision, Recall and F1-score for the two models in Table 4, the models
(2DCOS) images. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy,
for adulterated sesame oil identification are feasible. 249, 119211. https://doi.org/10.1016/j.saa.2020.119211
In summary, the performance of the adulterated and adulterated Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al.
sesame oil identification models evidences that the designed ViT (2020). An image is worth 16x16 words: Transformers for image recognition at scale.
arXiv preprint arXiv:2010.11929.
network is suitable for EEMF and TSyF spectroscopy of vegetable oils. Elzey, B., Pollard, D., & Fakayode, S. O. (2016). Determination of adulterated neem and
flaxseed oil compositions by FTIR spectroscopy and multivariate regression analysis.
4. Conclusion Food Control, 68, 303–309. https://doi.org/10.1016/j.foodcont.2016.04.008
Filoda, P. F., Fetter, L. F., Fornasier, F., de Souza Schneider, R.d. C., Helfer, G. A.,
Tischer, B., et al. (2019). Fast methodology for identification of olive oil adulterated
In this report, the ViT network models based on stereoscopic fluo­ with a mix of different vegetable oils. Food Analytical Methods, 12(1), 293–304.
rescence spectrum image had completed the identification of counterfeit https://doi.org/10.1007/s12161-018-1360-5
Gan, J., Zhou, L., Cui, J., Man, B., Jia, X., Shi, S., et al. (2019). Classification of blood
and adulterated SOs. Both EEMF and TSyF spectra well characterized the species using fluorescence spectroscopy combined with deep learning method.
SO samples, which not only showed the advantages of easy operation Journal of Applied Mathematics and Physics, 7(10), 2324–2332. https://doi.org/
and nondestructive, but also demonstrated good selectivity of high- 10.4236/jamp.2019.710158
Georgouli, K., Osorio, M. T., Del Rincon, J. M., & Koidis, A. (2018). Data augmentation in
dimensional spectra. Three data augmentation generators designed in food science: Synthesising spectroscopic data of vegetable oils for performance
this study made deep learning possible in oil identification. The abun­ enhancement. Journal of Chemometrics, 32(6), Article e3004. https://doi.org/
dant fluorescence information in stereoscopic fluorescence spectrum 10.1002/cem.3004
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., et al. (2020). A survey on visual
images provided more visual input for ViT network. The superior per­ transformer. arXiv preprint arXiv:2012.12556.
formance of the ViT network in oil identification manifested that Hu, F., Zhou, M., Yan, P., Li, D., Lai, W., Bian, K., et al. (2019). Identification of mine
Transformer combined with fluorescence spectroscopy is available and water inrush using laser-induced fluorescence spectroscopy combined with one-
dimensional convolutional neural network. RSC Advances, 9(14), 7673–7679.
scalable. These confirm that this novel method can provide technical
https://doi.org/10.1039/C9RA00805E
support for food supervision departments. The potential of this new Itakura, K., Saito, Y., Suzuki, T., Kondo, N., & Hosoi, F. (2019). Estimation of citrus
vegetable oil quality identification method is considered to be great maturity with fluorescence spectroscopy using deep learning. Horticulturae, 5(1), 2.
because it is green, fast and intelligent. It is worth noting, however, that https://doi.org/10.3390/horticulturae5010002
Ju, L., Lyu, A., Hao, H., Shen, W., & Cui, H. (2019). Deep learning-assisted three-
the characterization of vegetable oils is only realized by non-enhanced dimensional fluorescence difference spectroscopy for rapid identification and semi-
fluorescence spectroscopy in this paper. As people have higher re­ quantification of illicit drugs in bio-fluids. Analytical Chemistry, 91(15). https://doi.
quirements for food safety, the enhanced spectroscopy based on nano­ org/10.1021/acs.analchem.9b01315
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2021).
technology can be tried in the future research to realize the detection of Transformers in vision: A survey. arXiv preprint arXiv:2101.01169.
trace substances in vegetable oil. Moreover, multivariate calibration Kongbonga, Y. G. M., Ghalila, H., Onana, M. B., Majdi, Y., Lakhdar, Z. B., Mezlini, H.,
regression based on the current research will be modeled, which is a et al. (2011). Characterization of vegetable oils by fluorescence spectroscopy. Food
and Nutrition Sciences, 2(7), 692–699. https://doi.org/10.4236/fns.2011.27095
very significant work for spectral analysis. Kyriakidis, N. B., & Skarkalis, P. (2000). Fluorescence spectra measurement of olive oil
and other vegetable oils. Journal of AOAC International, 83(6), 1435–1439. https://
CRediT authorship contribution statement doi.org/10.1093/jaoac/83.6.1435
Lin, H., Li, Z., Lu, H., Sun, S., Chen, F., Wei, K., et al. (2019). Robust classification of tea
based on multi-channel LED-induced fluorescence and a convolutional neural
Zhilei Zhao: Conceptualization, Investigation, Methodology, Soft­ network. Sensors, 19(21), 4687. https://doi.org/10.3390/s19214687
ware, Writing – original draft, Writing – review & editing, Validation, Liu, Y., Yao, L., Xia, Z., Gao, Y., & Gong, Z. (2021). Geographical discrimination and
adulteration analysis for edible oils using two-dimensional correlation spectroscopy
Formal analysis. Xijun Wu: Funding acquisition, Project administration, and convolutional neural networks (CNNs). Spectrochimica Acta Part A: Molecular and
Data curation, Conceptualization, Methodology, Writing – review & Biomolecular Spectroscopy, 246, 118973. https://doi.org/10.1016/j.saa.2020.118973
editing. Hailong Liu: Supervision. Milanez, K. D. T. M., Nóbrega, T. C. A., Nascimento, D. S., Insausti, M., Band, B. S. F., &
Pontes, M. J. C. (2017). Multivariate modeling for detecting adulteration of extra
virgin olive oil with soybean oil using fluorescence and UV–vis spectroscopies: A
Declaration of competing interest preliminary approach. LWT-Food Science and Technology, 85, 9–15. https://doi.org/
10.1016/j.lwt.2017.06.060
Moore, J. C., Spink, J., & Lipp, M. (2012). Development and application of a database of
No authors declare any conflicts of interest. food ingredient fraud and economically motivated adulteration from 1980 to 2010.
Journal of Food Science, 77(4), R118–R126. https://doi.org/10.1111/j.1750-
Acknowledgments 3841.2012.02657.x
Ni, Y., Zhang, G., & Kokot, S. (2005). Simultaneous spectrophotometric determination of
maltol, ethyl maltol, vanillin and ethyl vanillin in foods by multivariate calibration
This work was supported by the National Natural Science Foundation and artificial neural networks. Food Chemistry, 89(3), 465–473. https://doi.org/
of China (NSFC 11674275), Natural Science Foundation of Hebei 10.1016/j.foodchem.2004.05.037
Pan, Y., Lai, K., Fan, Y., Li, C., Pei, L., Rasco, B. A., et al. (2014). Determination of tert-
Province (F2020203110; F2021203052), Science and technology butylhydroquinone in vegetable oils using surface-enhanced Raman spectroscopy.
research project of Hebei higher education institutions (QN2018071). Journal of Food Science, 79(6), T1225–T1230. https://doi.org/10.1111/1750-
3841.12482
Qiu, J., Hou, H.-Y., Huyen, N. T., Yang, I.-S., & Chen, X.-B. (2019). Raman spectroscopy
Appendix A. Supplementary data and 2DCOS analysis of unsaturated fatty acid in edible vegetable oils. Applied
Sciences, 9(14), 2807. https://doi.org/10.3390/app9142807
Supplementary data to this article can be found online at https://doi. Sikorska, E., Górecki, T., Khmelinskii, I. V., Sikorski, M., & Kozioł, J. (2005).
Classification of edible oils using synchronous scanning fluorescence spectroscopy.
org/10.1016/j.lwt.2022.113173. Food Chemistry, 89(2), 217–225. https://doi.org/10.1016/j.foodchem.2004.02.028
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017).
References Attention is all you need. arXiv preprint arXiv:1706.03762.
Wang, T., Wu, H.-L., Long, W.-J., Hu, Y., Cheng, L., Chen, A.-Q., et al. (2019). Rapid
identification and quantification of cheaper vegetable oil adulteration in camellia oil
Bazi, Y., Bashmal, L., Rahhal, M. M. A., Dayil, R. A., & Ajlan, N. A. (2021). Vision
by using excitation-emission matrix fluorescence spectroscopy combined with
transformers for remote sensing image classification. Remote Sensing, 13(3), 516.
chemometrics. Food Chemistry, 293, 348–357. https://doi.org/10.1016/j.
https://doi.org/10.3390/rs13030516
foodchem.2019.04.109
Chu, X., Wang, W., Li, C., Zhao, X., & Jiang, H. (2018). Identifying camellia oil
Wu, X., Bian, X., Lin, E., Wang, H., Guo, Y., & Tan, X. (2021). Weighted multiscale
adulteration with selected vegetable oils by characteristic near-infrared spectral
support vector regression for fast quantification of vegetable oils in edible blend oil
regions. Journal of Innovative Optical Health Sciences, 11(2), 1850006. https://doi.
by ultraviolet-visible spectroscopy. Food Chemistry, 342, 128245. https://doi.org/
org/10.1142/S1793545818500062
10.1016/j.foodchem.2020.128245
da Costa, G. B., Fernandes, D. D. S., Gomes, A. A., de Almeida, V. E., & Veras, G. (2016).
Wu, X., Zhao, Z., Tian, R., Niu, Y., Gao, S., & Liu, H. (2021). Total synchronous
Using near infrared spectroscopy to classify soybean oil according to expiration date.
fluorescence spectroscopy coupled with deep learning to rapidly identify the
Food Chemistry, 196, 539–543. https://doi.org/10.1016/j.foodchem.2015.09.076

8
Z. Zhao et al. LWT 158 (2022) 113173

authenticity of sesame oil. Spectrochimica Acta Part A: Molecular and Biomolecular Yuan, Y.-Y., Wang, S.-T., Wang, J.-Z., Cheng, Q., Wu, X.-J., & Kong, D.-M. (2020). Rapid
Spectroscopy, 244, 118841. https://doi.org/10.1016/j.saa.2020.118841 detection of the authenticity and adulteration of sesame oil using excitation-emission
Xu, J., Liu, X.-F., & Wang, Y.-T. (2016). A detection method of vegetable oils in edible matrix fluorescence and chemometric methods. Food Control, 112, 107145. https://
blended oil based on three-dimensional fluorescence spectroscopy technique. Food doi.org/10.1016/j.foodcont.2020.107145
Chemistry, 212, 72–77. https://doi.org/10.1016/j.foodchem.2016.05.158 Zandomeneghi, M., Carbonaro, L., & Caffarata, C. (2005). Fluorescence of vegetable oils:
Yang, J., Xu, J., Zhang, X., Wu, C., Lin, T., & Ying, Y. (2019). Deep learning for Olive oils. Journal of Agricultural and Food Chemistry, 53(3), 759–766. https://doi.
vibrational spectral analysis: Recent progress and a practical guide. Analytica org/10.1021/jf048742p
Chimica Acta, 1081, 6–17. https://doi.org/10.1016/j.aca.2019.06.012 Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk
minimization. arXiv preprint arXiv:1710.09412.

You might also like