1 s2.0 S1386142518309077 Main

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 219 (2019) 8–14
Contents lists available at ScienceDirect
Spectrochimica Acta Part A: Molecular and Biomolecular

Spectroscopy
journal homepage: www.elsevier.com/locate/saa
Fourier transform infrared spectroscopy and chemometrics for the

discrimination of paper relic types
Jingjing Xia, Jixiong Zhang, Yueting Zhao, Yangming Huang, Yanmei Xiong ⁎, Shungeng Min ⁎
College of Science, China Agricultural University, Beijing 100193, PR China
a r t i c l e i n f o a b s t r a c t
Article history: The paper relic identification is a pending issue to be resolved in the field of cultural heritage. As we all known,
Received 30 May 2018 heritage paper has significant importance in archaeological research. Nowadays, there are a variety of research
Received in revised form 27 August 2018 methodologies focuses on the analysis of inks for dating documents. While the paper analysis attained little at-
Accepted 30 September 2018
tention. This work is to explore the non-destructive application of ATR-FTIR technique in discrimination of
Available online 2 October 2018
paper relics. 15 types of paper spectra were collected by ATR-FTIR, which wavenumber range were range from
Keywords:
4000 to 650 cm−1. And the moving average smoothing and normalization was used for pretreatment analysis.
Pattern recognition Five different classification algorithms, principal component analysis-linear discriminant analysis (PCA-LDA),
ATR-FTIR partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), least
Paper relics squares-support vector machine (LS-SVM), partial least squares-linear discriminant analysis (PLS-LDA) were se-
LS-SVM lected to classify the types of paper. PLS-LDA and LS-SVM are effective techniques with 100% classification accu-
PLS-LDA racy. PCA-LDA, PLS-DA and SIMCA give accuracy of 98.67%, 97.33% and 95.56%, respectively. The present
experiment suggested that ATR-FTIR combining with chemometrics will be highly useful in paper identification
of cultural heritage.
© 2018 Published by Elsevier B.V.
1. Introduction SIMS imaging has been used to investigate the distribution of paper-
making chemicals on the surface of various uncoated and coated papers
Paper relics are an important part of cultural heritage, containing [14]. Diffuse reflectance UV–VIS–NIR spectra were used to differentiate
considerable precious cultural historical materials and records [1].And 20 office paper samples which could not be indisputably discriminated
the identification of paper relics is an indispensable part of the identifi- by a preliminary visual examination under several different lighting
cation of cultural relics. However, identifying the paper relic is still sources [20]. X-ray diffraction (XRD) was used to identify different crys-
depended on the experiences of researchers [2], and makes the papers talline phases of rice paper, and FT-IR allowed good identification of the
distinguishable physically from their appearance by modern mi- substances present in pigments and inkpads and differentiates each era
croscopy [3]. The modern microscopy method brings great convenience of rice paper [21]. Matías Calcerrada reviewed the forensic analysis of
to cultural heritage researchers. However, this analysis method is usu- questioned documents have been published. The research include the
ally destructive to heritage remains and needs the operation of analysis of paper and questioned documents and study on intersecting
professionals. lines [22]. There use ATR IR spectroscopy and DRIFT spectroscopy to
Nowadays, analytical techniques such as mass spectrometry [4–9], study paper samples, however discrimination of these samples only
elemental analysis [8,10], X-rays coupled with different detectors has 67.86% [16]. J. Andrasko used FTIR good distinguish the black print-
[11–13], ToF-SIMS imaging [14], UV-VIS [15], IR [3,16–18] were used ing inks, paper, plastics, photocopy toners, and transfer letter [18]. FT-IR
by various groups to discriminate and characterize the paper samples. spectra of six papers were classified by SIMCA in the study of Samantha
LA-ICP-MS has been employed for paper characterization, which Stewart. The aim of his research was to determine the discrimination
maintain the advantages of ICP-MS, but the laser makes it a quasi- power and data pre-processing techniques [17]. Raj Kumar used ATR-
non-destructive technique, but there is unavoidably to be need pre- FTIR to discriminate paper samples. Although his discriminating
treatment. X-ray techniques have been used to determine the elemental power (DP) DP equals 99.64% [3], he divided all samples into pairs,
composition of paper [12]. UVA irradiation was used to investigate the which increased mass of data processing operations and couldn't com-
fluorescence and photochemical properties of Xuan paper [19]. ToF- pare three or more samples.
All the literature above analyzes modern paper or the ink on the
⁎ Corresponding authors. paper. Only a little literature analyzes the raw materials of paper relics
E-mail addresses: xiongym@cau.edu.cn (Y. Xiong), minsg@263.net (S. Min). [23,24]. Using ATR-FTIR to classify paper artifacts, no literature was
https://doi.org/10.1016/j.saa.2018.09.059
1386-1425/© 2018 Published by Elsevier B.V.
J. Xia et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 219 (2019) 8–14 9
Table 1 range of 4000–650 cm−1. The recording was done with the ATR spec-
The number of training set and test set. trometer Nicolet iS5 (Thermo Scientific, America). The samples were
Serial Paper name Samples Training Test pressed against the diamond crystal of the ATR device until a torque
letter set set knob ensured that the pressure applied was the same for all measure-
A Qian'an painting paper 45 30 15 ments [25]. 45 spectra for each type of paper were collected (both
B Red star special net paper 45 30 15 sides of paper should be collected), there were 675 spectra. The crystal
C Zhejiang bamboo paper 45 30 15 was cleaned between successive measurements with anhydrous etha-
D Powder paper 45 30 15
nol to avoid minute contamination. The cleaned crystal was checked
E The base paper of powder 45 30 15
F Hunan knot incense paper 45 30 15 by running a background spectrum.
G Hubei bamboo paper 45 30 15
H Guangdong bamboo paper 45 30 15
I Guizhou chu-pi paper 45 30 15 2.3. Software
J Jiajiang painting and calligraphy paper 45 30 15
K Dai nationality copy paper 45 30 15
L Naxi dongba paper 45 30 15 The free classification_toolbox_4.0 was applied with MATLAB
M Tengchong calligraphy and painting 45 30 15 R2014a to develop PCA, SIMCA, PLS-DA, PCA-LDA, PLS-LDA models.
paper The free LS-SVM toolbox (LS-SVM v 1.8, K. De Brabanter, P. Karsmakers)
N Xinjiang mulberry paper 45 30 15
was applied with MATLAB R2014a to develop models. The spectral col-
O Tibetan wolf poison paper 45 30 15
Total 675 450 225 lected by OMNIC 9.7.43.
found for this method. The aim of present work is exploring the non- 2.4. Analytical Method
destructive application of ATR-FTIR technique for characterization and
discrimination of paper relics. 2.4.1. Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is the most widely used as a
2. Materials and Methods multivariate analysis method, which provides an unsupervised inter-
pretation without prior assumptions on identifying of different samples
2.1. Paper Material [26]. PCA aims to reduce the dimensionality of the original data space by
using a smaller and more efficient abstract space of latent variables,
15 types of paper (Qian'an painting paper, red star special net paper, where the data can be displayed and the information of the original
Zhejiang bamboo paper, powder paper, the base paper of powder, space is essentially kept [27].
Hunan knot incense paper, Hubei bamboo paper, Guangdong bamboo
paper, Guizhou chu-pi paper, Jiajiang painting and calligraphy paper,
Dai nationality copy paper, Naxi dongba paper, Tengchong calligraphy 2.4.2. Soft Independent Modeling of Class Analogy (SIMCA)
and painting paper, Xinjiang mulberry paper and Tibetan wolf poison Soft Independent Modeling of Class Analogy (SIMCA), is known as a
paper) were saved into hermetic bags until the spectrums were col- supervised pattern recognition method, identify different classes of
lected. In this report, 15 types of paper are labelled from A to O respec- samples based on classification rules that defined by the values of dis-
tively. The serial letter and the paper name are shown in Table.1. tinct measurements provided for samples belonging to known, but dif-
ferent classes in the training set [28]. Unknown samples are then
2.2. Attenuated Total Reflection (ATR) and Sample Preparation compared to the class models and assigned to classes according to
their distinct to the training set [29]. The optimal number of PCs should
ATR-FTIR spectra for both background and paper were measure- be chosen for each model separately, according to a suitable validation
ments were recorded at a resolution of 2 cm−1 and 16 scans in the procedure.
Fig. 1. (a) ATR absorbance spectra of the original spectrum; (b) ATR absorbance spectra after moving average smoothing and normalization.
10 J. Xia et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 219 (2019) 8–14
(a) The spectra of 15 types of paper
(b) The loading of the first three PCs
Fig. 2. (a) The spectra of 15 types of paper. (b) The loading of the first three PCs.
2.4.3. Partial Least Squares-discriminate Analysis (PLS-DA) components (PCs) is as an important parameter decided by cross-
Partial Least Squares-Discriminate Analysis (PLS-DA) has discrimina- validation.
tion power of a classification technique, is a multivariate technique widely
used in quantitative analysis. It is a linear classification method based on 2.4.4. Least Squares Support Vector Machine (LS-SVM)
the partial least squares regression which find latent variables with a Least Squares Support Vector Machines (LS-SVM) is reformulation
maximum covariance with the chemical information [30]. In the study, to standard Support Vector Machine(SVM) [31,32] which was firstly
PLS-DA was applied to different types of paper. The number of principal proposed by Suykens as a regression machine. LS-SVM adopts equality
Fig. 3. (a) is two-dimensional map of paper samples; (b) is three-dimensional map of paper samples.
9
8
7
The nnumber of PC
6
5
4
3
2
1
0
A B C D E F G H I J K L M N O
The type of paper
Fig. 4. The number of PC for SIMCA with each type of paper.
constraints, which attempts to minimize the least squares error on the In this way, PLS-LDA can extract useful information and reduce the
training samples while simultaneously to maximize the margin be- amount of calculation.
tween two classes [33]. In the model development using LS-SVM and ra-
dial basis function (RBF) kernel, the optimal combination of gam(c) and 2.4.7. Data Pretreatment
sig2 (r2) parameters was selected when resulting in smaller root mean The ATR spectra were pretreated in order to eliminate noise or opti-
square error of cross validation (RMSECV) [34]. cal path difference. Moving average smoothing (window size is 5) was
used because it could eliminate noise to some extent. Otherwise, the op-
tical path difference is inevitable for every ATR. In addition, the normal-
2.4.5. Principal Component Analysis-linear Discriminant Analysis (PCA-
ization is done prior to statistical analysis, which overcomes non-
LDA)
uniformity in the paper surfaces and was useful to reduce optical path
Linear Discriminant Analysis (LDA) aims to find ideal projections
difference [3]. Moving average smoothing with a window size of 5 and
and performs classification on the projected subspace. The projections
normalization were used for data pretreatment. The original spectrum
maximize the projected distances between classes while maintaining
is shown in Fig. 1 (a) and the spectrum after moving average smoothing
a minimum projection distance between subjects in the same class
and normalization is shown in Fig. 1 (b).
[35]. The resulting projection may be used as a linear classifier [36,37].
2.4.8. Select Training Set and Test Set
2.4.6. Partial Least Squares-linear Discriminant Analysis (PLS-LDA) As is the case with all classification methods, there is a training set
Partial least squares-linear discriminant analysis (PLS-LDA), is a new and a test set. The data set split into a training set and a test set by sys-
linear discriminant analysis algorithm improved by Prof. Liang yizeng of tematic sampling, and the distance is three. Hence, test set is one-third
Central South University, combines PLS and LDA. Using PLS to extract of the samples. The training set implies that one has identified enough
principal components firstly, and then do linear discriminant analysis. samples as members of each class to be able to build a reliable model.
Fig. 5. Performance of the training set with different number of PC by cross validation, (a) is PLS-DA (b) is PCA-LDA.
100
95
90
Accuracy(%)
85
80
75
70
65
60
gridsearch gridsearch gridsearch gridsearch simplex simplex simplex simplex
MOC OneVsOne ECOC OneVsAll MOC OneVsOne ECOC OneVsAll
training set test set
Fig. 6. The result of LS-SVM.
It also requires enough variables measured to describe the samples ac- where n is the total number of samples. Not assigned samples are not
curately. The test set uses significance tests to classify new samples. considered for the accuracy calculation.
We used training set to build model by ten-fold cross validation. Mean-
while, the Monte-Carlo sampling was also employed to further confirm 3. Result and Discussion
the results obtained by systematic sampling, for whose prediction errors
on different test sets are generated by random sampling [38]. There are To conduct the follow-up analytical methods, we collected the paper
675 samples with 450 training set samples and 225 test set samples re- samples from 4000 cm−1 to 650 cm−1. Characterization is made by
spectively. Table 1 summarized the training set and test set. using qualitative features, as well as by chemometric analysis i.e.
matching peaks of paper samples with standard reference spectra ob-
2.4.9. Model Efficiency Estimation tained either from scanning standard samples in FTIR in the present
To characterize prediction ability (efficiency) of created classifica- case or from the available literature. I list the spectra of paper samples
tion model, the error rate and accuracy are used. in Fig. 2 (a). Since severe spectra overlap, no conspicuous distinct
found between samples. The most of the samples show prominent
PG ngg peaks at fingerprint regions, i.e. around 1020 cm−1 is the stretching vi-
g¼1
ng bration of C\\O, 1620 cm−1 is the stretching vibration of C_O, and
Error rate ¼ 1− ð1Þ some other peaks at 2800–2900 cm−1 is the stretching vibration of
G
C\\H and 3000–3500 cm−1 is the stretching vibration of O\\H
respectively.
where ng is the total number of samples belonging to the g-th class and
Since severe spectra overlap, there is no way to distinguish paper by
ngg is the number of samples belonging to class g and correctly assigned
eyes, thus we often use spectra combining with chemometrics to distin-
to class g.
guish the types of paper. PCA is a classic algorithm for dimensionality re-
Accuracy is the ratio of correctly assigned samples:
duction in chemometrics. The components extracted from the complete
PG data are in the form of the first three principal components (PC). Though
g¼1 ngg comparing the main peak area in Fig. 2 (a) and (b), we can distinguish
Accuracy ¼ ð2Þ
n the first PC mainly extracts C\\O information; the second PC mainly
training set
test set
100
98
96
94
Accuracy˄%˅
92
90
88
86
84
82
80
PCA-LDA LS-SVM PLS-DA PLS-LDA SIMCA
Fig. 7. Classification result of the five algorithms by systematic sampling.

Fig. 8. The confusion matrix of test set: (a) is the result of SIMCA; (b) is the result of PCA-LDA; (c) is the result of PLS-DA; (d) is the result of PLS-LDA/LS-SVM.
contains O\\H information; the third PC mainly includes C_O and optimal number of PC by 10-fold cross-validation, and Fig. 4 shows
C\\H information. The first three PCs include almost chemical informa- the optimal number of PC that should be employed in the SIMCA.
tion we need. In summary, using PCA for dimensionality reduction is ef-
fective and can retain the most of the chemical information. Thus, we 3.2. PLS-DA&PLS-LDA
can use the first three PCs to classify the paper samples.
Thus, we can only use the first two PCs to classify the paper samples. PLS-DA is based on PLS, classification category as chemical value to
The result is shown in Fig. 3 (a), the abscissa is first PC which contains build PLS model. The chemical value of 15 types of paper were assigned
the 46% of information, the ordinate is second PC which contains 26% from 1 to 15 respectively in this study. PLS-LDA combines PLS and LDA.
of information. We only classify the paper type G and L from the two- Both PLS-DA and PLS-LDA determine the optimal number of PC by
dimensional map. With adding the third PC which contains 10% of infor- building PLS model. Determining the optimal number of PC is very im-
mation, we get a three-dimensional map to classify the types. We can portant step. If the PC is too big, the model will be overfit. On the con-
get the similar result with three-dimensional map from Fig. 3 (b), trary, the model will not be well represented if the PC is too small.
which merely classifies one more paper type O, in addition. Hence, Fig. 5 (a) shows the optimal number of PCs that should be employed
using dimension reduction only to classify the paper types is not in the PLS-DA and PLS-LDA models by 10-fold cross validation on the
enough. Next, we try to use pattern recognition methods to classify all training set. We can see that the optimal number of PC for PLS-DA
the types of paper accurately. is 15.
3.3. PCA-LDA
3.1. SIMCA
LDA looks for a projection where samples from the same class are
PCA models for each class were constructed independently for projected very close to each other, at the same time, the between-
SIMCA calculation. Before calculating the distance from the unknown class are projected as farther apart as possible. PCA-LDA tries to reduce
sample to the model, and based on the distance to predict the type of dimensions before LDA. Fig. 5 (b) shows the optimal number of PC
unknown samples. Every training set was used to determine the should be employed in the PCA-LDA model by 10-fold cross validation
for all training set. When the abscissa is the number of PC, the ordinate
is error rate. We can find that the optimal number of PC for PCA-LDA is 8.
Table 2
The result of systematic sampling and Monte-Carlo sampling.
3.4. LS-SVM
Algorithm Training set Test set
Systematic Monte-Carlo Systematic Monte-Carlo SVM separates the classes with a decision surface that maximizes
PCA-LDA 3/450 4/450 2/225 3/225 the margin between the classes, thus SVM is suit for two categories.
LS-SVM 0/450 0/450 0/225 0/225 The function of LS-SVM is as a multiclass classifier by combining several
PLS-DA 8/450 7/450 7/225 6/225 binary SVM classifiers. ENVI Classic's implementation of SVM uses the
PLS-LDA 0/450 0/450 0/225 0/225 pairwise classification strategy which included one versus one coding
SIMCA 3/450 0/450 10/225 10/225
(OneVsOne), minimum output coding (MOC), error correcting output
code (ECOC), one versus all coding (OneVsAll) for multiclass classifica- paper types: application in forensic document examinations, Spectrochim. Acta A
Mol. Biomol. Spectrosc. 170 (2017) 19–28.
tion. And optimization methods are simplex and gridsearch. We can [4] J. Adams, Analysis of printing and writing papers by using direct analysis in real time
see the result from Fig. 6. There are many optimal combinations with mass spectrometry, Int. J. Mass Spectrom. 301 (1) (2011) 109–126.
the accuracy of training set and test set are 100%. [5] L.S. Eberlin, et al., Instantaneous chemical profiles of banknotes by ambient mass
spectrometry, Analyst 135 (10) (2010) 2533.
Fig. 7 shows the summary results of the five algorithms by systematic [6] P.M. Lalli, et al., Fingerprinting and aging of ink by easy ambient sonic-spray ioniza-
sampling. As it shows, although the training set of SIMCA achieves the tion mass spectrometry, Analyst 135 (4) (2010) 745–750.
best classification performance with 100.0%, its result of test set is [7] E.A. Mcgaw, D.W. Szymanski, R.W. Smith, Determination of trace elemental concen-
trations in document papers for forensic comparison using inductively coupled
95.56%. The training set of PCA-LDA and PLS-DA give the classification plasma-mass spectrometry, J. Forensic Sci. 54 (5) (2009) 1163–1170.
performance with 99.33% and 98.22%, the result of test set is 99.11% and [8] L.D. Spence, A.T. Baker, J.P. Byrne, Characterization of document paper using ele-
96.89%. Not only training set but test set, LS-SVM and PLS-LDA are 100%. mental compositions determined by inductively couple plasma mass spectrometry,
J. Anal. At. Spectrom. 15 (7) (2000) 813–819.
Usually only a pair of training set and test set cannot promise a con-
[9] T. Trejos, A. Flores, J.R. Almirall, Micro-spectrochemical analysis of document paper and
vincing result. Thus, we used Monte-Carlo sampling technique to get a gel inks by laser ablation inductively coupled plasma mass spectrometry and laser in-
new pair of training set and test set, and rebuild model to testify the reduced breakdown spectroscopy, Spectrochim. Acta, Part B 65 (11) (2010) 884–895.
sults achieved by systematic sampling. The confusion matrix of test set [10] L.D. Spence, R.B. Francis, U. Tinggi, Comparison of the elemental composition of office
document paper: evidence in a homicide case, J. Forensic Sci. 47 (3) (2002) 648–651.
was used to show the results of five algorithms. (Fig. 8) The results of [11] J.L. Enyeart, et al., Non-destructive elemental analysis of photographic paper and
PLS-LDA and LS-SVM are same. Thus, put them in a figure. The results emulsions by X-ray fluorescence spectroscopy, Hist. Photogr. 7 (2) (1983) 99–113.
of systematic sampling and the Monte-Carlo sampling are listed in [12] A.V. Es, J.D. Koeijer, G.V.D. Peijl, Discrimination of document paper by XRF, LA–ICP–MS
and IRMS using multivariate statistical techniques, Sci. Justice 49 (2) (2009) 120–126.
Table 2. The form of S/N in Table.2, and the S represent the number of [13] M. Rožić, M.R. Mačefat, V. Oreščanin, Elemental analysis of ashes of office papers by
misclassify samples, and the N represent the number of training set or EDXRF spectrometry, Nucl. Inst. Methods Phys. Res. B 229 (1) (2005) 117–122.
test set. We can see the similar results according to Table 2, and confirm [14] P. Fardim, B. Holmbom, ToF-SIMS imaging: a valuable chemical microscopy tech-
nique for paper and paper coatings, Appl. Surf. Sci. 249 (1) (2005) 393–407.
the results achieved by systematic sampling. In addition, LS-SVM, PLS- [15] R. Kumar, V. Kumar, V. Sharma, Discrimination of various paper types using diffuse
LDA, PCA-LDA and PLS-DA present more stable accuracy than SIMCA, reflectance ultraviolet-visible near-infrared (UV-vis-NIR) spectroscopy: forensic ap-
which indicates the algorithm have the better adaptability in dealing plication to questioned documents, Appl. Spectrosc. 69 (6) (2015) 714.
[16] A. Kher, et al., Classification of document papers by infrared spectroscopy and mul-
with the paper relics. The results of LS-SVM and PLS-LDA were better tivariate statistical techniques, Appl. Spectrosc. 55 (9) (2001) 1192–1198.
than PLS-DA and PCA-LDA, therefore, we chose the LS-SVM and PLS- [17] A. Kehr, S. Stewart, M. Mulholland, Forensic classification of paper with infrared
LDA as our final models. spectroscopy and principal components analysis, J. Near Infrared Spectrosc. 13 (1)
(2005) 225.
Through comparing the results of the five algorithms, PLS-LDA and
[18] J. Andrasko, Microreflectance FTIR techniques applied to materials encountered in
LS-SVM stand out as the suitable algorithms. To some extent, the result forensic examination of documents, J. Forensic Sci. 41 (5) (1996) 812–823.
we can explain from some theories. PLS-LDA is linear classifier, but LS- [19] Y. Tang, G.J. Smith, Fluorescence and photodegradation of Xuan paper: the
SVM is non-linear classifier. Thus, we can get the types of paper are lin- photostability of traditional Chinese handmade paper, J. Cult. Herit. 14 (6) (2013)
464–470.
ear separable. PLS-LDA reduce dimension firstly as a classifier. Com- [20] V. Causin, et al., The discrimination potential of diffuse-reflectance ultraviolet-
pared PCA-LDA, PLS-LDA used PLS to reduced dimension. PLS combine visible-near infrared spectrophotometry for the forensic analysis of paper, Forensic
chemical information and variable information to reduce dimension. Sci. Int. 216 (1–3) (2012) 163–167.
[21] N. Na, et al., Non-destructive and in situ identification of rice paper, seals and pig-
However, the directions of PCA projection only have the largest variable ments by FT-IR and XRD spectroscopy, Talanta 64 (4) (2004) 1000–1008.
information. The direction of having the largest variable information is [22] M. Calcerrada, C. Garcia-Ruiz, Analysis of questioned documents: a review, Anal.
not good for classification sometimes. PLS is more reasonable than Chim. Acta 853 (2015) 143–166.
[23] Xi-yun Luo, Y.-p. D., Mei-hua Shen, Wen-qing Zhang, Xin-guang Zhou, Shu-ying
PCA. LS-SVM used mapping to put samples into more dimensions, and Fang, Xuan Zhang, Investigation of fibrous cultural materials by infrared spectros-
get hyperplane to classify samples. Some support vectors decide the hy- copy, Spectrosc. Spectr. Anal. 35 (1) (2015) 60–64.
perplane. The changing of unrelated sample will not affect the hyper- [24] Xue-yun Liu, Z.-x. X., Mu-yi Hu, Identification of Pulps by Near Infrared Spectroscopy
Based on SIMCA Model, 7, China Pulp & Paper, 2011 20–24.
plane selection. Thus, LS-SVM has better robustness. Moreover, LS- [25] G.D. Anda, et al., Feasibility study for the detection of Trichinella spiralis in a murine
SVM is very suit for small samples, there are 45 samples for each type model using mid-Fourier transform infrared spectroscopy (MID-FTIR) with attenu-
in this study. Most of classifiers are based on statistics. The number of ated total reflectance (ATR) and soft independent modelling of class analogies
(SIMCA), Vet. Parasitol. 190 (3–4) (2012) 496–503.
45 cannot call big samples for classifiers.
[26] Y.J. Liu, et al., Estimating the number of components and detecting outliers using
Angle Distribution of Loading Subspaces (ADLS) in PCA analysis, Anal. Chim. Acta
4. Conclusion 1020 (2018) 17.
[27] C. Mees, et al., Identification of coffee leaves using FT-NIR spectroscopy and SIMCA,
Talanta 177 (2018) 4.
This study has shown that LS-SVM and PLS-LDA are both effective [28] D.L. Flumignan, et al., Screening Brazilian C gasoline quality: application of the
methods for distinguishing the types of paper with their comparatively SIMCA chemometric method to gas chromatographic data, Anal. Chim. Acta 595
(1) (2007) 128–135.
satisfactory classification accuracy. Rapid identified paper by using ATR-
[29] Quan-sheng Chen, J.-W. Z., Hai-dong Zhang, Mu-hua Liu, et al., Food Sci. 27 (4)
FTIR spectroscopy and chemometrics. We used five chemometrics (2006) 186–189.
(PCA-LDA, PLS-DA, PLS-LDA, SIMCA, LS-SVM) in this study. Among all [30] D. Ballabio, V. Consonni, Classification tools in chemistry. Part 1: linear models. PLS-
the methods, the LS-SVM and PLS-LDA classification methods showed DA, Anal. Methods 5 (16) (2013) 3790–3798.
[31] T.V. Gestel, et al., Benchmarking least squares support vector machine classifiers,
good discrimination between 15 types of paper. However, there are Neural. Process. Lett. 9 (3) (1999) 293–300.
only 15 types of paper. In practice, the database should be expanded [32] J.A.K. Suykens, J. Vandewalle, Least Squares Support Vector Machine Classifiers,
with more kinds of paper and regularly update. Once a broad and full- Kluwer Academic Publishers, 1999 293–300.
[33] B. Yang, et al., A study on regularized weighted least square support vector classifier,
scale database is established, ATR-FTIR combining with chemometrics Pattern Recogn. Lett. 108 (2018) 48–55.
can not only be useful for paper culture relics, but also shed light on his- [34] Y. Shao, et al., Discrimination of tomatoes bred by spaceflight mutagenesis using vis-
torical relic's identification barrier. ible/near infrared spectroscopy and chemometrics, Spectrochim. Acta A Mol.
Biomol. Spectrosc. 140 (2015) 431–436.
[35] L.H. Chen, C.R. Jiang, Sensible functional linear discriminant analysis, Comput. Stat.
References Data Anal. 126 (2018).
[36] S.A. Awais, Face recognition using principle component analysis (PCA) and linear
[1] Q. Li, S. Xi, X. Zhang, Conservation of paper relics by electrospun PVDF fiber mem- discriminant analysis (LDA), Int. J. Electr. Comput. Sci. 22 (2012) 140–146.
branes, J. Cult. Herit. 15 (4) (2014) 359–364. [37] S.A. Patil, et al., Principle Component Analysis (PCA) and Linear Discriminant Anal-
[2] S. Hui, Application of modern microscope technology on identification and recovery ysis (LDA) Based Face Recognition, 2014.
of paper cultural relics, Sci. Conserv. Archaeol. 27 (2) (2015) 52–57. [38] H.D. Li, et al., Model-population analysis and its applications in chemical and biolog-
[3] R. Kumar, V. Kumar, V. Sharma, Fourier transform infrared spectroscopy and ical modeling, TrAC Trends Anal. Chem. 38 (9) (2012) 154–162.
chemometrics for the characterization and discrimination of writing/photocopier

1 s2.0 S1386142518309077 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S1386142518309077 Main

Uploaded by

Copyright:

Available Formats

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 219 (2019) 8–14

Contents lists available at ScienceDirect

Spectrochimica Acta Part A: Molecular and Biomolecular

Fourier transform infrared spectroscopy and chemometrics for the

(a) The spectra of 15 types of paper

(b) The loading of the first three PCs

Fig. 4. The number of PC for SIMCA with each type of paper.

training set test set

Fig. 6. The result of LS-SVM.

Fig. 7. Classiﬁcation result of the ﬁve algorithms by systematic sampling.

You might also like