Exploratory study of the use of infrared spectroscopy for the detection of eating disorders in

gingival crevicular fluid



Vibrational spectroscopic techniques can be applied in different fields due to its versatility,

simplicity and low-cost per analysis. Within these techniques, mid-infrared (MIR) spectroscopy

is one of the most explored as it is non-destructive, enables the determination of several

parameters in the same analysis, environmental friendly, rapid, avoids sample preparation, has

a low cost per analysis and can be applied in-situ [1]. This technique was first applied in the

agro-food sector but currently it has been employed in the health sector because of the

abovementioned characteristics. In fact, the number of application within the health sector is

rapidly increasing [2, 3]. MIR spectroscopy measures the fundamental vibrations within 4000

to 400 cm-1 of several covalent bonds. It is considered a fingerprint technique and for this

reason it is applied in the identification and characterization of different types of samples [1, 3,

4]. The final spectrum is complex, reflecting several absorption bands that can be weak and

overlapping, and therefore it is essential to apply chemometric tools to extract useful

information. The need of using chemometric tools is pointed as the major drawback.

This technique was already explored in the analysis of gingival crevicular fluid (GCF) but there

are only two works described in the literature [5, 6]. In 2010, Xiang et al. [6] collected GCF
from several patients to assess if this technique was able to discriminate between healthy

patients and patients with periondontitis. The GCF samples were dried and then analysed. The

results obtained, 93% of correct predictions for the validation set through linear discriminant

analysis, demonstrated that this technique was very effective in the discrimination of patients

with and without periodontitis. Moreover, the authors suggested that the molecular

components responsible for this discrimination were lipids, proteins and DNA. However, the

results obtained could be better if the authors had performed a spectral region and pre-

processing optimization. A few years later, in 2013, Xiang et al. [5] investigated the capacity of

infrared spectroscopy to discriminate between patients with and without diabetes mellitus.

Several GCF samples collected from different sites in each patient were analysed. The

classification of the samples was accomplished through linear discriminant analysis and an

overall accuracy of 87% of correct classifications for the validation set was obtained. An

algorithm was applied for the selection of the best spectral regions which included the

molecular vibrations of proteins, glycogen, oligosaccharides and glycolipids. Therefore, the

authors attributed the discrimination of patients with and without diabetes mellitus to the

differences of these compounds content in the GCF. Again, the authors have not tested

different pre-processing techniques which could improve the good results obtained. Both

these works revealed the potential of infrared spectroscopy to detect, through a non-invasive

and rapid approach, chemical differences in the composition of GCF.

Therefore, the hereby manuscript explores the used of mid-infrared spectroscopy to detect

eating disorders in GCF. Moreover, this work explored as well the potential of this technique to

discriminate individuals, sampling site, type of eating disorders and vomiting induction. To the

best of our knowledge this is the first time mid-infrared spectroscopy is used with this

Material and methods


Gingival crevicular fluid collection procedure

(falar sobre dp,dv,mv,mp)

for each individual , four strip perio-papers were collected

Se calhar é melhor introduzir uma tabela com a informação dos pacientes

Mid-infrared spectral acquisition

The mid-infrared spectra acquisition of strip perio-papers was performed in a PerkinElmer

Spectrum BX FTIR System spectrophotometer (Waltham, USA) with a DTGS detector and a PIKE

Technologies Gladi ATR accessory. The spectra were acquired in diffuse reflectance mode from

4000 to 600 cm-1, with a resolution of 4cm-1 and 32 scans co-additions. Each strip perio-paper

was analysed on both sides at the bottom and compressed with a pressure of 150 N cm -2. Thus,

a total of 224 (28 x 8) spectra were obtained. The ATR crystal was cleaned and a background

was acquired between each patient.

Multivariate data analysis

MIR spectra were modelled with the help of chemometric tools, namely, principal component

analysis (PCA) to assist outlier detection and partial least squares discriminant analysis (PLSDA)
to develop discrimination models [7, 8]. The spectra were previously mean centred before any

data analysis. For the PLSDA, the spectral data were randomly divided in two data sets, one for

calibration (70%) and the other for validation (30%). This division was performed ensuring the

same proportion of patients’ classes in both sets aiming to avoid unbalanced classes [9]. The

optimization of the PLSDA models was performed through the selection of the optimal number

of latent variables, best spectral region and best pre-processing technique (using only the

calibration set). The optimal number of latent variables (LV) was estimated through leave-one-

sample-out cross-validation procedure. The assessment of the best spectral region involved

dividing the MIR spectra in 5 different regions and testing all these regions individually and in

combination. The different regions were the following: from 3982 to 2652 cm -1 (region 1), from

2650 to 1862 cm-1 (region 2), from 1860 to 1182 cm-1 (region 3), from 1180 to 922 cm-1 (region

4) and from 920 to 620 cm-1 (region 5). The selection of the best pre-processing technique was

achieved by testing different techniques individually and in combination, namely standard

normal variate (SNV) and Savitzky-Golay filter (with different filter widths, polynomial orders

and first and second derivatives). After model optimization, the validation set was used to test

the accuracy of the optimized models. This was performed through the projection of the

validation set and the results were arranged in the form of confusion matrices. The confusion

matrices express the percentages of correct predictions for each patient class and the total

percentage of correct predictions was obtained by adding the diagonal elements of the

confusion matrix [9]. The coefficient regression vectors of the PLSDA models for eating

disorders were analyzed to understand which specific wavenumbers were more important and

to relate them with possible compounds present in the gingival crevicular fluid.

Matlab version 8.6 (MathWorks, Natick, USA) and PLS Toolbox version 8.2.1 (Eigenvector

Research Inc., Wenatchee, USA) software were used to perform all chemometric analysis.
Results and discussion

The raw spectra of the gingival crevicular fluid are show in Figure 1. As abovementioned, a PCA

was performed to assist outlier detection and no outliers were identified. After this, several

PLSDA models were developed to verify if GCF spectra contains information related with each

individual, sampling site and presence of eating disorders. For all these discrimination models,

different pre-processing techniques and different spectral regions were tested individually and

in combination to improve models’ predictive capacity. Therefore, for each discrimination

model developed more than 150 (31 spectral regions combinations x 5 pre-processing

techniques combinations) PLSDA models were developed. Moreover, testing different spectral

regions allowed comprehending which were more appropriate for the respective

discrimination and with that ascertain the compounds that can be responsible.

The PLSDA models were built considering the mean of all the spectra collected from each

individual, thus at the end a total of 28 spectra were used. This was performed for each

discrimination model (ex: patient/subject, sampling site and presence of eating disorders).

Individual discrimination

In the first place, GCF spectra were modelled trough PLSDA to verify if MIR spectra contained

specific information related with each individual. The results obtained (a total of correct

predictions below 20% for all the strategies) indicated that it was not possible to discriminate

these patients through GCF. Moreover, it was not clear the best spectral region and pre-

processing technique. We believe that the GCF composition of each patient varies (introduzir

reference talvez) but possibly these variations are so marginal and MIR spectroscopy is not

sensitive enough.
Sampling site

The GCF spectra were modelled as well against the sampling site. It was important to

understand if GCF is influenced by the sampling site (aqui será melhor desenvolver mais um

bocado sobre se faz sentido existir ou não variação da composição do GCF). Again, the PLSDA

results revealed that it was not possible to discriminate (a total of correct predictions below

30% for all the strategies except for the mean of all spectra collected for each patient that was

not performed) the sampling site as well as to identify the best spectral region(s) and pre-

processing technique. (tentar justificar estes resultados e se possível colocar referências)

Eating disorders

The primary objective of this manuscript was to attest if it is possible to discriminate patients

with and without eating disorders (ED) through the GCF spectra. Again, several PLSDA models

were tested aiming to find the best spectral region and pre-processing technique. The best

results were obtained when selecting the spectral region within 3982 to 2652 cm -1 (R1) and

through the application Savitzky-Golay filter (15 points filters width, second polynomial order

and second derivative). The mean of the raw and pre-processed spectra obtained in the

spectral region within 3982 to 2652 cm -1 are depicted in Figure 2. Regarding Fig. 2a, both

spectra of each group of patients are very similar but the patients with ED show more intense

peaks. Fig. 2b, after spectra pre-processing, reveals significant differences around 3700, within
3400 to 3200 and within 3000 to 2800 cm-1.

a b

Figure 2- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the

discrimination of eating disorders.

Through PLSDA, a total of 84.1% of correct predictions were obtained using 6 LV for the

discrimination between patients with and without ED. Table 1 shows the confusion matrix and

it can be seen that the worst predictions involved the group of individuals without ED. In fact,

In fact, approximately 30% of this group was incorrectly classified as having ED.

pseudo-explicação…pelo menos tentar)

Table 1. Confusion matrix for the discrimination of patients/subjects with and without ED

based on the GCF spectra through strategy two (80.1 % of correct predictions and 6 LV’s).

Strategy three

% Subjects group(real)

Subjects group (predicted) With ED Without ED Total

With ED 52.8 13.9 66.7

Without ED 2.0 31.3 33.3

Total 54.8 45.2 100

Thus, it is important to explore the regression coefficient vectors to understand which spectral

absorptions showed a higher contribution and with that attempt to establish a relation with

the compounds present in the GCF. Figure 3 shows the regression coefficient vectors of the

PLSDA model and the wavenumbers that showed the higher contribution were located within

3720 to 3620, 3320 to 3270 and 2970 to 2880 cm -1, respectively. These wavenumbers are

consistent with the differences in the pre-processed spectra of patients with and without ED

(Figure 2). The wavenumbers within 3720 to 3620 cm -1 can be attributed to N-H stretching of

amines and O-H stretching of alcohols (REF book Clara); the wavenumbers within 3320 to 3270

cm-1can be associated with amines and carboxylic acids due to the N-H stretching and O-H

stretching, respectively; and the wavenumbers within 2970 to 2880 cm -1 can be attributed to

O-H stretching of carboxylic acids as well. (explore the compounds that could be changing in

GCF and relate them with the possible absorptions)

3320-3270 cm-1

3720-3620 cm-1
2970-2880 cm-1

Figure 3 – Regression coefficient vectors of the PLSDA model obtained adopting strategy three

within 3982 to 2652 cm-1 (region 5) and through the application of Savitzky-Golay filter (15

points filters width, second polynomial order and second derivative).


The principal objective of this manuscript was accomplished and the results obtained allowed

to achieve several conclusions.

averaging the spectra is important to smooth slightly random variations present in the spectra

of strip perio-papers (achas João que posso dizer isto?) and the results obtained in the PLSDA
models for ED showed the by averaging the spectra the efficacy of the PLSDA models increases

and reduces the number of LV.

- the GCF is the same in all the sampling sites

The results obtained demonstrated that MIR spectroscopy is capable of discriminating

individuals with ED (around 80% of correct predictions).

Overall, this exploratory work revealed the potential MIR spectroscopy to detect chemical

variations in the composition of GCF in a non-invasive, rapid and cheap way. Further works

including a higher number of samples are needed to attest the robustness of this technique

but the results obtained are promising.


The authors are grateful for financial support from the European Union (FEDER funds through

COMPETE POCI-01-0145-FEDER-016735) and National Funds (FCT, Fundação para a Ciência e

Tecnologia) through project PTDC/AGR-PRO/6817/2014. Ricardo N.M.J. Páscoa thanks FCT

(Fundação para a Ciência e Tecnologia) and POPH (Programa Operacional Potencial Humano)

for Post-Doc Grant SFRH/BPD/81384/ 2011.


modelo Regiões mais Absorções activas nestas regiões Zona (cm-1)

C-Br stretch 700-600
O-H alcohol 720-590
C-H alkyne bend 680-610
C-H aromatic bend 900-670
C-O-O-C peroxides stretch 890-820
Perturbação do 862, 836,
P-O-C aromatic phosphates 995-850
comportament 684, 654,
C-S thioethers stretch 660-630
o alimentar 642 e 635
C-S disulfides 705-570
C-S aryl thioethers 715-670
Sulfate ion 680-610
Nitrate ion 840-815
Carbonate ion 880-860
C-C skeletal vibrations 1300-700
C-H methyne bend 1350-1330
C=C conjugated 1600
C-H vinylidene bend 1310-1290
C=C-C aromatic ring stretch 1615-1580
O-H primary or secondary 1350-1260
O-H phenol 1410-1310
1608, 1340, C-N tertiary amine stretch 1210-1150
Hábitos 1330, 1316, C-N aromatic primary amine 1340-1250
tabágicos 1300, 1224, C-N aromatic secondary amine 1350-1280
1208 C-N aromatic tertiary amine 1360-1310
carboxylate 1420-1300
Quinone or conjugated ketone 1650-1600
Aromatic nitro compounds 1355-1320
Organic phosphates (P=O) 1350-1250
Aromatic phosphates (P-O-C) 1240-1190
Diakyl/aryl sulfones 1335-1300
-N=N- open chain azo 1630-1575
Coates bibliografia MIR

