Predicting Postmortem Interval Based On Microbial

Environmental Microbiology (2020) 00(00), 00–00 doi:10.1111/1462-2920.
15000
Predicting postmortem interval based on microbial

community sequences and machine learning algorithms
Ruina Liu,1† Yuexi Gu,2† Mingwang Shen,3† Huan Li,4 neural network (ANN)] to investigate microbial suc-
Kai Zhang,1 Qi Wang,5 Xin Wei,1 Haohui Zhang,1 cession pattern during corpse decomposition and
Di Wu,1 Kai Yu,1 Wumin Cai,1 Gongji Wang,1 estimate PMI in a mouse corpse system. Microbial
Siruo Zhang,4 Qinru Sun,1* Ping Huang6* and communities exhibited significant differences
Zhenyuan Wang 1* between the death point and advanced decay stages.
1
College of Forensic Medicine, Xi’an Jiaotong University, Enterococcus faecalis, Anaerosalibacter bizertensis,
Xi’an, 710061, China. Lactobacillus reuteri, and so forth were identified as
2 the most informative species in the decomposition
School of Mathematics and Statistics, Xi’an Jiaotong
University, Xi’an, 710061, China. process. Furthermore, the ANN model combined with
3
Department of Epidemiology and Biostatistics, School the postmortem microbial data set from the cecum,
of Public Health, Xi’an Jiaotong University Health which was the best combination among all candi-
Science Center, Xi’an, Shaanxi, 710061, China. dates, yielded a mean absolute error of 1.5 0.8 h
4
Department of Microbiology and immunology, School of within 24-h decomposition and 14.5 4.4 h within
Basic Medical Sciences, Xi’an Jiaotong University, 15-day decomposition. This integrated model can
Xi’an, China. serve as a reliable and accurate technology in PMI
5
Chongqing Medical University, College of Basic estimation.
Medicine, Department of Forensic Medicine,
Chongqing, 400016, China. Introduction
6
Shanghai Key Laboratory of Forensic Medicine,
In forensics investigations, accurate estimation of time
Shanghai Forensic Service Platform, Academy of
since death (postmortem interval, PMI) is critical but com-
Forensic Science, Ministry of Justice, Shanghai,
plicated. Postmortem phenomena, such as algor mortis,
200063, China.
rigour mortis, livor mortis, discolouration of carrion, putre-
factive networks and digestion of gastric contents, are
Summary commonly employed to estimate PMI (Kaatsch et al.,
1993; Henßge and Madea, 1996). However, these types
Microbes play an essential role in the decomposition
of physical evidence are sensitive to various environmen-
process but were poorly understood in their succes-
tal conditions and rely deeply on individual states and
sion and behaviour. Previous researches have shown
that microbes show predictable behaviour that starts investigators’ empirical judgments. To address the limita-
at death and changes during the decomposition pro- tions of traditional methods, scientists have been
cess. Research of such behaviour enhances the employing various new techniques in PMI estimation,
understanding of decomposition and benefits esti- such as molecular biology (Young et al., 2013; Li et al.,
mating the postmortem interval (PMI) in forensic 2014; Feng et al., 2015; Ferreira et al., 2018), entomol-
investigations, which is critical but faces multiple ogy (Wells et al., 2015), spectroscopy (Li et al., 2017;
challenges. In this study, we combined microbial Zhang et al., 2017; Wang et al., 2017a; Wang et al.,
community characterization, microbiome sequencing 2017b) and thanatochemistry (Poloz and O’Day, 2009;
from different organs (i.e. brain, heart and cecum) Kikuchi et al., 2010; Sara et al., 2014) analysis. Each
and machine learning algorithms [random forest method provides significant information in investigations;
(RF), support vector machine (SVM) and artificial however, their feasibility and accuracy are limited under
the conditions and timeframe (e.g. days, weeks and
months) of PMI in a given case (Metcalf, 2019). For
Received 22 October, 2019; revised 18 March, 2020; accepted 22 example, forensic entomology is utilized to estimate PMI
March, 2020. *For correspondence. E-mail qinrusun@mail.xjtu.edu.cn.
huangp@ssfjd.cn. wzy218@xjtu.edu.cn. Tel. +86-029-8265-5472; Fax with an error ranging from days to months because the
+86-029-8265-5472. †These authors contributed equally to this work. time and distribution of insects laying eggs on the
© 2020 Society for Applied Microbiology and John Wiley & Sons Ltd.
2 R. Liu et al.
corpses are uncertain (Metcalf et al., 2013; Guo et al., technology (Wang et al., 2018), which generate large
2016). Because of these limitations, developing a quick amounts of data. Thus, analysing massive amounts of
and accurate method for estimating PMI by forensic genomic data based on machine learning methods (the
pathologists is essential. core technology of artificial intelligence) to build a PMI
Recently, studies found that postmortem microbial com- estimation model is imperative. Machine learning algo-
munities respond to PMI in a predictive manner (Hyde rithms (Esteva et al., 2017; Kermany et al., 2018) are
et al., 2013; Metcalf et al., 2013; Pechal et al., 2014). This powerful methods for addressing high-dimensional data
discovery has spurred vast amounts of research (Javan sets because they are commonly applied to microbiome
et al., 2016; Metcalf et al., 2016) performed on human and data sets with mega DNA sequences (Belk et al., 2018).
animal remains to study microbial community succession With the considerable advantages in computing ability
after death (Finley et al., 2015; Hyde et al., 2017; Pechal and large data processing, machine learning methods
et al., 2018). Metcalf and colleagues (2013, 2016) demon- have been employed in the PMI estimation field by sev-
strated that the relative abundances of anaerobic gut bac- eral independent research groups (Metcalf et al., 2016)
teria (such as Lactobacillaceae and Bacteroidaceae) (Johnson et al., 2016; Pechal et al., 2018).
increased at the bloat stage, and aerobic bacteria and fac- Our study aims to develop an accurate PMI estimation
ultative anaerobic bacteria (such as Enterobacteriaceae) model in conjunction with a postmortem microbial
were dominant after rupture. Pechal and colleagues sequencing data set of internal organs in murine remains
(2014) found that Proteobacteria was the most abundant during 15-day decomposition. To accomplish this goal,
phylum in the early stages of decomposition, while informative features, suitable models and data sets were
Firmicutes dominated in the late stage. Damann and col- filtered, combined and evaluated. The feature extraction
leagues (2015) demonstrated that 99.2% of the bacterial process aids the discovery of microbial markers in PMI
genomic sequences of decomposing skeletons were from estimation. As the application of machine learning tech-
six phyla, Bacteroidetes, Firmicutes, Proteobacteria, niques in forensic science has not been well explored, it
Actinobacteria, Acidobacteria, and Chloroflexi, which is a novel addition to previous studies, and the data sets
were similar to the gut bacteria. In these studies, high- and multi-comparisons concluded here may improve cur-
throughput sequencing was used to reliably quantify the rent PMI estimation techniques.
relative abundance of postmortem microbial communities
and led to significant gains in the knowledge of bacterial
communities in mammal decomposition. Results
However, massively parallel high-throughput sequenc- Postmortem bacterial communities in internal organs
ing technologies, such as Illumina MiSeq and the Ion
S5TM XL system, are based on sequence-by-synthesis A total of 240 organ samples, including the brains, hearts
(Benbow et al., 2015) and semiconductor sequencing and ceca, were collected from 80 mice remains over
10 time points that spanned 15 days. A total of 176 sam-
ples remained and were labelled by PMI and body site
Table 1. A visual body score estimate of the decomposition stage after quality control by agarose gel electrophoresis,
according to Megyesi et al. (2005). except for brain and heart samples before day 2. A total
of 144 samples (D2Brain, D4Brain, D7Brain, D10Brain,
Head and neck Torso D13Brain, D15Brain, D2Heart, D4Heart, D7Heart,
Fresh D10Heart, D13Heart, D15Heart, D2Cecum, D4Cecum,
No discoloration (0 point) No discoloration (0 point) D7Cecum, D10Cecum, D13Cecum and D15Cecum)
Excoriation and trichomadesis Excoriation and trichomadesis
(1 point) (1 point) were used for the comparison of microbial comparison
Active decay between different organs. Visual decomposition stages
Discoloration: (2 points) Discoloration: (2 points) were estimated according to the body score estimation
Purging of decomposition Bloating of abdominal cavity
fluids out of eyes, nose, (3 points) method in Table 1 (the result was shown as Key_head,
or mouth Key_torso in see Additional file 2).
(3 points)
Bloating of neck and/or Precedes rupture and/or
face purging Bacterial analysis among internal organs at multiple
(4 points) of fluids (4 points). levels. The relative abundance of dominant bacteria
Advanced decay
Sagging of flesh (5 points) Sagging of flesh (5 points)
exhibited significant differences between ceca and hearts/
Sinking of flesh (6 points) Sinking of flesh (6 points) brains at multiple taxa. At the phylum level throughout
Caving in of flesh (7 points) Caving in of flesh decomposition duration, the dominant phyla were
(7 points)
Mummification (8 points) Mummification (8 points)
Firmicutes, Proteobacteria, and Bacteroidetes in brain and
heart samples, while Firmicutes, Proteobacteria,
© 2020 Society for Applied Microbiology and John Wiley & Sons Ltd., Environmental Microbiology
Microbiome and AI in death time estimation 3
Fig. 1. The MAE and R2 of prediction
values of the ANN model, RF model
and SVM model on the testing data in
the cecum, brain and heart.
(A) and (B) corresponding to the
mean MAE and mean R2 on testing
data. The number of input variables of
the ANN model in the intestine, brain
and heart was 891, 160 and
116 respectively. Each model was
repeated 15 times, and the values of
MAE and R2 in this figure are the
mean values of the corresponding
15 repetitions.
Bacteroidetes, and Actinobacteria had a relative abun- To visualize the similarity and dissimilarity in postmor-
dance of more than 1% of the total abundance in cecum tem bacterial composition among different organs, princi-
samples (Fig. 2A). At the family level, Enterobacteriaceae pal coordinate analysis (PCoA) was used to demonstrate
and Peptostreptococcaceae were dominant in the brain the decomposing pattern in two-dimensional space
samples. Enterobacteriaceae and Lactobacillaceae were based on the unweighted and weighted UniFrac dis-
abundant in the heart samples. By contrast, Lac- tances. Principal coordinate 1 (PCo1) and PCo2 (40.96%
tobacillaceae, Enterococcaceae and Erysipelotrichaceae and 24.67% of variance explained respectively) axes
were the main components in the cecum samples (Fig. 2A). showed that the microbial communities of brains and
At the genus level, Lactobacillus, Enterococcus and ceca were separated before day 10 and had a tendency
Dubosiella were important genera in the cecum samples. to cluster after day 10 (Fig. 2B). The separation between
Morganella and Proteus were dominant in brain and heart samples across body sites before day 10 and the similar-
samples (Fig. 2A). At the species level, Lactobacillus ity at day 10–15 were more notable on Bray–Curtis
reuteri, Enterococcus faecalis and Firmicutes bacterium based non-metric multidimensional scaling (NMDS) plot
M10-2 were the most abundant species of bacteria (see Additional file 1, Fig. S2C). The change trend was
detected in cecum samples. Clostridium novyi, Proteus similar between hearts and ceca (Fig. 2B, see Additional
vulgaris, Anaerosalibacter bizertensis, and Clostridium file 1, Fig. S2C). Surprisingly, the bacterial communities
butyricum were dominant in heart and brain samples of brains and hearts clustered at the same postmortem
(Fig. 2A). Enterococcus faecalis, Clostridium cochlearium time point (see Additional file 1, Fig. S2B, S2C). The
and A. bizertensis were predominant in brains, hearts and Unweighted pair group method with arithmetic mean
ceca in the advanced decomposition of corrupted tissues (UPGMA) (Li and Xu, 2007) plot (Fig. 2C) indicated that
(Fig. 2A). Shared and unique operational taxonomic units groups clustered by different organs based on weighted
(OTUs) among different organs are exhibited in Venn plots UniFrac distance.
(see Additional file 1, Fig. S1). A t-test was performed to identify specific microorgan-
isms at the genus level with significant differences
Comparison of alpha and beta diversity among different (P ≤ 0.05) in different organs at the same PMI. The com-
organs. The number of OTUs, observed species, the esti- parison of bacterial community differences between
mators of community richness (ACE and Chao 1) and the hearts and brains revealed no significant differences
diversity (Shannon and Simpson) indexes of groups are (P > 0.05), while this between brain and cecum samples
shown in Table S1 (see Additional file 1). Shannon index showed significant differences at multiple-taxon (see
comparison within different organs was performed with Additional file 1, Table S2). We also employed linear dis-
analysis of variance, which indicated that the abundance criminant analysis effect size (LEfSe) analyses to detect
of microbial communities was significantly higher in significant biomarkers among the sample sites during
cecum groups than brain and heart groups before day decomposition at different levels (Fig. 2D). Morganella
15 (Fig. 2B). Other alpha diversity index comparisons and C. butyricum were detected in D2Heart. Clostridium
between brain, heart and cecum samples also showed novyi. was considered a significant microbiome in
significant differences (see Additional file 1, Fig. S2A). In D4Brain. Firmicutes bacterium M10_2 was identified in
contrast, there were no significant differences between D4Cecum. Proteus vulgaris. and Clostridioides man-
brain and heart samples at the same time points during genotii were significant species in D7Brain. Clostridium
decomposition (Fig. 2B, see Additional file 1, Fig. S2A). sporogenes and Proteus mirabilis were significant
4 R. Liu et al.
Fig. 2. The significant differences in postmortem bacterial community composition of different organs.
A. Taxonomic profiles of postmortem microbial communities in internal organs during decomposition (community composition was based on the
top 10 phyla, families, genera and species with a classification of Other; Others were those found in some but not all samples). B. Comparison of
the Shannon index (ANOVA, *P < 0.05, **P < 0.05, ***P < 0.001) in different organs and PCoA plot based on weighted UniFrac distance coloured
by PMI points, shaped by organ type and coloured by PMI. Other alpha and beta diversity comparison plots are shown in Fig. S2 (see Additional
file 1). C. UPGMA analysis showed groups clustered according to different organs. D. LDA plots of bacteria at different levels among different
organs.
species in D10Brain. Enterococcus faecalis. and A. between brains and ceca, and the results were similar in
bizertensis were regarded as representative in the comparison of heart and ceca.
D10Cecum. Faecalibacterium was a significant genus in
D13Brain. Clostridium cochlearium., Clostridium tetani Significant metabolic differences among different
and Clostridiales bacterium mt7 were significant species organs. The decomposition process was associated with
in D13Heart. Lactobacillus reuteri. was considered as a certain signalling pathways. To explore the connection
significant species in D15Cecum. Bacteria at multiple between postmortem microbiota and metabolic changes
levels with significant differences are also displayed in within decomposition, Phylogenetic Investigation of Com-
the MetaStat analysis figures (see Additional file munities by Reconstruction of Unobserved States
1, Fig. S3). The similarity percentage (Simper) is a (PICRUSt) analysis was performed. In total, 4927 Kyoto
decomposition of the Bray–Curtis distances index, which Encyclopaedia of Genes and Genomes (KEGG)
can quantify the contribution of each species to the differ- orthologue groups (Kos) were defined from our data and
ence between two groups. Figure S4 (see Additional file annotated to three levels of KEGG pathways. The sec-
1) shows that C. butyricum, L. reuteri and E. faecalis con- ond level of KEGG pathways is shown in the clustering
tributed much higher than other species to the differences heatmap (Fig. 3A). The top four abundant pathways in all
Fig. 3. Predicted functional profiles of postmortem bacteria of internal organs in postmortem interval duration.
A. The relative abundance of KEGG pathway at level 2 in different organs is shown in the heat map. Gene copy numbers of samples within the
same sample group were pooled. The value of each functional gene (row) was log10 transformed. The significance test of the gene distribution
between groups was performed using the bootstrap Mann–Whitney U test with a cut-off of P < 0.01, FDR < 0.1 and mean counts >10. B. The
comparison histograms of the top two different pathways on metabolism (LSD-t, *P < 0.05, **P < 0.05, ***P < 0.001). C. PCoA plot of the third
level of metabolic pathway expression differences in internal organs after death.
groups for metabolism were membrane transport, carbo- see Additional file 1, Fig. S5). The comparison of the rela-
hydrate metabolism, amino acid metabolism, and replica- tive abundance of KEGG pathways between different
tion and repair. Groups were clustered according to sample sites was analysed using SPSS software (version
different sample sites based on the relative abundance 18.0, Chicago, IL, USA), and the statistically significant
expression level of KEGG pathways. Brain and heart results are shown in Fig. 3B.
samples were clustered at different time points and sepa-
rated from cecum samples. Certain pathways, including
metabolism of cofactors and vitamins, cellular processes
and signalling and membrane transport, were much lower Postmortem bacterial community change in cecum
in ceca than those in the brains and hearts (P < 0.05, samples
Fig. 3B, see Additional file 1, Fig. S5). In contrast, carbo- After comparing the microbial community differences
hydrate metabolism, nucleotide metabolism, replication among sample positions, the changes in the microbial
and repair, and translation were lower in the brain and kingdom along the PMI were traced by high-throughput
heart samples than the cecum samples (P < 0.05, Fig. 3B, sequencing. Using high-throughput sequencing data of
6 R. Liu et al.
cecum samples as an example, successive bacterial Firmicutes bacterium. M10-2 appeared at day 2 and
community changes are described below. showed an immediate increase from day 2 to day
4. Enterococcus faecalis. appeared at day 2 and showed
Overview of postmortem bacterial composition in ceca. an increase until day 10. Clostridium tetani E88 appeared
A total of 18 858 845 raw sequences were generated in at day 4 and showed an increase and then a decrease
the current study. After sequence trimming, quality filter- from then on.
ing and removal of chimeras, 17 696 903 high-quality The heat map also showed differences in microbial
sequences remained, with an average of 73 771 9912 community abundance during decomposition (Fig. 4B),
(SD) reads per sample. The rarefaction curves of the indicating that there might be discrepancies in genera-
Shannon diversity index indicated that species represen- relative abundance between different groups with a col-
tation in each sample approached the plateau phase, our gradient from deep blue (low abundance) to deep red
and it was unlikely that more bacteria would be detected (high abundance). The diversity and similarity of microbial
with additional sequencing efforts (see Additional file communities in each group were based on the alpha and
1, Fig. S6A). These high-quality sequences were clus- beta values at the OTU level. To explore alpha diversity
tered into 5099 OTUs by the VSEARCH pipeline using a among groups, we chose the ACE, Chao 1, Shannon
threshold of 97% identity. The results showed bacteria diversity and Simpson diversity indexes, with the last two
belonging to 29 phyla, 67 classes, 160 orders, 250 fami- indexes considering the richness and evenness of the
lies, 473 genera and 835 species. Among the 29 phyla, microbial community (Fig. 4C, see Additional file
three phyla, Firmicutes, Bacteroidetes, and Prote- 1, Fig. S6B). With the development of postmortem degra-
obacteria, were dominant in all samples. dation time, the Shannon index gradually decreased, and
Epsilonbacteraeota, Deferribacteres, Actinobacteria and other alpha diversity indexes also exhibited similar trends
Patescibacteria had a relative abundance of more than (Fig. 4C, see Additional file 1, Fig. S6B). Only the follow-
1% of the total abundance (Fig. 4A). The 10 most abun- ing genera remained at day 15 postmortem:
dant genera included Lactobacillus, Lachnospiraceae, Gordonibacter, Bifidobacterium, Enterorhabdus,
Prevotellaceae, Muribaculaceae, Helicobacter, Lactococcus, Clostridium sensu stricto 18, Clostridium
Lachnoclostridium, Ruminococcaceae, Dubosiella, sensu stricto 15, Anaerosalibacter, Enterococcus,
Bacteroides and Ruminiclostridium, which accounted for Dubosiella and Lactobacillus. UniFrac distance is
more than 50% of the total sequences (Fig. 4A). At the designed to describe the difference between the two
species level, the most abundant species were L. reuteri, groups. The score shown above is represented as a
E. faecalis, F. bacterium M10-2, A. bizertensis, Clostrid- divergence indicator. PCoA based on UniFrac distance
ium tetani, Vagococcus lutrae, Lachnospiraceae, Lacto- revealed the decomposition pattern in two-dimensional
bacillus johnsonii, Escherichia shigella and Helicobacter space for Bray–Curtis distances (Fig. 4C). Groups clus-
(Fig. 4A). tered based on PMI points shown on the principal coordi-
nate 1 axis (PC1, 16.90% of variance explained). The
The cecum microbiome change flow during decomposi- LEfSe (Segata et al., 2011) analysis of core bacteria at
tion. Intestinal microorganisms provided robust abun- different levels revealed features that most likely
dance community profiles (Fig. 4A, Table 2). Using explained differences between groups associated with
community analysis at the genus level as an example, PMI. Nine OTUs were identified as representative of the
several genera exhibited regular change flow during following five time points: 0 h, 8 h, 2 days, 7 days and
decomposition. Lactobacillus was abundant during the 15 days (Fig. 4E). In conclusion, among OTUs from lower
decomposition period (hour 0 to day 15). Enterococcus, taxonomic levels, Ruminococcaceae UCG 014 and
Anaerosalibacter and Clostridium sensu stricto 15 had Prevotellaceae UCG 001 (for 0 h samples);
similar change trend, increasing up to the peak abun- Lachnospiraceae NK4A136 group (for 8 h samples);
dance at the advanced decay stage (1 week after death) Allobaculum (for 2 days samples); F. bacterium M10_2
and decreasing until day 15. Anaerosalibacter, Clostrid- (for 4 days samples); Clostridium sensu stricto 18 and
ium sensu stricto 15 and C. cochlearium appeared at Clostridium tetani E88 (for 7 days samples); E. faecalis
day 4. However, the Lachnospiraceae NK4A136 group, (for 10 days samples), and Firmicutes (for 15 days sam-
Muribaculaceae, Lachnospiraceae, Ruminococcaceae, ples) were significant in explaining the differences in
Prevotellaceae_UCG-001, Lachnospiraceae_UCG-006 decomposition (Fig. 4E).
and Helicobacter decreased from hour 0 to day 15, and The shifts in the probable functions of the gut flora of
almost disappeared at day 15. At the species level, the mice during decomposition were analysed by predicting
relative abundance of L. reuteri, Clostridium tetani E88, the 16S rRNA genes using PICRUSt (Fig. 5A). Similari-
L. johnsonii and E. faecalis showed notable variation dur- ties in the variation trends among different level 2 KEGG
ing decomposition, which may aid in PMI estimation. pathways during decomposition were observed (Fig. 5C).
Fig. 4. Bacterial community composition in cecum samples changed significantly and consistently over the course of decomposition.
A. The relative abundance of microbial communities at multiple level from postmortem cecum samples over PMI (the relative abundance of other
<0.01). B. The mean Shannon diversity index value decreased over time in cecum samples and PCoA analysis based on Bray–Curtis dis-
tance (Student T-test, *P < 0.05, **P < 0.05, ***P < 0.001). C. Heat map of the microbial 16S rRNA gene sequences (OTUs) at the generic taxo-
nomic resolution. D. LDA plots of bacteria at different levels as a result of all time points.
8 R. Liu et al.
Table 2. Relative abundance proportion of bacteria of cecum samples during decomposition.
Relative content percentage (%)

Bacteria h0 h8 h12 d1 d2 d4 d7 d10 d13 d15
Genera
Lactobacillus 21.36 22.62 29.08 26.24 26.64 23.92 50.20 24.67 35.04 42.77
Dubosiella 2.55 0.11 0.02 0.07 17.56 28.95 0.68 5.32 14.83 25.04
Enterococcus 0.11 0.06 0.94 2.63 4.01 11.82 14.59 26.96 13.04 17.23
Lachnospiraceae NK4A136 group 13.22 17.68 16.57 7.24 1.29 0.23 0.17 0.06 0.12 0.03
Anaerosalibacter 0.01 0.00 0.00 0.06 0.00 0.14 0.46 12.72 18.27 5.10
Muribaculaceae 4.25 3.97 3.93 3.44 7.35 4.05 1.69 0.30 0.29 0.07
Lachnospiraceae 8.54 8.29 4.33 5.07 1.10 0.20 0.45 0.10 0.18 0.04
Clostridium sensu stricto15 0.00 0.00 0.00 0.00 0.02 1.33 9.42 8.33 4.56 3.83
Ruminococcaceae 3.12 4.55 4.33 2.85 2.34 1.61 0.58 0.26 0.51 0.16
Helicobacter 3.20 1.87 3.75 3.26 2.55 0.22 0.14 0.06 0.04 0.00
Species
Lactobacillus reuteri 4.46 4.22 8.32 7.05 7.28 7.49 25.10 11.67 7.38 30.56
Enterococcus faecalis 0.01 0.03 0.05 0.17 0.83 3.10 3.68 15.99 3.40 10.07
Firmicutes bacterium M10-2 0.02 0.10 0.02 0.01 10.16 23.99 0.07 0.15 0.02 0.98
Lactobacillus johnsonii 2.83 5.11 6.65 3.33 2.28 1.33 2.85 0.24 5.41 0.67
Clostridium tetani E88 0.00 0.00 0.00 0.00 0.02 1.30 9.33 8.05 4.46 3.69
Anaerosalibacter massiliensis 0.00 0.00 0.00 0.02 0.00 0.03 0.12 2.74 4.94 1.24
Proteus mirabilis 0.00 0.07 0.00 0.00 0.29 2.61 0.77 0.36 0.29 0.01
The top 10 relative abundances of level 2 KEGG path- previously described. The evaluation indexes of the
ways showed that increases occurred at hour 8 and day model are the MAE and R2. These indexes are computed
7, and decreases occurred at day 4 and day 13, which using the testing data. As the training and testing of the
was similar to the heat map results (Fig. 5C). The expres- ANN model will be affected by the partition of data sets,
sion level of genes involved the endocrine system, amino the model in each data set was been repeated 15 times,
acid metabolism, carbohydrate metabolism pathways and the evaluation indexes of the model were calculated
showed obvious reductions during decomposition in as the mean value of these 15 experiments. The mean
cecum samples (Fig. 5B). MAEs (SD) of the ANN models in the data set with all
891 features and data set with 45 features were 14.483
(4.443) and 18.684 (3.566) respectively. The mean
Computing results and PMI estimation model R2 (SD) of the models in these two data sets was 0.951
Feature selection (0.032) and 0.948 (0.021) respectively. Furthermore,
we specifically considered the prediction of testing sam-
In the feature ranking list, features with small rankings ples that decomposed within 24 h. The mean MAEs
are more critical. We selected the top 20, 30, 40, 45 and (SD) of the original data set and the biomarker data set
50 features to generate four different biomarker sets. To were 1.528 (0.814) and 7.425 (1.949) respectively.
identify the most stable biomarker sets, we established That is, for mouse remains that decomposed in less than
four different ANN models, taking into account these four 24 h, the biomarker data set can predict PMI with a mean
biomarker sets. Similarly, each model was run 15 times, error at 7.43 h. Because the original set provided the
and the mean value of the mean absolute error (MAE) ANN model with more features than the biomarker set,
and goodness-of-fit (R2) of each experience were calcu- the MAE value of the biomarker set was slightly larger
lated and compared (Table 3). Biomarker sets with than the original set.
45 genes had a lower MAE and higher R2 than other set
sizes.
Discussion
Forensic microbiology is a developing tool for estimating
PMI prediction model
PMI in forensic investigations (Metcalf, 2019). Currently,
The biomarker set in predicting PMI consisted of 45 differ- there remains a paucity of knowledge and motivation for
ent strains in cecum samples. Figure 6 exhibits the pre- an in-depth interpretation concerning postmortem micro-
diction results by ANN on the testing data run 15 times. bial communities and their transmigration within internal
The ANN model is applied to two data sets. One is the body sites. This study utilized 16S rDNA sequencing
cecum data set with 891 different taxa, and the other is approach to trace changes in bacterial community struc-
the cecum data set with 45 taxon biomarkers selected as ture with PMI in internal organs of murine remains. In
Fig. 5. Predicted function in the metabiotic field.

A. Relative abundance of metabiotic pathways at multiple levels. B. Heat map of the top 50 abundant pathways at level 2. C. Line charts exhibit
an obvious shift of the level 2 KEGG pathways.
addition, mega microbial sequencing data sets were cadaver-associated soils and abdomen of mice, swine,
analysed with machine learning methods to extract infor- and human remains (Hyde et al., 2013; Metcalf et al.,
mative microbial species and establish a PMI estimation 2013; Pechal et al., 2014; Damann et al., 2015), as the
model. microbial data in those studies fluctuated with the sur-
This study is different from previous studies, which rounding biological and environmental conditions (e.-
examined microbial communities in the skin, bone, g. weather, insect accessibility and cadaver position).
10 R. Liu et al.
Table 3. Evaluation test of biomarker set size with MAE and R2. that the quantity of microorganisms in brain and heart
samples was not sufficient for high-throughput sequenc-
Size of biomarker set Mean MAE ± SD Mean R2 ± SD
ing process until 2 days after death in this study. Thereaf-
20 23.575 ± 3.073 0.922 ± 0.022 ter, brains and hearts were invaded by Morganella at
30 20.535 ± 4.337 0.946 ± 0.033 an earlier PMI (day 2) and Clostridium at later decompo-
40 18.844 ± 3.301 0.948 ± 0.023
45 18.683 ± 3.566 0.948 ± 0.021 sition as organs begin to emulsify at day
50 20.822 ± 2.811 0.940 ± 0.018 4. Paeniclostridium and C. novyi were detected in brain
samples at day 4, and A. bizertensis appeared in brain
samples at day 13 (Fig. 2A), which may contribute to PMI
Considering the influence of environmental factors, inter- estimation as specific biomarkers. Clostridium novyi. is a
nal organs (brains, hearts and ceca) were sampled in this pathogenic (Watanabe et al., 2019) anaerobic bacterium.
study, and microorganisms of ceca served as a contrast It is found in soil and faeces and becomes abundant with
to the postmortem microorganisms in brain and heart the consumption of oxygen. Anaerosalibacter bizertensis.
samples. Furthermore, the decomposition of these sam- is a kind of salt-tolerant microorganism (Rezgui et al.,
ples can represent the decomposition of cranial, thoracic 2012). These dominant postmortem microorganisms
and abdominal cavities. Referring to previous work detected in brains and hearts are known to be pathogenic
(Burcham et al., 2019; Can, Javan, Pozhitkov, & Noble, microorganisms. We infer that the immune system breaks
2014), most internal organs such as brains or hearts are down after death (Alan & Sarah, 2012), followed by prolif-
devoid of microorganisms in a healthy condition in the eration of microorganisms from the digestive system to
mammalian body, which is in accordance with the fact the whole body through capillaries of the vascular and
Fig. 6. The prediction results of the ANN model based on 45 biomarkers in the intestine.
In detail, the main picture shows the prediction time of all testing data, and the subgraph in the upper left specifically shows the prediction
corresponding to mouse remains that decomposed within 24 h. The diagonal value is the actual PMI the remains. The closer the predicted value
is to the diagonal, the more accurate this prediction is.
lymphatic system (Paczkowski & Schutz, 2011) and the Enterococcaceae and Clostridiaceae increased during
mucus membranes in the respiratory system (Gill the 15-day decomposition. The former two were inhered
et al., 1976). in the intestinal tract, and the following two were reported
A significant difference was observed between the as important contributors to decomposition. Additionally,
microbial communities of the brain, heart and cecum dur- at the genus level, facultative anaerobes such as Entero-
ing 7-day decomposition, as reported previously on coccus, Dubosiella and Anaerosalibacter (Fig. 4A),
human cadavers (Burcham et al., 2019). Interestingly, E. which are widely recognized as opportunistic pathogens
faecalis, C. cochlearium and A. bizertensis all appeared and are associated with sewage and animal matter,
in the three organs in the advanced decomposition of became abundant after rupture. Anaerosalibacter is a
corrupted tissues (Fig. 2A). This demonstrates that the halotolerant bacterium isolated from sludge (Rezgui
decomposers of different organs were similar at et al., 2012) and stool (Dione et al., 2016). The
advanced decay stages. Previous studies (Brenner et al., Lachnospiraceae NK4A136 group and Bacteroidetes,
2001) also showed that C. cochlearium, which was dis- which occur in the human and mammal gut microbiota
covered in soil, human oral cavity, faeces, wounds and (Debruyn and Hauther, 2017), became less abundant
septic infections, is a crucial contributor to the decompo- after rupture at day 4. Several species also showed indic-
sition of remains. In addition, E. faecalis caused the ative changes in relative abundance during PMI in this
majority of human enterococcal infections and contrib- study. For instance, Anaerosalibacter massiliensis, Pro-
uted to the decomposition of corpses in previous reports teus mirabilis, Clostridium tetani E88, and L. reuteri
(Metcalf et al., 2013). These three species may help increased, whereas, L. johnsonii decreased with PMI.
describe the characteristics of different decay stages in Enterococcus faecalis. and F. bacterium M10-2
the carrion decomposition process. increased, following a decrease during the decomposition
To discover more significant microbial features of inter- procedure. These species may provide important infor-
nal organs during the decomposition period, cecum sam- mation in the PMI estimation model. In conclusion, based
ples were taken as an example, as they had a more on the trends in bacterial composition changes in this
abundant and robust profile of microorganisms than brain study, we found that the diversity and evenness of the
and heart samples did. Firmicutes, Bacteroidetes, Prote- cecum microbiome gradually decreased within 15 days of
obacteria and Actinobacteria were predominant in all decomposition. Also, in the early stage of death, micro-
PMI groups at the phylum level, which was coincidence bial participants depend on bacteria inherent in ceca,
with several reports based on the human corpse system which were similar to those in the stool of healthy
(Finley et al., 2015). Firmicutes were found in the oral humans, as reported by the Human Microbiome Project
cavity during prebloat, while Proteobacteria were found (Huttenhower et al., 2012). These shifts in bacteria taxa
during postbloat (Hyde et al., 2013). Proteobacteria are were related to the shifts in functional gene abundances,
commonly associated with the spoiling of meat and have as the functional prediction made from PICRUSt. We
been found on the hides of slaughtered animals (Gill and detected markedly predicted increases in genes associ-
Newton, 1978). Actinobacteria play a vital role in the ated with energy metabolism including carbohydrate
recycling of refractory biological components through metabolism and amino acid metabolism on day 7 and
decomposition and humus formation (Gill and Newton, day 15, which was decreased on day 4 (Fig. 5C). These
1978), and a group of Actinobacteria with filamentous results suggested that the level of amino acid and carbo-
form have been shown to be important decomposers of hydrate degradation inside murine corpses decreased
polysaccharides, proteins and lipids through the genera- after the body rupture on day 3 because nutrient-rich
tion of extracellular enzymes, and have been found in fluids permeated the surrounding environment of the
grave soil and bones at an advanced decay stage in corpses. Consistently, the functional gene abundances
dead bodies (Cobaugh et al., 2015; Damann et al., associated with amino acid metabolism were significantly
2015). We found that relative abundance of increased in grave soil samples around human corpses
Epsilonbacteraeota decreased immediately after death (Metcalf et al., 2016). Based on the postmortem micro-
and disappeared at day 7, which is consistent with previ- biome correlation between this mouse decomposing sys-
ous reports of many Epsilonbacteraeota species coloniz- tem and the human decomposing system, the microbes
ing in the digestive tracts of animals and serving as found related to decomposition in this work could provide
symbionts or pathogens. In addition, they reported that fruitful information for projecting to humans in forensics
carbon, nitrogen and sulphur cycling are the primary investigations.
drivers of functional divergence in environmental Next, we focused on establishing a mathematical
Epsilonbacteraeota (Waite et al., 2017). At the family model to predict PMI more accurately. Some previous
level, the relative abundances of Ruminococcaceae and science scientific research teams utilized class- or
Lachnospiraceae decreased, while those of phylum-level taxonomy data sets to predict PMI with
12 R. Liu et al.
machine learning methods (Johnson et al., 2016; Belk decomposition process. Therefore, the longer time frame
et al., 2018). Johnson and colleagues (2016) developed to study microbial community changes and the effect of
a k-nearest neighbour regressor, combined with the temperature and humidity on microbial succession need
microbial data set from all nasal and ear samples at the to be improved and established in future work. In addi-
phylum level, which predicted the PMI with an average tion, the sample positions in this study only reflected the
error of 55 accumulated degree days, approximately internal postmortem microbial succession. The influence
3 days. Metcalf and colleagues (2013) established a RFs of outside environment and complicated inter-reaction
model combined with cadaver-associated skin data set with the hosts of microbiome should be focused in future
for PMI prediction, which allowed them to estimate PMI studies. This present work formed a preliminary basis for
within 3.30 2.52 days (MAE SD) during 34-day future investigations with human remains. However, how
decomposition. To the best of our knowledge, this is the to leverage the results of this study in practical forensic
first study to use OTU level data sets combined with an investigations with human corpses needs further studies.
ANN model to estimate PMI. In this study, we removed In conclusion, the present study evaluated the post-
OTUs with low relative abundances and less relative mortem microbiome community structure during the
abundance variance during decomposition, which were 15-day decomposition. Consistent with previous works,
considered as noise and low contribution features in this the microbial decomposers of the brain, heart and cecum
study, and finally, 891, 160 and 116 features were identi- became more similar as PMI progressed. We utilized
fied in the cadaver-associated cecum, brain and heart machine learning algorithms to evaluate the effectiveness
bacterial data set respectively. Figure 1 indicates that the of different data sets, and the results indicate that the
cecum data set predicted PMI through three models with most robust model of PMI estimation is postmortem bac-
lower MAE and higher R2 than the brain and heart data teria data in cecum samples associated with the ANN
sets. The results of this data mining indicate that the most model at the OTU level. We also detected several spe-
robust models in PMI estimation are the cadaver- cies (Clostridium tetani E88, A. massiliensis, L. reuteri, E.
associated cecum data set combined with the ANN faecalis, F. bacterium M10-2) that may facilitate the
model and 16S rRNA gene marker summarized at OTU establishment of PMI prediction models. Finally, we
level taxonomies. Among the 891 features ranked by the developed an ANN model that predicted PMI with an
support vector machine recursive feature elimination MAE of 14.5 4.4 h during 15-day decomposition and
(SVM-RFE) and RF models, the top 45 biomarker sets 1.5 0.8 h within 24-h decomposition. This study pro-
performed better than the other three sets with the mini- vides directions for further research on leveraging
mized MAE and maximized R2 (Table 3). Regarding spe- machine learning models with microbial-based data sets
cies annotation, Clostridium tetani E88, A. massiliensis, for estimating PMI.
Vagococcus fluvialis, Candidatus Arthromitus sp. SFB
mouse Japan and Lactobacillus animals were included
in the ANN model for PMI estimation. Because microbial Experimental procedures
succession is more rapid and diverse before 24 h than in
Sample collection and experimental set-up
later stages of decomposition (Metcalf et al., 2013; Cob-
augh et al., 2015; Metcalf et al., 2016), we utilized the In this experiment, 240 mice (strain C57BL/6, 18–25 g,
ANN model to predict PMI specifically within 24 h. Our males, 6–10 weeks) were acquired from the Experimental
ANN model predicted PMI with an MAE of 14.483 Animal Centre of Xi’an Jiaotong University and cohoused
(4.443) h during 15-day decomposition and 1.528 and given a 2-week normal diet. After sacrifice by cervi-
(0.814) h within 24-h decomposition. cal dislocation, the mice were placed on sterile plates
This result demonstrated that the ANN model with pre- with ambient temperature (Ta, 25 1.5 C) and moderate
dictable postmortem microbiome data sets is more accu- relative humidity (RH, 50 7%). This work used only
rate for PMI estimation during 15-day decomposition. dead animals that had been sacrificed as a by-product of
Currently, the greatest methodological and technological other research and was deemed not to be animal
gap (Metcalf, 2019) in developing generalizable microbial research by the Institutional Animal Use and Care Com-
models for predicting PMI is the limitation of the size of mittee of Xi’an Jiaotong University. Mice were euthanized
cadaver-associated data sets and low sampling fre- humanely under approved approval No. 2017-288. We
quency. Our research team is currently trying to over- recorded a visual body score estimate for the head and
come these barriers and establish lager data set torso according to Megyesi and colleagues (2005))
(240 cecum samples) to train, test and generate robust (Table 1). Ten time points (0 h, 8 h, 12 h, 1 day, 2 days,
models. However, we have a limited sampling time 4 days, 7 days, 10 days, 13 days and 15 days) were set
frame, and the longer sampling span will allow for a more spanning 15 days, and at each time point, brains, hearts
thorough understanding of microbial behaviour in the and ceca were removed from 24 mice under strict aseptic

operation and stored at −80 C until further processing. were removed, only the raw data remained. To obtain
The organs could not be removed from the remains clean data, chimaeras were filtered and trimmed
because of advanced corruption occurring from 15 days according to the species annotation database in the
after death. In total, 144 samples (D2Brain, D4Brain, VSEARCH pipeline. High-throughput sequencing reads
D7Brain, D10Brain, D13Brain, D15Brain, D2Heart, were classified into OTUs in the UPARSE pipeline
D4Heart, D7Heart, D10Heart, D13Heart, D15Heart, (v7.0.1001) within a 0.03 difference (equivalent to 97%
D2Cecum, D4Cecum, D7Cecum, D10Cecum, similarity) (Haas et al., 2011).
D13Cecum and D15Cecum) were used to compare the The Mothur method and SSUrRNA database in
differences in microbial composition between different SILVA132 were utilized for taxonomy assignment, which
organs. A total of 240 cecum samples (H0Cecum, was performed at various levels according to representa-
H8Cecum, H12Cecum, D1Cecum, D2Cecum, D4Cecum, tive sequences of each OTU. The representative
D7Cecum, D10Cecum, D13Cecum and D15Cecum) sequences of all OTUs were then aligned to the
were used in deep learning algorithms to estimate PMI. Greengenes reference alignment using MUSCLE (Quast
et al., 2013) (Version 3.8.31, http://www.drive5.com/
muscle/), and this alignment was used to construct a phy-
DNA extraction, PCR amplification and high-throughput
logenetic tree. Data from each sample were homoge-
sequencing
nized, with the standard of the least amount of data in
Total genomic DNA was extracted from 25 mg of organ the samples. Alpha diversity was measured by the Shan-
samples (brains, hearts and ceca) using QIAamp® DNA non, Chao1, Simpson and ACE indexes in the bacterial
Mini Kit (Qiagen, Germany) following the protocols communities in Mothur software (Tai et al., 2016).
recorded in the handbook. Final purified DNA was eluted All statistical analyses were conducted in the R envi-
in a final volume of 100 μl. The extracted bacterial geno- ronment (v2.15.3; http://www.r-project.org/). To assess
mic DNA was estimated by measuring the absorbance at the microbial diversity and abundance, the alpha (α)
260 nm (A260) using a NanoDrop spectrophotometer diversities of OTU richness and various indexes
(Thermo Scientific, Inc., Waltham, MA, USA) before (Shannon index, Chao 1, ACE, Simpson) were calcu-
downstream processing. Only samples harbouring at lated, while the microbial beta diversity was estimated
least 1 ng μl−1 DNA and yielding a clean spectrogram according to the UniFrac distance between the samples.
with a peak at 260 nm were included in subsequent To measure the sequencing depth, the coverage index
amplification steps. was used. The PD_whole_tree index described the phy-
For IonS5™XL sequencing, total genomic DNA was logenetic diversity of the microbiome in the samples. Beta
subjected to PCR amplification targeting an informative diversity was calculated to compare differences between
portion of the 16S rRNA variable region 3–4 (V3, V4) microbial community profiles, and the data are shown as
using the bacterial primers 341F (50 - the results of PCoA. Principal component analysis, PCoA
0
CCTAYGGGRBGCASCAG-3 ) and 806R (50 - and NMDS coordination analysis were performed to visu-
0
GGACTACNNGGGTATCTAAT-3 ). Negative controls alize the similarities or dissimilarities of variables that
included no template controls for DNA extraction and best represented the pairwise distances between sample
PCR amplification. PCR was performed using a T100™ groups, which were displayed by the WGCNA package,
Thermal Cycler (Bio-Rad, Hercules, USA) with the follow- stat packages and ggplot2 package in R software
ing cycling parameters: 95 C for 5 min, followed by (v2.15.3).
34 cycles of 94 C for 1 min, 57 C for 45 s, 72 C for Statistical analyses were performed to find significant
1 min, and a final elongation step of 72 C for 10 min and, differences in microbial community structure between
finally, 16 C for 5 min. PCR products were detected by groups. To identify microbial markers in specific organs
agarose gel electrophoresis. The target bands were or at specific time points, Student T-test, MetaStat, Sim-
removed under a UV lamp and recovered by a GeneJET per analyses and LEfSe analysis were implemented.
Gel Extraction Kit (Thermo Scientific). Using the Ion Plus UPGMA (Li and Xu, 2007) is a hierarchical clustering
Fragment Library Kit 48 rxns (Thermo Scientific), we fin- method that follows a three-step procedure. First,
ished library construction. After Qubit quantification and UPGMA begins with the two closest OTUs and combines
library testing, each amplicon pool was sequenced using them to form a new OTU. Second, the arithmetic mean
one lane of the Ion S5™XL plate. distances between the new OTU and all other OTUs
High-throughput sequencing reads were merged and were recalculated. Finally, the process was repeated,
quality filtered using Cutadapt (Langille et al., 2013a) and a complete phylogenetic tree was obtained. Analysis
(V1.9.1, Langille, 2013b). After removing low-quality of molecular variance was utilized to test discrepancy
sequences, reads were divided into groups according to among groups based on weighted or non-weighted
the barcode. When the barcode and primer sequences UniFrac distance matrix in Mothur software. The
14 R. Liu et al.
PICRUSt (Langille et al., 2013a) together with the KEGG two parts: 70% was the training data and the remaining
database as a reference was utilized to predict meta- 30% was the testing data. The accuracy of the models
genomic functional content based on the 16S rRNA was measured by the MAE, which calculated the devia-
sequences (Kanehisa et al., 2014). tion of the predicted from observed values and represen-
ted the average prediction error in the same unit of the
original data. The efficiency of the models was evaluated
Computational methods
by the goodness-of-fit test (R-squared, R2) (Rights and
Data materials. The data for this project were produced by Sterba, 2019). In particular, each model was run and
collecting postmortem organic samples of mouse cadavers evaluated 15 times in each data set, and the mean value
during decomposition in controlled environmental condi- of the MAE and R2 was shown in Fig. 1.
tions. DNA collected from each sample was analysed using
16S rDNA high-throughput sequencing, which allowed for Significant microbial markers selection. In this study, the
species detection and relative abundance calculation in all identification of strain biomarkers was built on an SVM-
samples. Therefore, the raw data included the relative RFE model and RF regression model. Again, each model
abundance of bacterial sequences in postmortem organs was run 15 times, and their feature importance gave the
(brains, hearts and ceca) and PMI labels. The sample sizes ranking of features. Finally, we summed up all rankings
of the brain, heart and cecum data sets were 48, 48 and of corresponding features in each experiment of the two
240 respectively. Brain and heart data included six PMI models. Then, features were reordered to generate the
points (including 48, 96, 168, 240, 312 and 360 h after final feature ranking list.
death). Four more time points (including 0, 8, 12 and 24 h)
were obtained in cecum data set.
Deep learning algorithms applied in PMI estimation. To
Data preprocessing. Each sample in the brain data set determine the most robust model, we applied SVM-RFE
and heart data set contained the relative abundance values model, RF regression model and ANN model to the brain,
of 3414 different taxa. Moreover, the individual sample in heart, cecum postmortem microbial data sets and evalu-
the cecum data set contained 4099 different taxa. However, ated the model performance with the MAE and R2 of PMI
the relative abundance values of many taxa were mostly prediction.
zero in these three data sets, and data preprocessing was RF (Breiman, 2001; Mamoshina et al., 2018) is a clas-
required. This preprocessing process consisted of three sification and regression model. This model assembles
steps. First, the taxon whose relative abundance was zero many tree-structured predictors and aggregates their
in most samples in each data set was removed. Specifi- results (majority vote for classification and averaging for
cally, samples from each data set were grouped according regression). Tree-structured predictions are generated
to the time point of death. Hence, there are 10 groups, six based on the bootstrap samples of the training data, and
groups and six groups of samples in the cecum data set, the candidate feature subset of this tree is randomly
brain data set and heart data set respectively. For each selected (Svetnik et al., 2003). Each tree grows to its
data set, if the relative abundance of a taxon was zero in maximum depth without pruning. In addition, the RF
more than 60% samples in all groups, then this taxon was model also assesses the importance of different features
deleted in this data set. Second, taxa with low relative (Breiman, 2001; Kamin ska, 2019). In this study, a RF
abundances were removed. Briefly, if the mean relative model was established with regression trees. For every
abundance of a taxon was lower than 3 in all groups of a regression tree, training samples are chosen with
data set, then it was removed from this data set. Third, the replacement from the original training data set. A candi-
taxon with low relative abundance variance was removed. date feature subset is randomly selected from the feature
For each taxon, we calculated the variance of the mean rel- set, and each node of this tree will choose one feature
ative abundance in each group. Then, the taxa from this subset to split. The splitting criterion of a regres-
corresponding to the first 80% of the variance in each data sion problem minimizes the mean squared error at the
set were regarded as informative features and reserved. All node. During the process of generating a forest, the size
removed features are considered as noise and low contri- of the candidate feature subset is fixed. For the regres-
bution features in this study. Finally, samples in the cecum sion problem, the output of this forest is formed by aver-
data set, brain data set and heart data set contained aging all regression trees. The importance of features
891, 160 and 116 OTUs respectively. (Archer and Kimes, 2008; Genuer et al., 2010) was deter-
mined using out-of-bag observations. Specifically, the
Data set selection. RF, SVM and ANN models were importance of a variable was measured by calculating
applied to these three data sets to predict the PMI of the the increase in the prediction error (mean square error)
mice postmortem model. Each data set was divided into on the condition that only that variable is permuted.
A SVM based on recursive feature elimination (Guyon
RF random forest
et al., 2002; Sahran et al., 2018) is a popular feature
SVM support vector machine.
selection method. In this study, we used the linear SVM-
RFE to perform feature selection. The steps involved in
the SVM-RFE algorithm can be briefly described as fol- ACKNOWLEDGEMENTS
lows: (i) train a linear SVM based on the training data set; The authors thank Prof. Jiru Xu for scientific design and meth-
(ii) compute the ranking criteria of features according to odology direct. The authors also thank Shanghai Majorbio
their weights in the SVM model. Specifically, ranking Bio-pharm Technology and Novogene for insightful assistance
criteria ci of the feature is ci = (wi)2, and wi is the with the DNA sequencing and metagenomics analysis. This
corresponding weight of this feature; (iii) rank the features study was funded by the Council of the National Natural Sci-
by their importance and eliminate the feature with ence Foundation of China (Nos. 81730056, 81722027,
81273339, 11801435 (MS)). The funders had no role in the
smallest ranking criterion; (iv) update the feature rank list;
study design, data collection and analysis, decision to publish,
and (v) repeat these processes. Moreover, a cross- or preparation of the manuscript.
validation algorithm with 10-fold is combined with SVM-
RFE to determine the number of features adaptively.
ANN model (LeCun et al., 2015; Sahran et al., 2018) was AUTHOR CONTRIBUTIONS
used in this study with four neuron layers (including two hid-
Ruina Liu, Qi Wang and Zhenyuan Wang contributed to the
den layers). The ANN model tends to make every neuron
conception and design of the work. Ruina Liu, Haohui Zhang,
have a similar importance, so it is not suitable for feature
Di Wu, Kai Yu, Wumin Cai and Gongji Wang contributed to the
selection. The input layer has 45 neurons that correspond to
acquisition of data. Ruina Liu, Yuexi Gu and Mingwang Shen
the biomarkers in cecum as described previously. Two hid-
contributed to the analysis and interpretation of data. All co-
den layers have 23 and 12 neurons respectively. The output
authors contributed to the critical revision of the manuscript.
layer has only one neuron to yield the predicted value of
All co-authors contributed to the final approval of the manu-
PMI. In the last three layers, the rectified linear unit activa-
script. Ruina Liu and Yuexi Gu contributed to the drafting the
tion function was used to enhance model non-linearity. The
article. All the authors fulfil the ICMJE criteria for authorship.
error function in this model is the mean square error. More-
over, to minimize this error function, the backpropagation
algorithm is used to adjust the weights of connections in the DATA AVAILABILITY STATEMENT
network. The performance of the model is evaluated based
The data sets used and analysed during the current study
on the MAE and goodness-of-fit of the testing data.
are available from the corresponding author on reasonable
request.
Evaluation of the regression model. We calculate the
MAE and R2 on the testing data to assess the regression
Reference
model performance. To simplify, we assume that the
observed value and prediction value of N samples are yi Alan, G., and Sarah, J.P.J.T.B. (2012) Microbes as forensic
and yî ði = 1,2, …, NÞ. The MAE of all these samples is cal- indicators. 29: 311.
Archer, K.J., and Kimes, R.V. (2008) Empirical characteriza-
PN
culated as MAE = N1 jyî −yi j . This evaluation criterion tion of random forest variable importance measures. Com-
i=1 put Stat Data Anal 52: 2249–2260.
can intuitively measure the error of time prediction. R2 Belk, A., Xu, Z.Z., Carter, D.O., Lynne, A., Bucheli, S.,
describes how well the model fits the observations Knight, R., and Metcalf, L.J. (2018) Microbiome data accu-
PN rately predicts the postmortem interval using random for-
ðyî − yÞ
2
est regression models. Genes 9: 104.
(Lewis-Beck and Skalaban, 1990). In detail, R2 = P
i =1
N Benbow, M.E., Pechal, J.L., Lang, J.M., Erb, R., and
ðyi − yÞ2
i =1
Wallace, J.R. (2015) The potential of high-throughput
metagenomic sequencing of aquatic bacterial communi-
, where yî , yi and y are the prediction value, observed ties to estimate the postmortem submersion interval.
value and the mean value of observations respectively. J Forensic Sci 60: 1500–1510.
Breiman, L.J.M.L. (2001) Random Forests. Mach Learn
45: 5–32.
Abbreviations Brenner, D.J., Krieg, N.R., and Staley, J.T. (2001) Bergey’s
Manual® of Systematic Bacteriology, Vol. 38. New York:
AI artificial intelligence Springer-Verlag, pp. 443–491.
ANN artificial neural network Burcham, Z.M., Pechal, J.L., Schmidt, C.J., Bose, J.L.,
MAE mean absolute error Rosch, J.W., Benbow, M.E., and Jordan, H.R. (2019) Bac-
PMI postmortem interval terial community succession, transmigration, and
16 R. Liu et al.
differential gene transcription in a controlled vertebrate diversity of the healthy human microbiome. Nature 486:
decomposition model. Front Microbiol 10: 745. 207–214.
Can, I., Javan, G.T., Pozhitkov, A.E., and Noble, P.A. (2014) Dis- Hyde, E.R., Haarmann, D.P., Lynne, A.M., Bucheli, S.R.,
tinctivethanatomicrobiome signatures found in the blood and and Petrosino, J.F.J.P.O. (2013) The living dead: bacterial
internal organs of humans. J Microbiol Methods 106: 1–7. community structure of a cadaver at the onset and end of
Cobaugh, K.L., Schaeffer, S.M., and DeBruyn, J.M. (2015) the bloat stage of decomposition. PLoS One 8: e77733.
Functional and structural succession of soil microbial com- Hyde, E.R., Metcalf, J.L., Bucheli, S.R., Lynne, A.M., and
munities below decomposing human cadavers. PLoS One Knight, R. (2017) Microbial Communities Associated with
10: e0130201. Decomposing Corpses. In Forensic Microbiology. Carter,
Damann, F.E., Williams, D.E., and Layton, A.C. (2015) D.O., Tomberlin, J.K., Benbow, M.E. and Metcalf,
Potential use of bacterial community succession in J.L. (eds). Hobeken, NJ: Wiley.
decaying human bone for estimating postmortem interval. Javan, G.T., Finley, S.J., Can, I., Wilkinson, J.E., Hanson, J.
J Forensic Sci 60: 844–850. D., and Tarone, A.M. (2016) Human Thanatomicrobiome
Debruyn, J.M., and Hauther, K.A.J.P. (2017) Postmortem Succession and Time Since Death. Sci Rep 6: 29598.
succession of gut microbial communities in deceased Johnson, H.R., Trinidad, D.D., Guzman, S., Khan, Z., Parziale, J.
human subjects. PeerJ 5: e3437. V., DeBruyn, J.M., and Lents, N.H. (2016) A machine learning
Dione, N., Sankar, S.A., Lagier, J.C., Khelaifia, S., approach for using the postmortem skin microbiome to esti-
Michele, C., Armstrong, N., et al. (2016) Genome mate the postmortem interval. PLoS One 11: e0167370.
sequence and description of Anaerosalibacter mas- Kaatsch, H.J., Stadler, M., and Nietert, M. (1993) Photomet-
siliensis sp. nov. New Microbes New Infect 10: 66–76. ric measurement of color changes in livor mortis as a func-
Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., tion of pressure and time. Int J Legal Med 106: 91–97.
Blau, H.M., and Thrun, S.J.N. (2017) Dermatologist-level Kamin ska, J.A. (2019) A random forest partition model for
classification of skin cancer with deep neural networks. predicting NO2 concentrations from traffic flow and meteo-
Nature 542: 115–118. rological conditions. Sci Total Environ 651: 475–483.
Feng, H., Zhang, X., and Zhang, C.J.N.C. (2015) mRIN for Kanehisa, M., Goto, S., Sato, Y., Kawashima, M.,
direct assessment of genome-wide and gene-specific Furumichi, M., and Tanabe, M. (2014) Data, information,
mRNA integrity from large-scale RNA-sequencing data. knowledge and principle: back to metabolism in KEGG.
Nat Commun 6: 7816. Nucleic Acids Res 42: D199–D205.
Ferreira, P.G., Muñozaguirre, M., Reverter, F., Godinho, C. Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S.,
P.S., Sousa, A., Amadoz, A., et al. (2018) The effects of Liang, H., Baxter, S.L., et al. (2018) Identifying medical
death and post-mortem cold ischemia on human tissue diagnoses and treatable diseases by image-based deep
transcriptomes. Nat Commun 9: 490. learning. Cell 172: 1122–1131.e1129.
Finley, S.J., Benbow, M.E., and Javan, G.T. (2015) Microbial Kikuchi, K., Kawahara, K.I., Biswas, K.K., Ito, T.,
communities associated with human decomposition and Tancharoen, S., Shiomi, N., et al. (2010) HMGB1: a new
their potential use as postmortem clocks. Int J Legal Med marker for estimation of the postmortem interval. Exp
129: 623–632. Therap Med 1: 109.
Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010) Vari- Langille, M.G., Zaneveld, J., Caporaso, J.G.,
able selection using random forests. Pattern Recognit Lett McDonald, D., Knights, D., Reyes, J.A., et al. (2013a)
31: 2225–2236. Predictive functional profiling of microbial communities
Gill, C.O., Penney, N., and Nottingham, P.M. (1976) Effect of using 16S rRNA marker gene sequences. Nat Bio-
delayed evisceration onthe microbial quality of meat. Appl technol 31: 814–821.
Environ Microbiol 31: 465–468. Langille, M.G.I., Jesse, Z., J Gregory, C., Daniel, M.D., Dan,
Gill, C.O., and Newton, K.G. (1978) The ecology of bacterial K., Reyes, J.A. et al. (2013b) Predictive functional profiling
spoilage of fresh meat at chill temperatures. Meat Sci 2: of microbial communities using 16S rRNA markergene
207–217. sequences. 31: 814.
Guo, J., Fu, X., Liao, H., Hu, Z., Long, L., Yan, W., et al. LeCun, Y., Bengio, Y., and Hinton, G. (2015) Deep learning.
(2016) Potential use of bacterial community succession Nature 521: 436–444.
for estimating post-mortem interval as revealed by high- Lewis-Beck, M.S., and Skalaban, A. (1990) The R-squared:
throughput sequencing. Sci Rep 6: 24197. some straight talk. Political Anal 2: 153–171.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V.J.M.L. Li, C., Ma, D., Deng, K., Chen, Y., Huang, P., and Wang, Z.
(2002) Gene selection for cancer classification using sup- (2017) Application of MALDI-TOF MS for estimating the
port vector machines. Mach Learn 46: 389–422. postmortem interval in rat muscle samples. J Forensic Sci
Haas, B.J., Gevers, D., Earl, A.M., Feldgarden, M., Ward, D.V., 62: 1345–1350.
Giannoukos, G., et al. (2011) Chimeric 16S rRNA sequence Li, W.C., Ma, K.J., Lv, Y.H., Zhang, P., Pan, H., Zhang, H.,
formation and detection in sanger and 454-pyrosequenced et al. (2014) Postmortem interval determination using 18S-
PCR amplicons. Genome Res 21: 494–504. rRNA and microRNA. Sci Justice 54: 307–310.
Henßge, C., and Madea, B. (1996) Estimation of the time Li, Y., and Xu, L. (2007) Improvement for unweighted pair
since death in the early post-mortem period. Forensic Sci group method with arithmetic mean and its application.
Int 144: 167–175. J Beijing Univ Technol 33: 1333–1339.
Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Mamoshina, P., Volosnikova, M., Ozerov, I.V., Putin, E.,
and Badger, J.H. (2012) Structure, function and Skibina, E., Cortese, F., and Zhavoronkov, A. (2018)
Machine learning on human muscle transcriptomic data Tai, N., Peng, J., Liu, F., Gulden, E., Hu, Y., Zhang, X., et al.
for biomarker discovery and tissue-specific drug target (2016) Microbial antigen mimics activate diabetogenic
identification. Front Genet 9: 242. CD8 T cells in NOD mice. J Exp Med 213: 2129–2146.
Megyesi, M.S., Nawrocki, S.P., and Haskell, N.H. (2005) Waite, D.W., Vanwonterghem, I., Rinke, C., Parks, D.H.,
Using accumulated degree-days to estimate the postmor- Zhang, Y., Takai, K., et al. (2017) Comparative genomic
tem interval from decomposed human remains. J Forensic analysis of the class epsilonproteobacteria and proposed
Sci 50: 618–626. reclassification to epsilonbacteraeota (phyl. nov.). Front
Metcalf, J.L. (2019) Estimating the postmortem interval using Microbiol 8: 682.
microbes: knowledge gaps and a path to technology adop- Wang, L., Chen, M., Wu, B., Liu, Y.C., Zhang, G.F.,
tion. Forensic Sci Int Genet 38: 211–218. Jiang, L., et al. (2018) Massively parallel sequencing of
Metcalf, J.L., Parfrey, L.W., Gonzalez, A., Lauber, C.L., forensic STRs using the ion chef™ and the ion S5™ XL
Dan, K., Ackermann, G., et al. (2013) A microbial clock systems. J Forensic Sci 63: 1692–1703.
provides an accurate estimate of the postmortem interval Wang, Q., He, H., Li, B., Lin, H., Zhang, Y., Zhang, J., and
in a mouse model system. Elife 2: e01104. Wang, Z.J.P.O. (2017a) UV-Vis and ATR-FTIR spectro-
Metcalf, J.L., Xu, Z.Z., Weiss, S., Lax, S., Van Treuren, W., scopic investigations of postmortem interval based on the
Hyde, E.R., et al. (2016) Microbial community assembly changes in rabbit plasma. PLoS One 12: e0182161.
and metabolic function during mammalian corpse decom- Wang, Q., Zhang, Y., Lin, H., Zha, S., Fang, R., Wei, X.,
position. Science 351: 158–162. et al. (2017b) Estimation of the late postmortem interval
Paczkowski, S., and Schutz, S. (2011) Post-mortem volatiles using FTIR spectroscopy and chemometrics in human
of vertebrate tissue. Appl Microbiol Biotechnol 91: skeletal remains. Forensic Sci Int 281: 113–120.
917–935. Watanabe, N., Kobayashi, K., Hashikita, G., Taji, Y., Ishibashi,
Pechal, J.L., Crippen, T.L., Benbow, M.E., Tarone, A.M., N., Sakuramoto, S. et al. (2019) Hepatic gas gangrene cau-
Dowd, S., and Tomberlin, J.K. (2014) The potential use of sed by Clostridium novyi. Anaerobe 57: 90–92.
bacterial community succession in forensics as described Wells, J.D., Lecheta, M.C., Moura, M.O., and Lamotte, L.R.
by high throughput metagenomic sequencing. Int J Legal (2015) An evaluation of sampling methods used to pro-
Med 128: 193–205. duce insect growth models for postmortem interval estima-
Pechal, J.L., Schmidt, C.J., Jordan, H.R., and Benbow, M.E. tion. Int J Legal Med 129: 405–410.
J.S.R. (2018) A large-scale survey of the postmortem Young, S.T., Wells, J.D., Hobbs, G.R., and Bishop, C.P.
human microbiome, and its potential to provide insight into (2013) Estimating postmortem interval using RNA degra-
the living health condition. Sci Rep 8: 5724. dation and morphological changes in tooth pulp. Forensic
Poloz, Y.O., and O’Day, D.H. (2009) Determining time of Sci Int 229: 163.e161–163.e166.
death: temperature-dependent postmortem changes in Zhang, J., Li, B., Wang, Q., Wei, X., Feng, W., Chen, Y.,
calcineurin A, MARCKS, CaMKII, and protein phospha- et al. (2017) Application of Fourier transform infrared
tase 2A in mouse. Int J Legal Med 123: 305–314. spectroscopy with chemometrics on postmortem interval
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., estimation based on pericardial fluids. Sci Rep 7:
Yarza, P., et al. (2013) The SILVA ribosomal RNA gene 18013.
database project: improved data processing and web-
based tools. Nucleic Acids Res 41: D590–D596. APPENDIX
Rezgui, R., Maaroufi, A., Fardeau, M.L., Ben Ali Gam, Z.,
Cayol, J.L., Ben Hamed, S., and Labat, M. (2012) Anaer- Mathematics
osalibacter bizertensis gen. nov., sp. nov., a halotolerant
bacterium isolated from sludge. Int J Syst Evol Microbiol
The deviation of support vector regression mode
62: 2469–2474.
Rights, J.D., and Sterba, S.K. (2019) Quantifying explained vari-
ance in multilevel models: an integrative framework for defin- 1.Prediction function the more accurate this
ing R-squared measures. Psychol Methods 24: 309–338. prediction is
Sahran, S., Albashish, D., Abdullah, A., Shukor, N.A., and
Hayati Md Pauzi, S. (2018) Absolute cosine-based SVM-
RFE feature selection method for prostate histopathologi-
cal grading. Artif Intell Med 87: 78–90.
f ðxi Þ = wT xi + b:
Sara, C.Z., Menéndez, S.T., Paula, N.E.J.C., and Cmls, M.L.
S. (2014) Cell death proteins as markers of early postmor-
tem interval. Cell Mol Life Sci 71: 2957. 2.Original problem
Segata, N., Izard, J., Waldron, L., Gevers, D., Miropolsky, L.,
This is an SVM regression problem with the pipeline, and
Garrett, W.S., and Huttenhower, C. (2011) Metagenomic
the objective function is
biomarker discovery and explanation. Genome Biol
12: R60.
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R. 1 Xn
min kwk2 + C lε ð f ðxi , yi ÞÞ
P., and Feuston, B.P. (2003) Random forest: a classifica- w,b 2
i=1
tion and regression tool for compound classification and
QSAR modeling. J Chem Inf Comput Sci 43: 1947–1958.
The loss function of this problem is defined as
18 R. Liu et al.

0, ifj f ðxi Þ −yi j ≤ ε _ ^ _ ^ _ ^
lε ð f ðxi , yi ÞÞ = L w, b, ξ i , ξi , αi , αi , βi , βi
j f ðxi Þ−yi j −ε, otherwise
X
n
^ _
X
n
^ _

The quadratic programming problem is obtained as = αi −αi yi −ε αi −αi
i=1 i=1
n
X _ 1X n X n
^ _
^ _
1 ^ − αi − αi α j − α j xTi x j
min_ kwk2 + C ξi + ξ i 2 i=1 j=1
^
2 i=1
w, b,ξ, ξ
_
Then, the dual problem of the original problem is
s:t: f ðxi Þ− yi ≤ ε + ξi
^
yi −f ðxi Þ ≤ ε + ξi X
n X
n
^ _ ^ _
_ max
_ ^
αi −αi yi −ε αi −αi
αi , αi i = 1
ξi ≥ 0 i=1
^
1 X n X n ^ _
ξi ≥ 0 −
^ _
αi − αi α j − α j xTi x j s:t:
2 i=1 j=1
By introducing Lagrange multipliers, the Lagrangian X
n
^ _
_ ^
αi −αi = 0 0 ≤ αi ≤ C 0 ≤ αi ≤ C
function is constructed as
i=1
_ _ _
^ ^
^
L w, b, ξi , ξi , αi , αi , βi , βi The KKT condition needed to satisfy is
n _ X 8
1 X ^
n _ _
>
_
= kwk2 + C ξi + ξ i + αi wT xi + b −yi −ε −ξi >

> f ð x Þ − y ≤ ε + ξ
>
> i i i
2 >
> ^
i=1 i=1
>
> yi − f ðxi Þ ≤ ε + ξi
Xn X n _ _
X
n >
>
^ ^ ^ ^ >
> _
+ αi yi −wT xi − b− ε− ξi − βi ξ − βi ξ i >
> ^
>
i=1 i=1 i=1 > , ξi ≥ 0
>
>
ξ i
>
> _
>
> ^
> αi , αi ≥ 0
>
Then, the dual problem of the original problem is >
> _
>
> ^
max_α ,α^ ,β_ ,β^ minw,b,_ξ ,^ξ . Deriving the partial derivation of >
> βi ,βi ≥ 0
>
>
i i i i i i
>
> _
_ >
>
_
^
> αi wT xi + b −yi − ε−ξi = 0
>
L with respect to variables w,b, ξi , ξi , we can obtain that >
>
>
>
<^ ^

Xn Xn αi yi −wT xi −b −ε− ξ i
∂L _
>
=w+ αi x i −
^
αi x i = 0 >
>
∂w >
> Pn _
i=1 i=1 >

> w =
^
α −α i xi
>
>
i
∂L X n _ X n >
>
i=1

αi −
^
αi = 0 >
>P
= >
>
n _
∂b i = 1 >
>
^
αi − αi = 0
i=1 >
>i =1
_ >
>
∂L _
>
> _
_= C −αi −β i = 0
>
>
_
>
> C = α + β
∂ξi >
>
i i
>
> ^
>
> C =
^
α + β
∂L ^ >
>
i i
= C −αi − βi = 0
^
>
> _
^ > ^
> αi αi = 0
∂ξi >
>
>
> _
: ^
βi βi = 0
Thus, it is deduced that
n
X _
3. Prediction function
^
w= αi −αi xi
Finally, the form of the prediction function is
i=1
n _
X
^
αi −αi = 0 f ðxÞ = wT x + b
n
i=1
_ _ X ^
_
C = αi + β i = αi −αi xi T x + b
^
i=1
^
C = αi + β n
X _

^
= αi −αi κðxi , xÞ + b
Substituting it into L i=1
In this study, the kernel function is taken as a linear Supporting Information
kernel function κ(xi, x) = xiTx. Hence, the prediction func-
Additional Supporting Information may be found in the online
P
n
^
_
tion is f ðxÞ = αi − αi xi T x + b , and the vector xi satis- version of this article at the publisher’s web-site:
i=1
_ Appendix S1: Supporting Information
^
fied to αi −αi 6¼ 0 is the support vector.

Predicting Postmortem Interval Based On Microbial

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting Postmortem Interval Based On Microbial

Uploaded by

Copyright:

Available Formats

Environmental Microbiology (2020) 00(00), 00–00 doi:10.1111/1462-2920.

Predicting postmortem interval based on microbial

Relative content percentage (%)

Fig. 5. Predicted function in the metabiotic ﬁeld.

= kwk2 + C ξi + ξ i + αi wT xi + b −yi −ε −ξi >

i=1 i=1 >

You might also like