Machine Learning and Metagenomics Enhance Surveillance of Antimicrobial Resistance in Chicken Production in China

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Machine learning and metagenomics enhance

surveillance of antimicrobial resistance in chicken


production in China
Michelle Baker
University of Nottingham https://orcid.org/0000-0002-8926-7689
Xibin Zhang
Shandong New Hope Liuhe Group Co., Ltd., and Qingdao Key Laboratory of Animal Feed Safety
Alexandre Maciel Guerra
University of Nottingham
Yinping Dong
NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk
Assessment
Wei Wang
NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk
Assessment
Yujie Hu
NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk
Assessment
David Renney
Nimrod Veterinary Products Limited
Yue Hu
University of Nottingham
Longhai Liu
Shandong Kaijia Food Co. LTD
Hui Li
Luoyang Center for Disease Control and Prevention
Zhiqin Tong
Luoyang Center for Disease Control and Prevention
Meimei Zhang
Liaoning Provincial Center for Disease Control and Prevention
Yingzhi Geng
Liaoning Provincial Center for Disease Control and Prevention
Li Zhao
Qingdao Agricultural University

Page 1/32
Zhihui Hao
China Agricultural University
Nicola Senin
University of Perugia
Junshi Chen
Peking University
Zixin Peng
NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk
Assessment
Fengqin Li
China National Center for Food Safety Risk Assessment
Tania Dottorini (  tania.dottorini@nottingham.ac.uk )
University of Nottingham

Article

Keywords:

Posted Date: January 13th, 2023

DOI: https://doi.org/10.21203/rs.3.rs-2458989/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Page 2/32
Abstract
The use of antimicrobials in livestock production is associated with the rise of antimicrobial resistance
(AMR). China is the largest consumer of antimicrobials and improving AMR surveillance methods may
help inform intervention. Here, we report the surveillance of ten large-scale chicken farms and four
connected abattoirs from three Chinese provinces, over 2.5 years. By using a bespoke data-mining
approach based on machine learning, we analysed microbiomes and resistomes from birds, carcasses
and environments. We found that a core subset of the chicken gut resistome and microbiome, featuring
clinically relevant bacteria and antibiotic resistance genes correlates with AMR profiles of Escherichia coli
colonizing the gut. This core is itself influenced by environmental temperature and humidity, contains
clinically relevant mobile ARGs shared by chickens and environments, and correlates with antimicrobial
usage. Our findings indicate a viable route to optimize AMR surveillance in livestock production.

Introduction
The reduction of extensive antibiotic usage in agriculture, particularly in low-to-middle income countries
(LMICs), has been highlighted as a key intervention to tackle the global threat of antimicrobial resistance
(AMR)1,2. The increased demand for animal protein has pushed several countries towards intensive
livestock farming3, leading to increased antimicrobial usage for prophylaxis4,5. For example,
antimicrobial use in poultry production in China, where our study has focussed, is approximately five
times the international average (measured in mg/PCU)6. High usage of antibiotics can provide selective
pressure for the emergence of antibiotic resistant bacteria (ARB) and antibiotic resistance genes (ARGs)
in livestock7,8. Surveillance of AMR in food production environments is particularly relevant in LMICs,
where contact with livestock, looser regulations on veterinary drugs and the higher influx of livestock
faecal matter and waste entering the environment (soil, water) pose increased risks for humans,
compared to higher income countries8–12. Co-targeted sampling of environment and livestock is
recommended to identify potential sources and spread13 particularly as AMR in the food chain, crops,
animals and environment can be mobilized into bacterial pathogens of humans and animals through
horizontal gene transfer14. Despite the risk of transfer, AMR surveillance in non-healthcare domains has
not yet been sufficiently addressed.

Key to understanding the role of food production systems in selecting and disseminating ARBs and ARGs
is the development of solutions for better surveillance. With the proliferation of collectable information,
research has been gradually moving towards the adoption of the latest technologies in machine learning
(ML) and big data mining to implement precision poultry farming15,16. Recently, we have developed
computational frameworks that combine ML, genome-scale metabolic models, next generation
sequencing (WGS), and phenotyping assays to investigate infection and AMR in humans, animals, food
and the environment17–19. Our methods have led to the identification of novel variants of methicillin-
resistant Staphylococcus aureus and ARGs shared between food of animal origin and humans in
China17,20. More recently, using ML on WGS data within a One Health approach focussing on one broiler

Page 3/32
farm in China and considering highly resistant pathogenic and non-pathogenic Escherichia coli, we
uncovered new ARGs and hotspots of AMR shared by animals, farmers, households and farm
workplaces19. However, surveillance approaches focusing solely on WGS of individual pathogens may
not capture the diversity of the microbiomes and resistomes within livestock production; thus, important
information involving ARGs may be missed 21.

The importance of studying the gut microbiome in connection to AMR in farming has been
acknowledged, recognising that antibiotic usage, even at low levels, alters and expands the gut resistome
in livestock22. We have recently proposed a new method to investigate AMR in livestock farming, where
the gut microbiome of livestock, workers and their households, is analysed using machine learning,
metagenomics and sensing technologies23. By analysing two consecutive chicken life cycles in one
Chinese farm, we uncovered preliminary evidence linking the AMR profile of E. coli isolates from chicken
faeces to the bird gut resistome composition23.

In this work, we report the results of a large study involving a sample collection campaign performed over
2.5 years in ten large-scale Chinese commercial poultry farms and connected abattoirs. With respect to
our previous studies, in addition to testing our surveillance approach on a much larger dataset, here we
extend our search for correlations involving AMR of E. coli in chicken faeces, to include the analysis of
microbial communities, mobile ARGs and antimicrobial usage. We report the identification of a core
subset of the microbiome and resistome of the bird gut that strongly correlates with resistance in the E.
coli, as well as with environmental variables such as temperature and humidity. This core subset includes
clinically relevant mobile ARGs. Further uncovered evidence links the use of antimicrobials to the
presence of ARGs in this core subset, that are specific to that antimicrobial, together with ARGs specific to
other antimicrobial classes.

Results
Sample collection was performed on farms located in the three Chinese provinces of Shandong, Henan
and Liaoning (Fig. 1 and Supplementary Tables 1 and 2, further information in the Methods section).
Sample collection resulted in a total of 461 viable biological samples, covering two time points in the bird
life cycle within the farm (t1 and t2) and one time point in the slaughterhouse (t3). Biological samples
consisted of bird faeces (n = 223; 116 at t1, 107 at t2), feathers (n = 36; 17 at t1, 19 at t2), barn floors (n =
23; 10 at t1, 13 at t2), carcasses (n = 94 at t3), abattoir wastewater (n = 21 at t3), abattoir processing lines
(n = 12 at t3), and outdoor soil (n = 52; 25 at t1, 27 at t2).

Microbial Communities And Resistomes Are Differentiated Across


Farm Sources And Between Farm And Abattoir
Taxonomic profiling of the metagenomic samples revealed 19 bacterial phyla, 2 archaea phyla and 3
eukaryote phyla (Supplementary Fig. 1A and Supplementary Table 3). The microbial communities formed
Page 4/32
two clusters (Supplementary Fig. 1B) (PERMANOVA, p < 0.001), clearly separating farm sources (faeces,
feathers, barn floor and outdoor soil) from abattoir sources (carcasses, processing line and wastewater).
Pairwise testing indicated that abattoir sources were not significantly separated from each other.
However, farm sources were consistently separated from abattoir sources and from each other (adjusted
p values < 0.05). Comparison of individual phyla abundances (Wilcoxon Rank-sum test) highlighted nine
phyla with differences across sources (Supplementary Fig. 2 and Supplementary Table 4). As expected,
typical soil phyla, Actinobacteria and Planctomycetes were more abundant in outdoor soil samples, and
commensal gut species (Firmicutes, Protobacteria, and Bacteroidetes) were found in high abundances in
most sample sources indicative of potentially contaminated environments. Chlamydiae, a phylum
endemic in birds, was found in high abundance in abattoir samples.

In the resistome of all samples, after rarefying to the minimum read depth, 336 ARGs were found,
representative of 14 antibiotic classes. The ARG presence/absence patterns differed in all farm and
abattoir sources (PERMANOVA, p < 0.05) except for wastewater and abattoir processing line,
Supplementary Fig. 3. The analysis of pattern differentiation for ARGs belonging to each specific class
(Wilcoxon Rank-sum test, Supplementary Fig. 4 and Supplementary Table 5) revealed that 13 of the 14
classes had significant differentiation at least in one pairwise comparison across sources, consistent
across farms. Specifically, barn floor (t1 and t2) samples carried a greater number of aminoglycosides,
amphenicol, MLSB and tetracycline genes compared to processing line, wastewater, outdoor soil and (for
t1 only) carcasses (adjusted p values < 0.05). Chicken faeces carried a greater number of aminoglycoside,
MLSB and amphenicol genes compared to outdoor soil, wastewater, processing line and carcasses
(adjusted p values < 0.05), and beta lactam genes compared to outdoor soil, wastewater and carcasses
(adjusted p values < 0.05). Additionally, chicken faeces collected at t1 carried a greater number of MDR
and fosfomycin genes compared to outdoor soil, wastewater and carcasses (adjusted p values < 0.05).

Analysis Of Mobile Args Reveals Numerous Clinically Relevant Ones,


Shared Between Birds And Environments
As gene mobility may influence ARG presence across sources, and because of the potential importance
of mobile genetic elements (MGEs) in the development of effective surveillance systems11, we considered
ARGs in proximity to MGEs. In total, 695 different MGE-ARG combinations (mobile ARGs) were found,
Supplementary Table 6, featuring 202 unique ARGs. Eighty ARGs (40%) were found in only one MGE-ARG
combination, whilst the remaining 122 (60%) were found in multiple combinations (2 to 22), Fig. 2A. Over
half (57%) of the 695 mobile ARGs were present in more than one source, Fig. 2B, with three MGE-ARG
combinations (IS 1216-poxtA, IS 15-APH(3’)-Ia and IS Cfr1-ACC(3)-IId) present in every source. Chicken
faeces had the highest number of mobile ARGs, but also the greatest variance, Fig. 2C. Feathers and barn
floor also carried many mobile ARGs, the mean number not being statistically different from faeces
(Dunns test adjusted p values > 0.05). Outdoor soil, carcasses, processing line and wastewater generally
had lower numbers of mobile ARG patterns per sample, with these numbers differing significantly (Dunns
test adjusted p values < 0.01) from faeces and feather, but not with each other. In total, 157 different
Page 5/32
MGE-ARG combinations were found in bird and environmental sources on the same farm, with some of
these appearing on multiple farms. Of these, 47 contained clinically relevant ARGs24, Fig. 2D. Notably, we
found blaNDM−5 known to be present in the IncX3 plasmid which can be disseminated among humans,
animals, food and environment25, and qnrS1 a plasmid-mediated quinolone resistance gene known to be
present in the chicken supply chain that is capable of being transferred over different bacteria26.

The antimicrobial resistance profile of E. coli residing in the gut is correlated to the resistome and
microbiome of the gut itself

We further investigated if there was a correlation between the chicken gut resistome and microbiome and
the antimicrobial resistance profiles of E. coli taken from the same samples as the metagenome data.

We cultured E. coli isolates from 170 chicken faeces samples (a subset of the samples that had been
used for metagenomics) and characterized their AMR profiles against a panel of 26 antibiotics. The
proportion of isolates resistant to each antibiotic ranged from 1–98% (Supplementary Table 7). All
isolates were resistant to at least one antibiotic, with 169 resistant to at least three.

To investigate correlations between resistance in E. coli and gut microbiome/resistome, we developed a


bespoke data mining method based on machine learning, Fig. 3. The method consists of building a ML-
powered “predictive function”, whose output is the resistance of E. coli to a specific antibiotic (true or
false) from antimicrobial susceptibility testing (AST), and whose input is the aggregation of information
from the gut microbiome (relative abundances of microbial species) and gut resistome
(presence/absence of ARGs). The predictive function is trained using experimental data (supervised
learning) and swapping different underlying ML technologies, until optimal prediction performance is
achieved. A set of the most informative features, also referred to as “predictors”, is extracted from the ML
models. The set is then refined by analysis of correlation with temperature and humidity (see later
section).

Out of the 26 antibiotics, only 17 had sufficient data (resistance and susceptibility cases) to allow proper
ML training: amikacin, amoxicillin/clavulanic acid, aztreonam, cefepime, cefoxitin, cefotaxime/clavulanic
acid, ceftazidime, ceftazidime/clavulanic acid, chloramphenicol, cefotaxime, gentamycin, kanamycin,
minocycline, nalidixic acid, streptomycin, sulphafurazole, and trimethoprim/sulfamethoxazole. For all, the
extra-tree classifier (ML technology) resulted in best prediction performance (Nemenyi test,
Supplementary Table 8 and Supplementary Fig. 5). Prediction performance indicators using extra-tree are
reported in Fig. 4A and Supplementary Fig. 6. A total of 11 predictive models (amikacin, aztreonam,
cefoxitin, chloramphenicol, cefotaxime, kanamycin, minocycline, nalidixic acid, streptomycin, and
sulphafurazole) achieved the best performance according to AUC (> 0.90).

Data mining showed the existence of a core subset of the chicken gut resistome and microbiome that
exhibited a strong predictive power on the resistance of E. coli. This core consisted of 495 features (191
microbial species and 304 ARGs), acting as strong predictors of E. coli resistance/susceptibility to 11
antibiotics (Fig. 4B, Fig. 4C and Supplementary Table 9). The 304 ARGs from the top 11 antibiotic models
Page 6/32
belonged to beta lactams (26% of the ARGs), aminoglycosides (16%), and MLSB (14%), with other
antibiotic classes accounting for less than 10% each. Of these 304 ARGs, based on the correlation of ARG
read depth with species abundance (see Methods)27, 70 were found to be present in contigs identified as
originating from E. coli with 69 of these also present in other bacteria. A further 100 ARGs (of the 304)
were present only in contigs identified as other bacterial species (i.e. did not originate from E. coli). To
further explore the relationships between core gut features and antibiotic resistances, the 495 features
and the 11 antibiotic resistances were visualised as nodes of a graph, with edges only connecting
predictors to predicted resistances (Fig. 4C). This analysis highlighted a core of 117 ARGs (68 clinically
relevant, including blaCTX−M−64, blaCTX−M−123, fosB3 and dfra5) acting as predictors of more than three
antibiotic resistances. Six ARGs (aadA16, aadA22, PER-1, AAC(6')-Ia, tet(39) and SHV-110) were found to
be predictors of eight antibiotic resistances. The same analysis revealed 29 microbial species in the gut,
acting as predictors of five antibiotic resistances (aztreonam, chloramphenicol, cefotaxime, kanamycin
and nalidixic acid). These 29 species included the bacterial genera Arcobacter, Acinetobacter and
Sphingobacterium in addition to other commensal bacteria.

Gut resistome and microbiome features recognised as predictors of E. coli resistance are also correlated
with temperature and humidity

For the top 11 antibiotic models, we developed bespoke regression models using individual gut features
as independent variables (one model per variable), and temperature or humidity as dependent variables,
to see if model fitting would highlight a correlation, Fig. 3 – phase III (see also Materials and Methods for
details). Amongst the original 495 features, 183 ARGs and 48 microbial species correlated well with
humidity, whilst 60 ARGs and 23 microbial species correlated with temperature (Supplementary Fig. 7 and
Supplementary Table 10). Correlation with humidity was on average stronger (higher R2 values in the
regression analysis, Supplementary Fig. 7). Of the 183 ARGs correlated with humidity, 20% were beta
lactams, 16% MLSB, 15% aminoglycosides and 11% tetracyclines. Of the 60 ARGs correlated to
temperature, 22% were MLSB, 20% beta lactams, 18% aminoglycosides and 12% glycopeptides. Forty-
eight ARGs correlated with both temperature and humidity, four of them clinically relevant (AAC(6')-Ib4,
ANT(2'')-Ia, NDM-1 and VEB-1). Four microbial species from the phyla Proteobacteria (Helicobacter
pullorum and Alcaligenes faecalis), Firmicutes (Bacillus cereus group), Bacteroidetes (Bacteroides
stercoris) correlated with both temperature and humidity. One species from Tenericutes (Mycoplasma
yeatsii) correlated with temperature only; while other species from Proteobacteria, Firmicutes,
Bacteroidetes and Actinobacteria correlated either with temperature or humidity (Supplementary
Table 10).

We tested for the possibility that some ARGs found correlated with temperature or humidity might belong
to microbial species also correlated to temperature and humidity. The analysis highlighted four distinct
subgraphs correlated with humidity (Fig. 5A) and one correlated with temperature (Fig. 5B). Notably, one
subgraph correlated with humidity contained Klebsiella pneumoniae and 5 related ARGs (KpnE, KpnF,
KpnG, OmpK37 and acrA). The subgraph containing A. faecalis and ARGs ErmY, FosD, dfrA16, vgaC and
vgaE, was found in both analyses (i.e., correlated with both temperature and humidity).
Page 7/32
We then investigated if the gut ARG features identified as predictors of resistance in E. coli, and further
identified as correlated to humidity or temperature, were in close proximity to MGEs. Nine ARGs were
found located in close proximity to MGEs (MLSB: lsaB, mphF, cfrC and ermX; beta-lactams: NDM-1, PER-
1, DHA-1; amphenicol: catB2 and trimethoprim: dfrA16). Four of the nine ARGs were found associated to
only one MGE (lsaB with IS Sau9; cfrC with IS Ec9; DHA-1 with IS 15, and drfA16 with IS 6100), whilst the
other five were found associated from 2–6 different MGEs. All the MGE-ARG pairs were investigated for
conserved structure across farms or sources. For example, the clinically important NDM-1 was found in
close proximity of IS 15 in four samples (three chicken faeces from LN1 and one barn floor sample from
LN3 (Supplementary Fig. 8)). In 19 samples from chicken faeces and feather sample from LN1, LN3, SD2
and SD4, NDM-1 was found in proximity of MGE IS Aba125, and located next to another ARG ble, which is
a known association for plasmid-borne NDM-1 in Enterobacteriae species from Asian regions28. Despite
the having found the same NDM-1-IS Aba125 pattern in several farms (LN1, LN3, SD2 and SD4) there was
no evidence of transmission between farms, Fig. 6. Instead, evolutionary analysis of the contigs
suggested recent branching of isolates within individual farms (most common recent ancestor (MCRA)
less than two years on most branches) and much earlier MRCAs between different farms (greater than 20
years), indicating the likelihood of this mobile ARG widely circulating in livestock throughout China.

Gut microbiome and resistome features, recognised as predictors of E. coli resistance, are correlated with
drug use

We investigated if the core chicken gut microbiome and resistome, previously identified as predictors of
resistance in E. coli, may in turn be influenced by antibiotics usage (Supplementary Table 11). We found
statistically significant differences in the relative abundance of ARGs per antibiotic class (Supplementary
Fig. 9), presence/absence of ARGs (Supplementary Fig. 10) and relative abundances of microbial species
(Supplementary Fig. 11), between farms using and not using antibiotics (Supplementary Table 12). On
the farms that received tetracycline antibiotics, the tetracycline class and tetracycline ARGs (tet(39),
tet(B), tet(G), tet(Y), tetS and tetX) were found be significantly increased (adjusted p values < 0.05). In
addition, in these farms there was also a significantly increased presence of genes from the classes:
aminoglycoside, beta lactam, MLSB, MDR, phenicol, sulfonamide, trimethoprim, fosfomycin, glycopeptide
and nucleoside (adjusted p values < 0.05). All these, except fosfomycin, glycopeptide and nucleoside, had
a greater than expected co-presence on contigs with tetracycline genes (Chi-square test, Holm correction
adjusted p values < 0.0001). Similarly, on farms that received lincosamide antibiotics there was an
increased presence of MLSB genes. There was also an increased presence of genes in aminoglycoside,
beta lactam, tetracycline rifamycin, fosfomycin, phenicol and glycopeptide classes. All except
glycopeptide had a greater than expected co-presence on contigs with MLSB genes (Chi square test, Holm
correction adjusted p values < 0.0001).

All the ARGs for which presence/absence was found significantly correlated to antibiotic usage (indicated
in Supplementary Fig. 10) were also found correlated with humidity. Additionally, seven of these genes
(ANT(2’’)-Ia, catB2, dfrA16, DHA-1, ErmX, FosC2 and spd) were also correlated with temperature as well as
humidity. Of the 16 microbial species that were found to have a statistically significant difference in
Page 8/32
relative abundances in relation to antibiotic usage, seven were correlated to changes in humidity
(Baciillus cereus group, Jeotgalibaca sp PTS2502, Klebsiella pneumoniae, Klebsiella variicola,
Lysinibacillus sp BF 4, Proteus hauseri and Proteus mirabilis) whilst three were correlated to changes in
temperature (Alistipes sp An66, Baciillus cereus group and Enorma massiliensis).

Discussion
AMR is a major global challenge, and efforts to tackle this problem typically heavily rely on large scale
surveillance programs1. Recent studies have found that conventional surveillance lacks the ability to
accurately assess bacterial and resistome diversity both within and between farms29. There is a growing
interest within the farming industry in digitalisation and the adoption of precision farming technologies
that are hoped to improve the identification and understanding of disease outbreaks leading to improved
animal welfare and food safety30. Metagenomic analysis has been suggested as having the potential to
be used in this way11 and there is a growing body of literature using ML and statistical approaches to
dissect genomic data to draw out correlations between bacterial genomes and AMR23,31−33. This study
presents large-scale metagenomic sampling and E. coli isolation, undertaking a comprehensive analysis
using statistical and ML methods to draw out complex correlations showing AMR trends and patterns.
Our results suggest a series of main elements of discussion.

E. coli has an established role as reference indicator of AMR23. In our study 68 clinically relevant ARGs
correlated with resistance to multiple antibiotics, and some of these antibiotics had no previously known
association with these ARGs. In particular, the six ARGs (aadA16, aadA22, PER-1, AAC(6')-Ia, tet(39) and
SHV-110) that we found associated to resistance to the highest number of antibiotics, had been
previously found in earlier studies on poultry in China34–36, confirming our method. However, we found a
cluster of further gut bacteria that correlated well with E. coli resistance to five different antibiotics. These
included Arcobacter (an emerging waterborne and foodborne zoonotic pathogen, responsible for
gastroenteritis in humans37), Acinetobacter (commensal in the poultry gut, but capable of causing
extraintestinal diseases in both humans and poultry38) and Sphingobacterium (clinically relevant in
humans and animals39). This result suggests that, in agreement with previous studies11,14,40−44, focusing
exclusively on E. coli within the farm for surveillance purposes may not be as effective as monitoring a
larger number of pathogens.

Our study has shown that the usage of tetracyclines and lincosamides was positively correlated to the
presence of ARGs from a wide range of classes, beyond those specific to the selected antibiotics. This
appears consistent with previous findings45,46, but in contrast with a recent study from the USA47. It is
possible that in our farms co-localisation of AMR genes is playing an important role in AMR selection. Co-
localisation in food animals has previously been observed and recognised as a food safety concern in
China48 as well as elsewhere49,50.

Page 9/32
Our results indicate that core features of the gut microbiome and resistome, found correlated with
resistance in E. coli, are also correlated with temperature and humidity. This confirms and expands results
of previous studies23,51−54. Of note, the relative abundance of A. faecalis and the ARGs ErmY, FosD,
dfrA16, vgaC and vgaE originating from this species, were found to be correlated to changes in both
temperature and humidity. Greater abundance of A. faecalis and more severe clinical symptoms in higher
humidity conditions have been observed previously in a case controlled study of turkeys kept at different
humidity levels and inoculated with A. faecalis55. This bacterium is commonly found in birds56 and
would not typically be monitored by conventional surveillance. However, it is considered an emerging
pathogen, has been associated to infections in humans, and is considered difficult to treat due to its
capacity to become extensively drug resistant57. Similarly, the important opportunistic pathogen
Klebsiella pneumoniae and 5 ARGs (KpnE, KpnF, KpnG, OmpK37 and acrA) originating from this
bacterium and important for K. pneumoniae resistance58, were found correlated to changes in humidity.
Again this bacterium, which is able to be transmitted via air borne contamination, has been previously
found to have increased survival in indoor high-humidity conditions, highlighting the importance studying
this bacterium in indoors environments59.

Nine mobile ARGs in the gut resistome were found to correlate with E. coli resistance, and with
temperature and humidity. The gene cfrC, conferring resistance to linezolid and phenicol antibiotics, was
associated only to the transposase IS Ec9. This is a new finding, as typically IS Ec9 is found associated
with CTX-M genes60,61 (as we also found). The association of drfA16 with the transposase IS 6100 has
previously been reported only in a single study with an association with Corynebacterium diphtheriae, the
causative agent of cutaneous diphtheria62. These novel associations potentially indicate environment-
specific evolution of these MGEs, as has been hinted at in previous work on pig farms which showed the
importance of MGEs to AMR to differ by season63.

Even though our analysis relied on a large set of samples from many heterogeneous sources and
geographical and seasonal differences, our scope was limited to E. coli, and did not consider human
samples. It would be interesting to extend our analysis to other indicator species such as Enterococcus64.
It would also be informative to understand how the spatial and temporal variations in the
farm/slaughterhouse microbiomes and resistomes are mirrored in human faecal samples as our previous
work and that of others has found such variations23,65 but whether these observations would then be
generalizable and globally true is currently unknown.

Despite the increasing availability of low-cost precision farming technology and metgenomics11,66, there
is a continued need for innovation and methodological development to further enable the advancement
of surveillance solutions capable of monitoring AMR dynamics11,14. Drug resistance arises from the
complex interaction across ARBs, microbial communities, geographical niches and environments,
evolutionary forces, climate and human practices. The biggest opportunities for impact from novel
methodological surveillance approaches will come through consideration of all relevant and
interconnected AMR data sets in a 360-degree approach.
Page 10/32
Methods
Ethics statement
This study was performed in accordance with protocols approved by the Ethics Committee of the State
Key Laboratory of the China National Centre for Food Safety Risk Assessment (CFSA). The ethical
approval number of CFSA is 2018018. Ethical approval was also obtained from the Research Ethic
Committee in the School of Veterinary Medicine and Science at the University of Nottingham, application
ID: 2340 180613.

Collection Of Biological Samples And Environmental Sensor Data


For the study we selected ten large-scale commercial poultry farms belonging to three different provinces
in China (Shandong, Henan and Liaoning), covering an area of 472,500 km2, each farm feeding into one
of four regional abattoirs (two in Henan, one in Liaoning and one in Shandong). Each farm features
multiple barns, each barn containing between 12,000 and 32,800 birds, leading to a total production
capacity of 110,730 to 380,000 birds per breeding cycle (depending on farm). Broiler production is based
on self-breeding with broilers bred on the farm and moved to the barns in same-aged batches. Of the ten
selected farms, three (two in Liaoning and one in Shandong) use net housing systems, whilst the other
six use cage housing systems. During collection, the number of birds per barn did not significantly differ
between the two housing systems (t-test, p value = 0.07).

Biological samples consisted of pooled faeces and feathers (not necessarily from the same animals)
from live birds in the barns, collected at mid-life (t1: week 3) and at the end of life (t2: week 6) of the
animals, as well as barn floor samples collected at the same time points. In the abattoirs, samples were
collected on slaughtering day (t3: 1–5 days after week 6) from carcasses, meat processing surfaces
(referred to as: processing line) and wastewater. Soil samples were taken from outside areas surrounding
the farms at t1 and t2. Sampling followed the same pooled birds over one breeding cycle, except for a
farm in the Shandong province, which was sampled over two cycles, to perform a pilot study in order to
fine-tune the collection campaign and data analysis protocols23.

Details on the collection methods are as follows. For faecal samples, each sample consisted of
approximately 10g fresh sample of mixed chicken faeces (2–3 chickens), collected from the bottom of
the chicken cage/net using a sterilized spoon. Feather samples were collected from the birds and
swabbed using cotton tipped swabs. Pooled carcass samples were collected in the abattoirs using a
sponge swab (SS100NB, Hygiena International, Watford, UK) on the surface of the carcass. In addition,
samples from four types of environmental sources (barn floor, soil outside the barn, wastewater and
processing line in the abattoir) were also collected. Barn floor samples were taken using a sterilized
spoon. Wastewater samples of no less that 20ml were collected from the water pipe or by using pipettes.
Abattoir processing line samples were collected from multiple surfaces, e.g., the cutting table and transfer

Page 11/32
belt of the cutting and deboning house. Soil samples consisted of about 10g soil, collected outdoors at
depth of 1-3cm, 5m from the external barn walls, to ensure sufficient separation from areas of human
use. All biological samples were collected using aseptic techniques, and then stored in secure containers
at 4℃ during transportation to the laboratory and extracted within 24h.

All the farms involved in this study are equipped with heating / air conditioning systems. Environmental
sensor data (temperature and humidity) were collected at 5 minutes intervals, using automated sensors
and data-loggers available in most farms (HN1, HN2, HN3, SD2, SD3, SD4). Three farms (SD1, LN2 and
LN3) were unequipped with automated solutions, and manual measurements were performed using
SMART SENSOR AS837 temperature/ humidity devices either daily or every six hours. Farm LN1 had
technical issues with the sensor and did not acquire any measurements. In all cases, temperature and
humidity data was averaged over three measurements taken at different locations within the barn.

Antibiotic susceptibility testing of E. coli isolates

For each sample, E. coli strains were cultured as indicator organisms. 1g sample of faeces and outdoor
soil was vortexed with 9 mL of sterile buffered peptone water tube (BPW; Luqiao Inc., Beijing, China) for 1
min. Broiler carcass sponge samples were homogenised with 10 mL BPW for 1 min in a stomacher bag.
Approximately 1 mL was added to 9 mL E. coli (EC) broth (Luqiao Inc.) and incubated at 37℃ for 16–20
h. A loopful of these solutions was then streaked onto an eosin-methylene blue (EMB) agar and
MacConkey (MAC) Agar (Luqiao Inc.) and incubated at 37°C for 18–24 h. Typical E. coli colonies were
counted and subsequently characterized by Bruker MALDI Biotyper (Germany).

The antimicrobial susceptibility testing was carried out on the cultured E. coli isolates. Antimicrobial
susceptibility to a panel of agents was determined by broth microdilution and interpreted according to the
criteria based on the Clinical & Laboratory Standards Institute (CLSI) interpretive criteria (CLSI 2009). The
minimum inhibitory concentrations (MIC) of 28 antimicrobial compounds were measured for the E. coli
isolates: ampicillin (AMP), ampicillin/sulbactam (AMS), tetracycline (TET), chloramphenicol (CHL),
trimethoprim/sulfamethoxazole (SXT), cephazolin (CFZ), cefotaxime (CTX), ceftazidime (CAZ), cefoxitin
(CFX), gentamicin (GEN), imipenem (IMI), nalidixic acid (NAL), sulfisoxazole (SUL), ciprofloxacin (CIP),
amoxicillin/clavulanic acid (AMC), cefotaxime/clavulanic acid (CTX-C), ceftazidime/clavulanic acid
(CAZ-C), polymyxin E (CT), polymyxin B (PB), minocycline (MIN), amikacin (AMI), aztreonam (AZM),
cefepime (FEP), meropenem (MEM), levofloxacin (LEV), doxycycline (DOX), kanamycin (KAN),
streptomycin (STR). The resistance/susceptibility profiles for each isolate were calculated (summarised
in Supplementary Table 1). E. coli ATCC™25922 was used as a control bacterium for these experiments.

DNA library construction and sequencing


For faeces, barn floor and outdoor soil samples, DNA extraction was performed using a Magnetic bead
genomic DNA extraction kit (DOP336-T3). For carcass samples, the CTAB (cetyl trimethylammonium
bromide) method67 was used. Samples with DNA contents above 1µg were used to construct the DNA

Page 12/32
library. The DNA concentration was measured using Qubit® dsDNA Assay Kit in Qubit® 2.0 Flurometer
(LifeTechnologies, CA, USA) and the integrity was measured using 1% agarose gel electrophoresis. A total
amount of 1 µg DNA per sample was used as input material for the DNA sample preparations.
Sequencing libraries were generated using NEBNext®Ultra™ DNA Library Prep Kit for Illumina (NEB, USA).
The DNA sample was fragmented to 350 bp, then DNA fragments were end-polished, A-tailed, and ligated
with the full-length adaptor for Illumina sequencing with further PCR. Finally, PCR products were purified
(AMPureXPsystem) and libraries were analysed for size distribution by using an Agilent2100 Bioanalyzer
and quantified using real-time PCR. After cluster generation, the library preparations were sequenced on
Illumina Novaseq 6000 platform and 150 bp paired-end reads were produced.

Bioinformatics Analysis
The raw sequence reads, obtained from the Illumina HiSeq sequencing platform, were pre-processed and
filtered using Readfq (V8, https://github.com/cjfields/readfq) to acquire high quality data for subsequent
analysis. Host DNA was filtered out using Bowtie2 v2.3.4.168 and filtered out using SAMtools v1.969
(reference genome accessions: GCF_000002315.6). To construct the microbiome of the samples,
assembly of metagenome sequencing data was performed separately for the different sample sources
(chicken faeces, chicken feather, chicken carcass, barn floor, outdoor soil, wastewater and processing
line) using binning and dereplication pipelines as described in previous work23,70. MEGAHIT software
v1.1.271 was used to assemble the sequences. Single sample assemblies for all samples were generated
with MEGAHIT default parameters. Co-assemblies were generated for each sample source group (chicken
faeces, chicken feather, chicken carcass, barn floor, outdoor soil, wastewater and processing line), each
with MEGAHIT setting parameters “--continue --kmin-1pass --min-contig-len 1000” as previously used on
co-assemblies70. Filtered contigs (> 2000 bp) were mapped to single assemblies and co-assemblies using
BWA MEM72 and SAMtools69 to produce the BAM files. METABAT273 was used obtain the depth of
coverage. Taxonomic classification and composition (relative species abundances) of the metagenome
reads was performed in MetaPhlAn 3.074 with Bowtie268 using default settings, --bowtie2out –input_type
fastq. NMDS of the relative species abundance was performed in R using the vegan75 package with Bray-
Curtis dissimilarity. Analysis of variance was done in R using PERMANOVA from the vegan package75,
with pairwise testing using the pairwise adonis function76 with holm correction for multiple comparisons.
Relative abundances were visually analysed via combining violin plots and categorical scatter plots, and
differences were assessed via Wilcoxon Rank-sum test, with Holm correction - adjusted p value 0.05.

As sequencing depth can affect the observed diversity in genomic sequencing, rarefaction is widely used
to normalise samples before analysis across different sample types77. However, the use of rarefaction is
controversial as the subsampling leads to the loss of information available in the non-rarefied sample78.
Hence in this study we have used rarefied data only where necessary (to compare different sample types)
and used non-rarefied data where only a single sample type is being considered. Host removed reads

Page 13/32
were rarefied using the minimum sample depth using seqtk (https://github.com/lh3/seqtk), with the
random seed fixed for each pair of reads.

Analysis Of Resistome And Mges


Assembled genomes were searched for sequence similarity to annotated antibiotic resistance genes
(ARGs) present in the CARD database58 using BLASTn79 with an identity threshold of 80%, and coverage
threshold of 70%, as previously done by us23. NMDS analysis was performed on the resulting gene
presence/absence matrix in R vegan75 package using Bray-Curtis dissimilarity. Comparisons were made
using: i) the number of ARGs present per sample; ii) the actual presence/absence pattern of the individual
ARGs, expressed as a string of binary entries (encoding: 1 = presence, 0 = absence); and iii) relative ARG
abundance per antibiotic class according to CARD (the number of ARGs present in the sample divided by
the total number of ARGs in that class). These three approaches were visually analysed via combining
violin plots and categorical scatter plots, and differences were assessed via Wilcoxon Rank-sum test, with
Holm correction - adjusted p value 0.05.

To identify the source bacteria from which the ARGs originated as previously done by others27, reads
from each metagenome sample were mapped to their single assemblies. The average depths were
assigned to the ARG carrying contigs and ARGs. The coverage of ARGs were then used to correlate with
species abundances by spearman correlation tests. The ARG-species pairs were considered significantly
correlated if the p value < 0.05 and coefficient > = 0.8.

To look for the presence of shared mobile ARG content across different sources, ARGs carried by both
environment and chicken were considered. Filtered contigs (> 500bp) in each assembly were searched for
ARGs and MGEs using a BLASTn search against the CARD and ISfinder databases using a high identity
(90%) and coverage (90%) to prevent false positives and variant uncertainty80. The distance between the
ARG and MGE was calculated based on the position of the ARG and MGE in the contig23. ARG carrying
contigs with a distance between ARG and MGE of greater than 5kb were discarded65,81−83, with the
remaining contigs classed as mobile ARGs. Contigs were annotated using Prokka 1.14.684. Mobile ARG
patterns found in only a single sample were discounted in the analysis. ARGs were further classified as
clinically important if the ARG was included in the Risk I (clinically important ARGs dataset) according to
Zhang et al.24. The structure for the mobile ARG patterns (the MGE type, ARG carried, MGE carried,
sample source, farm, number of samples carrying mobile ARG and distance) is summarised in
Supplementary Table 8. For selected mobile ARGs of interest, the gene structure was visualised using
EasyFig85.

Evolutionary phylogeny was reconstructed for contigs carrying selected mobile ARGs using
BEASTv1.10.486. All combinations of three clock models (strict, uncorrelated log normal, and uncorrelated
exponential) and three tree priors (constant coalescent, logistic growth and Bayesian skyline) were tested
using stepping-stone sampling on the contigs to identify the best model. The best model was found to be
Page 14/32
a random uncorrelated lognormal clock model, with a Bayesian skyline growth model. The GTR-gamma
nucleotide substitution model was used, as selected by a maximum likelihood tree analysis in IQ-tree2
using automated model selection87. The analysis was run for 3 independent chains until the effective
sample size (ESS), that is, the effective number of independent draws from the posterior distribution, for
all parameters was greater than 200 per chain. This entailed each chain running for 100 million steps.
Convergence was assessed in Tracer v1.7.188, and chains were subsequently combined using
LogCombiner v1.10.42. The maximum clade credibility tree was selected using TreeAnnotator v1.10.4 and
then visualized in iTOL v589.

Investigation of correlations between faecal metagenomic


features, antibacterial resistance, and temperature/humidity
E. coli strains were taken from the same samples as the chicken gut metagenome data and cultured and
used as an indicator species for AMR90 for each chicken faeces sample (191 samples). Initially 191 E.
coli isolates from chicken faeces samples were collected from the 10 farms. However, the 21 samples
from Liaoning 1 were discarded for this analysisdue to technical issues with the sensors when collecting
the temperature and humidity variables.

The antibiotic susceptibility/resistance profiles of the E. coli strains were evaluated against a panel of 26
antibiotics (Table 1), using broth microdilution and interpreted according to the criteria based on the
Clinical & Laboratory Standards Institute (CLSI 2021) interpretive criteria. The overall data analysis
pipeline (see Fig. 3) consisted of three phases:

Phase I – Metagenomic features preselection: For each antibiotic, isolation of a first set of faecal
metagenome features (i.e. presence/absence of ARGs and relative abundances of microbial species)
showing correlation with the resistance/susceptibility profiles of E. coli based on a chi-square test;
Phase II - Assessment of feature predicting-power through the development of ML-powered predictive
functions: development of ML-based predictive functions of resistance/susceptibility (one predictive
function per antibiotic) that operate from the preselected features (see below for more details),
supervised training with available samples, and then inspection of the best-fit state of each
predictive function to retrieve the predictive influence of each feature, i.e., the relative weight of the
feature in driving the prediction result;
Phase III - Assessment of feature dependency on temperature/humidity through the development of
ML-powered regressors: development of ML-based regressors to identify correlations between the set
of faecal metagenome features identified in phase II, and temperature/humidity conditions.

The three phases are illustrated in detail in the following.

Phase I – Metagenomic Feature Preselection

Page 15/32
An initial set of features was considered for each of the 26 antibiotics, comprised of all data on
presence/absence of ARGs and the abundances of microbial species in the faecal metagenome. The
following steps were applied to process and reduce such sets, using the Python package Scikit-learn91:

1. abundances were turned into relative abundances (0–1 interval) using MinMax normalization;
2. for each specific antibiotic, unbalances in sample size between resistance and susceptibility
observations were compensated with synthetically generated data, using the synthetic minority over-
sampling technique (SMOTE)92 adopting 5-nearest neighbours as the default parameter;
3. features (ARGs presence/absence, relative abundances of species) with a variance equal to 0 (i.e.,
features that had the same value in all the samples) were removed as redundant (uncapable to act
as effective predictors);
4. features that did not show strong association with the prediction result (resistance/susceptibility
profile), according to a chi-square test were removed (all the features with a p value higher than 0.01
were removed). No multiple-comparison correction was used as we were looking to assess each
feature in its own right 93;
5. the remaining set of features was subjected to visual inspection via a graph representation designed
to create spatial clusters that highlight correlation. The analysis was performed using the
NetworkX94 library in python. In the resulting graph, nodes representing features (ARGs
presence/absence or relative abundance of species) are connected to nodes representing
resistance/susceptibility to a specific antibiotic, if the existence of correlation had been
demonstrated by the Chi-square test (see previous step). The nodes were spatially arranged using the
Kamada-Kawai path-length cost function95.

Phase II - Assessment of feature predicting-power through the development of ML-powered predictive


functions.

Predictive functions based on multiple underlying ML technologies were developed and tested, each
trained to predict resistance/ susceptibility to a specific antibiotic, using the features preselected in Phase
I as input of supervised learning. For each one of the 26 antibiotics tested, a predictive function was
trained and validated. Upon successful training and validation, the inspection of the best-fit state of each
predictive function allowed to retrieve the quantitative influence of each feature (i.e., relative weight) in
relation to predicting resistance/susceptibility to each antibiotic.

The following ML technologies were tested to implement the predictive functions: logistic regression (LR),
linear support vector machine (L-SVM), radial basis function support vector machine (RBF-SVM), extra-
tree classifier, random forest, adaboost and xgboost; all implemented using the Python package Scikit-
learn91. Nested Cross-validation (NCV)96 was used to assess the performance and select optimal hyper-
parameters for each technology. NCV is an iterative procedure where different configurations of the
predictive function (i.e., different hyperparameters driving the selected technology) are repeatedly tested
for performance whilst reshuffling the training and testing sets. NCV consists of an outer loop dedicated
to randomly reallocating observations into new training and testing sets, and an inner loop where
Page 16/32
different configurations (sets of hyperparameters) for the predictive function are tested with the current
training and testing set. In our analysis we ran an NCV with a 5-fold outer loop (five reshuffles of the
training and testing sets) and a 3-fold inner loop (three reshuffles of the training set) for each different
ML-technology. Prediction performance was measured via the receiver operating characteristic area under
the curve (ROC-AUC, referred to as simply AUC in the following), accuracy, sensitivity, specificity and
precision, all computed at each iteration of the outer loop97. Thirty iterations of the NCV assessments
were completed for each ML technology. Technologies were then compared running an F-test on the
mean quantitative results for each using the AUC metric. A minimum of twelve samples in the minority
class were required for the classification for SMOTE and NCV. Nine antibiotics (ampicillin,
ampicillin/sulbactam, cefazolin, ciprofloxacin, doxycycline, imipenem, levofloxacin, meropenem, and
tetracycline) did not have enough samples in one class to allow cross-validation and SMOTE and so were
not taken further. To avoid bias in the analysis related to choosing a specific ML technology we
compared the 7 ML architectures. Prediction performance was measured using 30 iterations of NCV, with
the final performance score defined as the mean of all runs. To verify which predictive function performed
better out of the 7 ML methods a Nemenyi test was used. The extra-tree predictive functions ranked best
according to all studied performance indicators apart from sensitivity (where all the predictive functions
were considered statistically equivalent) and were finally selected to produce the correlation results. As
the extra-tree method had been selected to power the final predictive functions, the Gini importance was
used to extract the strongest predictors from the final, trained models.

Phase Iii - Assessment Of Feature Dependency On


Temperature/humidity
The last phase of the analysis consisted of the development of regression models to identify correlations
between the set of faecal metagenome features identified in Phase II (predictors), and
temperature/humidity conditions. Only the predictors extracted from ML-powered models with AUC > 0.9
where considered.

A separate regression model was created to represent the relationship of each predictor (considered as
the input/explanatory variable) and either temperature or humidity (considered as the dependent
variable). The predictor was treated either as binary if related to presence/absence of an ARG, or as
continuous if related to a relative abundance. The temperature and humidity values were collected within
each farm and averaged from the 7 days before the two time points t1 and t2.

Each regression model was developed using linear least-square fitting (using the python package
Scipy98) using the coefficient of determination (r2) to assess goodness of fit. Metagenome features were
considered to be significantly correlated with temperature or humidity if the slope of the regression line
statistically differed from 0 (p value < 0.05 using a Wald Test with t-distribution of the test statistic). We
looked for correlations between the ARG read depth and species read depth, which would indicate
likelihood of ARGs originating from a particular species, as proposed in Tong et al27. An undirected graph

Page 17/32
was created using NetworkX94 to visualize the interconnected ARGs and species selected by the
regression framework for humidity and temperature.

Analysis Of Antibiotic Usage Bias


The observed correlations between the metagenomic data in chicken faeces and resistance profiles
observed in E. coli may be influenced by the different antibiotic protocols each farm adopted
(Supplementary Table 13). To analyse if the differences in antibiotic treatment in each farm led to bias in
the selected metagenomic features, we calculated the relative abundances of ARGs expressed by first
grouping ARGs by relationship to each specific antibiotic, and then by computing ratios of ARGs present
in the sample, divided by the total number of ARGs for each antibiotic and calculated the relative
abundance of the microbial species. For these three cases we used a Wilcoxon rank-sum test to verify if
there was a difference between the samples from the farms that received an antibiotic against the
samples that did not receive.

Data availability
The metagenomic sequencing data supporting the conclusions of this article are available in the NCBI
database under Bioproject accession numbers PRJNA678871 (for Shandong 1_1 and 1_2) and
PRJNA841806 (for all other farms) available on: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA678871
and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA841806.
Code availability
The code is available on Github: https://github.com/tan0101/Commercial_MGS2023

Declarations
Acknowledgements

This work was supported by the InnovateUK grant [104986], FARMWATCH: Fight AbR with Machine
learning and a Wide Array of sensing TeCHnologies and Ministry of Science and Technology of P. R.
China under Grant Key Project of International Scientific and Technological Innovation Cooperation
Between Governments (number 2018YFE0101500).

The authors gratefully acknowledge the support received from the University of Nottingham Research
Beacon of Excellence: Future Food.

Author Contributions

Page 18/32
JC, FL, ZP, XZ, LL and TD designed and supervised the study; ZP, XZ, LL, FL, JC and TD planned the
methodology; MB, AMG, NS and TD wrote the draft; ZP, JC, FL, MB, AMG, NS and TD edited & reviewed the
draft and provided critical comments; ZP, WW, YD, Yujie H, HL, ZT, MZ, YG, LZ, ZH and XZ carried out the
experiments and collected the animal and environmental samples; AMG, MB and Yue H and performed
the data analysis and the visualization of the analysed data with critical comments from NS and TD; ZP,
DR and TD acquired the funding. The authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no competing interests.

References
1. O'Neill, J. Tackling drug-resistant infections globally: final report and recommendations. The Review
on Antimicrobial Resistance (2016).
2. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC
Evol. Biol. 7, 214, doi:https://doi.org/10.1186/1471-2148-7-214 (2007).
3. Bruinsma, J. World agriculture: towards 2015/2030: an FAO perspective. (Earthscan, 2003).
4. Gilbert, W., Thomas, L. F., Coyne, L. & Rushton, J. Review: Mitigating the risks posed by intensification
in livestock production: the examples of antimicrobial resistance and zoonoses. Animal 15, 100123,
doi:https://doi.org/10.1016/j.animal.2020.100123 (2021).
5. FAO. Status report on antimicrobial resistance. Rome: Food and Agriculture Organization of the
United Nations (2015).
6. Wu, Z. Antibiotic use and antibiotic resistance in food-producing animals in China. OECD Food,
Agriculture and Fisheries Papers No. 134, doi:https://doi.org/10.1787/4adba8c1-en (2019).
7. Ayukekbong, J. A., Ntemgwa, M. & Atabe, A. N. The threat of antimicrobial resistance in developing
countries: causes and control strategies. Antimicrob. Resist. Infect. Control 6, 1–8,
doi:https://doi.org/10.1186/s13756-017-0208-x (2017).
8. Van Boeckel, T. P. et al. Global trends in antimicrobial resistance in animals in low- and middle-
income countries. Science 365, eaaw1944, doi:https://doi.org/10.1126/science.aaw1944 (2019).
9. Graham, D. W. et al. Complexities in understanding antimicrobial resistance across domesticated
animal, human, and environmental systems. Ann. N. Y. Acad. Sci. 1441, 17–30,
doi:https://doi.org/10.1111/nyas.14036 (2019).
10. Robinson, T. P. et al. Antibiotic resistance is the quintessential One Health issue. Trans. R. Soc. Trop.
Med. Hyg. 110, 377–380, doi:https://doi.org/10.1093/trstmh/trw048 (2016).
11. Ikhimiukor, O. O., Odih, E. E., Donado-Godoy, P. & Okeke, I. N. A bottom-up view of antimicrobial
resistance transmission in developing countries. Nat. Microbiol. 7, 757–765,

Page 19/32
doi:https://doi.org/10.1038/s41564-022-01124-w (2022).
12. Donado-Godoy, P. et al. Prevalence, risk factors, and antimicrobial resistance profiles of Salmonella
from commercial broiler farms in two important poultry-producing regions of Colombia. J. Food Prot.
75, 874–883, doi:https://doi.org/10.4315/0362-028x.jfp-11-458 (2012).
13. Humboldt-Dachroeden, S. & Mantovani, A. Assessing Environmental Factors within the One Health
Approach. Medicina (Kaunas) 57, doi:https://doi.org/10.3390/medicina57030240 (2021).
14. Ko, K. K. K., Chng, K. R. & Nagarajan, N. Metagenomics-enabled microbial surveillance. Nat.
Microbiol. 7, 486–496, doi:https://doi.org/10.1038/s41564-022-01089-w (2022).
15. Astill, J., Dara, R. A., Fraser, E. D. G. & Sharif, S. Detecting and predicting emerging disease in poultry
with the implementation of new technologies and big data: A focus on avian influenza virus. Front.
Vet. Sci. 5, doi:https://doi.org/10.3389/fvets.2018.00263 (2018).
16. Ahmed, G. et al. An approach towards IoT-based predictive service for early detection of diseases in
poultry chickens. Sustainability 13, 13396, doi:https://doi.org/10.3390/su132313396 (2021).
17. Wang, W. et al. Whole-genome sequencing and machine learning analysis of Staphylococcus aureus
from multiple heterogeneous sources in China reveals common genetic traits of antimicrobial
resistance. mSystems 6, e01185-01120, doi: https://doi.org/10.1128/mSystems.01185-20 (2021).
18. Pearcy, N. et al. Genome-scale metabolic models and machine Learning reveal genetic determinants
of antibiotic resistance in Escherichia coli and unravel the underlying metabolic adaptation
mechanisms. mSystems 6, e00913-00920, doi: https://doi.org/10.1128/mSystems.00913-20 (2021).
19. Peng, Z. et al. Whole-genome sequencing and gene sharing network analysis powered by machine
learning identifies antibiotic resistance sharing between animals, humans and environment in
livestock farming. PLoS Comput. Biol. 18, e1010018, doi:
https://doi.org/10.1371/journal.pcbi.1010018 (2022).
20. Wang, W. et al. Novel SCCmec type XV (7A) and two pseudo-SCCmec variants in foodborne MRSA in
China. J. Antimicrob. Chemother., doi:https://doi.org/10.1093/jac/dkab500 (2022).
21. Hendriksen, R. S. et al. Using genomics to track global antimicrobial resistance. Public Health Front.
7, doi:https://doi.org/10.3389/fpubh.2019.00242 (2019).
22. Looft, T. et al. In-feed antibiotic effects on the swine intestinal microbiome. Proc. Natl. Acad. Sci. U. S.
A. 109, 1691–1696, doi:https://doi.org/10.1073/pnas.1120238109 (2012).
23. Maciel-Guerra, A. et al. Dissecting microbial communities and resistomes for interconnected humans,
soil, and livestock. The ISME Journal, doi:https://doi.org/10.1038/s41396-022-01315-7 (2022).
24. Zhang, A.-N. et al. An omics-based framework for assessing the health risk of antimicrobial
resistance genes. Nat. Commun. 12, 4765, doi:https://doi.org/10.1038/s41467-021-25096-3 (2021).
25. Tang, B. et al. Characterization of an NDM-5 carbapenemase-producing Escherichia coli ST156
isolate from a poultry farm in Zhejiang, China. BMC Microbiol. 19, 82,
doi:https://doi.org/10.1186/s12866-019-1454-2 (2019).

Page 20/32
26. Cui, M. et al. Prevalence and Characterization of Fluoroquinolone Resistant Salmonella Isolated
From an Integrated Broiler Chicken Supply Chain. Front. Microbiol. 10,
doi:https://doi.org/10.3389/fmicb.2019.01865 (2019).
27. Tong, C. et al. Swine manure facilitates the spread of antibiotic resistome including tigecycline-
resistant tet(X) variants to farm workers and receiving environment. Sci. Total Environ. 808, 152157,
doi:https://doi.org/10.1016/j.scitotenv.2021.152157 (2022).
28. Dortet, L., Nordmann, P. & Poirel, L. Association of the Emerging Carbapenemase NDM-1 with a
Bleomycin Resistance Protein in Enterobacteriaceae and Acinetobacter baumannii. Antimicrob.
Agents Chemother. 56, 1693–1697, doi:https://doi.org/10.1128/AAC.05583-11 (2012).
29. Laird, T. J. et al. Diversity detected in commensals at host and farm level reveals implications for
national antimicrobial resistance surveillance programmes. J. Antimicrob. Chemother. 77, 400–408,
doi:https://doi.org/10.1093/jac/dkab403 (2022).
30. Neethirajan, S. & Kemp, B. Digital Livestock Farming. Sensing and Bio-Sensing Research 32, 100408,
doi:https://doi.org/10.1016/j.sbsr.2021.100408 (2021).
31. Hendriksen, R. S. et al. Global monitoring of antimicrobial resistance based on metagenomics
analyses of urban sewage. Nat Commun 10, 1124, doi:https://doi.org/10.1038/s41467-019-08853-3
(2019).
32. Marini, S. et al. AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance
from high-throughput short-read metagenomics data. Gigascience 11,
doi:https://doi.org/10.1093/gigascience/giac029 (2022).
33. Danko, D. et al. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell
184, 3376–3393.e3317, doi:https://doi.org/10.1016/j.cell.2021.05.002 (2021).
34. Zhou, W. et al. Antimicrobial resistance and genomic characterization of Escherichia coli from pigs
and chickens in Zhejiang, China. Front. Microbiol. 13,
doi:http://doi.org/10.3389/fmicb.2022.1018682 (2022).
35. He, D. et al. CTX-M-123, a novel hybrid of the CTX-M-1 and CTX-M-9 Group β-lactamases recovered
from Escherichia coli isolates in China. Antimicrob. Agents Chemother. 57, 4068–4071,
doi:https://doi.org/10.1128/aac.00541-13 (2013).
36. Wang, Y. et al. Antibiotic resistance gene reservoir in live poultry markets. J. Infect. 78, 445–453,
doi:https://doi.org/10.1016/j.jinf.2019.03.012 (2019).
37. Sciortino, S. et al. Occurrence and antimicrobial resistance of Arcobacter spp. recovered from aquatic
environments. Antibiotics 10, 288, doi:https://doi.org/10.3390/antibiotics10030288 (2021).
38. Jochum, J. M., Redweik, G. A. J., Ott, L. C. & Mellata, M. Bacteria Broadly-Resistant to Last Resort
Antibiotics Detected in Commercial Chicken Farms. Microorganisms 9,
doi:https://doi.org/10.3390/microorganisms9010141 (2021).
39. Błażejewska, A., Zalewska, M., Grudniak, A. & Popowska, M. A Comprehensive Study of the
Microbiome, Resistome, and Physical and Chemical Characteristics of Chicken Waste from Intensive
Farms. Biomolecules 12, doi:https://doi.org/10.3390/biom12081132 (2022).
Page 21/32
40. de Mesquita Souza Saraiva, M. et al. Antimicrobial resistance in the globalized food chain: a One
Health perspective applied to the poultry industry. Braz. J. Microbiol. 53, 465–486,
doi:https://doi.org/10.1007/s42770-021-00635-8 (2022).
41. World Health Organisation. Surveillance and One Health in food production key to halting
antimicrobial resistance, <https://www.who.int/europe/news/item/07-06-2021-surveillance-and-one-
health-in-food-production-key-to-halting-antimicrobial-resistance> (2021). Last accessed 11/12/2022
42. Davies, N., Jørgensen, F., Willis, C., McLauchlin, J. & Chattaway, M. A. Whole genome sequencing
reveals antimicrobial resistance determinants (AMR genes) of Salmonella enterica recovered from
raw chicken and ready-to-eat leaves imported into England between 2014 and 2019. J. Appl.
Microbiol. 133, 2569–2582, doi:https://doi.org/10.1111/jam.15728 (2022).
43. Conesa, A., Garofolo, G., Di Pasquale, A. & Cammà, C. Monitoring AMR in Campylobacter jejuni from
Italy in the last 10 years (2011–2021): Microbiological and WGS data risk assessment. EFSA Journal
20, e200406, doi:https://doi.org/10.2903/j.efsa.2022.e200406 (2022).
44. Rohr, J. R. et al. Emerging human infectious diseases and the links to global food production. Nature
Sustainability 2, 445–456, doi:https://doi.org/10.1038/s41893-019-0293-3 (2019).
45. Xiong, W. et al. Antibiotic-mediated changes in the fecal microbiome of broiler chickens define the
incidence of antibiotic resistance genes. Microbiome 6, 34, doi:https://doi.org/10.1186/s40168-018-
0419-2 (2018).
46. Zhou, Y. et al. Antibiotic administration routes and oral exposure to antibiotic resistant bacteria as
key drivers for gut microbiota disruption and resistome in poultry. Front. Microbiol. 11,
doi:https://doi.org/10.3389/fmicb.2020.01319 (2020).
47. Noyes, N. R. et al. Resistome diversity in cattle and the environment decreases during beef
production. Elife 5, e13195, doi:https://doi.org/10.7554/eLife.13195 (2016).
48. Zhang, C. Z. et al. The Emergence of Chromosomally Located bla (CTX-M-55) in Salmonella From
Foodborne Animals in China. Front. Microbiol. 10, 1268,
doi:https://doi.org/10.3389/fmicb.2019.01268 (2019).
49. Storey, N. et al. Use of genomics to explore AMR persistence in an outdoor pig farm with low
antimicrobial usage. Microb Genom 8, doi:10.1099/mgen.0.000782 (2022).
50. Thu, W. P. et al. Prevalence, antimicrobial resistance, virulence gene, and class 1 integrons of
Enterococcus faecium and Enterococcus faecalis from pigs, pork and humans in Thai-Laos border
provinces. Journal of Global Antimicrobial Resistance 18, 130–138,
doi:https://doi.org/10.1016/j.jgar.2019.05.032 (2019).
51. Gautam, R. et al. Modeling the effect of seasonal variation in ambient temperature on the
transmission dynamics of a pathogen with a free-living stage: example of Escherichia coli O157:H7
in a dairy herd. Prev. Vet. Med. 102, 10–21, doi:https://doi.org/10.1016/j.prevetmed.2011.06.008
(2011).
52. Oakley, B. B. et al. The cecal microbiome of commercial broiler chickens varies significantly by
season. Poult. Sci. 97, 3635–3644, doi:https://doi.org/10.3382/ps/pey214 (2018).
Page 22/32
53. Wang, X. et al. Effects of high ambient temperature on the community structure and composition of
ileal microbiome of broilers. Poult. Sci. 97, 2153–2158, doi:https://doi.org/10.3382/ps/pey032
(2018).
54. Yang, Y., Liu, G., Ye, C. & Liu, W. Bacterial community and climate change implication affected the
diversity and abundance of antibiotic resistance genes in wetlands on the Qinghai-Tibetan Plateau.
J. Hazard. Mater. 361, 283–293, doi:https://doi.org/10.1016/j.jhazmat.2018.09.002 (2019).
55. Slavik, M. F. et al. Effect of humidity on infection of turkeys with Alcaligenes faecalis. Avian Dis. 25,
936–942, doi:https://doi.org/10.2307/1590068 (1981).
56. Filipe, M. et al. Fluoroquinolone-Resistant Alcaligenes faecalis Related to Chronic Suppurative Otitis
Media, Angola. Emerg. Infect. Dis. 23, 1740–1742, doi:https://doi.org/10.3201/eid2310.170268
(2017).
57. Huang, C. Extensively drug-resistant Alcaligenes faecalis infection. BMC Infect. Dis. 20, 833,
doi:https://doi.org/10.1186/s12879-020-05557-8 (2020).
58. Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic
resistance database. Nucleic Acids Res. 48, D517-d525, doi:https://doi.org/10.1093/nar/gkz935
(2020).
59. Barnes, N. M. & Wu, H. Mechanisms regulating the airborne survival of Klebsiella pneumoniae under
different relative humidity and temperature levels. Indoor Air 32, e12991,
doi:https://doi.org/10.1111/ina.12991 (2022).
60. Zheng, W., Yue, M., Zhang, J. & Ruan, Z. Coexistence of two bla(CTX-M-14) genes in a bla(NDM-5)-
carrying multidrug-resistant Escherichia coli strain recovered from a bloodstream infection in China.
J Glob Antimicrob Resist 26, 11–14, doi:https://doi.org/10.1016/j.jgar.2021.05.002 (2021).
61. Hernández, M. et al. First Report of an Extensively Drug-Resistant ST23 Klebsiella pneumoniae of
Capsular Serotype K1 Co-Producing CTX-M-15, OXA-48 and ArmA in Spain. Antibiotics (Basel) 10,
doi:https://doi.org/10.3390/antibiotics10020157 (2021).
62. Barraud, O., Badell, E., Denis, F., Guiso, N. & Ploy, M. C. Antimicrobial drug resistance in
Corynebacterium diphtheriae mitis. Emerg. Infect. Dis. 17, 2078–2080,
doi:https://doi.org/10.3201/eid1711.110282 (2011).
63. Song, L. et al. Bioaerosol is an important transmission route of antibiotic resistance genes in pig
farms. Environ. Int. 154, 106559, doi:https://doi.org/10.1016/j.envint.2021.106559 (2021).
64. Aarestrup, F. M. et al. Resistance to antimicrobial agents used for animal therapy in pathogenic-,
zoonotic- and indicator bacteria isolated from different food animals in Denmark: a baseline study
for the Danish Integrated Antimicrobial Resistance Monitoring Programme (DANMAP). APMIS 106,
745–770, doi:https://doi.org/10.1111/j.1699-0463.1998.tb00222.x (1998).
65. Sun, J. et al. Environmental remodeling of human gut microbiota and antibiotic resistome in
livestock farms. Nature Commun 11, 1427, doi:https://doi.org/10.1038/s41467-020-15222-y (2020).
66. Li, N., Ren, Z., Li, D. & Zeng, L. Review: Automated techniques for monitoring the behaviour and
welfare of broilers and laying hens: towards the goal of precision livestock farming. Animal 14, 617–
Page 23/32
625, doi:https://doi.org/10.1017/S1751731119002155 (2020).
67. Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol
for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1,
2320–2325, doi:10.1038/nprot.2006.384 (2006).
68. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359,
doi:https://doi.org/10.1038/nmeth.1923 (2012).
69. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079
(2009).
70. Glendinning, L., Stewart, R. D., Pallen, M. J., Watson, K. A. & Watson, M. Assembly of hundreds of
novel bacterial genomes from the chicken caecum. Genome Biol. 21, 1–16,
doi:https://doi.org/10.1186/s13059-020-1947-1 (2020).
71. Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for
large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–
1676, doi:https://doi.org/10.1093/bioinformatics/btv033 (2015).
72. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754–1760, doi:https://doi.org/10.1093/bioinformatics/btp324 (2009).
73. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 7, e7359,
doi:https://doi.org/10.7717/peerj.7359 (2019).
74. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker
genes. Nat Methods 9, 811–814, doi:https://doi.org/10.1038/nmeth.2066 (2012).
75. Dixon, P. VEGAN, A Package of R Functions for Community Ecology. J. Veg. Sci. 14, 927–930,
doi:https://doi.org/10.1111/j.1654-1103.2003.tb02228.x (2003).
76. Arbizu, P. M. pairwiseAdonis: Pairwise multilevel comparison using adonis R package version 0.4.
See https://github.com/pmartinezarbizu/pairwiseAdonis (2020).
77. Cameron, E. S., Schmidt, P. J., Tremblay, B. J. M., Emelko, M. B. & Müller, K. M. Enhancing diversity
analysis by repeatedly rarefying next generation sequencing data describing microbial communities.
Sci. Rep. 11, 22302, doi:https://doi.org/10.1038/s41598-021-01636-1 (2021).
78. Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome Datasets Are
Compositional: And This Is Not Optional. Front. Microbiol. 8,
doi:https://doi.org/10.3389/fmicb.2017.02224 (2017).
79. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J.
Mol. Biol. 215, 403–410, doi:https://doi.org/10.1016/s0022-2836(05)80360-2 (1990).
80. Schmidt, K. et al. Identification of bacterial pathogens and antimicrobial resistance directly from
clinical urines by nanopore-based metagenomic sequencing. J. Antimicrob. Chemother. 72, 104–114,
doi:https://doi.org/10.1093/jac/dkw397 (2016).

Page 24/32
81. Che, Y. et al. Conjugative plasmids interact with insertion sequences to shape the horizontal transfer
of antimicrobial resistance genes. Proc. Natl. Acad. Sci. U. S. A. 118, e2008731118,
doi:https://doi.org/10.1073/pnas.2008731118 (2021).
82. Ellabaan, M. M. H., Munck, C., Porse, A., Imamovic, L. & Sommer, M. O. A. Forecasting the
dissemination of antibiotic resistance genes across bacterial genomes. Nat. Commun. 12, 2435,
doi:https://doi.org/10.1038/s41467-021-22757-1 (2021).
83. Hua, X. et al. BacAnt: a combination annotation server for bacterial DNA sequences to identify
antibiotic resistance genes, integrons, and transposable elements. Front. Microbiol. 12,
doi:https://doi.org/10.3389/fmicb.2021.649969 (2021).
84. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069,
doi:https://doi.org/10.1093/bioinformatics/btu153 (2014).
85. Sullivan, M. J., Petty, N. K. & Beatson, S. A. Easyfig: a genome comparison visualizer. Bioinformatics
27, 1009–1010, doi:https://doi.org/10.1093/bioinformatics/btr039 (2011).
86. Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10.
Virus Evol 4, vey016, doi:https://doi.org/10.1093/ve/vey016 (2018).
87. Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the
genomic era. Mol. Biol. Evol. 37, 1530–1534, doi:https://doi.org/10.1093/molbev/msaa015 (2020).
88. Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior Summarization in
Bayesian Phylogenetics Using Tracer 1.7. Syst. Biol. 67, 901–904,
doi:https://doi.org/10.1093/sysbio/syy032 (2018).
89. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and
annotation. Nucleic Acids Res., doi:https://doi.org/10.1093/nar/gkab301 (2021).
90. Anjum, M. F. et al. The potential of using E. coli as an indicator for the surveillance of antimicrobial
resistance (AMR) in the environment. Curr. Opin. Microbiol. 64, 152–158,
doi:https://doi.org/10.1016/j.mib.2021.09.011 (2021).
91. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830
(2011).
92. Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling
technique. J. Artif. Intell. Res. 16, 321–357, doi:https://doi.org/10.1613/jair.953 (2002).
93. Perneger, T. V. What's wrong with Bonferroni adjustments. BMJ 316, 1236–1238,
doi:https://doi.org/10.1136/bmj.316.7139.1236 (1998).
94. Hagberg, A., Swart, P. & S Chult, D. Exploring network structure, dynamics, and function using
NetworkX. (Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008).
95. Kamada, T. & Kawai, S. An algorithm for drawing general undirected graphs. Inf. Process. Lett. 31, 7–
15, doi:https://doi.org/10.1016/0020-0190(89)90102-6 (1989).
96. Cawley, G. C. & Talbot, N. L. On over-fitting in model selection and subsequent selection bias in
performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).

Page 25/32
97. Wainer, J. & Cawley, G. Empirical evaluation of resampling procedures for optimising SVM
hyperparameters. J. Mach. Learn. Res. 18, 1–35 (2017).
98. Jones, E., Oliphant, T. & Peterson, P. SciPy: Open source scientific tools for Python. (2001).

Figures

Figure 1

Summary of the collection of biological samples and environmental sensor data: (A) The three provinces
of China (Liaoning, Shandong and Henan) where the 10 farms are located; (B) source and collection time
Page 26/32
points for the biological samples collected within each farm. Biological samples consist of: i) farm
samples (faeces and feather from chickens and barn floor and outdoor soil samples) collected at mid-life
(t1: week 3) and at the end of life (t2: week 6) and ii) abattoir samples (chicken carcass, processing line
and wastewater) collected on slaughtering day (t3: 1-5 days after week 6); (C) season and start dates of
the collection campaign for each farm. Note the use of an underscore to distinguish the two collection
campaigns executed to cover two breeding cycles at Shandong 1 (SD1_1 and SD1_2).

Figure 2

Analysis of mobile ARGs (note that in this analysis t1 and t2 sources of the same type were aggregated
together, leading to a total of 7 sources considered). (A) Pie chart showing the proportion of ARGs (out of
the 202 found) associated to one or multiple MGEs. Colour indicates the number of MGEs associated to
an individual ARG (B) Undirected network graph showing mobile ARGs (small orange circles) association
with sample sources (large green circles). Edges of the graph link the mobile ARGs to the sources in
Page 27/32
which they were found. (C) Number of mobile ARGs per sample, per source. Each circle represents a
single sample, with circles coloured by farm. (D) 157 (out of 695) mobile ARGs were found to be present
in both chicken and environmental samples from the same farm (blue circle). 184 mobile ARGs contained
clinically relevant ARGs (pink circle). An overlap of 47 clinically relevant24 mobile ARGs was found in
chicken and environmental sources obtained from the same farm (purple region).

Figure 3

Data mining pipeline to find correlations between the gut microbiome and resistome, antibiotic resistance
in E. coli, and temperature and humidity. The full data analysis workflow: Input data shown in green;
phase I - metagenome data pre-processing (in yellow) – the steps are illustrated in detail in the Materials

Page 28/32
and Methods section; phase II – training and testing of ML-powered predictive functions to isolate
metagenomic features (i.e. presence/absence of the ARGs and relative abundances of microbial species
present in the sample) correlated to phenotypic resistance (in blue); phase III – method (discussed in the
next section) based on fitting regression models to isolate metagenomic features that better correlate
with variations of temperature and humidity (in red).

Figure 4

Machine learning performance and feature selection from correlations between the gut microbiome and
resistome, and antibiotic resistance in E. coli. (A) Performance of the ML-powered predictive functions of
E. coli resistance to specific antibiotics (ML technology: extra-tree classifier – see Materials and
Methods). Performance indicators (AUC, accuracy, and precision) were computed as the average of 30
iterations of nested cross-validation (see Materials and Methods). See Supplementary Fig. 6 for
performance indicators sensitivity, specificity and Cohen’s Kappa score (B) Number of metagenomic
features (resistome and microbiome) found as the strongest predictors of the resistance/susceptibility
profiles to each antibiotic. (C) Undirected graph showing the strongest predictors (metagenomic features
in the chicken gut: ARGs (circles) and bacteria species (stars)) for each antibiotic model. Edges of the

Page 29/32
graph link either the ARG or the bacteria species nodes (predictor variables) to the antibiotic model
(ellipses) they were found to be predictive in. Both the ARG and antibiotic model nodes are colour coded
according to the antibiotic class the antibiotic/ARG is known to be associated with. The machine learning
models were run for the following antibiotics: amikacin (AMI), aztreonam (AZM), cefotaxime (CTX),
cefoxitin (CFX), chloramphenicol (CHL), kanamycin (KAN), minocycline (MIN), nalidixic acid (NAL),
streptomycin (STR), sulphafurazole (SUL), and trimethoprim/sulfamethoxazole (SXT).

Figure 5

Graphs highlighting gut features (microbial species and ARGs) identified as predictors of E. coli
resistance, found correlated to humidity or temperature, and also found to be related to each other
(meaning: the ARGs are likely present in the species). (A) features correlated with humidity and (B)
features correlated with temperature. Nodes indicate ARGs or microbial species. Edges connect species to
ARGs likely present in the species. ARGs nodes are colour-coded according to the antibiotic class known
to be associated with the ARG; microbial species nodes are in grey.

Page 30/32
Figure 6

Gene structure and evolutionary analysis of the mobile ARG pattern IS Aba125-NDM-1. Bayesian
evolutionary phylogenetic tree reconstructing the phylogeny of contigs containing the clinically important
ARG NDM-1 and MGE ISAba125. The source type and location of the samples are indicated by colour
strips. The gene structure of each sample is shown to the right of the tree with MGEs coloured blue, ARGs
coloured green and other genes coloured yellow. The ARG ble is co-located with NDM-1 in all contigs.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.

SupplementaryInformation.pdf
TableS1.xlsx
TableS2.xlsx
TableS3.xlsx
TableS4.xlsx
TableS5.xlsx
TableS6.xlsx
TableS7.xlsx
TableS8.xlsx
Page 31/32
TableS9.csv
TableS10.xlsx
TableS11.xlsx
TableS12.xlsx

Page 32/32

You might also like