Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

The Plant Journal (2019) 97, 164–181 doi: 10.1111/tpj.

14170

SI GENOME TO PHENOME

Genome-wide association studies on the phyllosphere


microbiome: Embracing complexity in host–microbe
interactions
Kathleen Beilsmith1,†, Manus P.M. Thoen1,†, Benjamin Brachi2, Andrew D. Gloss1, Mohammad H. Khan1 and
Joy Bergelson1,*
1
Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL 60637, USA, and
2
BIOGECO, INRA, University of Bordeaux, 33610 Cestas, France

Received 25 July 2018; revised 8 November 2018; accepted 16 November 2018; published online 22 November 2018.
*For correspondence (e-mail jbergels@uchicago.edu).

These authors contributed equally to this study.

SUMMARY
Environmental sequencing shows that plants harbor complex communities of microbes that vary across
environments. However, many approaches for mapping plant genetic variation to microbe-related traits
were developed in the relatively simple context of binary host–microbe interactions under controlled condi-
tions. Recent advances in sequencing and statistics make genome-wide association studies (GWAS) an
increasingly promising approach for identifying the plant genetic variation associated with microbes in a
community context. This review discusses early efforts on GWAS of the plant phyllosphere microbiome and
the outlook for future studies based on human microbiome GWAS. A workflow for GWAS of the phyllo-
sphere microbiome is then presented, with particular attention to how perspectives on the mechanisms,
evolution and environmental dependence of plant–microbe interactions will influence the choice of traits to
be mapped.

Keywords: genome-wide association studies, microbiome, phyllosphere, host–microbe interactions,


community, environment, sequencing, phenotype, genotype, mapping.

INTRODUCTION have been tremendously fruitful, the advent of high-


throughput environmental sequencing has led to the dis-
Plants and microbes together make up the vast majority of covery that leaves harbor a complex microbial ecosystem.
terrestrial biomass on Earth (Bar-On et al., 2018). It is hardly From mountain shrubs (Ruiz-Perez et al., 2016) to seagrass
surprising, then, that plant phenotypes are inextricably (Fahimipour et al., 2017), and from subarctic grass (Uroz
shaped by their interactions with microbes. Some microbes et al., 2016) to equatorial forest canopies (Lambais et al.,
cause disease in plants (Mansfield et al., 2012), while others 2006), the photosynthesizing tissues of plants are colonized
offer protection against these microbial pathogens and by a massive and diverse set of microbial colonists, includ-
non-microbial enemies (Innerebner et al., 2011); still others ing bacteria, fungi and phages (Lindow and Brandl, 2003;
help plants utilize their abiotic environment through pro- Rastogi et al., 2013; Morella et al., 2018; Sapp et al., 2018).
cesses such as nutrient acquisition (Haney et al., 2015). Plant biologists and breeders are now confronted with the
Efforts to characterize the molecular biology, ecology and challenge of translating the wealth of knowledge gained
evolution of these interactions have revealed that they vary from the study of individual plant–microbe interactions to
among plant species and genotypes, and thus have a an understanding of factors shaping the microbiome. This
genetic basis (Smith and Goodman, 1999; Corwin and will require not only disentangling host–microbe from
Kliebenstein, 2017). This has largely been shown through microbe–microbe interactions, but also integrating the role
studies on interactions of plants with one or a few microbial of environmental variation.
strains at a time, often in controlled and simplified environ- Genome-wide association studies (GWAS) are a power-
ments in the laboratory or greenhouse. While these efforts ful and flexible tool for tackling this challenge. GWAS refer

164 © 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd
GWAS of the phyllosphere microbiome 165

to the application of association mapping, a technique that


PREVIOUS WORK
tests for statistical associations between genetic variation
at a locus and organismal phenotypes, to thousands or Most GWAS of plant–microbial interactions have focused
millions of loci throughout the genome (Brachi et al., on how plant genotype shapes pairwise interactions with a
2011). Because GWAS only requires genotype and pheno- single microbial taxon. Representative examples can be
type data across individuals, it can in theory be employed found in the 35 GWAS of direct pathogen challenge col-
for any property of the phyllosphere that can be quantified lated by Bartoli and Roux (2017), or the 13 GWAS of dis-
and is influenced by plant genotype. Promisingly, ease resistance in maize collated by Xiao et al. (2017).
quantitative indices of community composition and diver- GWAS have also been used to probe the genetic basis of
sity vary among plant genotypes grown in common envi- plant mutualisms with microbes, such as the relationships
ronments (i.e. they are heritable; Peiffer et al., 2013; Horton between legumes and their nodule-forming bacteria (Stan-
et al., 2014), making them suitable phenotypes to interro- ton-Geddes et al., 2013; Curtin et al., 2017).
gate through GWAS. A key strength of GWAS in this con- In contrast, only a single GWAS of the plant phyllo-
text is the flexibility of its underlying statistical models, sphere community has been published. Horton et al.
which can compare and quantify how the effects of geno- (2014) grew a panel of 196 Arabidopsis thaliana genotypes
type on phenotype vary across environmental conditions, in the field, and characterized the leaf microbial commu-
while also controlling for environmental noise that con- nity by sequencing taxonomic marker genes for bacteria
founds associations in the field (Korte et al., 2012; Sasaki and fungi. They found that additive genetic variation in the
et al., 2015). Recent computational and statistical advances host influenced these leaf microbes at the community
allow GWAS to gain power through modeling the relation- level. For the top eigenvectors produced by ordination
ships between traits (Hackinger and Zeggini, 2017), offer- analysis (principal component analysis; PCA), 9% and 11%
ing an exciting avenue for leveraging the wealth of of the variance in bacterial and fungal communities,
different microbial lineage-level and community-level phe- respectively, were explained by plant SNPs (narrow-sense
notypes that can be extracted from phyllosphere DNA heritability; Box 1). However, this result was only obtained
sequencing. after rare species represented by a small fraction of the
Here, we begin by reviewing how GWAS that leverage total sequencing reads in each community were excluded.
phenotypes derived from microbial DNA sequencing data Without restricting the microbiome profile to the most-
has revealed new insight into the influence of genetic sequenced taxa, heritability was no longer observed for
variation among plants on the composition of the phyllo- ordination outputs, and mapping SNPs to community-level
sphere. GWAS has also been used to characterize the traits proved impossible.
effects of host genetics on the associated microbiomes in Using these restricted community datasets, Horton et al.
organisms other than plants, and we draw on a leading (2014) successfully mapped both community-level traits
system for these studies, the human microbiome, to illus- and the presence/absence of the most abundant individual
trate the promise and challenges awaiting studies of the taxa in the leaf microbiome to host SNPs. Eigenvectors
plant phyllosphere. Next, along each stage of the work- produced by PCA of the fungal community were associ-
flow for GWAS of the phyllosphere, we synthesize emerg- ated with SNPs located in candidate genes for cell wall
ing bioinformatic, statistical and conceptual advances that integrity. Species richness of the bacterial community was
are poised to enable new breakthroughs, and we empha- associated with SNPs in a set of genes enriched for tri-
size key considerations for researchers in light of these chome-related functions. But the strongest associations in
advances. This outline of the workflow is intended to help the entire study were found with the presence/absence of
readers evaluate the feasibility and promise of conducting one or a few taxa. The SNPs in these associations were
microbiome-related GWAS studies in their own plant sys- located in genes involved in defense, signal transduction,
tems. We argue that perspectives on how plant–microbe kinase activity, cell walls and cell membranes. The strength
interactions evolve define the choice of phenotypes to be of these associations, coupled with the necessity of remov-
measured, and thus may be the most important determi- ing rarely sequenced taxa to observe any heritability, sug-
nant of the single nucleotide polymorphism (SNP) vari- gests that plant influence may be best observed in the
ants that GWAS identifies as mediators of host genotypic most frequent phyllosphere colonists rather than compre-
effects on the microbiome. In particular, the number and hensive community phenotypes.
identity of host loci associated with the microbiome will Because a larger number of GWAS have been performed
depend upon the choice to consider pairwise or commu- on the human and mouse microbiome, these models offer
nity-level interactions, on whether the interacting a glimpse of the advantages and challenges that await
microbes are characterized by taxonomic grouping or by plant biologists as the library of microbiome-associated
traits, and on the extent to which the environmental plas- variants grows. GWAS has successfully identified over 200
ticity of interactions is exposed in the sampling scheme. SNPs associated with variation in human microbiome

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
166 Kathleen Beilsmith et al.

Box 1 Heritability 6¼ inheritance

Inheritance and heritability are two terms that are often confused, and the potential for vertical transmission of the microbiome makes
it important to clearly distinguish the two concepts (box figure). In quantitative genetics, phenotypic variance is decomposed following
equations:

Vp ¼ Vg þ Ve þ Vge where Vg ¼ Va þ Vd þ Vi

where Vp is the total phenotypic variance (here, for a microbiome-related trait), Vg is the variance among genotypes explained by
genetics, Ve is the variance due to environmental variation, and Vg*e is the variance due to genotype by environment interactions. Vg
can be further decomposed into variance due to additive (Va), dominance (Vd) and epistatic effects (Vi). Broad sense heritability is for-
mally defined as the proportion of phenotypic variance explained by the effects of genes (host genes, in the case of the microbiota)
(Lynch and Walsh, 1998) (H2 = Vg/Vp). Narrow sense heritability, on the other hand, is defined as the proportion of the total phenotypic
variance due to additive effects (h2 = Va/Vp), and is related to the phenotypic response to selection (Falconer, 1981). Following these
definitions, a trait can be genetically determined, and therefore inherited, while not displaying heritability for three main reasons: (i)
there is no variation at the loci underlying the trait in the collection of individuals considered; (ii) the variance due to the environment
is much larger than the variance due to genetics, or (iii) strong gene 9 environment interactions obscure the signal.

In practice, the method to estimate broad sense heritability will depend on the model system. The simplest case concerns clonal or
selfing species, such as A. thaliana, where multiple replicates of the exact same genotype can be included in the same fully random-
ized experimental design. In this situation, broad sense heritability is estimated using mixed models as the ratio of phenotypic vari-
ance among genotypes over the total phenotypic variance and can be termed clonal repeatability. While this situation is powerful for
estimating broad sense heritability, great care should be taken to prevent confounding by non-genetic maternal effects. This is particu-
larly important when the trait of interest is the microbiome, due for example to potential vertical transmission through seeds. Control
of maternal effect is usually achieved by producing the seeds in homogeneous conditions, preferably growing the plants in a random-
ized design. However, if vertical transmission of the microbiome through seeds is important for the future microbiome, parental plants
used to generate experimental material should be grown in the environmental conditions in which the experiment will be performed,
maybe over several generations.
To compute heritability for non-selfing/clonal species, where replication of the exact same genotypes is not possible, one must either
compute trait correlations between parents and offspring, assume genetic similarity based on pedigrees, or compare zygotic and non-
zygotic twin pairs assuming that non-zygotic twins have a genetic similarity of 0.5 and zygotic twins have identical genetics. All these
cases suffer from the challenge that it is often impossible to control for environmental variance in fully randomized experimental
designs and/or to control for maternal effects, making heritability estimates less precise and increasing the need for very large sample
sizes. A recent solution is the computation of SNP-based or pseudo-heritability, which became possible following the rise of high-
throughput genotyping and sequencing. The idea is to estimate the proportion of phenotypic variance explained by the genome-wide
genetic similarity of individuals using mixed models (Yang et al., 2011). While the relationship of SNP-based heritability with broad

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 167

sense and narrow sense heritability is unclear and depends on the experimental design, its estimates are directly relevant to GWAS
because they capture the amount of among-genotype variance explained by true genetic variation in the sample considered (and not
assumed from pedigrees or family structure; Lee and Chow, 2014).
Heritability estimates represent a target for the amount of variance explained by underlying loci and potentially detectable in GWAS.
However, high heritability values do not guarantee the successful identification of significant trait/marker association, as this will
depend on the genetic architecture of the trait (see ‘Association Mapping’).

In human microbiome studies to date, host genetics account for only a small portion of inter-individual variation in human micro-
biome composition. Microbiomes are only slightly more similar between identical and non-identical twins (Goodrich et al., 2014), and
a recent re-analysis of these studies estimated SNP-based heritability of the microbiome to be between 2 and 8% (Rothschild et al.,
2018). Environmental variation – including diet, lifestyle and antibiotic use – collectively has a larger effect than host genetics (David
et al., 2014; Rothschild et al., 2018). The possibility of replicated experiments profiling plants raised in one or more shared natural
environment, which can increase the heritability of the microbiome by minimizing environmental variation among individuals, is there-
fore a major advantage to GWAS of the plant phyllosphere. It is also important to note that heritability of the human microbiome is
dependent on the methods used to characterize the microbiome, so heritability might be underestimated by the taxonomic metrics
used to date if host genotype shapes microbiome function more strongly than it shapes taxonomic composition. The heritable compo-
nents of the microbiome may nonetheless have outsized effects on human phenotypes. For example, the most heritable microbial
family in one study (Christensenellaceae) co-varied in abundance with many other microbes; these same microbes altered microbiome
composition and reduced weight gain when transplanted into mice (Goodrich et al., 2014). When many traits are available – as is the
case with high-dimensional phyllosphere datasets – those that are heritable can be prioritized for further analysis. This is sensible
because GWAS is unlikely to be informative for traits with little to no heritability in a given study, and retaining such traits for GWAS
increases the experiment-wide multiple testing burden.

composition (Knights et al., 2014; Blekhman et al., 2015; understand which parts of a pipeline must be standardized
Davenport et al., 2015; Bonder et al., 2016; Goodrich et al., to allow consistent results to be revealed, and in order to
2016; Turpin et al., 2016; Wang et al., 2016; Demmitt et al., confidently identify promising candidates for hypothesis
2017; Igartua et al., 2017; Kolde et al., 2018; Rothschild testing and application, plant biologists need to understand
et al., 2018), and a few genomic regions associated with how each step of a microbiome GWAS workflow can be
microbiome composition in mice (Org et al., 2015). The tuned to sample adequate variation, generate meaningful
majority of these associations are with the relative abun- phenotypes, and generate sufficient power to detect associ-
dances of individual bacterial families or genera, although ations. Here we provide a workflow for investigating the
some studies report associations with overall community genetic bases of leaf microbial community variation among
composition or species richness (Wang et al., 2016). Genes plant genotypes or cultivars (Figure 1). We discuss the
involved in metabolism and immunity are heavily repre- choices facing investigators at each step, as well as their
sented in regions harboring significant associations. tradeoffs.
The majority of associations found in human and mice
GWAS to date are unique to single studies (Wang et al., WORKFLOW
2016). The failure to consistently detect the same associa-
Defining the question
tions across studies may reflect differences in the genetic
ancestry of the mapping populations used in each study, Genome-wide association studies of the leaf microbiome
the environments inhabited by those populations (which are exploratory experiments, and results can be difficult to
have an outsized effect on microbiome composition; interpret. We propose that researchers interested in explor-
Box 1), or methods used to sequence and quantify micro- ing the relationships between host genetic variation and
bial communities. Alternatively, the paucity of shared asso- the leaf microbial community should consider three main
ciations may arise because their effects on microbiome points. First, GWAS provide an avenue to interrogate how
composition are too small to permit consistent detection a large catalog of naturally occurring alleles across the
with the sample sizes employed, which ranged roughly host genome impacts the leaf microbiota. However, this
from 100 to 1000 individuals. Some of these associations relationship is complex due to both the genetics of the
may also fail to replicate because they are false positives. host (i.e. millions of markers, many of them with rare alle-
Thus, even in humans, our understanding of the relation- les) and the diversity of the microbial communities (hun-
ship between the host genome and the microbiome dreds to thousands of members, many if not all with zero
remains limited. inflated distributions). Focus on a more geographically
Microbiome GWAS results will not always paint a similar restricted or less genetically diverse collection of plant
picture of how host genomes influence microbial colonists, genotypes could lighten the statistical burden and sources
as demonstrated by the challenges of synthesizing results of confounding due to population stratification. Second,
across mammalian studies of this kind. In order to concerning the microbial community itself, one might be

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
168 Kathleen Beilsmith et al.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 169

Figure 1. How to use genome-wide association studies (GWAS) with your plant leaf microbiome.
1. THE QUESTION: Understanding the leaf microbiome from an evolutionary, mechanistic plant physiological or agricultural perspective requires different
downstream considerations. 2. PLANT GENOTYPING: Selecting the plant panel is an essential step in non-model systems. A plant panel with the appropriate
diversity and population structure must be selected. 3. DESIGN: One can sample microbiomes from plant leaves in their natural habitat, in a common garden
experiment, or in sterile environments like well plates. 4. SAMPLING: Differences in sampling methods often concern surface cleaning treatments and the time
during which leaf samples are stored before DNA is isolated. 5. PHENOTYPING: Microbiota can be quantified with amplicon or metagenomic sequencing, and
converted into relevant phenotypic traits for GWAS. 6. GWAS: Several statistical models exist to perform GWAS on the leaf microbiome of plants and estimate
the significance of associations with microbiome traits at loci across the genome, as depicted in Manhattan plots. 7. VALIDATION: Genes underlying candidate
loci can be functionally validated through post-GWAS statistical approaches, such as bioinformatic methods that use transcriptome datasets and reverse genetic
gene editing methods.

interested in finding host genes that influence microbial and LD decreases, a denser set of SNPs will be necessary
community composition, or genes that enable hosts to fos- to ensure that all the relevant variants in the genome fall
ter specific microbial functions. This will have profound within the LD block of a marker. High-density SNP panels
consequences on the collection, sequencing and analysis already exist for a variety of plant species (Bayer et al.,
of leaf samples. Finally, if the ultimate goal is to gain a 2017; Shirasawa et al., 2017; Eltaher et al., 2018).
mechanistic understanding of how the plant influences the Another challenge is the ability to find associations
microbiome, it will be necessary to understand causality between the microbiome and structural variation in plant
between host factors and microbial community members/ genomes. Even in A. thaliana, with its notably small gen-
functions. Furthermore, determining how functional valida- ome, genome size varies by 10% over a fraction of the dis-
tion will be achieved requires planning. For example, func- tribution range of the species (Long et al., 2013). When
tional validation of allelic effects estimated in field SNPs and small structural variation are called by alignment
experiments may be difficult to validate under laboratory to a reference, much of this larger structural variation in
conditions or in replicated field experiments due to strong the genome will be lost. De novo alignments of genomes
environmental effects. We develop these points below. in plant panels can help to recapture this variation. For
species with repetitive sequences and large rearrange-
Plant genotyping
ments, increasing sequencing coverage and longer
Plant mapping panels benefit from having ample natural sequencing reads should help to resolve the issues with
variation, although the advantages of including more generating such alignments, and allow mapping micro-
diversity are limited by the density of SNP markers in the biome traits to both SNP and structural variants.
genome. GWAS draw their ability to identify associated Selection of a GWAS panel also involves a tradeoff
loci from historical recombination events (Mitchell-Olds between capturing phenotypic variation of interest and
and Schmitt, 2006; Bergelson and Roux, 2010), which cre- confounding the analysis with population structure and
ate new combinations of alleles at a frequency that allelic heterogeneity (Pritchard et al., 2000; Zhao et al.,
decreases with proximity, or linkage, on the chromosome. 2007). A good illustration is the case of FRIGIDA, a major
Because nucleotides in close proximity are less likely to be flowering time gene in A. thaliana. This gene displays
separated by a recombination event, causal variants can be extensive allelic heterogeneity, even when only consider-
identified by the association of a nearby SNP marker with ing French A. thaliana populations (Le Corre et al., 2002).
a phenotype. The number of SNP markers needed to find Associations at FRIGIDA proved difficult to map using a
associations thus depends on the distance over which link- worldwide panel of accessions (Atwell et al., 2010); how-
age disequilibrium (LD) extends in that portion of the gen- ever, it displayed significant association in a modest panel
ome. Patterns of LD differ significantly between plant of 60 genotypes, all from one local population presenting
species and even within single genomes as a function of flowering time variation (Brachi et al., 2013). By using a
chromosomal location, insertions, repetitive regions and local population, the authors sampled individuals with a
methylation. In the model plant A. thaliana, LD typically more recent common ancestor, allowing less time for mul-
decays over a distance of 5–10 kb. In an outcrossing spe- tiple causal alleles/genes to arise in the population. As a
cies like maize, intragenic linkage in a diverse germplasm result, fewer alleles were present at the causal loci, permit-
decreases to < 0.25 only over 100–200 bp (Tenaillon et al., ting associations to be more readily revealed (see Wright
2001). Thus, some plants will require denser SNP data than et al., 1999 for a similar reflection on finding the genetics
others to effectively map microbiome associations. The underlying rare diseases in humans). The cost, depending
amount of LD will also depend on how recently the mem- on the species, is that sampling a restricted population
bers of a mapping panel shared a common ancestor: gen- usually causes LD to decay over longer distances, reducing
omes in a local population will have experienced fewer the resolution of association analysis. This can limit the
historical recombination events than those in a worldwide ability of GWAS to pinpoint specific genes in species with
panel. As more diverse genomes are included in a panel relatively long LD decay distances. Furthermore,

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
170 Kathleen Beilsmith et al.

phenotypic variation may be limited for some traits of composition in a single environment, average effects that
interest at the local scale. Because both the extent to which are consistent across environments, or changes in effects
the leaf microbiome is heritable and the spatial scale at across environments (i.e. genotype-by-environment, or
which those heritable components vary among popula- G 9 E, interactions), investigators can design studies to
tions are still open questions, the appropriate scale for better control environmental effects in the lab or better
mapping microbiome traits in particular is unclear. quantify them in the greenhouse or field.
When mapping populations collected from across a Environmental variation can mask observable relation-
large geographical range are favored, it is essential to cor- ships between host genes and features of the phyllo-
rect for population structure in order to control the occur- sphere microbiome by creating differences in host gene
rence of false positives. A common way to perform this expression and microbial growth across replicates of the
correction, which has been successful in A. thaliana, is to same host genotype. Associations that might be
include a matrix of genotype similarity in a mixed linear quenched by such variation could be recovered in studies
model for association testing (Kang et al., 2010). A draw- that use synthetic plant microbial communities to gather
back to such corrections is that they can erase associations phenotypes on a diverse panel of host plants in tightly
with SNPs that are strongly correlated with population controlled environmental conditions (Bodenhausen et al.,
structure. The Quantitative Trait Cluster Association Test 2014; Bai et al., 2015). A study in the model plant A. thali-
(QTCAT) also has the potential to circumvent false posi- ana demonstrated that host genetic effects on the leaf
tives due to demographic history. The QTCAT enables microbiome can be measured in synthetic communities.
simultaneous multi-marker associations while considering Differences in community composition were observed
correlations between markers, thus avoiding the need to among four A. thaliana accessions 2 weeks after gnotobi-
correct for population structure (Klasen et al., 2016). This otic leaves were sprayed with a synthetic community
approach could prove useful given how little we know composed of seven microbes from the most abundant
about the extent to which the microbiome is correlated phyla sampled in natural leaf communities (Bodenhausen
with population structure. et al., 2014).
One concerning limitation of GWAS, especially for the When it occurs across sites in the field, environmental
microbiome, is a lack of power to detect small effects variation can cause leaf microbial community phenotypes
(Nordborg and Weigel, 2008). It is easiest to detect associa- to map differently to plant genotypes. This principle was
tions genome-wide for traits coded by a small number of demonstrated in a recent field study of a perennial mus-
loci with large effects. It is therefore not surprising that tard (Boechera stricta) grown in a variety of environments
GWAS of plant traits have been especially successful in the differing in elevation, moisture, temperature, soil nutrients
identification of resistance genes, which are highly herita- and plant diversity. In B. stricta leaves of 48 lineages, both
ble (genetically determined), of strong phenotypic effect, the overall community diversity and the relative abun-
and which act relatively autonomously (Bergelson and dance of specific taxonomic groups were better predicted
Roux, 2010). In addition, evolutionary mechanisms tend to by site-specific rather than site-averaged host genetic
maintain variation at resistance genes, sometime across effects (Wagner et al., 2016). Thus, environmental variation
the host range (Stahl et al., 1999; Karasov et al., 2014), is a key variable that must be considered in order to fully
reducing the confounding of associations by population understand the plasticity of the map between host plant
structure. Variation in microbial community composition is genotype and leaf microbiome phenotype. Environmental
unlikely to be influenced by host traits with such a simple variables have been manipulated in GWAS of other envi-
genetic basis because so many aspects of plant and cell ronmentally sensitive plant traits (Li et al., 2010; Wu et al.,
biology – including the cell wall, cell membrane, signal 2018). For example, one GWAS of flowering time in
transduction, meiosis, metabolic pathways and trichomes A. thaliana was replicated in growth chambers simulating
(Horton et al., 2014) – appear to be involved. Given these four distinct natural climates. Some polymorphisms asso-
limitations in power, it is particularly important to select ciated with flowering time overlapped between climates
the correct scale of phenotypic variation, genotype suffi- while others were climate-specific (Li et al., 2010). Studies
cient SNPs, and find appropriate corrections for population like this one may be similarly useful in establishing which
structure depending on the specific microbiome traits of features of the environment most influence the host vari-
interest. ants affecting leaf communities. Studies using environ-
ment as a variable can be used to find polymorphisms that
Design
have robust effects on the microbiome across environ-
The extent to which features of the phyllosphere micro- ments. Further, GWEIS (gene-by-environment genome-
biome are shaped by host genotype or the environment wide interaction study) following on such a design can be
is still an open question. Depending on whether they are used to find the polymorphisms that cause plasticity
interested in the effects of plant genotype on microbiome (Hamza et al., 2011).

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 171

Unlike flowering time, plant microbiomes vary not only microbial colonists have different interactions at these
with geography but also with abiotic conditions within locations. For example, gene expression patterns of the
sampling sites. Abiotic conditions should be measured phyllosphere bacterium Pseudomonas syringae are dis-
across samples in case they affect the chosen phenotype tinct when it is grown on leaf surfaces, where genes
for microbiome GWAS. For example, a study of soybean involved in motility are significantly more transcribed, rel-
and wheat root microbiomes across many different field ative to the apoplast, where genes related to iron and
sites found that pH and nitrate content correlated with nitrogen uptake are more transcribed (Yu et al., 2013). To
community diversity (Rascovan et al., 2016). preserve separate inner and outer communities, vortex
washes, sonication in buffer and surface sterilization have
Sampling
each been used to first collect ‘epiphytes’, although the
The microbes of the phyllosphere must be sampled, identi- steps employed are highly variable between studies, and
fied and enumerated in order to generate phenotypes for their efficacy at removing tightly adhered surface
use in GWAS. While imaging of hybridized fluorescent microbes is not well characterized.
probes or colony isolation and identification with Sanger One fundamental limitation of phyllosphere sampling is
sequencing can produce these phenotypes for bacteria, its destructive nature. Unlike many microbiome samples
next-generation sequencing approaches are the most high- from human hosts, which can be collected repeatedly from
throughput and comprehensive options for characterizing the same individuals via stool samples or cheek swabs, the
leaf microbiomes. The most common form of microbiome leaf microbiome, with its lower bacterial load, requires the
data is sequences for variable regions of a conserved mar- harvest of whole tissue, especially in small species or
ker gene like bacterial 16S rRNA or fungal ITS. After DNA young seedlings. Even when large plants are sampled
is extracted from lysed leaf tissue, a portion of the marker repeatedly, the possible effect of tissue wounds on micro-
gene is amplified from the sample of environmental DNA biome assembly must also be considered. Tissue wounds
by polymerase chain reaction (PCR) and sequenced on a can trigger immune responses by releasing damage-asso-
next-generation platform such as the Illumina MiSeq or ciated molecules in the plant and can also induce phyto-
HiSeq. An alternative to marker gene sequencing is the col- hormone synthesis (Savatin et al., 2014). Arabidopsis
lection of metagenomic data. In this approach, a library of thaliana mutants lacking phytohormone synthesis or sig-
environmental DNA is prepared from leaf lysate and naling-related genes have distinct root communities in nat-
sequenced without targeting specific types of microbes or ural soils (Lebeis et al., 2015), suggesting that altered
particular genes using shotgun sequencing. Many method- levels of phytohormones could affect community assembly
ological considerations for these approaches are not speci- in nature. In addition, the age and maturity of plants and
fic to phyllosphere sampling of microbial communities and leaves at the time of sampling can influence the leaf micro-
have been reviewed elsewhere (Knight et al., 2018). Below bial community (Meaden et al., 2016; Wagner et al., 2016).
we elaborate on phyllosphere-specific sampling concerns. In metagenome and marker gene datasets, DNA extrac-
A leaf is a heterogeneous environment composed of tion and sequencing methods can bias the measured rela-
microhabitats defined by abiotic conditions and proximity tive abundances of microbial taxa. This bias has been
to physical features like stomata. Fluorescence imaging quantified by comparing relative abundances from
indicates that populations of some phyllosphere bacterial sequencing data to expected abundances given samples
species grow to different densities in these microhabitats with a defined mix of bacterial cells from diverse taxa
(Burch et al., 2013; Peredo and Simmons, 2018). These (Morgan et al., 2010; Fouhy et al., 2016). In marker gene
findings support the idea that leaf microhabitats could har- data, the bias created by the choice of target sequences for
bor different microbial communities in nature (Edwards amplification is an even more important concern. Primer
et al., 2015). Because the leaf is heterogeneous, sampling choice affects how much chloroplast DNA is amplified, the
designs should ensure that within-leaf variation will not number of reads that can be classified, and the relative
confound the relationship between SNPs and microbe abundances of different taxa in phyllosphere sequence
communities. Microbiome samples should be collected samples. If amplification and sequencing methods severely
either from consistent positions on leaves or by homoge- skew the abundances of taxa influenced by host genotype
nizing the leaf tissue before extraction to ensure that any in the community profile, then differences at these steps
within-leaf variation is well sampled so that it does not could limit our ability to synthesize GWAS findings across
confound associations between plant genotype and micro- work in the model system A. thaliana and across plant sys-
bial community phenotype. tems in general.
Leaf tissue can be homogenized by grinding with a
Phenotyping
mortar and pestle or by bead beating. However, first it is
useful to separate epiphytic and endophytic microbes The most important aspect of any GWAS pipeline is defin-
because transcriptome data suggest that host plants and ing which traits will be mapped to the host genome. This

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
172 Kathleen Beilsmith et al.

in turn determines the type of raw phenotype data to col- OTU- or ASV-based taxonomic units must harbor pheno-
lect. To help determine the type of microbiome sequencing typic differences relevant to plant–microbe interactions. In
to perform and the way in which data will be analyzed, we particular, GWAS based on these phenotypes are likely to
suggest focusing on three main questions. identify associations driven by microbial motifs conserved
within families or orders.
1. Is the research question better addressed by associat-
Obtaining sufficient taxonomic resolution with universal
ing host genes with microbial taxa or with the biological
primers is a key limitation of marker gene microbiome
functions encoded in microbial genomes?
characterization. Fortunately, untargeted whole metagen-
2. If the research focuses on taxa, what mode of host–mi-
ome sequencing has become a viable alternative. Untar-
crobe interactions is more likely: diffuse interactions
geted data from whole communities can be used to find
between hosts and microbial communities or targeted
known genes or taxa in the sample or give sufficient cover-
interactions between hosts and specific microbes in the
age to reconstruct whole genomes of community mem-
community?
bers. Untargeted data can offer finer taxonomic resolution
3. How important is the environment in shaping host–mi-
by covering more of each microbe’s genome. These read-
crobe interactions, and do the hypotheses under consider-
based and assembly-based approaches are described by
ation require explicit investigation of the impact of host
Knight et al. (2018), and reviewed by McIntyre et al. (2017)
genotype-by-environment interactions on microbial com-
and Vollmers et al. (2017), respectively. Subsequently, gen-
munities?
omes assembled de novo from metagenomic reads, or
In the following three sections, we discuss how the obtained from a collection of publicly available genomes
answers to these three questions impact the selection of for plant-associated microbes (Levy et al., 2018), can be
phenotypes for microbiome GWAS, as well as how diverse used as references to delineate and quantify the abun-
approaches can be used to build a richer picture of how dance of up to hundreds of unique strains per bacterial
host plants shape the microbial communities of their species in each phyllosphere DNA sample (Albanese and
leaves. Donati, 2017; Smillie et al., 2018). A challenge for these
techniques, however, is separation of host and microbe
Phenotyping Microbial Taxa or Microbial Functions. In- DNA, as the typically larger size of host genomes limits the
stead of focusing on how host genotype influences a sin- coverage of small microbial genomes when sequencing
gle microbial taxon, microbiome GWAS focuses on how mixtures of host and microbe DNA. In addition, the com-
host genotype affects the overall community. The most pleteness of reference databases limits the ability to assign
intuitive way of characterizing microbial communities is to reads to genes or to assemble fragmented microbial gen-
identify microbial species or strains, and quantify their omes by comparative methods.
abundances. In practice, the most common strategy is to Irrespective of whether amplified gene fragments or
sequence PCR-amplified marker genes bearing taxonomi- untargeted shotgun sequencing is used to identify taxo-
cally relevant information, and then group these sequence nomic units, these methods enable a calculation of the rela-
reads into operational taxonomic units (OTUs) or amplicon tive abundances of members of the microbial communities.
sequence variants (ASVs). OTUs are generated by cluster- However, absolute abundances provide greater information
ing sequence reads based on similarity thresholds (West- content for understanding the interactions between hosts
cott and Schloss, 2015), while ASVs are generated by and microbes (Vandeputte et al., 2017). One way to over-
modeling and correcting sequencing errors so that even come this limitation is to estimate microbial cell density in
single-nucleotide differences can be used to distinguish samples using flow cytometry, for example (Props et al.,
biological variants (Callahan et al., 2016). 2017). Taxa counts can then be converted to absolute abun-
In order for the abundances of OTUs or ASVs to be dances and used to compare the population size of a partic-
meaningful community phenotypes, two conditions must ular microbe between plant genotypes.
be fulfilled. First, the genes targeted must provide suffi- A more inherent limitation arises from the possibility
cient resolution of taxa. Although regions of 16S ribosomal that lateral gene transfer among microbes may obliterate
RNA (rRNA) have been used most often, they afford poor the taxonomic signal associated with ecologically impor-
resolution at the level of genera or finer clades. This limita- tant microbial genes. Comparative studies have revealed
tion can be circumvented by using faster-evolving genes to that bacterial genomes contain both a shared core and a
gain deeper taxonomic resolution (Bartoli and Roux, 2017). divergent accessory component. Core genes can encode
For example, gyraseB sequences allow pathogenic and essential cell functions or clade-specific adaptive traits
commensal OTUs within a microbial genus to be distin- (Lassalle et al., 2011). Accessory genes are acquired
guished (Barret et al., 2015). The tradeoff in using less con- through lateral transmission by conjugation, transforma-
served genes is that multiple or degenerate primers may tion and transduction (Soucy et al., 2015), and can include
be necessary to sample all genera of interest. Second, genomic islands with virulence factors (Araki et al., 2006).

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 173

While the frequency of these transfers is difficult to quan- based on how they affected the response of A. thaliana to
tify, over evolutionary time they can contribute thousands phosphate starvation (as measured by phosphate accumu-
of genes to microbial genomes that are not conserved lation in shoots). Groups of eight or nine strains from each
among closely related taxa (Philippe and Douady, 2003). functional category were then combined into synthetic
The movement of these genes across microbial phyloge- communities that were used to infect host plants; plant
nies creates incongruities between the history of the trans- phosphate accumulation was then measured. Community
ferred genes and the rest of the genome. Because laterally membership explained over half of the variation in plant
transferred genes are phylogenetically incongruent, taxo- phosphate accumulation, suggesting that this phenotype is
nomic groupings based on the similarity of a conserved closely tied to microbial functions in the plant. If applied to
marker gene will not correspond to the presence or GWAS, plant phenotypes tied to community functions in
absence of these genes. As a result, microbiome GWAS planta could help identify host genes responsible for struc-
may be less effective than a disease case-control GWAS turing the leaf community to favor or limit specific func-
for detecting resistance genes that target effector proteins, tions instead of taxa.
which often show a highly variable presence/absence pat-
tern within species groups indicative of horizontal transfer. Targeted Versus Diffuse Plant–Microbe Interactions. When
An alternative approach to phenotyping a microbial investigating the host genes shaping the relative abun-
community is to focus on functions present in the micro- dance of microbial taxonomic units (OTUs or ASVs), the
bial community rather than taxa (Bonder et al., 2016; Levy associations detected will depend upon which aspects of
et al., 2018). This approach could identify associations dri- the microbial community are mapped. This choice will be
ven by microbial traits irrespective of whether they are driven by one’s perspective on whether plants primarily
found in the core or accessory genome, and regardless of influence their microbiomes through targeted interactions
whether they are phylogenetically congruent within lin- with key taxonomic groups or diffuse interactions with
eages. Whole-genome sequencing of leaf microbial com- entire communities. Two emerging paradigms for plant
munities will be a superior phenotyping method if relationships with leaf microbial communities can be used
metabolic, competitive and virulence traits are more to illustrate the difference between these perspectives: hub
important than phylogenetic identity for defining the effect taxa versus ecosystems on a leash (Foster et al., 2017).
of a microbe within a leaf community. Furthermore, if bac- Hubs are identified as highly connected nodes in net-
terial genomes are well-annotated, the association of works constructed to represent interactions between taxa
genetic variants with specific microbial genes rather than in the microbiome. Because the interactions between
taxonomic assignments would give more insight into microbes cannot be directly observed in nature, they are
the possible biological mechanisms underlying the inferred from relationships among taxa relative abun-
associations. dances as measured by sequencing reads. The edges of
If microbial functions are to be used as a phenotype, the network represent these relationships in the data, and
they can be inferred either from the microbial genomes or the nodes they connect each represent a clade of bacteria
from plant responses to the microbial community. In the grouped by a threshold of sequence similarity. The degree,
first approach, unassembled reads from whole-genome defined as the number of edges connecting to a node, and
sequencing are annotated (Keegan et al., 2016). The anno- measures of its centrality in a network can be used to clas-
tated genes are grouped into functional systems like sify a subset of microbial taxa as hubs; these hubs are
biosynthetic or signaling pathways based on databases like believed to be involved in some of the most important
KEGG (Kanehisa et al., 2017). Then, the frequencies of putative microbe–microbe interactions in the plant commu-
these systems can be compared across plant genotypes. nity. Inferring networks based on microbiome data from
This approach was used to find microbial functions that amplicon sequencing is an active field of research; the
were more frequent in the rhizosphere of a fungus-resis- main challenge being to control for compositional effects
tant bean cultivar than in the rhizosphere of a related, sus- (i.e. biases induced by the finite number of reads per sam-
ceptible cultivar (Mendez et al., 2018). In particular, ple). Current methods rely on computing sparse correlation
samples from the resistant cultivar had a higher abun- or partial correlation matrices among transformed OTU rel-
dance of microbial sequences associated with the produc- ative abundances (Friedman and Alm, 2012; Kurtz et al.,
tion of rhamnolipids and phenazines. In the second 2015).
approach, plant physiological responses that are linked to Agler et al. (2016) sequenced bacterial, fungal and
microbial community function are measured in the pres- oomycete marker genes in leaves of wild A. thaliana from
ence of defined synthetic communities. An example of a five sites, as well as the leaves of three A. thaliana acces-
plant phenotype linked to microbial community function sions in common garden plots. They constructed a net-
appears in Paredes et al. (2018). In this study, bacterial work of the community members based on correlations
strains were grouped into broad functional categories among relative abundances, and then used measures of

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
174 Kathleen Beilsmith et al.

closeness centrality, betweenness centrality, and degree to support the assembly of ‘good’ microbiomes that provide
select nodes as putative hubs. One isolate of a fungal hub these benefits to plants. Selecting appropriate phenotypes
taxon, Albugo, was used to infect plants and test the for these GWAS will depend on whether targeted taxa or
effects of hub presence on other phyllosphere biota as diffusely selected community properties are thought to be
measured by changes in community diversity and variabil- the means by which microbiomes provide benefits to
ity relative to communities on control plants. Communities plants. For example, one way in which specific microbes
containing the potential hub taxon were found to have can improve pathogen resistance is by producing elicitors
lower richness and to be less variable among replicates of defense responses in plants. Members of the bacterial
than control communities. The authors hypothesized that genus Bacillus elicit plant responses that reduce the sever-
Albugo, and hub taxa in general, stabilize plant microbial ity of subsequent pathogen infections (Kloepper et al.,
communities through strong negative interactions with 2004). There is even evidence that Bacillus subtilis can be
other microbes. In this framework, plant interactions tar- chemoattracted to root exudates, suggesting that the plant
geting a hub taxon can produce propagating effects that recruits bacteria with protective effects (Rudrappa et al.,
influence the entire microbial community. 2008). For cases like this, when targeted recruitment of
In order to find genes responsible for plant interactions specific taxa for plant benefits seems likely, GWAS pheno-
with microbes of ecological importance, investigators may types based on the abundances of just a few taxa would
choose to apply GWAS to the relative or absolute abun- best capture the plant variation relevant to the interaction.
dance of candidate hubs identified through network analy- However, there is a long record of ecological studies link-
sis of whole-community data. If plants shape their ing increased species diversity to decreased temporal vari-
microbiome through pairwise interactions with hub spe- ability of biomass and species composition (McCann, 2000;
cies that in turn shape the community through microbe– Allesina and Tang, 2012). Community evenness, or a lack
microbe interactions, phenotypes focused on these hub of dominance by a small number of taxa, has been linked
taxa may be better explained by host genotype and facili- to lower invisibility (Hillebrand et al., 2008). For cases
tate detection of the underlying genetic variants through where a diverse and even community is thought to prevent
GWAS. invasion or limit the growth of pathogenic microbes, phe-
An alternative scenario is that leaf microbial communi- notypes like Faith’s phylogenetic distance or the Shannon
ties are shaped by a large number of diffuse interactions. evenness index would be a better choice to capture rele-
Just as the cumulative action of weakly repressive miRNAs vant host variation in a GWAS.
can stabilize the transcriptome and canalize gene expres-
sion (Ma et al., 2018), small plant–microbe effects in aggre- Plant–Microbe Interactions in Natural Environments. Al-
gate may have a significant effect on microbial load and though it is well established that the environment affects
diversity in the phyllosphere. Plants may be selected to act genotype–phenotype mapping in plants, it is unclear how
diffusely to cultivate and stabilize microbial communities if this effect influences the associations found with micro-
these communities provide a benefit, either directly or indi- biome traits. Environmental effects may be mediated by
rectly by preventing colonization of pathogens. This idea modifiers of the causal loci. In these cases, genome-wide
has been described as the plant holding the microbial tests of interactions that establish the contributions of
ecosystem on a leash (Foster et al., 2017). While it is not both a SNP and its interaction with an environmental
yet clear which measure(s) of a microbiome makes it most variable could help to identify the modifier loci (Hamza
resistant to invasion, or provides robust direct benefits, it et al., 2011). Another possibility is that overlapping sets
is clear that comprehensive community phenotypes, rather of genes map to microbiome phenotypes in different
than individual taxa abundances, would be the most environments. While some of these genes will only show
appropriate phenotypes for characterizing the genetic associations in some environments, others will show
architecture of the ‘leash’ through GWAS. Measures of robust signal across many environments. A comparison
community diversity or evenness could capture some of of significant associations identified in independent
these effects, just as species richness captured the many GWAS that have been performed in multiple environ-
microbe–microbe interactions of the Albugo hub within the ments will allow the identification of environment-inde-
leaf community. Temporal sampling schemes or the con- pendent and environment-dependent host effects on leaf
struction of interaction networks can also allow inference communities.
of community stability in terms of species turnover or per- The scenario of both environmentally sensitive and envi-
sistence. ronmentally robust variation in host influence originally
The goal of many microbiome studies is to identify appeared to apply for the trait of flowering time in A. thali-
manipulations of the microbiota that may enhance plant ana (Li et al., 2010). But further study (Brachi et al., 2010)
traits like yield or pathogen resistance (Chaparro et al., revealed that very different environments could produce
2012). GWAS can be used to identify plant alleles that association results with very little overlap for this trait.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 175

Brachi et al. (2010) measured flowering time over 2 years Association mapping
in the field for tens of thousands of plants from natural
Linear mixed models (LMMs) are used to estimate the
lines, RILs and NILs in order to perform QTL mapping and
effects of SNPs on quantitative phenotypes, incorporating
GWAS. The phenotypes gathered from plants in the field
covariates and a matrix of relatedness to correct for popu-
mapped to different loci than those gathered from green-
lation structure (Kang et al., 2008, 2010; Widmer et al.,
house plants, with limited overlap in the genes where can-
2014). This approach has worked remarkably well when
didate SNPs fell. The GWAS on field samples detected
applied to traits with a simple genetic architecture that
many candidate genes that were not associated with flow-
were measured in a controlled greenhouse setting. For
ering time variation among accessions in the lab, including
example, GWAS of the hypersensitive response triggered
a number of circadian-clock-related genes. The unpre-
by high loads of P. syringae has pinpointed known R-genes
dictable and variable environment in the field may cause
in A. thaliana (Figure 2a; Method S1). In contrast, associa-
certain regulatory variation to be more important in com-
tion signals from LMMs applied to abundance measure-
plex traits than is revealed in the relatively stable condi-
ments of microbial taxa, which are likely to be influenced
tions of a greenhouse or incubator. If this is the case for
by many more loci, are often much weaker. For illustrative
microbiome-related host variation, then conducting pheno-
purposes, we show GWAS results for the titer of P. syrin-
typing in the field will be of particular importance to find-
gae in syringe-inoculated, greenhouse-grown plants (Fig-
ing candidate alleles with ecological relevance or utility
ure 2b), and the relative abundance of an OTU
field conditions.

(a)

–log10(P), observed
12
RPS2
10 HR (greenhouse)
–log10(P)

8
5

4 0
1 2 3 4 5 0 2 4
Chromosome –log10(P), expected

(b)
–log10(P), observed
12

10 Titer (greenhouse)
–log10(P)

5
4 0

1 2 3 4 5 0 2 4
Chromosome –log10(P), expected

(c)
12
–log10(P), observed

10 Relative abundance (field)


–log10(P)

5
4 0

1 2 3 4 5 0 2 4
Chromosome –log10(P), expected

Figure 2. Comparison of genome-wide association studies (GWAS) results for the Arabidopsis thaliana–Pseudomonas syringae interaction underscores the
challenges in GWAS of the phyllosphere.
Manhattan plots of the strength of phenotypic associations at over 200 000 single nucleotide polymorphisms (SNPs; left) and quantile-quantile plots comparing
observed P-values with those expected under a null model of no associations (right) are shown for three GWAS analyses in A. thaliana.
(a) Hypersensitive response to P. syringae DC3000::AvrRpt2 following syringe inoculation into leaves of greenhouse-grown plants (Atwell et al., 2010).
(b) Titer of P. syringae DC3000 4 days after syringe inoculation of greenhouse-grown plants (Atwell et al., 2010).
(c) Relative abundance of OTU24, which encompasses a diversity of P. syringae strains with near-identical 16S sequences, in field-grown plants (Horton et al.,
2014). To generate the figure, phenotype datasets from previously published studies were pared to 70 A. thaliana accessions shared among all studies, and
GWAS was conducted using a linear mixed model (LMM) for each trait individually (see Methods S1). The red dotted line in the Manhattan plots indicates the
genome-wide significance threshold following Bonferroni correction for the total number of SNPs (a = 0.05). A peak of significant association with hypersensitive
response falls at the gene RPS2. While titer in the greenhouse was highly heritable (pseudo-h2 = 0.88), no significant associations were recovered. The relative
abundance of OTU24 in the field was not heritable (pseudo-h2 = 0), and also lacked significant associations.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
176 Kathleen Beilsmith et al.

encompassing P. syringae in naturally colonized, field- associated with the plant hypersensitive response to P. syr-
grown plants (Figure 2c). Unlike GWAS for the hypersensi- ingae that varied depending on effector genes present in
tive response, these GWAS lack significant associations the infecting strain. Second, multi-trait GWAS offers
after applying a stringent genome-wide correction for mul- improved power when applied to traits that are genetically
tiple testing (i.e. Bonferroni correction for the family-wise correlated but somewhat inaccurately measured, analo-
error rate). A major challenge for GWAS of the microbiome gous to improvements in GWAS performance that result
is extracting robust effects of plant genotype from associa- as a consequence of phenotypes being measured across
tions that can be relatively weak at the level of individual multiple, replicate individuals of each accession (Korol
bacterial taxa. et al., 2001). This advantage may be particularly relevant to
Meta-analyses that aggregate statistical signals across GWAS of the phyllosphere, where heritability of the in
multiple analyses offer one route for strengthening signals planta abundance of individual taxa is low, and the esti-
in microbiome GWAS. Horton et al. (2014), for example, mated effects of each plant genotype on the abundance of
applied meta-analyses to GWAS on both community-level each taxon are therefore likely to be noisy when only a
traits and the relative abundances of individual taxa in the handful of individuals are phenotyped per accession.
community. This approach revealed that genes affecting The largest barrier to implementing multi-trait GWAS
cell wall integrity are strongly enriched in regions harbor- methods for high-dimensional phyllosphere phenotypes is
ing the top associations across multiple traits, even though computational. Multi-trait GWAS analyses – particularly
associations at these genes with the abundance of individ- those that fit a full mixed model to all traits, rather than
ual OTUs often failed to reach genome-wide significance in combining summary statistics from univariate GWAS – can
individual GWAS. A large number of meta-analysis meth- take tens to hundreds of hours to run for tens of traits.
ods has been developed specifically for GWAS (Zhu et al., Because runtime scales exponentially with the number of
2018). traits analyzed, multi-trait GWAS quickly becomes intract-
Multi-trait GWAS methods offer another avenue for able for larger numbers of traits (Korte et al., 2012). Fortu-
strengthening signals in microbiome GWAS. These meth- nately, high dimensionality is not a unique property of
ods, which jointly model a SNP’s association with many microbiome phenotypic datasets. Human brain imaging,
traits rather than with each trait individually, have not yet for example, generates thousands to millions of measure-
been applied to GWAS of the phyllosphere. The promise ments per individual, which consist of both independent
of multi-trait methods for microbiome GWAS arises from a and highly redundant phenotypes (Medland et al., 2014).
key feature of sequence-based phyllosphere profiling: it Motivated by these types of scenarios, approaches to syn-
produces measurements of hundreds to thousands of phe- thesize high-dimensional data into more meaningful phe-
notypes per individual (e.g. the abundance of different bac- notypes and to perform GWAS on extremely large
terial taxa, genes or predicted functions). A given SNP can numbers of traits individually are active areas of research
exert shared or opposing effects on different traits, or it (Medland et al., 2014; Ganjgahi et al., 2017) that hold pro-
can affect some traits but not others. Multi-trait GWAS mise to increase the feasibility of multi-trait GWAS of the
methods leverage relationships among traits as additional microbiome.
sources of information, and in doing so can increase map- Beyond increasing mapping power, multi-trait GWAS
ping power (Korte et al., 2012; Zhou and Stephens, 2014; methods can provide a more comprehensive view of the
Porter and O’Reilly, 2017). phenotypic consequences and evolutionary implications
Multi-trait GWAS methods typically test the hypothesis of a given polymorphism. The multi-trait mixed model of
that a given SNP has a non-zero effect size on one or more Korte et al. (2012), for example, estimates how the effect of
traits against the null hypothesis that the SNP has no effect a given SNP is shared or differs across pairs of traits and/
on any trait (Zhou and Stephens, 2014). The performance or across environments. Whether alternative alleles at a
of these models depends on the genetic architecture of the locus affect one trait or many, and whether these effects
traits under investigation, and the pattern of phenotypic are positively or negatively correlated among traits has
and genetic correlation among them. The nuances of important consequences for how that locus will evolve in
model performance across a range of plausible biological natural populations that are distributed across spatially
scenarios remain an active area of investigation (Porter and temporally heterogeneous environments (Frachon
and O’Reilly, 2017). However, a few general patterns have et al., 2017).
emerged. First, under some models, the largest gains in After SNP effects are determined, two main fine-map-
power occur for SNPs whose effects across traits deviate ping methods are used to prioritize candidate causal vari-
from the pattern of phenotypic correlations among those ants from a GWAS: (i) selection based on P-values
traits. This is borne out in an empirical dataset for A. corrected for multiple testing, and LD with the lead SNP;
thaliana–microbe interactions, where Wang et al. (2017) and (ii) Bayesian methods that assign posterior probabili-
used a bivariate GWAS model to discover a novel locus ties of causality to each SNP (Spain and Barrett, 2015).

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 177

Conditional analyses can determine if multiple weakly variants allow for precision gene editing in plants, even in
linked or unlinked causal variants contribute to the associa- transformation-recalcitrant species, through new delivery
tion of the same locus with a microbiome trait when com- methods reviewed elsewhere (Yin et al., 2017). A combina-
pared to a situation in which only one signal exists at this tion of gene-disruption platforms consisting of Tnt1 retro-
locus, as has been done for GWAS on obesity in humans transposons, hairpin RNA-interferences constructs and
(Wu et al., 2014) and nested analysis for broad-spectrum CRISPR/Cas9 nucleases, successfully validated the GWA
black mold resistance in A. thaliana (Huard-Chauveau peak that contributed to trait variation in the symbiosis
et al., 2013). We can then derive biological meaning of can- between legumes and rhizobia in Medicago truncatula
didate causal variants through post-GWAS bioinformatics (Curtin et al., 2017). While such techniques could also be
and empirical approaches (Gallagher and Chen-Plotkin, applied to test multiple candidate SNPs in combination,
2018). such validation approaches cannot be applied to the most
complex of polygenic traits associated with microbiome
Validation
composition.
Post-GWAS bioinformatics approaches for plants cur-
OUTLOOK
rently entail surveying gene ontology databases, tran-
scriptomics and epigenomic prioritization. Recent There are precious few studies mapping the effects of host
advances in genomic annotation have allowed plant biol- genetic variants on the plant microbiome. From work on
ogists to make better informed decisions on which genes the human microbiome, however, we know that there are
to experimentally investigate. Arabidopsis thaliana still few consistent association peaks across studies. At this
remains the model plant with most genomic resources point, the extent to which this lack of consistency between
for this, including more than a thousand completely studies is biological versus methodological is unclear.
sequenced genotypes (1001 Genomes Consortium, 2016), Inconsistencies could arise from biological differences in
transcriptomes (Klepikova et al., 2016) and a map of spe- the diversity of host genotypes included in the mapping
cies-wide methylation patterns (Schmitz et al., 2013). Sim- panel or the interactions of uncontrolled environmental
ilar consortium efforts in crop species have led to factors with host and microbe genotypes. They could also
expansive genomic resources to link phenotype to geno- arise from methodological differences in how microbiome
type in tomato (Aflitos et al., 2014), maize (Chia et al., sequence data are reduced to phenotypes or in power due
2012) and rice (Huang et al., 2012). Transcriptome analy- to the sample sizes and statistical approaches used. If
ses have also been used to confirm GWAS results on har- broad patterns in host–microbe interactions are to be dis-
vest index in Brassica napus (Lu et al., 2016), resistance covered, it is important that care is taken in designing
to the oomycete Phytophthora infestans in potato (Muktar future plant GWAS studies to ensure that the detected
et al., 2015), and a large set of biotic and abiotic stresses associations can be meaningfully compared across studies.
in A. thaliana (Thoen et al., 2016). We highlight below a few areas of basic research that we
After prioritizing candidate causal variants, there are still expect will help inform successful GWAS design for leaf
often many potential causal variants responsible for the microbiome traits.
association. Manipulation of host genetics is the most Elucidating the genetic bases of leaf microbial commu-
robust validation of any candidate causal variant from a nity variation will require that we first establish how much
GWAS. One such way in which candidate genes from an of the variation in taxon abundances and microbial com-
association study on plant traits have been validated is by munity properties are heritable. Without this knowledge, it
testing single knockout lines, such as T-DNA insertion lines will be difficult to focus phenotyping on community traits
that are readily available in A. thaliana (Thoen et al., 2016; or members that are likely to be influenced by host genetic
Vetter et al., 2016), although caution is warranted with the variation. We must also understand the environmental
use of these T-DNA insertion lines due to potential geno- scales at which microbiomes show consistent patterns in
mic rearrangements misleading result interpretations (Tax composition to select appropriate mapping populations.
and Vernon, 2001). Phenotyping traits of interest in single Efforts characterizing environmental variability in the phyl-
knockout lines can lead to false negatives, when allelic dif- losphere microbiome will prove invaluable in improving
ferences are responsible for the association, or when the the power and success of mapping efforts.
wild-type does not have the active form of the protein. That said, the most important choice that an investigator
Allele swapping can ameliorate this problem (Huard-Chau- must make is the trait to be mapped. We still have only
veau et al., 2013; Karasov et al., 2014). For gene-editing, limited information on which aspects of microbial commu-
CRISPR-Cas9 has become widely used in transformable nities influence their function. Are community measures
plants, mainly by introducing mutations through non- such as diversity or richness meaningful predictors of the
homologous end-joining of double-stranded breaks cre- behavior of communities, or should one focus on particu-
ated by CRISPR-Cas9 activity. Recently developed Cas9 lar genes, functional systems, strains or genera?

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
178 Kathleen Beilsmith et al.

Establishing whether the plant fitness benefits conferred phyllosphere community composition in field environ-
by microbes arise from community-level properties or ments. We eagerly await the future studies that extend the
specific taxa will be particularly important for determining approaches of Horton et al. (2014) and Wallace et al. (2018)
which phenotypes can be used in GWAS to identify loci for to larger collections of plant genotypes and additional spe-
use in breeding or engineering for properties like yield or cies, more powerful tools for quantifying the taxonomic
pathogen resistance. and functional composition of microbial communities, and
If it is found that strain-specific interactions provide most powerful multi-trait GWAS methods in hopes of overcom-
of the selective pressure on plants and their interactions ing this challenge.
with microbial partners, techniques that move beyond tradi-
tional GWAS to target these specific interactions will likely CONFLICT OF INTEREST
prove most fruitful. For example, Wang et al. (2018) used a
The authors declare no conflict of interest.
two-way mixed effects model to map simultaneously both
host and pathogen genomic variation shaping their interac- SUPPORTING INFORMATION
tion. They found non-overlapping regions of plant genomes
Additional Supporting Information may be found in the online ver-
conferring resistance to Xanthomonas arboricola or Xan- sion of this article.
thomonas campestris. In addition, the regions conferring Method S1. Methods used to generate Figure 2 are reported in
resistance to the latter had different levels of specificity for Methods S1.
strains within the pathogen species. The same approaches
could be applied to provide an initial picture of how genes REFERENCES
shape the interaction between host plants and their benefi- Aflitos, S., Schijlen, E., De Jong, H. et al. (2014) Exploring genetic variation
cial or commensal microbial inhabitants. in the tomato (Solanum section lycopersicon) clade by whole-genome
sequencing. Plant J. 80, 136–148.
ACKNOWLEDGEMENTS Agler, M.T., Ruhe, J., Kroll, S., Morhenn, C., Kim, S.-T., Weigel, D. and
Kemen, E.M. (2016) Microbial hub taxa link host and abiotic factors to
The authors would like to thank all members of the E&E Micro- plant microbiome variation (MK Waldor, Ed). PLoS Biol. 14, e1002352.
biome Journal club at the University of Chicago for lively discus- Albanese, D. and Donati, C. (2017) Strain profiling and epidemiology of bac-
sions, and Suzi Colpa for her artistic touch on the illustrations. terial species from metagenomic sequencing. Nat. Commun. 8, 2260.
Funding was provided by the Dropkin Foundation, and the Univer- Allesina, S. and Tang, S. (2012) Stability criteria for complex ecosystems.
sity of Chicago Biological Sciences Division. Nature 483, 205–208.
Araki, H., Tian, D., Goss, E.M., Jakob, K., Halldorsdottir, S.S., Kreitman, M.
NOTE FROM THE AUTHORS and Bergelson, J. (2006) Presence/absence polymorphism for alternative
pathogenicity islands in Pseudomonas viridiflava, a pathogen of Ara-
After this review was written, a GWAS of the maize bacter- bidopsis. Proc. Natl Acad. Sci. USA 103, 5887–5892.
Atwell, S., Huang, Y.S., Vilhjalmsson, B.J. et al. (2010) Genome-wide asso-
ial phyllosphere was published. Wallace et al. (2018)
ciation study of 107 phenotypes in Arabidopsis thaliana inbred lines.
sequenced 16S transcripts from 300 maize genotypes Nature 465, 627–631.
grown in a common field environment, and they used Bai, Y., Mu€ ller, D.B., Srinivas, G. et al. (2015) Functional overlap of the Ara-
bidopsis leaf and root microbiota. Nature 528, 364.
these data to estimate dozens of community diversity
Bar-On, Y.M., Phillips, R. and Milo, R. (2018) The biomass distribution on
indices, the relative abundance of hundreds of OTUs, and earth. Proc. Natl Acad. Sci. USA 115, 6506–6511.
the representation of thousands of predicted metabolic Barret, M., Briand, M., Bonneau, S., Pre veaux, A., Valie re, S., Bouchez, O.,
Hunault, G., Simoneau, P. and Jacques, M.-A. (2015) Emergence shapes
functions in community metagenomes for each leaf sam-
the structure of the seed microbiota (HL Drake, Ed.). Appl. Environ.
ple. A few percent of the metrics in each of these three Microbiol. 81, 1257–1266.
categories were heritable, many of which appeared to be Bartoli, C. and Roux, F. (2017) Genome-wide association studies in plant
pathosystems: toward an ecological genomics approach. Front. Plant
driven largely by variation in the abundance of Methylo-
Sci. 8, 763.
bacteria. Intriguingly, functions related to the metabolism Bayer, M.M., Rapazote-Flores, P., Ganal, M. et al. (2017) Development and
of short-chain carbon molecules were overrepresented, evaluation of a barley 50k iSelect SNP array. Front. Plant Sci. 8, 1792.
Bergelson, J. and Roux, F. (2010) Towards identifying genes underlying eco-
offering the hypothesis that these bacterial metabolic traits
logically relevant traits in Arabidopsis thaliana. Nat. Rev. Genet. 11, 867–
may be an important part of the causal link between plant 879.
genotype and phyllosphere community composition. How- Blekhman, R., Goodrich, J.K., Huang, K. et al. (2015) Host genetic variation
impacts microbiome composition across human body sites. Genome
ever, GWAS identified few significant associations with
Biol. 16, 191.
these traits, and consequently did not reveal which plant Bodenhausen, N., Bortfeld-Miller, M., Ackermann, M. and Vorholt, J.A.
genes or traits most strongly influence phyllosphere com- (2014) A synthetic community approach reveals plant genotypes affect-
ing the phyllosphere microbiota. PLoS Genet. 10, e1004283.
position in maize. This study illustrates both the wealth of
Bonder, M.J., Kurilshikov, A., Tigchelaar, E.F. et al. (2016) The effect of host
insight that can be gleaned from GWAS of the phyllo- genetics on the gut microbiome. Nat. Genet. 48, 1407–1412.
sphere, particularly when moving beyond bacterial taxon- Brachi, B., Faure, N., Horton, M., Flahauw, E., Vazquez, A., Nordborg, M.,
Bergelson, J., Cuguen, J. and Roux, F. (2010) Linkage and association
omy by focusing on metagenomic features, but also the
mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet.
difficulty in uncovering the links between plant genes and 6, e1000940.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 179

Brachi, B., Morris, G.P. and Borevitz, J.O. (2011) Genome-wide association Hamza, T.H., Chen, H., Hill-Burns, E.M. et al. (2011) Genome-wide gene-
studies in plants: the missing heritability is in the field. Genome Biol. 12, environment study identifies glutamate receptor gene GRIN2A as a
232. Parkinson’s disease modifier gene via interaction with coffee. PLoS
Brachi, B., Villoutreix, R., Faure, N., Hautekeete, N., Piquot, Y., Pauwels, M., Genet. 7, e1002237.
Roby, D., Cuguen, J., Bergelson, J. and Roux, F. (2013) Investigation of Haney, C.H., Samuel, B.S., Bush, J. and Ausubel, F.M. (2015) Associations
the geographical scale of adaptive phenological variation and its under- with rhizosphere bacteria can confer an adaptive advantage to plants.
lying genetics in Arabidopsis thaliana. Mol. Ecol. 22, 4222–4240. Nat. Plants 1, 15 051.
Burch, A.Y., Finkel, O.M., Cho, J.K., Belkin, S. and Lindow, S.E. (2013) Hillebrand, H., Bennett, D.M. and Cadotte, M.W. (2008) Consequences of
Diverse microhabitats experienced by Halomonas variabilis on salt- dominance: a review of evenness effects on local and regional ecosys-
secreting leaves. Appl. Environ. Microbiol. 79, 845–852. tem processes. Ecology 89, 1510–1520.
Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A. and Horton, M.W., Bodenhausen, N., Beilsmith, K. et al. (2014) Genome-wide
Holmes, S.P. (2016) DADA2: high-resolution sample inference from Illu- association study of Arabidopsis thaliana leaf microbial community. Nat.
mina amplicon data. Nat. Methods 13, 581–583. Commun. 5, 5320.
Chaparro, J.M., Sheflin, A.M., Manter, D.K. and Vivanco, J.M. (2012) Manip- Huang, X., Kurata, N., Wei, X. et al. (2012) A map of rice genome variation
ulating the soil microbiome to increase soil health and plant fertility. reveals the origin of cultivated rice. Nature 490, 497.
Biol. Fertil. Soils 48, 489–499. Huard-Chauveau, C., Perchepied, L., Debieu, M., Rivas, S., Kroj, T., Kars, I.,
Chia, J.-M., Song, C., Bradbury, P.J. et al. (2012) Maize HapMap2 identifies Bergelson, J., Roux, F. and Roby, D. (2013) An atypical kinase under bal-
extant variation from a genome in flux. Nat. Genet. 44, 803. ancing selection confers broad-spectrum disease resistance in Arabidop-
Corwin, J.A. and Kliebenstein, D.J. (2017) Quantitative resistance: more sis. PLoS Genet. 9, e1003766.
than just perception of a pathogen. Plant Cell 29, 655–665. Igartua, C., Davenport, E.R., Gilad, Y., Nicolae, D.L., Pinto, J. and Ober, C.
Curtin, S.J., Tiffin, P., Guhlin, J. et al. (2017) Validating genome-wide asso- (2017) Host genetic variation in mucosal immunity pathways influences
ciation candidates controlling quantitative variation in nodulation. Plant the upper airway microbiome. Microbiome 5, 16.
Physiol. 173, 921–931. Innerebner, G., Knief, C. and Vorholt, J.A. (2011) Protection of Arabidopsis
Davenport, E.R., Cusanovich, D.A., Michelini, K., Barreiro, L.B., Ober, C. and thaliana against leaf-pathogenic Pseudomonas syringae by Sphin-
Gilad, Y. (2015) Genome-wide association studies of the human gut gomonas strains in a controlled model system. Appl. Environ. Microbiol.
microbiota. PLoS ONE 10, e0140301. 77, 3202–3210.
David, L.A., Maurice, C.F., Carmody, R.N. et al. (2014) Diet rapidly and Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. and Morishima, K. (2017)
reproducibly alters the human gut microbiome. Nature, 505, 559–563. KEGG: new perspectives on genomes, pathways, diseases and drugs.
Demmitt, B.A., Corley, R.P., Huibregtse, B.M., Keller, M.C., Hewitt, J.K., Nucleic Acids Res. 45, D353–D361.
McQueen, M.B., Knight, R., McDermott, I. and Krauter, K.S. (2017) Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J.
Genetic influences on the human oral microbiome. BMC Genom. 18, 659. and Eskin, E. (2008) Efficient control of population structure in model
Edwards, J., Johnson, C., Santos-Medellın, C., Lurie, E., Podishetty, N.K., organism association mapping. Genetics 178, 1709–1723.
Bhatnagar, S., Eisen, J.A. and Sundaresan, V. (2015) Structure, variation, Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.-Y., Freimer, N.B.,
and assembly of the root-associated microbiomes of rice. Proc. Natl Sabatti, C. and Eskin, E. (2010) Variance component model to account
Acad. Sci. USA 112, E911–E920. for sample structure in genome-wide association studies. Nat. Genet. 42,
Eltaher, S., Sallam, A., Belamkar, V., Emara, H.A., Nower, A.A., Salem, 348–354.
K.F.M., Poland, J. and Baenziger, P.S. (2018) Genetic diversity and popu- Karasov, T.L., Kniskern, J.M., Gao, L. et al. (2014) The long-term mainte-
lation structure of F3:6 Nebraska winter wheat genotypes using genotyp- nance of a resistance polymorphism through diffuse interactions. Nature
ing-by-sequencing. Front. Genet. 9, 76. 512, 436–440.
Fahimipour, A.K., Kardish, M.R., Lang, J.M., Green, J.L., Eisen, J.A. and Sta- Keegan, K.P., Glass, E.M. and Meyer, F. (2016) MG-RAST, a metagenomics
chowicz, J.J. (2017) Global-scale structure of the Eelgrass microbiome. service for analysis of microbial community structure and function.
Appl. Environ. Microbiol. 83, e03391–16. Methods Mol. Biol. 1399, 207–233.
Falconer, D.S. (1981) Introduction to Quantitative Genetics. New York: Long- Klasen, J.R., Barbez, E., Meier, L., Meinshausen, N., Bu€ hlmann, P., Koorn-
mans, Greens. neef, M., Busch, W. and Schneeberger, K. (2016) A multi-marker associa-
Foster, K.R., Schluter, J., Coyte, K.Z. and Rakoff-Nahoum, S. (2017) The evolu- tion method for genome-wide association studies without the need for
tion of the host microbiome as an ecosystem on a leash. Nature 548, 43–51. population structure correction. Nat. Commun. 7, 13 299.
Fouhy, F., Clooney, A.G., Stanton, C., Claesson, M.J. and Cotter, P.D. (2016) Klepikova, A.V., Kasianov, A.S., Gerasimov, E.S., Logacheva, M.D. and
16S rRNA gene sequencing of mock microbial populations – impact of Penin, A.D. (2016) A high resolution map of the Arabidopsis thaliana
DNA extraction method, primer choice and sequencing platform. BMC developmental transcriptome based on RNA-seq profiling. Plant J. 88,
Microbiol. 16, 123. 1058–1070.
Frachon, L., Libourel, C., Villoutreix, R. et al. (2017) Intermediate degrees of Kloepper, J.W., Ryu, C.-M. and Zhang, S. (2004) Induced systemic resistance
synergistic pleiotropy drive adaptive evolution in ecological time. Nat. and promotion of plant growth by Bacillus spp. Phytopathology 94,
Ecol. Evol. 1, 1551–1561. 1259–1266.
Friedman, J. and Alm, E.J. (2012) Inferring correlation networks from geno- Knight, R., Vrbanac, A., Taylor, B.C. et al. (2018) Best practices for analysing
mic survey data. PLoS Comput. Biol. 8, e1002687. microbiomes. Nat. Rev. Microbiol. 16, 410–422.
Gallagher, M.D. and Chen-Plotkin, A.S. (2018) The post-GWAS era: from Knights, D., Silverberg, M.S., Weersma, R.K. et al. (2014) Complex host
association to function. Am. J. Hum. Genet. 102, 717–730. genetics influence the microbiome in inflammatory bowel disease. Geno-
Ganjgahi, H., Winkler, A.M., Glahn, D.C., Blangero, J., Donohue, B., Kochu- mic Med. 6, 107.
nov, P. and Nichols, T.E. (2017) Fast and powerful genome wide associa- Kolde, R., Franzosa, E.A., Rahnavard, G., Hall, A.B., Vlamakis, H., Stevens,
tion analysis of dense genetic data with high dimensional imaging C., Daly, M.J., Xavier, R.J. and Huttenhower, C. (2018) Host genetic varia-
phenotypes. Nat. Commun. 9, 3254. tion and its microbiome interactions within the Human Microbiome Pro-
1001 Genomes Consortium T (2016) 1,135 Genomes reveal the global pat- ject. Genome Med. 10, 6.
tern of polymorphism in Arabidopsis thaliana. Cell 166, 481–491. Korol, A.B., Ronin, Y.I., Itskovich, A.M., Peng, J. and Nevo, E. (2001)
Goodrich, J.K., Waters, J.L., Poole, A.C. et al. (2014) Human genetics shape Enhanced efficiency of quantitative trait loci mapping analysis based on
the gut microbiome. Cell 159, 789–799. multivariate complexes of quantitative traits. Genetics 157, 1789–1803.
Goodrich, J.K., Davenport, E.R., Beaumont, M., Jackson, M.A., Knight, R., Korte, A., Vilhjalmsson, B.J., Segura, V., Platt, A., Long, Q. and Nordborg,
Ober, C., Spector, T.D., Bell, J.T., Clark, A.G. and Ley, R.E. (2016) Genetic M. (2012) A mixed-model approach for genome-wide association studies
determinants of the gut microbiome in UK twins. Cell Host Microbe 19, of correlated traits in structured populations. Nat. Genet. 44, 1066–1071.
731–743. Kurtz, Z.D., Mu € ller, C.L., Miraldi, E.R., Littman, D.R., Blaser, M.J. and Bon-
Hackinger, S. and Zeggini, E. (2017) Statistical methods to detect pleiotropy neau, R.A. (2015) Sparse and compositionally robust inference of micro-
in human complex traits. Open Biol. 7, 170 125. bial ecological networks. PLoS Comput. Biol. 11, e1004226.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
180 Kathleen Beilsmith et al.

Lambais, M.R., Crowley, D.E., Cury, J.C., B€ull, R.C. and Rodrigues, R.R. Philippe, H. and Douady, C.J. (2003) Horizontal gene transfer and phyloge-
(2006) Bacterial diversity in tree canopies of the Atlantic forest. Science netics. Curr. Opin. Microbiol. 6, 498–505.
312, 1917. Porter, H.F. and O’Reilly, P.F. (2017) Multivariate simulation framework
Lassalle, F., Campillo, T., Vial, L. et al. (2011) Genomic species are ecologi- reveals performance of multi-trait GWAS methods. Sci. Rep. 7, 38 837.
cal species as revealed by comparative genomics in Agrobacterium Pritchard, J.K., Stephens, M., Rosenberg, N.A. and Donnelly, P. (2000) Asso-
tumefaciens. Genome Biol. Evol. 3, 762–781. ciation mapping in structured populations. Am. J. Hum. Genet. 67, 170–
Lebeis, S.L., Paredes, S.H., Lundberg, D.S. et al. (2015) Salicylic acid modu- 181.
lates colonization of the root microbiome by specific bacterial taxa. Props, R., Kerckhof, F.-M., Rubbens, P., De Vrieze, J., Hernandez Sanabria,
Science, 349, 860–864. E., Waegeman, W., Monsieurs, P., Hammes, F. and Boon, N. (2017)
Le Corre, V., Roux, F. and Reboud, X. (2002) DNA polymorphism at the FRIGIDA Absolute quantification of microbial taxon abundances. ISME J. 11,
gene in Arabidopsis thaliana: extensive nonsynonymous variation is consis- 584–587.
tent with local selection for flowering time. Mol. Biol. Evol. 19, 1261–1271. Rascovan, N., Carbonetto, B., Perrig, D., Dıaz, M., Canciani, W., Abalo, M.,
Lee, J.J. and Chow, C.C. (2014) Conditions for the validity of SNP-based her- Alloati, J., Gonzalez-Anta, G. and Vazquez, M.P. (2016) Integrated analy-
itability estimation. Hum. Genet. 133, 1011–1022. sis of root microbiomes of soybean and wheat from agricultural fields.
Levy, A., Conway, J.M., Dangl, J.L. and Woyke, T. (2018) Elucidating bacterial Sci. Rep. 6, 28 084.
gene functions in the plant microbiome. Cell Host Microbe 24, 475–485. Rastogi, G., Coaker, G.L. and Leveau, J.H.J. (2013) New insights into the
Li, Y., Huang, Y., Bergelson, J., Nordborg, M. and Borevitz, J.O. (2010) Asso- structure and function of phyllosphere microbiota through high-through-
ciation mapping of local climate-sensitive quantitative trait loci in Ara- put molecular approaches. FEMS Microbiol. Lett. 348, 1–10.
bidopsis thaliana. Proc. Natl Acad. Sci. USA 107, 21 199–21 204. Rothschild, D., Weissbrod, O., Barkan, E. et al. (2018) Environment domi-
Lindow, S.E. and Brandl, M.T. (2003) Microbiology of the phyllosphere. nates over host genetics in shaping human gut microbiota. Nature 555,
Appl. Environ. Microbiol. 69, 1875–1883. 210–215.
Long, Q., Rabanal, F.A., Meng, D. et al. (2013) Massive genomic variation Rudrappa, T., Czymmek, K.J., Pare, P.W. and Bais, H.P. (2008) Root-
and strong selection in Arabidopsis thaliana lines from Sweden. Nat. secreted malic acid recruits beneficial soil bacteria. Plant Physiol. 148,
Genet. 45, 884–890. 1547–1556.
Lu, K., Xiao, Z., Jian, H. et al. (2016) A combination of genome-wide associ- Ruiz-Perez, C.A., Restrepo, S. and Zambrano, M.M. (2016) Microbial and
ation and transcriptome analysis reveals candidate genes controlling functional diversity within the phyllosphere of Espeletia species in an
harvest index-related traits in Brassica napus. Sci. Rep. 6, 36 452. Andean high-mountain ecosystem. Appl. Environ. Microbiol. 82, 1807–
Lynch, M. and Walsh, B. (1998) Genetics and analysis of quantitative traits. 1817.
Sunderland, MA: Sinauer Associates. Sapp, M., Ploch, S., Fiore-Donno, A.M., Bonkowski, M. and Ros, L.E. (2018)
Ma, F., Lin, P., Chen, Q., Lu, X., Zhang, Y.E. and Wu, C.-I. (2018) Direct mea- Protists are an integral part of the Arabidopsis thaliana microbiome. Env-
surement of pervasive weak repression by microRNAs and their role at iron. Microbiol. 20, 30–43.
the network level. BMC Genom. 19, 362. Sasaki, E., Zhang, P., Atwell, S., Meng, D. and Nordborg, M. (2015) “Miss-
Mansfield, J., Genin, S., Magori, S. et al. (2012) Top 10 plant pathogenic ing” G x E variation controls flowering time in Arabidopsis thaliana.
bacteria in molecular plant pathology. Mol. Plant Pathol. 13, 614–629. PLoS Genet. 11, e1005597.
McCann, K.S. (2000) The diversity-stability debate. Nature 405, 228–233. Savatin, D.V., Gramegna, G., Modesti, V. and Cervone, F. (2014) Wounding
McIntyre, A.B.R., Ounit, R., Afshinnekoo, E. et al. (2017) Comprehensive in the plant tissue: the defense of a dangerous passage. Front. Plant Sci.
benchmarking and ensemble approaches for metagenomic classifiers. 5, 470.
Genome Biol. 18, 182. Schmitz, R.J., Schultz, M.D., Urich, M.A. et al. (2013) Patterns of population
Meaden, S., Metcalf, C.J.E. and Koskella, B. (2016) The effects of host age epigenomic diversity. Nature 495, 193–198.
and spatial location on bacterial community composition in the English Shirasawa, K., Tanaka, M., Takahata, Y. et al. (2017) A high-density SNP
Oak tree (Quercus robur). Environ. Microbiol. Rep. 8, 649–658. genetic map consisting of a complete set of homologous groups in auto-
Medland, S.E., Jahanshad, N., Neale, B.M. and Thompson, P.M. (2014) hexaploid sweet potato (Ipomoea batatas). Sci. Rep. 7, 44 207.
Whole-genome analyses of whole-brain data: working within an Smillie, C.S., Sauk, J., Gevers, D. et al. (2018) Strain tracking reveals the
expanded search space. Nat. Neurosci. 17, 791–800. determinants of bacterial engraftment in the human gut following fecal
Mendez, L.W., Raaijmakers, J.M., de Hollander, M., Mendes, R. and Tsai, microbiota transplantation. Cell Host Microbe, 23, 229–240.
S.M. (2018) Influence of resistance breeding in common bean on Smith, K.P. and Goodman, R.M. (1999) Host variation for interactions with
rhizosphere micorbiome composition and function. ISME J. 12, 212– beneficial plant-associated microbes. Annu. Rev. Phytopathol. 37, 473–
224. 491.
Mitchell-Olds, T. and Schmitt, J. (2006) Genetic mechanisms and evolution- Soucy, S.M., Huang, J. and Gogarten, J.P. (2015) Horizontal gene transfer:
ary significance of natural variation in Arabidopsis. Nature 441, 947–952. building the web of life. Nat. Rev. Genet. 16, 472–482.
Morella, N.M., Gomez, A.L., Wang, G., Leung, M.S. and Koskella, B. (2018) Spain, S.L. and Barrett, J.C. (2015) Strategies for fine-mapping complex
The impact of bacteriophages on phyllosphere bacterial abundance and traits. Hum. Mol. Genet. 24, R111–R119.
composition. Mol. Ecol. 27, 2025–2038. Stahl, E.A., Dwyer, G., Mauricio, R., Kreitman, M. and Bergelson, J. (1999)
Morgan, J.L., Darling, A.E. and Eisen, J.A. (2010) Metagenomic sequencing Dynamics of disease resistance polymorphism at the Rpm1 locus of Ara-
of an in vitro-simulated microbial community. PLoS ONE 5, e10209. bidopsis. Nature 400, 667–671.
Muktar, M., Lu€ beck, J., Strahwald, J. and Gebhardt, C. (2015) Selection and Stanton-Geddes, J., Paape, T., Epstein, B. et al. (2013) Candidate genes and
validation of potato candidate genes for maturity corrected resistance to genetic architecture of symbiotic and agronomic traits revealed by
Phytophthora infestans based on differential expression combined with whole-genome, sequence-based association genetics in Medicago trun-
SNP association and linkage mapping. Front. Genet. 6, 294. catula. PLoS ONE 8, e65688.
Nordborg, M. and Weigel, D. (2008) Next-generation genetics in plants. Nat- Tax, F.E. and Vernon, D.M. (2001) T-DNA-Associated duplication/transloca-
ure 456, 720. tions in Arabidopsis. Implications for mutant analysis and functional
Org, E., Parks, B.W., Joo, J.W.J. et al. (2015) Genetic and environmental genomics. Plant Physiol. 126, 1527–1538.
control of host-gut microbiota interactions. Genome Res. 25, 1558–1569. Tenaillon, M.I., Sawkins, M.C., Long, A.D., Gaut, R.L., Doebley, J.F. and
Paredes, S.H., Gao, T., Law, T.F. et al. (2018) Design of synthetic bacterial Gaut, B.S. (2001) Patterns of DNA sequence polymorphism along chro-
communities for predictable plant phenotypes. PLoS Biol. 16, e2003962. mosome 1 of maize (Zea mays). Proc. Natl Acad. Sci. USA 98, 9161–9166.
Peiffer, J.A., Spor, A., Koren, O., Jin, Z., Tringe, S.G., Dangl, J.L., Buckler, Thoen, M.P.M., Davila Olivas, N.H., Kloth, K.J. et al. (2016) Genetic architec-
E.S. and Ley, R.E. (2013) Diversity and heritability of the maize rhizo- ture of plant stress resistance: multi-trait genome-wide association map-
sphere microbiome under field conditions. Proc. Natl Acad. Sci. USA ping. New Phytol. 213, 1346–1362.
110, 6548–6553. Turpin, W., Espin-Garcia, O., Xu, W. et al. (2016) Association of host gen-
Peredo, E.L. and Simmons, S.L. (2018) Leaf-FISH: microscale imaging of ome with intestinal microbial composition in a large healthy cohort. Nat.
bacterial taxa on phyllosphere. Front. Microbiol. 8, 2669. Genet. 48, 1413–1417.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181
GWAS of the phyllosphere microbiome 181

Uroz, S., Buee, M., Deveau, A., Mieszkin, S. and Martin, F. (2016) Ecology of Widmer, C., Lippert, C., Weissbrod, O., Fusi, N., Kadie, C., Davidson, R.,
the forest microbiome: highlights of temperate and boreal ecosystems. Listgarten, J. and Heckerman, D. (2014) Further improvements to lin-
Soil Biol. Biochem. 103, 471–488. ear mixed models for genome-wide association studies. Sci. Rep. 4,
Vandeputte, D., Kathagen, G., D’hoe, K. et al. (2017) Quantitative micro- 6874.
biome profiling links gut community variation to microbial load. Nature Wright, A.F., Carothers, A.D. and Pirastu, M. (1999) Population choice in
551, 507–511. mapping genes for complex diseases. Nat. Genet. 23, 397–404.
Vetter, M., Karasov, T.L. and Bergelson, J. (2016) Differentiation between Wu, Y., Gao, H., Li, H. et al. (2014) A meta-analysis of genome-wide associa-
MAMP triggered defenses in Arabidopsis thaliana. PLoS Genet. 12, tion studies for adiponectin levels in east asians identifies a novel locus
e1006068. near WDR11-FGFR2. Hum. Mol. Genet. 23, 1108–1119.
Vollmers, J., Wiegand, S. and Kaster, A.K. (2017) Comparing and evaluating Wu, S., Tohge, T., Cuadros-Inostroza, A.  et al. (2018) Mapping the Ara-
metagenome assembly tools from a microbiologist’s perspective - not bidopsis metabolic landscape by untargeted metabolomics at different
only size matters!. PLoS ONE 12, e0169662. environmental conditions. Mol. Plant 11, 118–134.
Wagner, M.R., Lundberg, D.S., del Rio, T.G., Tringe, S.G., Dangl, J.L. and Xiao, Y., Liu, H., Wu, L., Warburton, M. and Yan, J. (2017) Genome-wide
Mitchell-Olds, T. (2016) Host genotype and age shape the leaf and root association studies in maize: praise and stargaze. Mol. Plant 10, 359–
microbiomes of a wild perennial plant. Nat. Commun. 7, 12 151. 374.
Wallace, J., Kremling, K.A., Kovar, L.L. and Buckler, E.S. (2018) Quantitative Yang, J., Manolio, T.A., Pasquale, L.R. et al. (2011) Genome partitioning of
genetics of the maize leaf microbiome. Phytobiomes, in press. genetic variation for complex traits using common SNPs. Nat. Genet. 43,
Wang, J., Thingholm, L.B., Skieceviciene,_ J., Rausch, P., Kummen, M., Hov, 519–525.
J.R., Degenhardt, F., Heinsen, F.-A., R€ uhlemann, M.C. and Szymczak, S. Yin, K., Gao, C. and Qiu, J.-L. (2017) Progress and prospects in plant gen-
(2016) Genome-wide association analysis identifies variation in vitamin ome editing. Nat. Plants 3, 17 107.
D receptor and other host factors influencing the gut microbiota. Nat. Yu, X., Lund, S.P., Scott, R.A., Greenwald, J.W., Records, A.H., Nettleton,
Genet. 48, 1396–1406. D., Lindow, S.E., Gross, D.C. and Beattie, G.A. (2013) Transcriptional
Wang, B., Li, Z., Xu, W., Feng, X., Wan, Q., Zan, Y., Sheng, S. and Shen, X. responses of Pseudomonas syringae to growth in epiphytic versus
(2017) Bivariate genomic analysis identifies a hidden locus associated apoplastic leaf sites. Proc. Natl Acad. Sci. USA 110, E425–E434.
with bacteria hypersensitive response in Arabidopsis thaliana. Sci. Rep. Zhao, K., Aranzana, M.J., Kim, S. et al. (2007) An Arabidopsis example of
7, 45 281. association mapping in structured samples. PLoS Genet. 3, e4.
Wang, M., Roux, F., Bartoli, C., Huard-Chauveau, C., Meyer, C., Lee, H., Zhou, X. and Stephens, M. (2014) Efficient multivariate linear mixed model
Roby, D., McPeek, M.S. and Bergelson, J. (2018) Two-way mixed-effects algorithms for genome-wide association studies. Nat. Methods 11, 407–
methods for joint association analysis using both host and pathogen 409.
genomes. Proc. Natl Acad. Sci. USA 115, E5440–E5449. Zhu, Z., Anttila, V., Smoller, J.W. and Lee, P.H. (2018) Statistical power and
Westcott, S.L. and Schloss, P.D. (2015) De novo clustering methods outper- utility of meta-analysis methods for cross-phenotype genome-wide asso-
form reference-based methods for assigning 16S rRNA gene sequences ciation studies. PLoS ONE 13, e0193256.
to operational taxonomic units. PeerJ 3, e1487.

© 2018 The Authors


The Plant Journal © 2018 John Wiley & Sons Ltd, The Plant Journal, (2019), 97, 164–181

You might also like