Professional Documents
Culture Documents
Tello-Ruiz Et Al., 2021
Tello-Ruiz Et Al., 2021
Received September 15, 2020; Editorial Decision October 09, 2020; Accepted October 09, 2020
* To whom correspondence should be addressed. Tel: +1 631 742 1762; Fax: +1 516 367 6851; Email: ware@cshl.edu
gether, they provide a solid foundation for obtaining novel cation process and the publisher’s manuscript approval and
insights into plant gene function. However, a thorough ex- handling system, and directly engages authors in the data
ploitation of the genomic data requires systems-level frame- mining process. It has processed 1436 articles to date, of
works, as well as analytic and visualization tools for com- which 456 were from The Plant Cell, and 980 were from
parative analysis of coding, noncoding and small molecule Plant Physiology. We have continued to engage our users
components across species. at scientific conferences, a news blog and social media. We
Since its inception in 2001, Gramene has strived to pro- have also conducted live webinars and recorded video tu-
vide a unique genome-scale data visualization platform torials. We have also trained plant researchers in structural
and analytical tools to support the global community of biocuration of maize genes, and rice genes and pathways by
plant genomics researchers, molecular biologists, geneti- organizing onsite and online jamborees.
cists, breeders, etc. (1,2). As the knowledgebase has evolved, All data in Gramene are freely accessible through visual
we have included the Ensembl Genome Browser, which and programmatic interfaces at http://www.gramene.org.
Table 1. New additions and updates in data, analyses, tools and infrastructure at Gramene
sorghum (from the Database of Genomic Variants [DGV] tion errors. Split gene models might result from a legiti-
(33)). mate evolutionary process, but most commonly they are
Since our last NAR update, we have integrated new varia- related to an annotation artifact in which a single gene is
tion data for three species (Supplementary Table S3): Malus annotated as two or more genes due to incomplete evi-
domestica (apple, 10.6 million SNPs; (26)), Helianthus an- dence. Therefore, at every release, we provide lists of po-
nuus (sunflower, 11 834 SNPs; (27,28), and T. turgidum (du- tential split genes (see ftp://ftp.gramene.org/pub/gramene/
rum wheat, 1.9 million SNPs; (29)). For apple, SNP variants CURRENT RELEASE/split genes) predicted using the
with minor allele frequency (MAF) 0.05 were called on Ensembl Compara pipeline.
70 varieties and lines through the FruitBreedomics project The phylogenetic trees (Figure 2) provide a standard way
(26). Sunflower SNPs were called in three sunflower pre- for the community to infer gene function across species, as
breeding collections belonging to INTA (Argentina), INRA well as insights into the fluid nature of gene gain and loss
(France), and USDA-UBC (USA and Canada) to estimate across the evolutionary spectrum, as shown in the less con-
Figure 2. Comparative genomic visualizations in Gramene. Four ways to graphically determine whether a gene is present or absent relative to other species
in Gramene. (A) Circular gene family tree for a rice repressor protein, OsDrAp1 (Os11g0544700) showing significant gene gain (red lines; e.g. Brassica
napus with 8 paralogs), and gene loss (zero paralogs, and species names grayed out; e.g., Oryza meridionalis and Triticum urartu). An apparent gene loss
could be evolutionary important for a species, but could also indicate a gap in the gene annotation. (B) Region comparison view of the syntenic segment
surrounding the OsDrAp1 gene. The gene is conserved in the monocot model Brachypodium, but absent in O. meridionalis and T. urartu. (C) Synteny map
between the model plant Brachypodium (chromosome 2) and Japonica rice (chromosome 1) in the region surrounding the rice auxin efflux carrier, OsPIN9,
and a table listing the corresponding syntelogs in this region. (D and E). Gene neighborhood views for the ZmPIN9 (blue arrow) gene tree. ZmPIN9 is the
maize ortholog of OsPIN9. The view shows (D) a high level of microsynteny in monocots (sorghum and wheat, including the new emmer wheat genome;
brachy and rice, shown in a green rectangle), and (E) microsynteny among dicots (Arabidopsis species marked with a red rectangle; and three new genomes:
tobacco, hot pepper and coffee).
Nucleic Acids Research, 2021, Vol. 49, Database issue D1457
in 28 species (Expression Atlas, August 2020) (Supple- A. thaliana can be compared with the reference rice path-
mentary Table S4). The new plant species with curated way (Figure 3C). The pathway browser also displays anato-
gene expression include Trifolium pratense, Setaria ital- mogram images and RNA-seq baseline expression data,
ica, Prunus persica, Musa acuminata, Brassica napus and fetched in real time from the Expression Atlas (4) (Figure
Beta vulgaris subsp. vulgaris. In addition, previously curated 3D). A similar level of expression data can be accessed for
datasets were reanalyzed against the updated genomes, such the 28 plant species mentioned in the Plant Gene Expression
as wheat IWGSCv1. Other analyses and visualizations in- in ATLAS section. Recently, we added the gene–gene inter-
cluded ‘enrichment’ in each differential comparison of GO action overlay feature, which allows users to select and dis-
terms, Plant Reactome pathways, and InterPro domains, as play gene–gene interaction data from the BioAnalytic Re-
well as a functionality to find genes with similar expression source (BAR) (50), the IntAct database (51), our own plant
profiles in a baseline experiment page, and another to visu- interactome, and several other sources via PSICQUIC web
alize the distribution of baseline expression across biologi- services (52). Users can also upload interaction data from
undergraduate students and plant researchers to triage sus- richment of metabolic and transport pathways created dur-
pect gene models (Figure 4), evaluate the experimental evi- ing the very early stages of Plant Reactome; (ii) depiction of
dence to support the existence of a transcript, and curate the processes associated with plant development (organ forma-
gene’s structure in the Apollo gene editor (59). Maize served tion, tissue differentiation, cell division, etc.); and (iii) inte-
as the initial prototype for these efforts (41). Gramene estab- gration of transcription networks. To engage the plant ge-
lished a network of researchers and teaching faculty to sup- nomics community in such biocuration efforts, we hosted
port and implement genome annotation projects as Course- two jamborees in the autumns of 2017 and 2018 (60). In ad-
based Undergraduate Research Experiences (CUREs). In dition, we recruited community collaborators and leveraged
addition, we have hosted several in-person and virtual jam- individuals involved in data science courses, thesis projects,
borees, trained over 50 participants, developed training ma- and experiential summer learning internships at Oregon
terials and curation tools based on the gene trees to support State University (OSU), and trained seven undergraduate
triage of suspect models, and set up Apollo instances of students in small biocuration research projects.
reference assemblies for maize and sorghum. Importantly, The availability of a large number of genomes, along with
the virtual jamborees highlighted methods for supporting a an extensive array of peer-reviewed literature and meeting
diverse collaborative environment for remote learning and abstracts reporting on gene function and phenotype, rep-
provided a proof-of-concept for future community curation resents a gold mine for training Machine Learning meth-
efforts to improve genomic annotations in maize and other ods for knowledge extraction. This is an important activity
important crops. Curated maize gene models are available because even though biocuration is of extremely high qual-
as a track on Gramene’s Maize Genome Browser (see the ity, it is time-consuming, challenging, and constrained by
table with links to their corresponding genomic location limitations on resources, experts, and funding models for
at http://www.gramene.org/curated maize v4 gene models, various plant biology databases. Moreover, the rate of re-
and Figure 4B). porting new data in the literature has substantially outpaced
In addition, we focused biocuration on rice genes and quality manual biocuration, making it practically impos-
pathways, based on evaluation of peer-reviewed published sible to capture everything manually. Therefore, in a pilot
scientific literature, analysis of expression patterns of ho- program, we implemented Domain Informational Vocabu-
mologous genes, and evidence from gene–gene interaction lary Extraction (DIVE) as a web service for the publication
data sets. Recent efforts accomplished (i) revision and en- pipeline used by the two ASPB journals, The Plant Cell and
Nucleic Acids Research, 2021, Vol. 49, Database issue D1459
Figure 4. Community biocuration. (A) Gene tree visualizer tool with a multiple (amino acid) sequence alignment highlighting potential structural annota- Downloaded from https://academic.oup.com/nar/article/49/D1/D1452/5973447 by guest on 26 September 2022
tion artifacts at the 3 -end of the auxin efflux carrier ZmPIN9 gene (Zm00001d043179). (B) Maize genome track showing a curated gene structure (shorter
internal exons) for ZmPIN9.
Plant Physiology. In March 2018, the DIVE tool was in- tially important words and phrases from manuscripts and
tegrated with The Plant Cell using ArticleExpress, a pro- provides a web-based interactive user interface for biocu-
prietary tool of Sheridan Journal Services. ArticleExpress ration by authors (61). An author is unable to submit the
is an XML-based author proofing system that allows au- manuscript proofs for composition until the data mined by
thors to make changes directly to the article’s XML, rather DIVE are approved by the authors. In April 2018, Plant
than the traditional method of marking up a PDF. The in- Physiology began requiring authors to use ArticleExpress
tegration of DIVE into ArticleExpress added a step to an with the integrated DIVE tool. Thus, DIVE enriches digital
author’s proofing process by presenting curated entities to publications by first detecting biological entities and key in-
confirm, edit, delete, or add. DIVE extracts a list of poten- formational words, and then adding additional annotations
D1460 Nucleic Acids Research, 2021, Vol. 49, Database issue
to the published article. The system implements multiple centered on molecular details of the development of spe-
strategies for biological entity detection, including the use cialized plant tissues and responses to external stimuli and
of regular expression rules, ontology, and a keyword dictio- stresses. We will continue our work to integrate DIVE with
nary. These extracted entities are then stored in a database Gramene, and to provide links back to the relevant publi-
and made accessible for curation and evaluation by authors. cations from our gene and pathway views.
The author approved edits and updates to the mined data Over the past 15 years, Gramene has evolved from a
can then be used to improve entity importance prediction genome database hosting the first crop reference assem-
for future article processing (62). For this work, we have bly, rice (69), into the preeminent plant genomics data
used gene nomenclature data from Gramene, ontologies warehouse and knowledgebase. Phylogenomics provides in-
from the Planteome database (48) as our word bank, and sights into gene evolution and the fluid nature of the plant
ontology-based knowledgegraphs, to drive entity recogni- genomes. To date, Gramene’s existing infrastructure and
tion and data mining. Other projects have performed post- data acquisition has focused on cross-species comparisons,
linkage disequilibrium assessment among international sunflower 48. Cooper,L., Meier,A., Laporte,M.-A., Elser,J.L., Mungall,C.,
breeding collections. Genes ,11, 283. Sinn,B.T., Cavaliere,D., Carbon,S., Dunn,N.A., Smith,B. et al. (2018)
29. Maccaferri,M., Harris,N.S., Twardziok,S.O., Pasam,R.K., The Planteome database: an integrated resource for reference
Gundlach,H., Spannagl,M., Ormanbekova,D., Lux,T., Prade,V.M., ontologies, plant genomics and phenomics. Nucleic Acids Res., 46,
Milner,S.G. et al. (2019) Durum wheat genome highlights past D1168–D1180.
domestication signatures and future improvement targets. Nat. 49. Jassal,B., Matthews,L., Viteri,G., Gong,C., Lorente,P., Fabregat,A.,
Genet., 51, 885–895. Sidiropoulos,K., Cook,J., Gillespie,M., Haw,R. et al. (2020) The
30. Wilkinson,P.A., Allen,A.M., Tyrrell,S., Wingen,L.U., Bian,X., reactome pathway knowledgebase. Nucleic Acids Res., 48,
Winfield,M.O., Burridge,A., Shaw,D.S., Zaucha,J., Griffiths,S. et al. D498–D503.
(2020) CerealsDB-new tools for the analysis of the wheat genome: 50. Waese,J. and Provart,N.J. (2017) The bio-analytic resource for plant
update 2020. Database, 2020, baaa060. biology. Methods Mol. Biol., 1533, 119–148.
31. Howe,K.L., Contreras-Moreira,B., De Silva,N., Maslen,G., 51. Orchard,S., Ammari,M., Aranda,B., Breuza,L., Briganti,L.,
Akanni,W., Allen,J., Alvarez-Jarreta,J., Barba,M., Bolser,D.M., Broackes-Carter,F., Campbell,N.H., Chavali,G., Chen,C.,
Cambell,L. et al. (2020) Ensembl Genomes 2020-enabling del-Toro,N. et al. (2014) The MIntAct project–IntAct as a common
non-vertebrate genomic research. Nucleic Acids Res., 48, D689–D695. curation platform for 11 molecular interaction databases. Nucleic
https://arxiv.org/abs/1910.14623, 31 October 2019, preprint: not peer 69. Ware,D.H., Jaiswal,P., Ni,J., Yap,I.V., Pan,X., Clark,K.Y.,
reviewed. Teytelman,L., Schmidt,S.C., Zhao,W., Chang,K. et al. (2002)
68. Yates,A.D., Achuthan,P., Akanni,W., Allen,J., Allen,J., Gramene, a tool for grass genomics. Plant Physiol., 130, 1606–1613.
Alvarez-Jarreta,J., Amode,M.R., Armean,I.M., Azov,A.G., 70. Alliance of Genome Resources Consortium (2020) Alliance of
Bennett,R. et al. (2020) Ensembl 2020. Nucleic Acids Res., 48, Genome Resources Portal: Unified Model Organism Research
D682–D688. Platform. Nucleic Acids Res., 48, D650–D658.