Gkad 445

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Published online 24 May 2023 Nucleic Acids Research, 2023, Vol.

51, Web Server issue W237–W242


https://doi.org/10.1093/nar/gkad445

GEPI: large-scale text mining, customized retrieval


and flexible filtering of gene/protein interactions
1 1 ,* 1 , 2 ,*
Erik Faessler , Udo Hahn and Sascha Schäuble
1
Jena University Language and Information Engineering (JULIE) Lab, Friedrich Schiller University Jena,
Fürstengraben 30, 07743 Jena, Germany and 2 Microbiome Dynamics, Leibniz Institute for Natural Product Research
and Infection Biology (Leibniz-HKI), 07745 Jena, Germany

Received March 18, 2023; Revised May 01, 2023; Editorial Decision May 09, 2023; Accepted May 11, 2023

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


ABSTRACT GRAPHICAL ABSTRACT
We present GEPI, a novel Web server for large-scale
text mining of molecular interactions from the scien-
tific biomedical literature. GEPI leverages natural lan-
guage processing techniques to identify genes and
related entities, interactions between those entities
and biomolecular events involving them. GEPI sup-
ports rapid retrieval of interactions based on power-
ful search options to contextualize queries targeting
(lists of) genes of interest. Contextualization is en-
abled by full-text filters constraining the search for in-
teractions to either sentences or paragraphs, with or
without pre-defined gene lists. Our knowledge graph
is updated several times a week ensuring the most
recent information to be available at all times. The
result page provides an overview of the outcome of a
search, with accompanying interaction statistics and
visualizations. A table (downloadable in Excel for- INTRODUCTION
mat) gives direct access to the retrieved interaction The molecular interactions between genes or gene prod-
pairs, together with information about the molecular ucts (e.g. exemplifying gene binding or positive regulatory
entities, the factual certainty of the interactions (as events) are the subject of a large body of scientific research
verbatim expressed by the authors), and a text snip- that is rapidly growing and typically communicated in scien-
pet from the original document that verbalizes each tific papers. Manually curated interaction databases such as
interaction. In summary, our Web application offers HPRD (1), INTACT (2) and BIOGRID (3) contain millions
free, easy-to-use, and up-to-date monitoring of gene of interactions extracted from thousands of publications.
Such databases offer structured, high-quality, and detailed
and protein interaction information, in company with
interaction data enabling continuous scientific hypothesis
flexible query formulation and filtering options. GEPI testing and discovery. However, only a small fraction of
is available at https:// gepi.coling.uni-jena.de/ . the complete set of documents in PubMed (PM) (with over
35M citations) and PubMed Central (PMC) (with more
than 5.1M full texts in its Open Access (OA) subset, as of
March 2023) is covered by such databases through manual
curation efforts. Therefore, automatic means of harvesting
such information from the biomedical literature on a larger
scale became a focus of research in the recent years.
STRING (4–6), because of its long-term development
history dating back to the beginning of this millennium
(7), is one of the most popular tools in the biomolecular

* To
whom correspondence should be addressed. Tel: +49 3641 944 320; Email: udo.hahn@uni-jena.de
Correspondence may also be addressed to Sascha Schäuble. Tel: +49 3641 532 1152; Fax: +49 3641 532 2152; Email: sascha.schaeuble@leibniz-hki.de


C The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
W238 Nucleic Acids Research, 2023, Vol. 51, Web Server issue

community for gene and protein interaction retrieval and


features text mining as one of its data input channels. Re-
searchers mainly from the field of natural language process-
ing (NLP) approached the automatic analysis of biomedical
literature in a series of challenge competitions, where tools
for the extraction of molecular events were rigorously eval-
uated against shared benchmark datasets (for a survey, see
(8)). Some of the front-runner systems were subsequently
integrated into Web servers, such as BIOCONTEXT (9) or
EVEXDB (10).
We here introduce an alternative NLP-based system,
Gene and Protein Interactions (GEPI), for the quick and
versatile retrieval of up-to-date molecular interactions from

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


the biomedical literature contained in PM and the OA
Figure 1. GEPI components and their data flow dependencies.
subset of PMC. GEPI comes as a Web application with
a graphical user interface that allows easy access also to
non-programming end users. It can be queried with one or our NLP infrastructure, execution speed, and currency of
two (unlimited) lists of gene/protein IDs and related entity the employed databases for GN. Considering the optimal
classes. Boolean full-text queries can be specified to filter for combination of these requirements, for gene recognition and
the sentence or paragraph context in which interactions oc- normalization, we equipped GEPI with GNORMPLUS, a
cur. Additional filter options for restricting search to docu- freely available tool that can be applied to arbitrary doc-
ment titles and section headings provide further options to uments, both abstracts and full texts. It was already ap-
constrain the retrieval of gene/protein associations. These plied to and evaluated on the whole of PubMed and PMC
full-text filters can also be used without the necessity to pro- in PUBTATOR CENTRAL (22), demonstrating its large-scale
vide input genes at all. An assessment of the degree of fac- applicability. The tool has been continuously updated since
tuality (11,12) for each interaction or event is derived from its first release, thus ensuring its currency. Using the same
verbatim signals in the documents where authors express assessment criteria from above, for event extraction, GEPI
their (un)certainty for an observed or stipulated interaction runs BIOSEM. It showed competitive overall performance in
(using hedging expressions such as ‘maybe’ or ‘[our data] major Shared Tasks, with very high precision values, result-
suggest that’). Such factuality tags may be used, e.g. to fil- ing in a high fraction of relevant events. Given the require-
ter out negated or low-certainty interaction statements. ment of correctly identifying both gene occurrences and
As a response to the often raised request for recency of their interactions, this ensures that GEPI generates mean-
information, GEPI includes automated procedures for up- ingful results, thus minimizing the number of false positives.
dating the indexed literature from PubMed and PMC mul-
tiple times a week. The obtained results are shown in a
dashboard which includes base statistics and visualizations, METHODS
as well as a comprehensive downloadable table summariz-
Text analytics: extracting gene/protein interactions
ing all identified interactions and their associated literature
passages. Figure 1 depicts the components of the GEPI application
ecosystem. The input to the pipeline are XML documents
from PubMed and PMC. Mentions of genes, gene products,
BACKGROUND
families, protein complexes and molecular events between
The retrieval of bio-molecular interactions from scientific those entities are extracted by an NLP pipeline (depicted
documents requires gene recognition (GR), where spans of in Supplementary Figure S1) and indexed into an Elastic-
text that correspond to names or identifiers for gene enti- Search (ES) cluster. A Neo4j graph database stores struc-
ties are identified, e.g., Arp5 as a gene in the text passage tured gene information that includes relationships express-
‘[...] the protein level of Arp5 was markedly reduced [...]’. ing orthology, family or group membership, protein com-
To uniquely identify gene mentions, database IDs are as- plexes and their subunits, and gene ontology annotations.
signed to them in the gene normalization (GN) step. A num- This information (taken from FamPlex, GO, NCBI Gene,
ber of tools handle both tasks, e.g., GNAT (13), GENO (14), etc.) is incorporated in the NLP pipeline for the resolu-
GNORMPLUS (15), the system proposed in (16) and oth- tion of ambiguous gene groups, families and protein com-
ers. After gene and protein mentions have been recognized, plexes. The events and interactions in ES are connected to
semantic relations between pairs of genes/proteins must be the entities in Neo4j via unique IDs. Together they form a
identified––the correlate of factual assertions in documents. knowledge graph that feeds the GEPI Web application. To
Again, NLP community challenges were the drivers for a ensure high result reliability, GEPI restricts molecular inter-
number of powerful relation extractors, such as TEES (17), actions to occur in single sentences. Finally, the Web appli-
JREX (18), BIOSEM (19), VERSE (20) or DEEPEVENTMINE cation leverages the gene database in Neo4j (holding stable
(21). terminological background knowledge) and the ES index
For the choice of GEPI components, we evaluated ex- (holding the continuously harvested results of relation ex-
isting NLP solutions with focus on the following criteria: traction) to serve user requests. More detailed descriptions
open source availability, performance, integratability into of the NLP pipeline and our data model can be found in
Nucleic Acids Research, 2023, Vol. 51, Web Server issue W239

Table 1. Current numbers of molecular interactions in GEPI. This in- After the retrieval process, GEPI results are presented
cludes events with one and two arguments in a dashboard. This includes aggregated overview pan-
Interactions Unique interactions els and a table of complete primary data where each re-
trieved interaction is listed individually within its document
PubMed 14 441 503 411 500 context. The aggregation panels summarize the publication
PubMed Central 52 144 387 1 032 377
PM+PMC 66 585 890 1 204 986 state with respect to the input and include frequency-sorted
pie and bar charts of interaction partners and Sankey dia-
grams revealing interaction frequencies. We distinguish two
Supplementary Material S1. An evaluation report on the types of Sankey diagrams. The first shows the most fre-
gene interaction extraction components is provided in Sup- quent interactions in the retrieval result. The second sum-
plementary Material S2. Evaluation results are shown in marizes second-degree interactions, offering information
Supplementary Tables S1 and S2. on interaction partners occurring in more complex inter-
actions than provided in the output table. Finally, the table

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


panel provides a list of direct associations between genes or
Gene/protein search: finding molecular information
proteins associated to the current query. It discloses detailed
Input to the GEPI interface is provided by a query form; information about the gene mentions in the text, the NCBI
the results of query processing are based on all interactions GENE IDs and symbols they were mapped to, and the sen-
stored in the knowledge graph (its current size is depicted tence in which interactions occurs. Where applicable, the ta-
in Table 1). ble offers links to the source databases of the molecular en-
The query form provides two input panels for lists of tity, e.g. NCBI GENE for genes or HGNC for gene groups to
genes and related entities we refer to as A-list and B-list, quickly obtain full gene names, gene descriptions and fur-
respectively. Further input fields allow the specification of ther resources.
a diverse set of filters described below. Input to the A- and The primary data table can be downloaded as an Excel
B-lists may consist of NCBI GENE (23) IDs and symbols, workbook, including query details and the resulting inter-
UNIPROT (24) IDs, FAMPLEX (25) protein family and com- action data. It also lists the occurrence counts of the gene or
plex identifiers or names, HGNC (26) gene group names, or protein symbols that take part in the interactions and how
Gene Ontology (GO) (27) terms. Family and group query often each symbol was found in interaction with another
items will include their members in the result, whereas GO symbol. Thus, the Excel workbook documents the query,
term queries will include genes that have been annotated the query result, and additional statistics enabling efficient
with the respective terms. Specifying only A items will re- downstream analysis. It can also be used for the manual cu-
sult in an open search. This mode retrieves gene and protein ration of results, e.g. further filtering or the removal of er-
interactions between the entries of A and any other interac- roneous result items.
tion partners from the interaction database. If a second set
of gene/protein identifiers is entered for the B-list, a closed
USE CASES INVOLVING GEPI
search is performed. This mode yields only interaction items
that have an element of A as one and an element of B as their We prepared two use cases that feature GEPI’s search and
other argument. filtering facilities. The first one presents a search for interac-
Adding genes to A- and B-lists in a search query is op- tion information about specific marker genes in the context
tional. If left out, the result of such a ‘full-text-only’ search of the devastating invasive pulmonary aspergillosis disease
comprises all molecular events in the interaction database caused by the fungal opportunistic pathogen Aspergillus fu-
that match the respective query filters. This query type can migatus (see Supplementary Material S3) (28). The second
be used to identify published molecular interactions associ- one describes the application of GEPI to investigate the Fer-
ated with specific filter terms, such as a condition (e.g., ‘el- ritin H chain in the context of disease tolerance in sepsis (see
derly’) or a disease (e.g., ‘obesity’) without restrictions on Supplementary Material S4) (29). A detailed walk-through
the associated genes. how to use data from these studies with GePI is provided in
The query form offers two input fields to specify full-text Supplementary Materials S5 and S6. The description of the
queries either on the sentence or paragraph level. They act second use case includes a verification step of GEPI results
as context-sensitive query filters for interactions. Sentence- leveraging STRING (see Supplementary Data S7 for the raw
level context can be used to specify filter terms occurring in protein pair confidence values given by STRING). We also
addition to an interaction in a sentence. The paragraph level specify the inputs to and outputs from GEPI in both use
widens the textual window around the interaction statement cases for reference in Supplementary Data S8.
and allows to retrieve interactions based on relevant context
keywords beyond sentence boundaries. Both filter queries
DISCUSSION
can be connected by Boolean AND or OR operators. An-
other full-text filter is sensitive to document titles and sec- Automated molecular event extraction from scientific liter-
tion headings and can be applied to restrict the gene/protein ature is an area of active research to by-pass large cover-
interaction search to, e.g., ‘Results’ sections only, since ‘In- age gaps in manually curated life science databases. STRING,
troduction’ and ‘Background’ sections commonly refer to for instance, brings together a multitude of biological in-
established (and thus less interesting) prior knowledge. Fur- frastructure resources, including curated interaction and
ther filter options may restrict the search results to specific genome sequence databases. One of STRING’s seven infor-
organisms, interaction types or factuality levels. mation channels is concerned with textual data accessed
W240 Nucleic Acids Research, 2023, Vol. 51, Web Server issue

from PM and the PMC OA subset. STRING’s text min- the 5’ adenosine monophosphate-activated protein kinase
ing facilities collect information for the interaction of pro- (AMPK) complex (31). By using full-text filter capabilities,
tein pairs in two fundamental ways. Firstly, statistical meth- we identified interactions between members of the Akt fam-
ods are leveraged to find over-represented proteins in docu- ily and pyruvate dehydrogenase kinase 2 (PDK2) in the con-
ments with respect to the user input. This is used to calculate text of cellular stress (32). We also used our NLP engine to
an aggregated measure of confidence that a unique pair of gain literature-based insight into the current state of poten-
proteins is associated in general, instead of identifying ex- tial interaction partners on kinases of the PI3K/Akt sig-
plicit association descriptions of a protein pair in individual naling network based on quantitative phosphoproteomics
documents (5). This can be witnessed in STRING’s text min- as input for our NLP service (33). We furthermore com-
ing viewer where at least two input proteins are highlighted bined results from our event extraction system with large-
in each citation but are not necessarily mentioned in any scale gene expression analysis and multiple validation ex-
interaction. Secondly, a modern NLP approach is used to periments to generate a molecular signature of the activa-
find protein pairs explicitly described to physically interact, tion of aryl hydrocarbon receptor (AHR) (34). Finally, our

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


i.e., to build protein complexes (5). For such cases, the text efforts supported the analysis of peripheral blood samples
mining viewer shows explicit verbalizations of physical in- of children to investigate childhood asthma (35).
teractions in PM abstracts. The above mentioned usage scenarios demonstrate the
Unlike STRING, GEPI is an exclusively NLP-driven flexible applicability of our NLP engines and their potential
tool designed towards the identification of interacting to complement life-science data set analysis and hypothesis
gene/protein pairs or events involving single genes/proteins generation. Besides its core functionality, the recognition of
from biomedical documents. As a safe-guard mechanism, gene and protein interactions in huge literature repositories,
GEPI supplies the text passage provided with each result GEPI excels with flexible options for expressive query for-
item. By design, GEPI cannot render interaction data po- mulation and a wide range of filtering functions to constrain
tentially (only) stored in structured tables, e.g. Supplemen- its result sets which can then be channeled to its end users by
tary data, external of the publication text. GEPI covers a simple-to-(re)use reporting devices (Excel sheets). Further-
broader range of interaction types, including binding (cor- more, GEPI operates on up-to-date textual data (based on
responding to STRING’s physical interaction), regulation short-term update cycles in the range of few days from the
and activation, thus substantially widening the scope of current date) and runs efficiently (as evidenced by its timely
high-quality interaction descriptions extracted from the lit- responses).
erature. GEPI’s additional capabilities of searching for a Despite the added value offered by GEPI, there is also
closed pair of gene lists and filtering by document context, room for improvement. On the preprocessing level, BIOSEM
make GePI a complementary tool to STRING, which (un- was used for its superior specificity and processing perfor-
like GEPI) processes several types of structured resources mance but has been equalized, in the meantime, by deep
(databases and terminologies). While STRING calculates learning-based approaches (DL) that might identify addi-
confidence scores aggregated from a diverse set of resources tional interaction events, e.g., DEEPEVENTMINE. However,
for unique protein pairs, GEPI’s focus lies on the high- these DL approaches have a high demand for computa-
quality extraction of interactions explicitly described in tional power, and frequent updates incorporating new liter-
publications together with factuality ratings extracted from ature can become prohibitive with updates consuming sig-
the textual context of interaction descriptions in the form nificant resources and taking prolonged periods of time. De-
of hedging expressions. spite the focus on high precision of GEPI’s NLP compo-
In this regard, GEPI is much closer in spirit to BIOCON- nents, false positive results cannot be ruled out. For fur-
TEXT and EVEXDB. However, BIOCONTEXT and EVEXDB ther result verification we recommend to query curated
allow only single gene queries to identify their interaction databases, e.g. BIOGRID, or comparison of GEPI results
partners. Even worse, the BIOCONTEXT Web server became with those of STRING, given the same query. Indeed, we see
offline in the meantime. EVEXDB allows for queries of sin- a clear complementary relation between GEPI and STRING:
gle genes or a pair of genes. It provides interaction informa- The interactions explicitly described in the literature found
tion including statistics and interaction members, as well as by GEPI can be corroborated with STRING’s diverse set
text snippets of the identified interactions grouped by in- of evidence channels to obtain a high-certainty list of in-
teraction type (regulation, binding, etc.). To the best of our teractions for each of which exists a directly linked liter-
knowledge, EVEXDB has not been updated since 2013 and ature support. We also recommend to investigate specific
thus features mostly outdated knowledge. Neither tool of- text portions provided by GEPI for interactions not found
fers further context filtering, nor non-programmatic means by BIOGRID or STRING. If their relevance is confirmed,
to search for more than two genes at once. these may feature interactions currently not included in the
GEPI’s value for the scientific community is also evi- other databases.
denced by the usage of prior versions of its NLP pipeline
for a variety of experimental studies. For these studies,
CONCLUSION
we developed NLP modules (30) that allow us to extract
molecular event information from the whole of PM and We introduced GEPI, a Web application for the fully au-
PMC (OA subset) at any given point of time. Previous tomated extraction of molecular events from the biomed-
instances of our NLP pipeline were already successfully ical literature. The automatically populated and updated
applied to identify potential interaction partners of pro- knowledge graph includes interactions mined from the
teins found in phosphoproteomics experiments including whole of PubMed and PubMed Central OA subset with
Nucleic Acids Research, 2023, Vol. 51, Web Server issue W241

regular updates to keep up with newest developments in 6. Franceschini,A., Szklarczyk,D., Frankild,S.P., Kuhn,M.,
the literature. We presented powerful query and filtering Simonovic,M., Roth,A., Lin,J., Minguez,P., Bork,P., von Mering,C.
et al. (2013) String v9.1 : protein-protein interaction networks, with
options for contextualized retrieval of interactions between increased coverage and integration. Nucleic Acids Res., 41,
genes, proteins, families and protein complexes. The interac- D808–D815.
tion results can be downloaded as an Excel workbook that 7. Snel,B., Lehmann,G., Bork,P. and Huynen,M.A. (2000) String : a
contains all identified relation pairs and statistics about the Web-server to retrieve and display the repeatedly occurring
frequency of the interaction partners and the interactions neighbourhood of a gene. Nucleic Acids Res., 28, 3442–3444.
8. Huang,C.-C. and Lu,Z. (2016) Community challenges in biomedical
themselves. For each result interaction, its textual source text mining over 10 years: success, failure and the future. Brief.
is provided for cross-checking. The NLP pipelines we pre- Bioinform., 17, 132–144.
sented here and their accessibility via a Web application 9. Gerner,M., Sarafraz,F., Bergman,C. and Nenadic,G. (2012)
opens our NLP service to the broad life science community BioContext: an integrated text mining system for large-scale
extraction and contextualization of biomolecular events.
to effectively foster new scientific discoveries. Bioinformatics, 28, 2154–2161.

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


10. Van Landeghem,S., Björne,J., Wei,C.-H., Hakala,K., Pyysalo,S.,
Ananiadou,S., Kao,H.-Y., Lu,Z., Salakoski,T., Van de Peer,Y. et al.
DATA AVAILABILITY (2013) Large-scale event extraction from literature with multi-level
gene normalization. PLoS One, 8, e55814.
GePI is freely available at https://gepi.coling.uni-jena.de/. 11. Hahn,U. and Engelmann,C. (2014) Grounding epistemic modality in
speakers’ judgments. In: Pham,D. and Park,S. (eds). PRICAI 2014 ––
Proceedings of the 13th Pacific Rim International Conference on
SUPPLEMENTARY DATA Artificial Intelligence. Gold Coast, Queensland, Australia, 1-5
December, 2014. Cham: Springer, pp. 654–667.
Supplementary Data are available at NAR Online. 12. Van Landeghem,S., Hakala,K., Rönnqvist,S., Salakoski,T., Van de
Peer,Y. and Ginter,F. (2012) Exploring biomolecular literature with
Evex: connecting genes through events, homology, and indirect
ACKNOWLEDGEMENTS associations. Adv. Bioinform., 2012, 582765.
13. Hakenberg,J., Plake,C., Leaman,R., Schroeder,M. and Gonzalez,G.
We thank all GlioPATH groups headed by Christiane (2008) Inter-species normalization of gene mentions with GNAT.
Opitz, Kathrin Thedieck and Saskia Trump and their teams Bioinformatics, 24, i126–i132.
14. Wermter,J., Tomanek,K. and Hahn,U. (2009) High-performance gene
for fruitful discussions. name normalization with GeNo. Bioinformatics, 25, 815–821.
15. Wei,C.-H., Kao,H.-Y. and Lu,Z. (2015) GNormPlus: an integrative
approach for tagging genes, gene families, and protein domains.
FUNDING BioMed Res. Int., 2015, 918710.
16. Zhou,H., Ning,S., Liu,Z., Lang,C., Liu,Z. and Lei,B. (2020)
S.S. acknowledges support from the BMBF e:Med initia- Knowledge-enhanced biomedical named entity recognition and
tive GlioPATH [01ZX1402]; institutional funding from the normalization: application to proteins and genes. BMC
Leibniz-HKI; E.F. and U.H. were supported by BMBF Bioinformatics, 21, 35.
17. Björne,J., Heimonen,J., Ginter,F., Airola,A., Pahikkala,T. and
within the SMITH project [01ZZ1803G]. Funding of the Salakoski,T. (2011) Extracting contextualized complex biological
open access charges was supported by IBM via three ‘Un- events with rich graph-based feature sets. Comput. Intell., 27,
structured Information Analytics Awards’ donated to U.H. 541–557.
between 2000 and 2008. 18. Buyko,E., Faessler,E., Wermter,J. and Hahn,U. (2011) Syntactic
Conflict of interest statement. None declared. simplification and semantic enrichment: trimming dependency graphs
for event extraction. Comput. Intell., 27, 610–644.
19. Bui,Q.-C., Campos,D., van Mulligen,E.M. and Kors,J.A. (2013) A
fast rule-based approach for biomedical event extraction. In: BioNLP
REFERENCES 2013 –– Proceedings of the BioNLP Shared Task 2013 Workshop @
1. Keshava Prasad,T.S., Goel,R., Kandasamy,K., Keerthikumar,S., ACL 2013. Sofia, Bulgaria, August 9, 2013. pp. 104–108.
Kumar,S., Mathivanan,S., Telikicherla,D., Raju,R., Shafreen,B., 20. Lever,J. and Jones,S.J.M. (2016) Verse: event and relation extraction
Venugopal,A. et al. (2009) Human protein reference database - 2009 in the BioNLP 2016 Shared Task. In: BioNLP 2016 –– Proceedings of
update. Nucleic Acids Res., 37, 767–772. the 4th BioNLP Shared Task Workshop @ ACL 2016. Berlin,
2. Orchard,S.E., Ammari,M., Aranda,B., Breuza,L., Briganti,L., Germany, 13 August 2016. pp. 42–49.
Broackes-Carter,F., Campbell,N.H., Chavali,G., Chen,C., del Toro,N. 21. Trieu,H.-L., Tran,T.T., Duong,K. N.A., Nguyen,A., Miwa,M. and
et al. (2014) The MIntAct project: IntAct as a common curation Ananiadou,S. (2020) DeepEventMine: end-to-end neural nested event
platform for 11 molecular interaction databases. Nucleic Acids Res., extraction from biomedical texts. Bioinformatics, 36, 4910–4917.
42, D358–D363. 22. Wei,C.H., Allot,A., Leaman,R. and Lu,Z. (2019) PubTator Central:
3. Oughtred,R., Rust,J., Chang,C., Breitkreutz,B.J., Stark,C., automated concept annotation for biomedical full text articles.
Willems,A., Boucher,L., Leung,G., Kolas,N., Zhang,F. et al. (2021) Nucleic Acids Res., 47, W587–W593.
The BioGRID database: a comprehensive biomedical resource of 23. Sayers,E.W., Bolton,E.E., Brister,J.R., Canese,K., Chan,J.,
curated protein, genetic, and chemical interactions. Protein Sci., 30, Comeau,D.C., Connor,R., Funk,K., Kelly,C., Kim,S. et al. (2022)
187–200. Database resources of the National Center for Biotechnology
4. Szklarczyk,D., Kirsch,R., Koutrouli,M., Nastou,K.C., Mehryary,F., Information. Nucleic Acids Res., 50, D20–D26.
Hachilif,R., Gable,A.L., Fang,T., Doncheva,N.T., Pyysalo,S. et al. 24. The Uniprot Consortium (2023) UniProt: the Universal Protein
(2023) The String database in 2023: protein-protein association Knowledgebase in 2023. Nucleic Acids Res., 51, 523–531.
networks and functional enrichment analyses for any sequenced 25. Bachman,J.A., Gyori,B.M. and Sorger,P.K. (2018) FamPlex: a
genome of interest. Nucleic Acids Res., 51, D638–D646. resource for entity recognition and relationship resolution of human
5. Szklarczyk,D., Gable,A.L., Nastou,K.C., Lyon,D., Kirsch,R., protein families and complexes in biomedical text mining. BMC
Pyysalo,S., Doncheva,N.T., Legeay,M., Fang,T., Bork,P. et al. (2021) Bioinformatics, 19, 248.
The String database in 2021: customizable protein-protein networks, 26. Seal,R.L., Braschi,B., Gray,K., Jones,T. E.M., Tweedie,S.,
and functional characterization of user-uploaded gene/measurement Haim-Vilmovsky,L. and Bruford,E.A. (2023) Genenames.org: the
sets. Nucleic Acids Res., 49, D605–D612. HGNC resources in 2023. Nucleic Acids Res., 51, D1003–D1009.
W242 Nucleic Acids Research, 2023, Vol. 51, Web Server issue

27. The Gene Ontology Consortium (2000) Gene Ontology: tool for the 32. Heberle,A.M., Razquin Navas,P., Langelaar-Makkinje,M.,
unification of biology. Nature Genetics, 25, 25–29. Kasack,K., Sadik,A., Faessler,E., Hahn,U., Marx-Stoelting,P.,
28. Zoran,T., Seelbinder,B., White,P., Price,J., Kraus,S., Kurzai,O., Opitz,C.A., Sers,C. et al. (2019) The PI3K and MAPK/p38 pathways
Linde,J., Häder,A., Loeffler,C. et al. (2022) Molecular profiling control stress granule assembly in a hierarchical manner. Life Sci.
reveals characteristic and decisive signatures in patients after Alliance, 2, e201800257.
allogeneic stem cell transplantation suffering from invasive 33. Reimann,L., Schwäble,A.N., Fricke,A.L., Mühlhäuser,W.W.D.,
pulmonary aspergillosis. J. Fungi, 8, 171. Leber,Y., Lohanadan,K., Puchinger,M.G., Schäuble,S., Faessler,E.
29. Weis,S., Carlos,A.R., Moita,M.R., Singh,S., Blankenhaus,B., et al. (2020) Phosphoproteomics identifies dual-site phosphorylation
Cardoso,S., Larsen,R., Rebelo,S., Schäuble,S., Del Barrio,L. et al. in an extended basophilic motif regulating FILIP1-mediated
(2017) Metabolic adaptation establishes disease tolerance to sepsis. degradation of filamin-C. Commun. Biol. [Nature], 3, 253.
Cell, 169, 1263–1275. 34. Sadik,A., Somarribas Patterson,L.F., Öztürk,S., Mohapatra,S.R.,
30. Hahn,U., Matthies,F., Faessler,E. and Hellrich,J. (2016) Panitz,V., Secker,P.F., Pfänder,P., Loth,S., Salem,H., Prentzell,M.T.
UIMA-based JCoRe 2.0 goes GitHub and Maven Central: et al. (2020) IL4|1 is a metabolic immune checkpoint that activates the
State-of-the-art software resource engineering and distribution of AHR and promotes tumor progression. Cell, 182, 1252–1270.
NLP pipelines. In: LREC 2016 –– Proceedings of the 10th 35. Thürmann,L., Klös,M., Mackowiak,S.D., Bieg,M., Bauer,T.,

Downloaded from https://academic.oup.com/nar/article/51/W1/W237/7177881 by guest on 04 February 2024


International Conference on Language Resources and Evaluation. Ishaque,N., Messingschlager,M., Herrmann,C., Röder,S., Bauer,M.
Portorož, Slovenia, 23-28 May 2016. pp. 2502–2509. et al. (2023) Global hypomethylation in childhood asthma identified
31. Dalle Pezze,P., Ruf,S., Sonntag,A.G., Langelaar-Makkinje,M., by genome-wide DNA-methylation sequencing preferentially affects
Hall,P., Heberle,A.M., Razquin Navas,P., van Eunen,K., Tölle,R.C., enhancer regions. Allergy, https://doi.org/10.1111/all.15658.
Schwarz,J.J. et al. (2016) A systems study reveals concurrent
activation of AMPK and mTOR by amino acids. Nat. Commun., 7,
13254.


C The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

You might also like