Professional Documents
Culture Documents
ELIXIR: Providing A Sustainable Infrastructure For Life Science Data at European Scale
ELIXIR: Providing A Sustainable Infrastructure For Life Science Data at European Scale
ELIXIR: Providing A Sustainable Infrastructure For Life Science Data at European Scale
doi: 10.1093/bioinformatics/btab481
Advance Access Publication Date: 27 June 2021
Letter to the Editor
Systems biology
ELIXIR: providing a sustainable infrastructure for life
science data at European scale
Received on December 17, 2020; revised on February 19, 2021; editorial decision on March 1, 2021; accepted on June 25, 2021
In 2012, the Digital Universe Study reported that ‘less than 1% of Building a stable and sustainable infrastructure
the World’s data is analyzed and less than 20% is protected’ and
recommended that management plans were needed to harness the for biological information across Europe
potential within these data streams (Gantz and Reinsel, 2012). Life ELIXIR is now a mature intergovernmental organization that has
science is a data science that is dependent on the generation, sharing grown from 6 members in 2014 to 22 members and one observer in
and integrative analysis of vast quantities of digital data. However, 2020. ELIXIR links more than 220 institutes and over 700 national
these data are complex and fragmented, creating a significant barrier experts dedicated to the development and operation of national
to their integration and reuse. Data are generated for different re- services that contribute to data access, integration, training and ana-
search purposes at thousands of facilities across the world, with di- lysis for the research community (see Fig. 1).
verse formats, annotations, methodologies and metadata standards. Fundamentally, ELIXIR is connecting national bioinformatic
Interpreting this data is difficult and often involves integrating very networks, known within ELIXIR as Nodes (https://elixir-europe.
large datasets from multiple sources. Certain data types such as sen- org/about-us/who-we-are/nodes). Some ELIXIR Nodes predate
sitive human data archived under controlled access require protec- ELIXIR, such as EMBL-EBI and the Swiss Institute of
tion, so data federation and sharing are a challenge in this context Bioinformatics. In many other countries, ELIXIR has contributed to
(Saunders et al., 2019). the formation of new national bioinformatics infrastructures, e.g.
Since its official launch in December 2013, ELIXIR (https://
de.NBI (https://www.denbi.de; ELIXIR Germany) and ELIXIR
elixir-europe.org/), the European infrastructure for life science data,
Czech Republic (https://www.elixir-czech.cz/), both launched in
has worked to address these challenges by bringing Europe’s nation-
2015. The participating countries contribute membership fees, pro-
al centers and core bioinformatics resources into a single, coordi-
portional to the countries Net National Income (NNI), provide
nated infrastructure. Many European countries have long-
ELIXIR with a technical budget that is used to fund development of
established national bioinformatics infrastructures that provide
shared services that support and link the nationally funded bioinfor-
tools, data resources and cloud services to national and international
users. ELIXIR invests in and aims to sustain these vital resources matics resources.
and ensure a level of interoperability that facilitates scientific discov- The Nodes guide the direction of ELIXIR in alignment with their
ery. Indeed, for research to thrive in a world of data abundance, all own national priorities, as has been described, e.g. by ELIXIR
components (research data, analysis tools, standards as well as com- Netherlands (Eijssen et al., 2015), ELIXIR Switzerland (Baillie
putational resources and training materials) must be FAIR—find- Gerritsen et al., 2019) and ELIXIR Norway (Tekle et al., 2018).
able, accessible, interoperable and reusable (Wilkinson et al., 2016). ELIXIR aims to ensure that all life science projects, irrespective of
ELIXIR supports European life scientists in making data FAIR, their membership status with respect to ELIXIR, have access to
ensuring that the benefits extend far beyond those actively partici- advanced, scalable and long-term sustainable infrastructure across
pating in ELIXIR, and, additionally, encourages all funders to adopt Europe. In this expanding interdisciplinary field, training of new
Open Science mandates. researchers and infrastructure providers is key for success and
Here, we describe how ELIXIR has evolved to be an established ELIXIR runs a vibrant training network. For example, between
research infrastructure, with a review of significant milestones September 2015 and March 2019, ELIXIR was able to register over
achieved, and a look into the future. By linking national networks 1200 training materials and train more than 19 000 people.
that span most of Europe’s major research centers ELIXIR is in a Capacity building of this magnitude depends upon the coordinated
unique position to drive the transformation of life science’s distrib- reuse of skills, resources and materials.
uted data resources within Europe to be sustainable, federated, Partnerships between industry and academia are increasingly im-
standards-based and cost-effective. portant if the societal impact of scientific research is to be fully
realized. For instance, access to open data and software has been Established and driven by researchers and bioinformaticians in
shown to be of long-term value to the bioeconomy (Bousfield et al., the Nodes, the ELIXIR Communities identify the needs of domain-
2016; Rothe et al., 2019). Industrial users of ELIXIR’s services or technology-specific research around a specific theme, e.g.
range from large multinationals to micro-Small and Medium Metabolomics, Microbial Biotechnology. ELIXIR now has a total of
Enterprises (SMEs), and include the pharmaceutical, healthcare, eleven Communities (see Fig. 2) which is a significant progression
food, agriculture and blue biotech sectors. ELIXIR has shown that from the four use cases (Federated Human Data, Marine
many SMEs extensively rely on life science data from public resour- Metagenomics, Plant Sciences and Rare Disease) that were sup-
ces (Garcia et al., 2018). ELIXIR aims to build new partnerships ported by the initial Horizon 2020 ELIXIR-EXCELERATE grant.
and strengthen existing collaborations (Lauer et al., 2019) through Harrow et al. (2021) describe how ELIXIR can support the research
its innovation and SME Forum, which has delivered 10 events over communities in their effective use of life science data, as exemplified
the first program, covering topics as diverse as Genomics and by the EXCELERATE use cases.
Health and Data-driven innovation in the agri-food industries. To ensure engagement between Nodes, Platforms and
Communities, ELIXIR conducts Implementation Studies (https://
elixir-europe.org/about-us/implementation-studies) that develop
Engagement through Platforms and services and bring benefit to key user communities (see e.g. Fig. 2).
Communities To date, more than 50 Implementation Studies, generally lasting be-
tween 6 months and 2 years, have been initiated. These are deliber-
ELIXIR facilitates collaboration between its member institutes and ately of short duration, timely, focused on bottlenecks highlighted
researchers with two intersecting organizational groupings, the by a Community or Platform, and their completion generally pro-
Platforms (https://elixir-europe.org/platforms) and the Communities vides analysis components, workflows or validations. They are
(https://elixir-europe.org/communities). The ELIXIR Platforms
selected based on high scientific or technical merit and broad applic-
(Data, Compute, Interoperability, Tools and Training) are sup-
ability, typically addressing different stages of the data life cycle, e.g.
ported by Technical Coordinators located at the ELIXIR
secure data deposition, distributed data annotation or advanced
Headquarters, with vision and strategy led by senior scientists in the
data reuse. They drive development of the service portfolio and gen-
Nodes. As part of joining ELIXIR, each Node contributes a set of
erate opportunities for building bundles of services bespoke for par-
services (https://elixir-europe.org/services) that are aligned with the
ticular communities.
Platforms, e.g. Human Protein Atlas (Uhlén et al., 2015) (Data
Platform—ELIXIR Sweden), CSC Cloud (https://research.csc.fi/)
(Compute Platform—ELIXIR Finland), Identifiers.org (Juty et al.,
2012) (Interoperability Platform—EMBL-EBI), CAMEO (Haas
ELIXIR enables progress within life sciences
et al., 2018) (Tools Platform—ELIXIR Switzerland) and TeSS ELIXIR Communities and Platforms illustrate how ELIXIR fosters
Training Portal (Beard et al., 2020) (Training Platform—ELIXIR long-term collaborations that connect national resources into trans-
UK). The Node-contributed services form the basis for ELIXIR de- national infrastructure. The federated data infrastructure is held to-
velopment work. gether by strong community standards, with users and data
2508 J.Harrow et al.
COMMUNITIES
Integration of
ELIXIR Biocontainers
Intrinsically
registries Metabolomics
Disordered
Proteins
Tools
Data Compute
Microbial 3D BioInfo
PLATFORMS
Biotechnology
Bioschemas
Training
Marine Proteomics
Metagenomics Learning
Common Workflow paths
Language
ELIXIR Beacons
Fig. 2. ELIXIR’s life science research Communities and Platforms are linked by Implementation Studies
providers agreeing on conventions for annotating, depositing, find- METdb provide important new reference data sources for interpretation
ing and accessing data. ELIXIR’s distributed organization with a sci- and species assignment within the wider marine metagenomics commu-
entific leadership that represents 23 national bioinformatics Nodes nity (Robertsen et al., 2017).
is a powerful vehicle to consult and build consensus on joint stand- A software ‘container’ packages code with all its dependencies,
ards and drive the implementation in national organizations and enabling the application to run quickly and reliably across a range
communities. of computing environments, facilitating reproducible research. The
For instance, the ELIXIR Plant Sciences Community has built a BioContainers Implementation Study is an example of an initiative
common technical infrastructure and associated social practices to driven by the ELIXIR Tools Platform to provide a stable infrastruc-
support plant genotype-phenotype analysis with the goal to make ture for unifying software containerized solutions within ELIXIR.
plant genotypic and phenotypic data easier to find, integrate and This infrastructure provides an access point for end-users to find,
analyze, by making them FAIR (Pommier et al., 2019). The generate, store, monitor and even benchmark software containers.
Community developed and proposed an extension of the Minimal
This has resulted in the development of a registry for containers,
Information about Plant Phenotyping Experiments (MIAPPE) v1.0
BioContainers (da Veiga Leprevost et al., 2017), which allows
specification (Krajewski et al., 2015) and is recognized as a signifi-
researchers open access to 7.9 K containerized tools.
cant player in the global plant community and is working on
The Bioschemas convention, set up in 2015 to improve FAIRness
MIAPPE with EMPHASIS (https://emphasis.plant-phenotyping.eu/),
of data, is driven by the ELIXIR Interoperability Platform as an
the ESFRI for European Plant Phenotyping, and the CGIAR (https://
www.cgiar.org/), the world’s largest global agricultural research open international community initiative. Bioschemas improves the
organization. findability of data in the life science resources by extending a
It is particularly beneficial in rapidly developing fields to bring to- Schema.org markup to specific biological data types. This markup
gether leading experts across national centers to collaboratively develop can be introduced into websites, without the need for specific tech-
infrastructure and standards. ELIXIR’s Marine Metagenomics nical expertise, such that they become readily indexable by search
Community has addressed a lack of accessible databases by providing engines and other services (Profiti et al., 2018). ELIXIR continues to
curated reference data that, together with benchmarking datasets and encourage the development of Bioschemas through its yearly
standardization of computational pipelines, provides the necessary in- Biohackathon (https://www.biohackathon-europe.org/), which
frastructure to enhance both academic and industrial research and de- brings developers together from different Platforms and
velopment. New databases such as MarRef, MarDB, MarCat and Communities to collaborate on preselected projects.
ELIXIR 2509
ELIXIR also funds strategic infrastructure projects such as AAI, funding and life cycle management strategy that secures the long-
the Authorization and Authentication Infrastructure (https://elixir- term sustainability of those resources’. Defined according to a proto-
europe.org/services/compute/aai). AAI facilitates data access that col based on 23 indicators described in the article ‘Identifying
requires specific authorization on account of its sensitive nature, e.g. ELIXIR Core Data Resources’ (Durinx et al., 2017), the ELIXIR
human genomic data. Before data access authorization and delivery Core Data Resources (https://elixir-europe.org/platforms/data/core-
can be carried out, a researcher needs to be identified and their iden- data-resources) are a set of European data resources of fundamental
tity authenticated to a sufficient level of assurance. AAI provides importance to the wider life science community and the long-term
this function. ELIXIR AAI is still in development, but already has preservation of biological data.
603 enabled identity providers and 2470 registered users organized Having defined this set, ELIXIR has conducted an analysis to
in 423 groups, with 59 production services connected to it. demonstrate scale of content and usage, open data and FAIR data
leadership, impact, integration, interdependencies and precarious
assured funding for these foundational data resources (Drysdale
repositories and support for good development practice will drive ELIXIR is coordinating the EOSC-Life project (http://www.eosc-
data reproducibility and provenance, which are of high importance life.eu/), where the 13 ESFRI life sciences Infrastructures in Europe
in both research and clinical practices. The ELIXIR Implementation (https://www.esfri.eu/) join forces to create an open collaborative
Study Deploying Reproducible Containers and Workflows Across digital space for life science in the EOSC. The project aims to pub-
Cloud Environments (https://elixir-europe.org/about-us/implementa lish data from the life sciences Infrastructure facilities and centers as
tion-studies/containers-workflow-cloud), addresses the need for a FAIR Data Resources in the cloud, link reusable Tools and
stable infrastructure for unifying software container solutions within Workflows to standardized compute services in national life science
ELIXIR. This infrastructure will provide an access point for end- clouds, connect all users across Europe to a single login AAI system,
users to find, generate, store, monitor and even benchmark software and to develop joint data policies to preserve and deepen the trust
container solutions. Such studies drive collaboration: the technical given by research participants and patients volunteering their data
experts in ELIXIR’s Compute and Tools Platforms will work with and samples.
developers and researchers in the Galaxy, Metabolomics and ELIXIR has well-established collaborations with a number of
Bousfield,D. et al. (2016) Patterns of database citation in articles and patents Lauer,K. et al. (2019) Supporting innovation in bioinformatics: how ELIXIR
indicate long-term scientific and industry value of biological data resources. connects public bioinformatics infrastructure with industry.
F1000Research, 5, 160. F1000Research, 8, 1136.
Drysdale,R. et al. (2020) The ELIXIR Core Data Resources: fundamental in- Martin,C. et al. (2019) ELIXIR’s long-term sustainability plan (2019).
frastructure for the life sciences. Bioinformatics, 36, 2636–2642. F1000Research, 8, 1642.
Dunn,M.C. and Bourne,P.E. (2017) Building the biomedical data science Phillips,M. (2018) International data-sharing norms: from the OECD to the
workforce. PLoS Biol., 15, e2003082. General Data Protection Regulation (GDPR). Hum. Genet., 137, 575–582.
Durinx,C. et al. (2017) Identifying ELIXIR Core Data Resources. Pommier,C. et al. (2019) Applying FAIR principles to plant phenotypic data
F1000Research, 5, 2422. management in GnpIS. Plant Phenomics, 2019, 1671403.
Eijssen,L. et al. (2015) The Dutch Techcentre for Life Sciences: enabling Profiti,G. et al. (2018) Using community events to increase quality and adop-
data-intensive life science research in the Netherlands. F1000Research, 4, tion of standards: the case of Bioschemas. F1000Research, 7, 1696.
33. Robertsen,E.M. et al. (2017) ELIXIR pilot action: marine metagenomics to-
Fiume,M. et al. (2019) Federated discovery and sharing of genomic data using wards a domain specific set of sustainable services. F1000Research, 6, 70.