Professional Documents
Culture Documents
Neuroscience Data in The Cloud: Opportunities and Challenges: Proceedings of A Workshop 1st Edition National Academies of Sciences
Neuroscience Data in The Cloud: Opportunities and Challenges: Proceedings of A Workshop 1st Edition National Academies of Sciences
https://ebookmeta.com/product/neuroscience-trials-of-the-future-
proceedings-of-a-workshop-1st-edition-and-medicine-engineering-
national-academies-of-sciences/
https://ebookmeta.com/product/brain-health-across-the-life-span-
proceedings-of-a-workshop-1st-edition-national-academies-of-
sciences/
https://ebookmeta.com/product/the-drug-development-paradigm-in-
oncology-proceedings-of-a-workshop-1st-edition-and-medicine-
engineering-national-academies-of-sciences/
https://ebookmeta.com/product/the-ebola-epidemic-in-west-africa-
proceedings-of-a-workshop-1st-edition-and-medicine-engineering-
national-academies-of-sciences/
https://ebookmeta.com/product/data-breach-aftermath-and-recovery-
for-individuals-and-institutions-proceedings-of-a-workshop-1st-
edition-and-medicine-engineering-national-academies-of-sciences/
Neuroscience Data
in the Cloud
Opportunities and Challenges
PROCEEDINGS OF A WORKSHOP
This activity was supported by contracts between the National Academy of Sciences and the
Alzheimer’s Association; Cohen Veterans Bioscience; Department of Health and Human
Services’ Food and Drug Administration (5R13FD005362-05) and National Institutes of
Health (NIH) (75N98019F00769 [Under Master Base HHSN263201800029I]) through the
National Center for Complementary and Integrative Health, National Eye Institute, National
Institute of Mental Health, National Institute of Neurological Disorders and Stroke, National
Institute on Aging, National Institute on Alcohol Abuse and Alcoholism, National Institute on
Drug Abuse, and NIH Blueprint for Neuroscience Research; Department of Veterans Affairs
(VA240-14-C-0057); Eisai Inc.; Eli Lilly and Company; Foundation for the National Institutes
of Health; Gatsby Charitable Foundation; Janssen Research & Development, LLC; The Kavli
Foundation; Lundbeck Research USA; Merck Research Laboratories; The Michael J. Fox
Foundation for Parkinson’s Research; National Multiple Sclerosis Society; National Science
Foundation (BCS-1064270); One Mind; Sanofi; Society for Neuroscience; Takeda
Pharmaceuticals International, Inc.; The University of Rhode Island; and Wellcome Trust.
Any opinions, findings, conclusions, or recommendations expressed in this publication do
not necessarily reflect the views of any organization or agency that provided support for the
project.
Additional copies of this publication are available from the National Academies Press, 500
Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; htt
p://www.nap.edu.
__________________
1 The National Academies of Sciences, Engineering, and Medicine’s planning committees
are solely responsible for organizing the workshop, identifying topics, and choosing
speakers. The responsibility for the published Proceedings of a Workshop rests with the
workshop rapporteurs and the institution.
FORUM ON NEUROSCIENCE AND NERVOUS
SYSTEM DISORDERS1
PART 1
CLOUD-BASED TECHNOLOGIES FOR NEUROSCIENCE
RESEARCH: CHALLENGES AND POTENTIAL SOLUTIONS
PART 2
DIFFERENT TYPES OF NEUROSCIENCE DATA: CHALLENGES
AND POTENTIAL OPPORTUNITIES
8 GENETIC DATA
Current Promising Practices for Managing Genetic Data in the
Cloud
Issues to Be Resolved Regarding Genetic Data in the Cloud
9 NEUROIMAGING DATA
Current Promising Practices for Neuroimaging Data in the Cloud
Issues to Be Resolved to Advance Cloud-Based Neuroimaging Data
Resources
10 REAL-WORLD DATA
Current Promising Practices for Managing Real-World Data in the
Cloud
Issues to Be Resolved to Incorporate Real-World Data into Clinical
Studies
11 FUTURE DIRECTIONS
Technology and Methods: Progress and Challenges
Training the Next Generation of Scientists
Funding: Current Commitments and Future Needs
Potential Next Steps: Working Groups to Move the Field Forward
APPENDIXES
A References
B Workshop Agenda
C Registered In-Person Attendees
1
Thirty years ago, the National Institute of Mental Health (NIMH), the
National Institute on Drug Abuse (NIDA), and the National Science
Foundation (NSF) commissioned the Institute of Medicine (IOM) to
consider the future of digital and networked neuroscience, recalled
Michael Huerta, associate director of the National Library of Medicine
(NLM). It is fitting that 30 years later, a group reconvened at the
National Academies of Sciences, Engineering, and Medicine (the
National Academies), which incorporates the former IOM, to explore the
burgeoning use of cloud computing in neuroscience, said Huerta. On
September 24, 2019, the National Academies’ Forum on Neuroscience
and Nervous System Disorders hosted a workshop on neuroscience data
in the cloud, co-chaired by Huerta and Deanna Barch, chair of the
Department of Psychological and Brain Sciences at Washington
University in St. Louis.2 Box 1-1 provides definitions for some of the
core concepts related to cloud computing discussed throughout the
workshop. The intention of the workshop, said Barch, was to focus on
maximizing the benefits that can be realized from neuroscience data.
BOX 1-1
Definition of Cloud Computing and Select
Related Concepts
Cloud computing: as defined by the National Institute of
Standards and Technology, “is a model for enabling ubiquitous,
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned
and released with minimal management effort or service provider
interaction” (Mell and Grance, 2011).
__________________
a From Nature Research, available at https://www.nature.com/subjects/data-integratio
n (accessed January 16, 2020).
b From the Princeton University Center for Data Analytics & Reporting, available at http
s://cedar.princeton.edu/understanding-data/what-data-model (accessed January 17,
2020).
c From the National Institutes of Health Strategic Plan for Data Science, available at htt
ps://datascience.nih.gov/strategicplan (accessed January 16, 2020).
WORKSHOP OBJECTIVES
The workshop brought together a broad range of stakeholders
involved in cloud-based neuroscience initiatives and research to explore
the use of cloud technology to advance neuroscience research and
share approaches to address current barriers. These stakeholders
represented academia, government, foundations, the pharmaceutical
and information technology industries, and the legal system. They were
tasked not only with identifying challenges, but also with suggesting
solutions and best practices that can help optimize the utility and
increase the efficiency of cloud-based neuroscience initiatives, support
ongoing efforts, and share information about the work of others, said
Barch.
In addition to cloud-specific issues, the workshop covered a number
of topics related to encouraging data sharing and open science, which
are integrally relevant for, but not specific to, cloud-based platforms.
Many discussions at the workshop covered issues, such as privacy
protection, that are common across many types of data, not just
neuroscience. The workshop provided a venue for members of the
neuroscience community to come together to discuss approaches for
tackling these common challenges, as well as challenges that are
specific to neuroscience data and the cloud-based platforms that are
dedicated to neuroscience or are frequently used by this community.
Box 1-2 provides the workshop Statement of Task.
ORGANIZATION OF PROCEEDINGS
These proceedings reflect the organization of the meeting. Chapter 2
summarizes talks about the landscape of cloud-based technologies for
neuroscience research. Two sets of breakout sessions are summarized
in Parts 1 and 2, which organize issues by content area and types of
data, respectively. In Part 1, Chapter 3 covers issues related to the
protection of privacy; Chapter 4 addresses data management and
interoperability issues; Chapter 5 examines issues related to assignment
of credit and data ownership; and Chapter 6 discusses platform
governance. In Part 2, challenges related to different types of
neuroscience data are examined: Chapter 7, clinical trial and research
data; Chapter 8, genetic data; Chapter 9, neuroimaging data; and
Chapter 10, real-world data. Chapter 11 concludes with a discussion of
future directions, including identifying tangible next steps and promising
areas for future action.
BOX 1-2
Statement of Task
Review the landscape of major neuroscience cloud-based
initiatives and other uses of cloud technology within
neuroscience research.
Discuss aspirational goals for maximizing benefit from data
and compute in the cloud by empowering broad and
meaningful data sharing and fostering open science.
Consider best practices and policies that would increase
efficiencies within and across cloud resources, including
aspects such as:
Consent and data use agreements
Authorization for and accessibility to a variety of data
types by a variety of users
Protection of privacy
Assignment of credit, ownership, and licensing
Technical issues
Researcher support and training
Explore potential next steps to move the field forward to
develop and deploy best practices in the service of achieving
aspirational goals.
__________________
1 The planning committee’s role was limited to planning the workshop, and the Proceedings
of a Workshop was prepared by the workshop rapporteurs as a factual summary of what
occurred at the workshop. Statements, recommendations, and opinions expressed are those of
individual presenters and participants; have not been endorsed or verified by the Health and
Medicine Division (HMD) of the National Academies of Sciences, Engineering, and Medicine; and
should not be construed as reflecting any group consensus.
2 For further information about the workshop, including slides presented by speakers, see htt
p://www.nas.edu/NeuroForum (accessed January 17, 2020).
2
Highlightsa
The International Neuroscience Coordinating Facility
coordinates the development and endorsement of findable,
accessible, interoperable, and reusable (FAIR) community
standards and best practices (Martone).
The cloud alone is not a replacement for depositing data into
a proper archive. To maximally impact neuroscience research,
data need to be maintained in a manner that makes them
FAIR (Martone).
To enable data sharing, data standards are needed at all levels
of datasets and become more specialized according to user
needs (Martone).
The platform OpenNeuro enables free and open sharing of
multiple data types using a community standard called the
Brain Imaging Data Structure (BIDS) (Poldrack).
BIDS apps—containerized applications of diverse
neuroimaging software packages—allow users to analyze data
easily and reproducibly, which provides users with additional
benefits and incentivizes adherence to the BIDS standards
(Poldrack).
The Science and Technology Research Infrastructure for
Discovery, Experimentation, and Sustainability (STRIDES)
Initiative is designed to make it easier for researchers to use
cloud-based services such as Google Cloud and Amazon Web
Services (Weber).
STRIDES has facilitated the building of several large and high-
value datasets managed by the National Institutes of Health
and enabled the transfer of more than 30 petabytes of data to
the cloud, where they are accessible to the research
community (Weber).
__________________
a These points were made by the individual workshop participants identified above.
They are not intended to reflect a consensus among workshop participants.
INTERNATIONAL NEUROSCIENCE
COORDINATING FACILITY
Neuroscience is never going to be served by a single platform or a
single infrastructure because there are too many different types of data
and too much technological flux, Martone said, and the cloud alone
should not be seen as a replacement for contributing data to a proper
data repository or archive. To seriously impact the field and move
neuroscience forward, efforts are needed to ensure that data are
maintained in a manner that makes them accessible to both humans
and machines, she said. INCF1 was initially established to create global
infrastructure and data standards with a goal of facilitating organization
and usability of neuroscience data. INCF has grown into a membership
organization with members from 18 countries across 4 different
continents. It aims to coordinate the development and endorsement of
open and FAIR—findable, accessible, interoperable, and reusable—
community standards and best practices that will enable data to be
shared in a maximally useful way for both humans and machines. INCF
also focuses on developing and providing training and educational
resources, and serves as an interface among international large-scale
brain projects, said Martone.
Martone noted that developing and implementing FAIR standards
requires a partnership among researchers, repositories, indexers, and
aggregators. Moreover, for any given dataset there may be dozens of
standards and best practices that need to be brought together in a way
that can be navigable by a range of users. Martone illustrated how
these various standards relate to one another using what she calls the
FAIR onion, shown in Figure 2-2. A host of organizations, societies, and
others act as convening authorities to bring experts together to
establish standards at the different layers of the onion. At the outer
layer of the onion where data are more specialized, problems can only
be solved by the neuroscientists generating those data rather than by
general organizations, said Martone. Organizations like INCF play a
critical role in bringing these researchers together at the outer layers of
the onion, she said.
FIGURE 2-1 An exciting time for global neuroscience. Global neuroscience projects such
as those pictured here have been enabled by the cloud.
SOURCE: Presented by Maryann Martone, September 24, 2019.
OPEN NEURO
OpenNeuro2 is a platform that enables free and open sharing of
magnetic resonance imaging (MRI), magnetoencephalography (MEG),
electroencephalography (EEG), invasive EEG (iEEG), and
electrocorticography (ECoG) data. According to Russell Poldrack,
director of the Stanford Center for Reproducible Neuroscience,
OpenNeuro was built on an early project called OpenfMRI, a resource
that was developed to enable open sharing of data from task-based
functional MRI (fMRI) studies (Poldrack et al., 2013). In creating
OpenfMRI, Poldrack and colleagues developed a data organization
scheme that was specific to the type of data that would be submitted
and that would allow automatic analysis of these data. There was no
way to validate a dataset other than to run it through the pipeline. If
the pipeline crashed, manual curators at Stanford had to figure out
what went wrong. The process was very labor intensive, said Poldrack.
FIGURE 2-2 The FAIR onion. Standards are needed at all levels of datasets, as
illustrated by the FAIR onion. At the core, standards for basic data descriptors are
needed. As data become more complex and specialized, additional layers of standards are
required: first, standardized community vocabularies and data types; followed by domain-
specific vocabularies, minimal information models, and common data elements. At the
outer layers of the onion, specialized vocabularies and information models as well as
customized standards and formats are needed for specific applications.
NOTE: CDE = common data elements; FAIR = findable, accessible, interoperable,
reusable.
SOURCE: Presented by Maryann Martone, September 24, 2019.
STRIDES
The NIH Center for Information Technology has also made a major
commitment to adopting and developing best practices related to cloud
technologies as a means of supporting the research community, said
Nick Weber, program manager for Cloud Services at the NIH Center for
Information Technology. The Science and Technology Research
Infrastructure for Discovery, Experimentation, and Sustainability
(STRIDES) Initiative, launched in July 2018, now has partnerships with
Google Cloud and Amazon Web Services (AWS) and is working on
additional partnerships with other commercial providers, said Weber.
STRIDES aims to make it easier for researchers to use these services,
access data, and employ the latest tools and technologies while
protecting the security and privacy of data, he said. Other important
elements of the STRIDES Initiative include a training component across
the full range of users, including technical staff, bench researchers, data
scientists, and informaticians, and providing insight into sustainability
by gathering data on data usage to inform funding decisions, said
Weber.
STRIDES has facilitated building the operational environment for
several large and high-value NIH-managed datasets such as those
generated by Common Fund programs, the Trans-Omics for Precision
Medicine (TOPMed) program sponsored by the National Heart, Lung,
and Blood Institute (NHLBI), and the Accelerating Medicines
Partnership-Parkinson’s Disease (AMP-PD) program, said Weber.
Already, STRIDES investments have provided benefits to these research
programs in terms of cost savings and improved access to professional
services and enterprise support, he said. He added that STRIDES has
enabled the transfer of more than 30 petabytes of data into the cloud,
making it more widely accessible to the research community. Ultimately,
Weber predicts that STRIDES will facilitate improved interconnections
among datasets that otherwise would not have been connected. To
achieve this, he said, STRIDES has initiated efforts to make sure
funding agencies and partners understand how to leverage STRIDES
resources, for example, by including information about STRIDES in
funding opportunity announcements.
__________________
1 For more information, see https://www.incf.org (accessed November 12, 2019).
2 For more information, see https://openneuro.org (accessed November 12, 2019).
Part 1
Highlightsa
A complicated web of laws, including the General Data
Protection Regulation in Europe and the Health Insurance
Portability and Accountability Act and the Common Rule in the
United States, regulate privacy and security in research and
complicate efforts to share data across geographical
boundaries (Rosati).
Regulations regarding whether data can be shared are in
constant flux and upcoming changes to the Common Rule are
likely to cause confusion about sharing genomic information
(Rosati).
Different frameworks and models are required to protect
patient privacy depending on the design of a study, its
governance model, and its infrastructure (Hanson, Mackay).
Federated data-sharing platforms protect data by bringing
analytical tools to the data, rather than permitting
downloading the data for analysis, to allow analyses of
multiple datasets without violating restrictions on transfer of
personal information mandated by privacy laws (Rosati).
New approaches to informed consent are needed to enable
greater data sharing and collaboration (Barch, Haas, Rosati).
__________________
a These points were made by the individual workshop participants identified above.
They are not intended to reflect a consensus among workshop participants.
The UK Biobank, possibly the largest cohort study in the world with
500,000 individuals, multiple data types, and multiple institutions,
but a single principal investigator and single IRB.1
The Wellcome Center for Integrative Neuroimaging (WIN), a study
taking place at a single institution with multiple study types,
principal investigators, and IRBs.2
Dementias Platform UK (DPUK), a substudy of the Field platform
(which also includes Health Data Research UK, or HDRUK) that is
taking place at multiple institutions with multiple study types, IRBs,
consent, and principal investigators.3
The message Mackay delivered to the workshop is that one size does
not fit all. Each project has a different governance model and different
infrastructure designed to fit the types of data gathered, the study
participants, and the users of the data.
The UK Biobank, for example, is open by design, said Mackay.
Participants consent upon enrollment to having their data shared, not
only imaging data, but also a range of sensitive health and personal
information. Data access is governed through a data access committee;
researchers apply and pay an administrative fee for access, said
Mackay. She added that an important aspect of the project is that there
is no disclosure of any health information to the participants, which
limits the ability to recruit potential participants for future studies or for
participants to self-identify for future studies, since the health
information that would make them eligible for the study cannot be
disclosed.
FIGURE 3-1 Penn Med’s principles for access, use, and disclosure of patient information.
Penn Med is working to develop a framework for enabling data sharing while protecting
the privacy of patients.
NOTE: BAA = business associate agreement; HIPAA = Health Insurance Portability and
Accountability Act; IS = information services; PM = Perelman School of Medicine at the
University of Pennsylvania; SSN = Social Security number; TCPA = Telephone Consumer
Protection Act.
SOURCE: Presented by William Hanson, September 24, 2019.
__________________
1 For more information, see https://www.ukbiobank.ac.uk (accessed November 10, 2019).
2 For more information, see https://www.ndcn.ox.ac.uk/divisions/fmrib (accessed November
10, 2019).
3 For more information, see https://www.dementiasplatform.uk (accessed November 10,
2019).
4 For more information, see https://www.fda.gov/safety/fdas-sentinel-initiative (accessed
November 10, 2019).
4
Highlightsa
Developing interoperability mechanisms that enable data
platforms to talk to each other is critical, whether or not
these platforms exist in the cloud (Evans).
The National Library of Medicine is working to accelerate
the promotion and adoption of Fast Healthcare
Interoperability Resources standards to promote data
exchange across the National Institutes of Health (Huerta).
Although housing data in a single place could support more
rapid research progress, sometimes it may be more
practical to use federated models and store different levels
of data in different ways. For example, the Psychiatric
Genomics Consortium shares data on a compute cluster in
the Netherlands that, while not in the cloud, uses similar
data sharing and standardized processing approaches
(Neale).
Harmonized approaches, funding, and training are needed
to enable transforming data from a raw state to a
standardized format, which is costly and time consuming
(Huerta, Nalls, Ramoni, Snyder).
A common coordination frame would be needed to merge
different types of data in repositories and platforms
(Marcus).
__________________
a These points were made by the individual workshop participants identified above.
They are not intended to reflect a consensus among workshop participants.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.