Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/314500852

XTENS - A JSON-Based Digital Repository for Biomedical Data Management

Conference Paper · April 2015


DOI: 10.1007/978-3-319-16480-9_13

CITATION READS

1 100

6 authors, including:

Massimiliano Izzo Gabriele Arnulfo


University of Oxford Università degli Studi di Genova
29 PUBLICATIONS   231 CITATIONS    42 PUBLICATIONS   952 CITATIONS   

SEE PROFILE SEE PROFILE

Marco Fato
Università degli Studi di Genova
109 PUBLICATIONS   1,311 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

SWALIS View project

Neuroscience View project

All content following this page was uploaded by Marco Fato on 10 May 2017.

The user has requested enhancement of the downloaded file.


XTENS - A JSON-Based Digital Repository
for Biomedical Data Management

Massimiliano Izzo1,2,*, Gabriele Arnulfo1,3, Maria Carla Piastra1,


Valentina Tedone1, Luigi Varesio2, and Marco Massimo Fato1
1
Department of Computer Science Bioengineering Robotics and Systems Engineering,
University of Genoa, Viale Causa 13, 16145 Genoa, Italy
2
Laboratory of Molecular Biology, Giannina Gaslini Institute, Largo Gaslini 5, 16147 Genoa,
Italy
3
Neuroscience Center, P.O. Box 56, FI-00014 University of Helsinki, Finland
massimorgon@gmail.com

Abstract. Biomedical Science poses unique challenges in data management.


Heterogeneous information - such as clinical records, biological specimens,
imaging and genomic data, different technology-associated formats - must be
collected and integrated to provide a unified overview of each patient. Interna-
tional scale research collaborations involve different disciplines (Medi-
cine/Biology, Engineering/IT, Physics,...). Extensive metadata is required to
maximize information sharing among the partners. To properly tackle these is-
sues, we have developed XTENS, a data repository built on a flexible and ex-
tensible JSON-based data model. The JSON data model is conceived to achieve
maximal flexibility, to allow adaptive metadata management, and to perceive
metadata as a dynamical process of scientific communication rather than an en-
during product fixed in time. XTENS is integrated with iRODS, a data grid
software that allows distributed storage, metadata file annotation and advanced
policies for data curation. We have adopted the platform for a functional con-
nectomics multicentric project where heterogeneous data sources (radiological
images, electroencephalography signals) must be integrated and analysed to
compute connectivity maps of the brain. To this end, we have tested the reposi-
tory prototype allowing the external programs to interact with XTENS using a
service-oriented REST interface. We demonstrated XTENS usefulness because
we could input heterogeneous data, run the required processing tool and store
the process output.

1 Background
Data Management in Biomedical Science presents peculiar challenges. Research
projects in the field are constantly moving towards international collaboration with
participants coming from a great variety of disciplines (medicine, biology, engineer-
ing/IT, etc.). In such a scenario, extensive metadata are required to improve the acces-
sibility of the shared information, be it clinical records, biological specimens’

*
Corresponding author.

F. Ortuño and I. Rojas (Eds.): IWBBIO 2015, Part II, LNCS 9044, pp. 123–130, 2015.
© Springer International Publishing Switzerland 2015
124 M. Izzo et al.

(i.e. samples) characteristics, or the output of high-throughput analyses. Many data


repositories have been proposed in the field. Some of them - such as SimBioMS [1]
and openBIS [2] - are more focused on molecular biology and genomics, and provide
support for sample management; others - like XNAT [3], COINS [4] or LORIS [5] -
are more oriented towards Neuroscience/Neuroimaging, and are equipped with tools
for automatic metadata extraction from common radiological formats
(DICOM/NIFTI) and quality controls on the uploaded images. A common trait of all
these systems is that, in order to create novel data types - such as a hitherto unsup-
ported genomic assay or a novel imaging format - the system must be reconfigured by
an operator with sufficient informatics skills, able to modify SQL relations or XML
schemas. As a consequence, shared data is often described with a somewhat fixed and
limited set of metadata, out of the control of the researchers themselves, with strong
drawbacks on the quality and completeness of the shared information, and the output
of biomedical collaborative projects. Recent studies in Neural [7] and Social Sciences
[8] promote the view of metadata as a fluid, dynamical process rather than a fixed
product; therefore we feel that modern repository should comply this requirement,
providing adaptive metadata management and configuration tools to maximize infor-
mation sharing and understanding in multidisciplinary, international collaborations.
Here we describe a novel implementation of XTENS, a data repository based on a
JSON metadata model [9], as a proper tool to solve data management issues in Bio-
medical Sciences and specifically in Functional Connectomics studies. Previously,
XTENS was developed with Java technology and had been successfully used both
in Neuroscience [10] and in integrated biobanking management [11]. In the next
paragraphs, we describe the novel JSON-compliant XTENS architecture and its
underlying data model. Then we will illustrate a use case scenario - a Stereo-
ElectroEncephaloGraphy (SEEG) collaborative project where our group was provid-
ing support for data management and image/signal processing.

2 Results
2.1 The Repository

In XTENS metadata model, each Data instance is characterized by its Data Type. The
Data Types provide a hierarchical JSON schema that works as a ‘template’ for the
Data instance. The schema is composed by (i) a header with name, a brief description
and the version of the data type, and (ii) a body, that contains the full set of metadata
properties (i.e. attributes) usually collected in metadata groups. Each property is cha-
racterized by a name, a primitive type (i.e. number, string, date, boolean...), an (op-
tional) ontology term, value options and units, a set of validation properties and a
sensitivity flag. Each Data instance contains a metadata field in JSON format, where
all the properties defined in the Data Type schema are stored as a name-value-unit
triples. Subjects and Samples are treated as specialized versions of the data instance
class, containing additional fields, methods and relationships. This paradigm ensures
security, anonymisation of personal details, and dedicated biobanking management.
The new XTENS implementation supports management of samples from different
biobanks, and describes biobanks according to the MIABIS specifications of the
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 125

BBMRI consortium [11]. An outline of XTENS data model is shown in Figure 1.


Figure 2 provides a visual example of the JSON schema. XTENS source code is
available on Github [12]

Fig. 1. XTENS data model. Each Data instance is characterized by its Data Type, and the Data
Type schema provides the structure to build up the metadata JSON object. Subject and Sample
are specialized classes of Data. Personal Details are managed in a separate class to allow easy
anonymisation and satisfy privacy requirements.

The model has been implemented on the novel XTENS repository using a modular
structure, separating the concerns between the client interface (i.e. front-end) and the
back-end. The latter consists of a web application running on a Node.js server, a Post-
greSQL database, and a distributed file system based on the iRODS data grid software
[13]. The adoption of iRODS makes XTENS the first data management platform rely-
ing on a distributed file storage. The web application is written in JavaScript to pro-
vide full compliance with the JSON metadata model, and exposes a RESTful interface
to allow a common access specification for the XTENS front-end and for external
applications. We have chosen PostgreSQL as database management system because it
supports JSON as a native type. The upcoming 9.4 version will introduce a binary
JSON format (JSONB) that improves dramatically the query speed on retrieval.
iRODS is equipped as well with a REST API (irods-rest) that allows direct and trans-
parent upload/download from the client interface. When a new Data Instance is
created using the front-end web form, the user can upload one or more associated
files. Uploaded files are temporary stored in a ‘landing’ collection, and moved to their
permanent location after the Data form is submitted. The association between the
Data instance and Data file location in then stored in the database. A dynamical query
interface based on the Composite design pattern allows users to perform queries on
every Data Type previously defined. Data Type schemas can be updated with new
126 M. Izzo et al.

properties using the graphical interface; new and updated data types can be used im-
mediately for create-retrieve-delete-update (CRUD) operations with no additional
programming, compilation or restart steps. The user (with administrative authoriza-
tion) is given full control to describe its data and experiments, without having to
resort every time to an IT expert. New Data Types can be created with minimal effort,
to improve data sharing in different scenarios and to promote the concept of metadata
as a dynamical, ephemeral and fluid process.

schema metadata

{ {
"header": { "label":";METADATA FIELD",
"schemaName":"DATA_EXAMPLE", "fieldType":"Float", {
"description":"DESCRIPTION", "name":"test_field1", "test_field1":{
"version":"1.0", "iri":"", "value": 50.7,
"classTemplate": "GENERIC" "customValue":"", "unit":"first unit"
"fileUpload":true "required":true, },
}, "sensitive":false, "test_field2":{
"body": [ "hasRange":true, "value": 20,
{ "min";0, "unit":"another unit"
"label":"METADATA_GROUP", "max":1000.0, },
"name":"FIRST_GROUP", "step":0.1, "test_field3":{
"iri":"", "isList":false, "value": "some text",
"content":[ "possibleValues": null, },
{...}, "hasTableConnection":false, {...}
{... "tableConnection":null, }
"name":"test_field2" "hasUnit":true,
...}, "possibleUnits": [
{...} "first unit",
] "second unit"
} ],
} }

Fig. 2. Outline of the JSON metadata schema used in XTENS. Each Data Type has a schema
composed by a header and a body. The latter is an array of Metadata Groups. Each contains one
or more Metadata Field or Loops. Metadata Loops are not shown for sake of simplicity. A
Metadata Field (shown in the second column) represents the leaf of the model and is described
by a set of property that specify its primitive type, name, an optional International Resource
Identifier (IRI) for linked data support, a set of validation properties to determine if it is re-
quired, if it stores sensitive/personal information, if (in case of numeric type) the value must
fall in a determined range, if the value must be selected from a list of controlled terms, and if it
has a measure unit. The "metadata" column of a Data entity stores the instances of each meta-
data field as a name-value-unit triple.

2.2 SEEG Use Case

We have set up a first XTENS 2.0 prototype to manage imaging data - computed
tomography (CT), magnetic resonance imaging (MRI) and Stereo-
ElectroEncephaloGraphy (SEEG) - data in a collaborative project involving three
centres: Niguarda Hospital in Milan (providing data), the Neuroscience Centre at the
University of Helsinki (developing methods), and the Department of informatics,
Bioengineering, Robotics and System Engineering (DIBRIS) at the University of
Genoa (Data storage and management). The aim of the project is to exploit recent
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 127

advancements in functional and effective connectomics to tentatively define biomark-


ers for focal epilepsy. Functional (and effective) connectomics studies describe how
different brain regions interact with each other and how modification of such func-
tional (or effective) couplings is directly linked to neurological pathologies. In this
context, it is crucial to have access to high quality tools to store, analyse, and retrieve
multimodal datasets in order also to comply with national and international laws that
rules the sharing of medical information and patient details.
Details about the methods used in data preprocessing and analyses can be found
elsewhere [14] [15], but we briefly summarize the peculiar steps that interact with the
XTENS platform. SEEG is a highly invasive techniques to record neural activity that
is routinely used in clinical application aimed at localizing seizure onset zones in
patients with drug-resistant focal epilepsy undergoing presurgical evaluation [16].
Despite the sparsity of SEEG implants, recently we showed that it can successfully be
used in the context of functional connectomics studies fully exploiting its potential.
We estimated that ~100 patients are required to reach a 85% coverage of all possible
interactions in a 250 anatomical parcels atlas. The entire analyses can be divided in
two domains: structural and functional. Each domain is characterized by different
data, methods and analyses outputs. The structural domain deals with anatomical data
and is composed of a post-implant CT (postCT) scans that show the electrode in their
final locations and pre-implant MRI (preMRI) that contain the information about
individual brain anatomy. We provided to the physicians a set of medical image
processing tools that are specifically designed to deal with SEEG implants to (i) local-
ize each contact in both individual and common geometrical spaces and (ii) to assign
to each contact its neuronal source on a probabilistic reference atlas (Destrieux, ). The
functional domain deals with signal processing techniques aimed at quantifying the
degree of synchrony between brain regions and at characterizing the so called func-
tional connectome. We developed several tools (i) to correctly estimate phase differ-
ences and (ii) to assess statistical significance of the observed phase couplings.
XTENS successfully manages data describing both domains and provides client-
side services for physicians to submit data and retrieve analyses results. We have
installed XTENS on a cluster of Linux Servers (Ubuntu 12.04 LTS) located at the
Department of Informatics, Bioengineering, Robotics and System Engineering
(DIBRIS), Genoa, Italy. Details of the installation are shown in Figure 2. We current-
ly have defined the following Data Types: Patient, Preimplant_MRI, Postimplant_CT,
Fiducial_List, SEEG_Implant, SEEG_Data, and Adjacency_Matrix. SEEG_Implant
data instances are the output of the segmentation process operated on Postimplant_CT
using Fiducial_List metadata as reference. On the other hand, Adjacency_Matrix is
the data describing brain region phase couplings in individual patient geometry esti-
mated using SEEG_Data.
Postimplant_CT, Fiducial_List, and SEEG_Data are directly uploaded by the phy-
sician on the XTENS repository. We have developed a Node.js package, called xpr-
seeg, to provide a web interface with the segmentation tool running on a separate
server. The user fires through the XTENS client interface. In turn, XTENS sends a
POST request to xpr-seeg forwarding all the required information about the two data
instances. Xpr-seeg executes a bash script that retrieves the required files from
128 M. Izzo et al.

iRODS and runs the segmentation algorithm. Once the procedure is done, the com-
puted SEEG Implant is stored on a file. A novel data instance of SEEG_Implant is
composed by xpr-seeg and saved in XTENS through a POST request. In a similar
way, SEEG_Data are downloaded by operators, manually investigated to rejected
artefactual channels (i.e., non physiological data) and analyzed to build the Adjacen-
cy_matrix. Here, xpr-seeg provides the tool to correctly upload the analysed data to its
data parent (i.e., Patient) in the XTENS repository.

Fig. 3. XTENS setup for the Stereo-EEG collaborative project. XTENS communicates with
xpr-seeg using REST. After the analysis (e.g. segmentation) is run by xpr-seeg, the results' file
is stored in iRODS and a new Data instance is saved on XTENS.

3 Discussion

We have developed a novel data management and service providing platform that is a
valid alternative to more established technological solutions, because it presents a
number of advantages. First, the whole system is structured on JSON format, and
does not require lengthy and cumbersome data transformation or binding often re-
quired when dealing with XML format or entity-attribute-value (EAV) paradigms.
Secondly, the system is conceived to be user-friendly, easily configurable by non-IT
people. This is a major advantage in biomedical science which is hampered by the
difficulties in dealing with complex informatics systems. For instance, XNAT, among
the most popular data repositories for Neuroscience projects, provides data schema in
XML format. To modify data schema expressed in XML it is necessary to have ba-
sic/advanced knowledge of the syntax itself. Furthermore, after data schema changes
the database needs to be updated, XNAT application redeployed and security XNAT
setup must also be performed. In XTENS, the users can create new Data Types and
setup the security and authorization levels using an intuitive graphical interface.
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 129

These characteristics make XTENS suitable for projects in labs that cannot afford a
system administrator. XTENS only requires the initial configuration and deploy while
new data types can be added at runtime.
The previous version of XTENS was based on Java Servlet technology (running on
a Tomcat server), and was backed by a MYSQL database. We moved to Node.js and
PostgreSQL to provide an environment more compliant to our JSON model. The new
version is suitable to test the scaling capabilities of Node.js in conjunction with Post-
greSQL JSONB format when managing large amounts of metadata. We are planning
to run some stress tests on the system, before adopting it in production, and to tune
the database for improving the performances. Moreover, the extensibility and simple
management of the presented platform make it the optimal tool in connectomics stu-
dies. In this evolving and challenging scientific context, exchanging knowledge be-
tween scientific operators and multimodal data approaches can sensibly increase the
interpretability of the results and the applicability of the method itself to the clinical
context. In focal epilepsy studies, the complexity of the pathology itself and difficul-
ties in the localization process can benefit from modern distributed technologies such
as the one suggested in the present work.

4 Conclusions

We developed a novel data repository, with a highly configurable JSON-based data


model, for research collaborations in Biomedical Science. We have tested the reposi-
tory prototype in an ongoing SEEG project where external programs interacted with
the repository using a service-oriented REST interface. We demonstrated its useful-
ness in Computational Neuroscience because we could (i) input CT/MRI and SEEG
data, (ii) run the required processing tool and (iii) output the phase couplings co-
localized on both individual and common anatomical spaces.

References
1. Krestyaninova, M., Zarins, A., Viksna, J., Kurbatova, N., Rucevskis, P., Neogi, S.G.,
Gostev, M., Perheentupa, T., Knuuttila, J., Barrett, A., et al.: A System for Information
Management in BioMedical Studies–SIMBioMS. Bioinformatics 25, 2768–2769 (2009)
2. Bauch, A., Adamczyk, I., Buczek, P., Elmer, F.J., Enimanev, K., Glyzewski, P., Kohler,
M., Pylak, T., Quandt, A., Ramakrishnan, C., et al.: openBIS: A flexible framework for
managing and analyzing complex data in biology research. BMC Bioinformatics 12, 468
(2012)
3. Herrick, R., McKay, M., Olsen, T., et al.: Data dictionary services in XNAT and the
Human Connectome Project. Front. Neuroinform. 8, 65 (2014)
4. Scott, A., Courtney, W., Wood, D., et al.: COINS: An Innovative Informatics and Neuroi-
maging Tool Suite Built for Large Heterogeneous Datasets. Front. Neuroinform. 5, 33
(2011)
5. Das, S., Zijdenbos, A.P., Harlap, J., Vins, D., Evans, A.C.: LORIS: A web-based data
management system for multi-center studies. Front. Neuroinform. 5, 37 (2011)
130 M. Izzo et al.

6. Neu, S.C., Crawford, K.L., Toga, A.W.: Practical management of heterogeneous neuroi-
maging metadata by global neuroimaging data repositories. Front. Neuroinform. 6, 8
(2012)
7. Edwards, P.N., Mayernik, M.S., Batcheller, A.L., Bowker, G.C., Borgman, C.L.: Science
friction: data, metadata, and collaboration. Soc. Stud. Sci. 41, 667–690 (2011)
8. JSON, http://www.json.org/
9. Corradi, L., Porro, I., Schenone, A., Momeni, P., Ferrari, R., Nobili, F., Ferrara, M., Arnulfo, G.,
Fato, M.M.: A repository based on a dynamically extensible data model supporting multidiscip-
linary research in neuroscience. BMC Med. Inform. Decis. Mak. 12, 115 (2012)
10. Izzo, M., Mortola, F., Arnulfo, G., Fato, M., Varesio, L.: A digital repository with an ex-
tensible data model for biobanking and genomic analysis management. BMC Genomics 15
(2014)
11. Norlin, L., Fransson, M., Eriksson, M., Merino-Martinez, R., Anderberg, M., Kurtovic, S.,
Litton, J.: A Minimum Data Set for Sharing Biobank Samples, Information, and Data:
MIABIS. Biopreservation and Biobanking 10 (2012)
12. XTENS source code, https://github.com/biolab-unige/xtens-app
13. iRODS, http://irods.org/
14. Arnulfo, G., Hirvonen, J., Nobili, L., Palva, S., Palva, J.M.: Phase and Amplitude correla-
tions in resting state activity in human stereoEEG recordings. NeuroImage (submitted)
15. Arnulfo, G., Narizzano, M., Cardinale, F., Fato, M.M., Palva, J.M.: Automatic Segmenta-
tion of Deeep intracerebral Electrodes in Computed Tomography Scans. BMC Bioinfor-
matics (submitted)
16. Cardinale, F., Cossu, M., Castana, L., Casaceli, G., Schiariti, M.P., Miserocchi, A., et al.:
Stereoelectroencephalography: surgical methodology, safety, and stereotactic application
accuracy in 500 procedures. Neurosurgery 72(3), 353–366 (2013)

View publication stats

You might also like