Professional Documents
Culture Documents
XTENS - A JSON-Based Digital Repository For Biomedical Data Management
XTENS - A JSON-Based Digital Repository For Biomedical Data Management
net/publication/314500852
CITATION READS
1 100
6 authors, including:
Marco Fato
Università degli Studi di Genova
109 PUBLICATIONS 1,311 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Marco Fato on 10 May 2017.
1 Background
Data Management in Biomedical Science presents peculiar challenges. Research
projects in the field are constantly moving towards international collaboration with
participants coming from a great variety of disciplines (medicine, biology, engineer-
ing/IT, etc.). In such a scenario, extensive metadata are required to improve the acces-
sibility of the shared information, be it clinical records, biological specimens’
*
Corresponding author.
F. Ortuño and I. Rojas (Eds.): IWBBIO 2015, Part II, LNCS 9044, pp. 123–130, 2015.
© Springer International Publishing Switzerland 2015
124 M. Izzo et al.
2 Results
2.1 The Repository
In XTENS metadata model, each Data instance is characterized by its Data Type. The
Data Types provide a hierarchical JSON schema that works as a ‘template’ for the
Data instance. The schema is composed by (i) a header with name, a brief description
and the version of the data type, and (ii) a body, that contains the full set of metadata
properties (i.e. attributes) usually collected in metadata groups. Each property is cha-
racterized by a name, a primitive type (i.e. number, string, date, boolean...), an (op-
tional) ontology term, value options and units, a set of validation properties and a
sensitivity flag. Each Data instance contains a metadata field in JSON format, where
all the properties defined in the Data Type schema are stored as a name-value-unit
triples. Subjects and Samples are treated as specialized versions of the data instance
class, containing additional fields, methods and relationships. This paradigm ensures
security, anonymisation of personal details, and dedicated biobanking management.
The new XTENS implementation supports management of samples from different
biobanks, and describes biobanks according to the MIABIS specifications of the
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 125
Fig. 1. XTENS data model. Each Data instance is characterized by its Data Type, and the Data
Type schema provides the structure to build up the metadata JSON object. Subject and Sample
are specialized classes of Data. Personal Details are managed in a separate class to allow easy
anonymisation and satisfy privacy requirements.
The model has been implemented on the novel XTENS repository using a modular
structure, separating the concerns between the client interface (i.e. front-end) and the
back-end. The latter consists of a web application running on a Node.js server, a Post-
greSQL database, and a distributed file system based on the iRODS data grid software
[13]. The adoption of iRODS makes XTENS the first data management platform rely-
ing on a distributed file storage. The web application is written in JavaScript to pro-
vide full compliance with the JSON metadata model, and exposes a RESTful interface
to allow a common access specification for the XTENS front-end and for external
applications. We have chosen PostgreSQL as database management system because it
supports JSON as a native type. The upcoming 9.4 version will introduce a binary
JSON format (JSONB) that improves dramatically the query speed on retrieval.
iRODS is equipped as well with a REST API (irods-rest) that allows direct and trans-
parent upload/download from the client interface. When a new Data Instance is
created using the front-end web form, the user can upload one or more associated
files. Uploaded files are temporary stored in a ‘landing’ collection, and moved to their
permanent location after the Data form is submitted. The association between the
Data instance and Data file location in then stored in the database. A dynamical query
interface based on the Composite design pattern allows users to perform queries on
every Data Type previously defined. Data Type schemas can be updated with new
126 M. Izzo et al.
properties using the graphical interface; new and updated data types can be used im-
mediately for create-retrieve-delete-update (CRUD) operations with no additional
programming, compilation or restart steps. The user (with administrative authoriza-
tion) is given full control to describe its data and experiments, without having to
resort every time to an IT expert. New Data Types can be created with minimal effort,
to improve data sharing in different scenarios and to promote the concept of metadata
as a dynamical, ephemeral and fluid process.
schema metadata
{ {
"header": { "label":";METADATA FIELD",
"schemaName":"DATA_EXAMPLE", "fieldType":"Float", {
"description":"DESCRIPTION", "name":"test_field1", "test_field1":{
"version":"1.0", "iri":"", "value": 50.7,
"classTemplate": "GENERIC" "customValue":"", "unit":"first unit"
"fileUpload":true "required":true, },
}, "sensitive":false, "test_field2":{
"body": [ "hasRange":true, "value": 20,
{ "min";0, "unit":"another unit"
"label":"METADATA_GROUP", "max":1000.0, },
"name":"FIRST_GROUP", "step":0.1, "test_field3":{
"iri":"", "isList":false, "value": "some text",
"content":[ "possibleValues": null, },
{...}, "hasTableConnection":false, {...}
{... "tableConnection":null, }
"name":"test_field2" "hasUnit":true,
...}, "possibleUnits": [
{...} "first unit",
] "second unit"
} ],
} }
Fig. 2. Outline of the JSON metadata schema used in XTENS. Each Data Type has a schema
composed by a header and a body. The latter is an array of Metadata Groups. Each contains one
or more Metadata Field or Loops. Metadata Loops are not shown for sake of simplicity. A
Metadata Field (shown in the second column) represents the leaf of the model and is described
by a set of property that specify its primitive type, name, an optional International Resource
Identifier (IRI) for linked data support, a set of validation properties to determine if it is re-
quired, if it stores sensitive/personal information, if (in case of numeric type) the value must
fall in a determined range, if the value must be selected from a list of controlled terms, and if it
has a measure unit. The "metadata" column of a Data entity stores the instances of each meta-
data field as a name-value-unit triple.
We have set up a first XTENS 2.0 prototype to manage imaging data - computed
tomography (CT), magnetic resonance imaging (MRI) and Stereo-
ElectroEncephaloGraphy (SEEG) - data in a collaborative project involving three
centres: Niguarda Hospital in Milan (providing data), the Neuroscience Centre at the
University of Helsinki (developing methods), and the Department of informatics,
Bioengineering, Robotics and System Engineering (DIBRIS) at the University of
Genoa (Data storage and management). The aim of the project is to exploit recent
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 127
iRODS and runs the segmentation algorithm. Once the procedure is done, the com-
puted SEEG Implant is stored on a file. A novel data instance of SEEG_Implant is
composed by xpr-seeg and saved in XTENS through a POST request. In a similar
way, SEEG_Data are downloaded by operators, manually investigated to rejected
artefactual channels (i.e., non physiological data) and analyzed to build the Adjacen-
cy_matrix. Here, xpr-seeg provides the tool to correctly upload the analysed data to its
data parent (i.e., Patient) in the XTENS repository.
Fig. 3. XTENS setup for the Stereo-EEG collaborative project. XTENS communicates with
xpr-seeg using REST. After the analysis (e.g. segmentation) is run by xpr-seeg, the results' file
is stored in iRODS and a new Data instance is saved on XTENS.
3 Discussion
We have developed a novel data management and service providing platform that is a
valid alternative to more established technological solutions, because it presents a
number of advantages. First, the whole system is structured on JSON format, and
does not require lengthy and cumbersome data transformation or binding often re-
quired when dealing with XML format or entity-attribute-value (EAV) paradigms.
Secondly, the system is conceived to be user-friendly, easily configurable by non-IT
people. This is a major advantage in biomedical science which is hampered by the
difficulties in dealing with complex informatics systems. For instance, XNAT, among
the most popular data repositories for Neuroscience projects, provides data schema in
XML format. To modify data schema expressed in XML it is necessary to have ba-
sic/advanced knowledge of the syntax itself. Furthermore, after data schema changes
the database needs to be updated, XNAT application redeployed and security XNAT
setup must also be performed. In XTENS, the users can create new Data Types and
setup the security and authorization levels using an intuitive graphical interface.
XTENS - A JSON-Based Digital Repository for Biomedical Data Management 129
These characteristics make XTENS suitable for projects in labs that cannot afford a
system administrator. XTENS only requires the initial configuration and deploy while
new data types can be added at runtime.
The previous version of XTENS was based on Java Servlet technology (running on
a Tomcat server), and was backed by a MYSQL database. We moved to Node.js and
PostgreSQL to provide an environment more compliant to our JSON model. The new
version is suitable to test the scaling capabilities of Node.js in conjunction with Post-
greSQL JSONB format when managing large amounts of metadata. We are planning
to run some stress tests on the system, before adopting it in production, and to tune
the database for improving the performances. Moreover, the extensibility and simple
management of the presented platform make it the optimal tool in connectomics stu-
dies. In this evolving and challenging scientific context, exchanging knowledge be-
tween scientific operators and multimodal data approaches can sensibly increase the
interpretability of the results and the applicability of the method itself to the clinical
context. In focal epilepsy studies, the complexity of the pathology itself and difficul-
ties in the localization process can benefit from modern distributed technologies such
as the one suggested in the present work.
4 Conclusions
References
1. Krestyaninova, M., Zarins, A., Viksna, J., Kurbatova, N., Rucevskis, P., Neogi, S.G.,
Gostev, M., Perheentupa, T., Knuuttila, J., Barrett, A., et al.: A System for Information
Management in BioMedical Studies–SIMBioMS. Bioinformatics 25, 2768–2769 (2009)
2. Bauch, A., Adamczyk, I., Buczek, P., Elmer, F.J., Enimanev, K., Glyzewski, P., Kohler,
M., Pylak, T., Quandt, A., Ramakrishnan, C., et al.: openBIS: A flexible framework for
managing and analyzing complex data in biology research. BMC Bioinformatics 12, 468
(2012)
3. Herrick, R., McKay, M., Olsen, T., et al.: Data dictionary services in XNAT and the
Human Connectome Project. Front. Neuroinform. 8, 65 (2014)
4. Scott, A., Courtney, W., Wood, D., et al.: COINS: An Innovative Informatics and Neuroi-
maging Tool Suite Built for Large Heterogeneous Datasets. Front. Neuroinform. 5, 33
(2011)
5. Das, S., Zijdenbos, A.P., Harlap, J., Vins, D., Evans, A.C.: LORIS: A web-based data
management system for multi-center studies. Front. Neuroinform. 5, 37 (2011)
130 M. Izzo et al.
6. Neu, S.C., Crawford, K.L., Toga, A.W.: Practical management of heterogeneous neuroi-
maging metadata by global neuroimaging data repositories. Front. Neuroinform. 6, 8
(2012)
7. Edwards, P.N., Mayernik, M.S., Batcheller, A.L., Bowker, G.C., Borgman, C.L.: Science
friction: data, metadata, and collaboration. Soc. Stud. Sci. 41, 667–690 (2011)
8. JSON, http://www.json.org/
9. Corradi, L., Porro, I., Schenone, A., Momeni, P., Ferrari, R., Nobili, F., Ferrara, M., Arnulfo, G.,
Fato, M.M.: A repository based on a dynamically extensible data model supporting multidiscip-
linary research in neuroscience. BMC Med. Inform. Decis. Mak. 12, 115 (2012)
10. Izzo, M., Mortola, F., Arnulfo, G., Fato, M., Varesio, L.: A digital repository with an ex-
tensible data model for biobanking and genomic analysis management. BMC Genomics 15
(2014)
11. Norlin, L., Fransson, M., Eriksson, M., Merino-Martinez, R., Anderberg, M., Kurtovic, S.,
Litton, J.: A Minimum Data Set for Sharing Biobank Samples, Information, and Data:
MIABIS. Biopreservation and Biobanking 10 (2012)
12. XTENS source code, https://github.com/biolab-unige/xtens-app
13. iRODS, http://irods.org/
14. Arnulfo, G., Hirvonen, J., Nobili, L., Palva, S., Palva, J.M.: Phase and Amplitude correla-
tions in resting state activity in human stereoEEG recordings. NeuroImage (submitted)
15. Arnulfo, G., Narizzano, M., Cardinale, F., Fato, M.M., Palva, J.M.: Automatic Segmenta-
tion of Deeep intracerebral Electrodes in Computed Tomography Scans. BMC Bioinfor-
matics (submitted)
16. Cardinale, F., Cossu, M., Castana, L., Casaceli, G., Schiariti, M.P., Miserocchi, A., et al.:
Stereoelectroencephalography: surgical methodology, safety, and stereotactic application
accuracy in 500 procedures. Neurosurgery 72(3), 353–366 (2013)