Darriah Interview - VP

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Hi Panos, and thanks for joining us!

Could you start off by telling us a bit about what motivated the creation of NeMO and
which communities have been involved in its creation?

One of our biggest motivations for creating NeMO came from close observation of the
way researchers work today which helped us understand better the difficulties and
the hurdles that they face and empowered us to try and offer a solution to their
problems. To be more specific, it is clear that research today can be: computationally
intensive; in special cases like Bioinformatics or digital Humanities, it can require
interdisciplinary collaboration and it often relies on very large scale data management
and knowledge extraction. Currently, the tools that researchers have at their disposal
in order to conduct their work comprise Large digital Research Infrastructures -
aiming to gather targeted research resources, tools and services- or search engines
like Google Scholar, Semantic Scholar etc. that address the problem mainly by
traversing through author / citation graphs (i.e. they don’t create graphs based on the
knowledge represented inside the article but only based on article’s metadata). All the
available digital resources are fragmented in a way that they are designed to cover
either a particular research aspect, or a specific research field. Thus, we believe that it
is questionable how well the offered services fit the needs of the actual research life-
cycle, ensuring that actual information needs of scholars are addressed and their
working practices enhanced. Since NeMO started as our attempt to provide a solution
to this problem for the special case of Digital Humanities, we worked closely with
various groups of Digital Humanists either in the form of interviews or through
various workshops that were conducted specifically for “chartographing” their needs
and workflows.

What are the main aims of NeMO?

With NeMO initially and Scholarly Ontology later, our main goal was to conceptualize
and document the research process in a systematic and formal way so that different
aspects of scholarly research behavior can be covered and available resources from
different research fields be interconnected. More specifically, although NeMO started
as conceptual model that describes research processes in DH, it quickly became
apparent that what we were trying to solve was actually a larger problem that covers
the majority of research fields and -very interestingly- the modeling approach that we
were using in DH could actually be employed in other disciplines such as Medicine
and Biology. This led to the creation of an even more general model named Scholarly
Ontology that covers scholarly work in general and -through its modular architecture-
can incorporate discipline specific models like NeMO.

Let’s also discuss a little bit about the creation of the knowledge graph. Which kinds
of input did you use for its creation?

With NeMO and Scholarly Ontology we offered a conceptual model that could help
researchers document their work in a systematic way -common across disciplines-
and detailed enough so that even small pieces of information regarding research work
(such as methods, goals, activities etc.) can be modeled. But since we are living in a
data-centric world- such a model would be of no practical value without the actual
data (i.e. the instances of the model) that not only prove the concept but actually
highlight potential use cases and workflows for documenting research work. For the
creation of such knowledge graph we used input from i) the documented work of
scholars that we received from various workshops, ii) various indicative papers that
we used as examples of how research processes described in a paper could be
transformed into instances of our model; iii) earlier relevant models of scholarly
research activity iv) empirical research using semi-structured interviews with
scholars from across Europe, that focused on analyzing the research practices and
capturing the information requirements of research infrastructures as well as v)
existing Taxonomies and models.

A subquestion to that: in order to extract and model scholarly research processes in


humanities research, you created a knowledge base called Research Spotlight. It was
developed to extract information from research articles, enrich it with relevant
information from other Web sources. Which databases did you consult for this and
what were the biggest challenges in the process? Did you encounter issues related to
open access to scholarly databases?

Since documenting research work by hand can be a rather time-demanding endeavor


which also may lead to inconsistencies between different (human) annotators we
developed a system that can automate and enhance this process of creating the
knowledge base that we call Research Spotlight. This is done using various machine
leaning algorithms that we have developed, along with algorithms that gather data
and infer information from various web sources. Specifically, we use APIs to retrieve
research articles from various Publishers (Springer, Elsevier), access DBpedia in
order to gather information for various concepts of Scholarly Ontology (i.e. research
methods, Research Topics and tools) in order to train our ML models and we connect
with ORCID for author disambiguation. For the sources that we can’t find an API, we
develop our own web scrapers in order to collect information from their websites. For
the purposes of our work, in order to avoid any licensing issues, we only used open
access material. Of course, combining information from such different sources and in
different formats -let alone transforming it into meaningful datasets that can be used
for machine learning- constitutes a technical challenge by itself. Nevertheless, we
haven’t yet encountered any unsurpassed obstacles that could irrevocably hinder our
efforts.

How NeMO supports scholarly discovery e.g. enabling scholars to find workflows,
methods and tools relevant to their research questions?

In contrast to other available vocabularies for documenting scholarly work, NeMO


(and Scholarly Ontology) is based on an ontological model. This allows not only for
being able to create concepts of various semantic meanings (such as activities,
methods, goals, tools, etc.), but also for being able to interrelate them in a
semantically meaningful way that gives them context. For example, using the
concepts and relations of Scholarly Ontology such as Activity, follows(Activity1,
Activity2), Method, employs(Activity, Method) we can model workflows comprising
the specific sequence of activities that were conducted during an experiment or a
study, as well as the methods that were employed during those activities. More over
since these are instances of a knowledge base, they can be easily retrieved by
relevant queries such as “How has a particular experiment that uses a specific
method been conducted?”, “Which steps where involved?”, “With which order?”, etc.

How the knowledge graph can be integrated in specific research projects? Do you
know re-use cases or specific implementations of NeMO?

Researchers can use the knowledge graph (NeMO or Scholarly Ontology) in order to
document their work. We provide definitions for our terms and use case scenarios
that can help them achieve a consistent way of documentation. I like to think of this
process the same way as “keeping a smart diary”. The difference with a “normal
diary” though, would be that the output in our case, is a machine readable and
understandable object (e.g. an rdf file). Documenting research work this way, not only
ensures that every single piece of information is retrievable (the way we described in
the previous question), but also that -since everyone is using the same model to
produce her/his “smart diary”- all these rdf files can be combined into a bigger
knowledge graph where information can be matched and new knowledge can be
inferred. The process of documentation of research work using NeMO or SO could be
done easily using any modeling tool such as Protégé where our rdf model could be
imported and then used by a researcher for documentation of her/his project. Along
this line, we also had developed a prototype that incorporates NeMO and provides a
friendly GUI for entering instances and querying the knowledge base. Our demo was
presented in a workshop in DH 2016 in Krakow where various researchers used it to
model their own work. Currently we are also working on the development of a social
network tool (like ResearchGate or academia.edu) which leverages the automatically
parsed instances from various research papers (created by Research Spotlight) and
provides a tweeter-like UI where researchers can browse for methods, activities,
research topics, etc.

How NeMO achieves interoperability and connectivity to other ontologies and


taxonomies such as CIDOC-CRM or TaDIRAH (esp. with v2 of TaDIRAH2)?

NeMO and SO were developed using CIDOC-CRM as the higher ontology. All of their
classes and relations are sub-concepts of CIDOC-CRM concepts. Hence, we achieve
100% compatibility. As for TaDIRAH, we have developed specific mappings that inter-
relate their vocabulary to ours under the general “Type” concept (inherited by CIDOC)
that is used in NeMO and in SO exactly for such purposes (i.e. to incorporate other
closed vocabularies in our model). The terms comprising TaDIRAH research activities
for example could be incorporated as instances of our ActivityTypes concept and
therefore inherit all the extra properties from that class. This way, not only various
works that have been documented using TaDIRAH can be easily imported in NeMO
but also, researchers can have access to the new modeling capabilities that our
ontological framework offers (such as various semantic relations that connect activity
types with other concepts like Methods, Goals, etc.)

How can you sustain and ensure the coherence of the ontology in the long-term?

Our models are maintained by the team in Digital Curation Unit in Athens. Any
additions and expansions to “neighboring” fields such as academic publishing or
other specific disciplines, are carefully studied and mapped to existing classes so
that me can assure coherence. Once an update is ready, we publish it and release it
as a rdf file along with relevant documentation.

Finally, we know that a great deal of work around NeMO had been carried out in
DARIAH-GR. To finish the interview, could you also tell us a bit about the special
flavours of Digital Humanities in Greece?

+1 Is multilingualism a relevant concept to NeMO? Is multilingualism reflected in it


somehow and you planning to have the ontology available in multiple languages?

Since NeMO and SO are implemented as knowledge graphs using rdf technology, the
display of the concepts and their instances in other languages is very easy to
implement and can be achieved simply by using an RDF attribute such as xml:lang
This is actually one of the reasons we decided to go with this technology in the first
place. Embracing the values of open web society and highlighting how different
aspects (i.e. disciplines or even languages for that matter) can be interrelated and
combined so that new knowledge can be inferred, is fundamentally embedded into the
core of our work.

You might also like