Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

An Approach to Ontology Integration for Ontology Reuse

in Knowledge Based Digital Ecosystems

Enrico G. Caldarola Antonio Picariello Antonio M. Rinaldi


Department of Electrical Department of Electrical Department of Electrical
Engineering and Information Engineering and Information
Engineering and Information
Technologies, University of Technologies, University of
Technologies, University of
Naples Federico II Naples Federico II Naples Federico II
Via Claudio, 21 Via Claudio, 21 Via Claudio, 21
Napoli, Italy Napoli, Italy IKNOS-LAB Intelligent and
Institute of Industrial antonio.picariello@unina.it Knowledge Systems,
Technologies and Automation, University of Naples Federico
National Research Council II, LUPT
Via P. Lembo, 38F Via Toledo, 402
Bari, Italy Napoli, Italy
enricogiacinto. antoniomaria.
caldarola@unina.it rinaldi@unina.it

ABSTRACT 1. INTRODUCTION
In the last years, the large availability of information and The term ontology, originally introduced by Aristotle, has
knowledge models formalized by ontologies has demanded become today a buzzword among computer scientists, while
effective and efficient methodologies for reusing and inte- ontologies are considered as the silver bullet for the realiza-
grating such models in global conceptualizations of a spe- tion of the Semantic Web vision [3]. According to Gruber
cific knowledge or application domain. The ability to effec- [14], an ontology is a specification of a shared conceptualiza-
tively and efficiently perform knowledge reuse is a crucial tion of a domain, i.e., a formal definition and representation
factor in the development of ontologies, which are a poten- of the concepts and their relations belonging to a certain
tial solution to the problem of information standardization domain of interest. Once a knowledge domain or some as-
and a viaticum towards the realization of knowledge-based pects of it are formally represented using a common and
digital ecosystem. In this paper, an approach to ontology shared language, they become understandable not only by
reuse based on heterogeneous matching techniques will be humans but also by automated computer agents [2]. As a
presented; in particular, we will show how the process of result, for example, web services or search engines can im-
ontology building will be improved and simplified, by au- prove their performances in terms of exchange of information
tomating the selection and the reuse of existing data models or accuracy in searching results, exploiting the semantically
to support the creation of digital ecosystems. The proposed enriched representation of the information they share. For
approach has been applied to the food domain, specifically these reasons, ontologies are increasingly considered also as a
to food production. key factor for enabling interoperability across heterogeneous
systems [9], information retrieval [1], multimedia knowledge
representation [27] and complex user-based digital ecosys-
Categories and Subject Descriptors tems [21]. Following this trend, companies of all sizes and
H.4 [Information Systems Applications]: Miscellaneous; research groups have produced a plethora of data or concep-
D.2.8 [Software Engineering]: Metrics—complexity mea- tual models for many applications such as e-commerce, gov-
sures, performance measures; I.2.4 [Computing Method- ernment intelligence, medicine, manufacturing, etc. This,
ologies]: Artificial IntelligenceKnowledge Representation For- together with the progress of information and communica-
malisms and Methods[Semantic networks] tion technologies, has made available a huge amount of dis-
parate information, raising the problem of managing hetero-
General Terms geneity among various information sources and sharing com-
plex knowledge representation [30, 7]. Furthermore, avail-
Algorithms, Design able ontologies in the literature are becoming increasingly
large in terms of number of concepts and relations, to such
an extent that they can be certainly considered Big Data
[33, 24, 5, 6].Ontologies are often the result of collabora-
Permission to make digital or hard copies of all or part of this work for tive and distributed efforts that require effective methodolo-
personal or classroom use is granted without fee provided that copies are
gies to guarantee their maintenance and evolution. In the
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to attempt to mitigate the increasing heterogeneity and com-
republish, to post on servers or to redistribute to lists, requires prior specific plexity of modern ontologies, several related research fields
permission and/or a fee. have emerged in the last years. Ontology evolution and on-
MEDES’15 October 25-29, 2015, Caraguatatuba/Sao Paulo, Brazil tology versioning aim at managing the inevitable changes
© 2015 ACM. ISBN 978-1-4503-3480-8/15/10…$15.00
http://dx.doi.org/10.1145/2857218.2857219

-1-
to which ontologies are subject over time; whereas, ontol- research fields is that different ontologies generally use dif-
ogy matching, mapping, alignment, and ontology integration ferent terminology, different representation languages and
and merging are the most spread research areas which aim to different syntax to refer to the same or similar concepts.
overcome the heterogeneity issue. One of the most notable Welcoming the suggestion for a clarification of the terminol-
applications of ontology integration is ontology reuse. In ogy contained in [12], and according to some related works
the context of ontology engineering, reuse of existing knowl- in the literature [10], [25], we define ontology matching as
edge models is recommended as a key factor to develop cost the process of finding relationships or correspondences be-
effective and high quality ontologies, since it reduces the tween entities of different ontologies; ontology alignment, as
cost and the time required for the conceptualization of spe- a set of correspondences between two or more ontologies;
cific domains from scratch [4], [20] increasing the quality of ontology mapping, as the oriented, or directed, version of
newly implemented ontologies by reusing components that an alignment, i.e., it maps the entities of one ontology to
have already been validated. Finally, it avoids the confusion at most one entity of another ontology; ontology integration
and the inconsistencies that may be generated from multiple and merging as the construction of a new ontology based on
representations of the same domain; thus, it strengthens the the information found in two or more source ontologies; and
orchestration and harmonization of knowledge. finally, ontology reuse as the process in which available on-
The problem becomes challenging when the knowledge is tologies are used as input to generate new ontologies. It is a
represented using detailed information; it is clear that these common practice in the literature to consider heterogeneity
information can be used as a knowledge base in several digi- resolution and related ontology matching or mapping strate-
tal systems to the dawn of complex virtual environments. In gies to be an internal part of ontology merging or integration
particular, Knowledge Representation is essential for defin- [16]. Several works have been addressed in the last decade to
ing and managing the complex systems, that in the Internet ameliorate the ontology mapping and matching strategies for
of Things area represents a community of digital devices and an effective and efficient data integration. According to Choi
the connected persons, as a whole new environment that is [9], ontology mapping can be classified into three categories:
called digital ecosystem. 1) mapping between an integrated global ontology and lo-
This paper is an attempt to answer the following ques- cal ontologies, 2) mapping between local ontologies and 3)
tions: what is the knowledge at the basis of the ecosystem mapping on ontology merging and alignment. In particular,
in order to manage this kind of complex environments? How the third category is used as a part of ontology merging or
can we create new knowledge, reusing existing representa- alignment in an ontology reuse process. Some of the most
tions? How can we integrate all the knowledge extracted spread tools belonging to this category are: SMART and
by a system for supporting novel ecosystems and environ- PROMPT [23], which are semi-automatic ontology merging
ments? Taking into account the above considerations and and alignment tools; OntoMorph [8] that provides a rule
the proposed questions, this paper presents a new approach language for specifying mappings; FCA-Merge [31] that is
for ontology integration oriented to the ontology reuse for a method for ontology merging based on Ganter and Willes
digital ecosystems aims. In particular, we use a combination formal concept analysis, lattice exploration, and instances of
of existing matching techniques based on a knowledge back- ontologies to be merged; and finally, CHIMAERA [19] that
ground by exploiting the domain experts’ knowledge and is an interactive merging tool based on Ontolingual ontology
some generic linguistic oracles. The novelty of the proposed editor. A survey of the matching systems is also provided by
approach consists in adopting an extended linguistic analy- Shvaiko and Euzenat in [30], where, in addition to an analyt-
sis, which, starting from the vocabularies contained in each ical comparison of the recent tools and techniques, the au-
source ontology representing a particular knowledge, and thors argue on the opportunity to pursue further researches
relating them according to the knowledge background, inte- in ontology matching and propose a list of promising direc-
grates the local ontologies in a global conceptualization of a tions for the future. A recent trend and one of the future
specific knowledge or application domain. In other words, challenge suggested by the authors, is the ontology match-
the ultimate goal of the presented approach is to ease the ing using knowledge background [22]. Contrary to the direct
process of ontology construction by automating the selection matching that involves only the knowledge contained in the
and the reuse of existing data models, that may be consid- input ontologies entities, the new methodology performs the
ered as a first step in the definition of a complex knowledge matching by discovering a common context or background
base serving digital ecosystems. knowledge for the ontologies and use it to extract relations
The reminder of the paper is structured as follows. After a between ontologies entities. Adding context can helps to
literature review of the main ontology integration and reuse increase the recall but at the same time may also gener-
methodologies, and a clarification of the terminology related ate incorrect matches decreasing the precision, thus, a right
to such research areas, contained in section 2, a description tradeoff must be found. As background knowledge, on the
of the proposed approach is provided in section 3, while one hand, it is common to use generic knowledge sources and
section 4 applies the entire approach to a specific study case, tools, such as WordNet, Linked Open Data like DBpedia, or
highlighting results and strengths. Finally, section 5 draws the web itself; on the other hand, they can be used domain
the conclusion summarizing the major findings and outlining specific ontologies, upper level ontologies, or the ontologies
future investigations. available on the Web. The semantic matching framework S-
Match [13], for example, uses WordNet as a linguistic oracle,
2. RELATED WORKS while the work in [29] discusses the use of UMLS (Unified
Medical Language System), instead of WordNet, as a knowl-
In this section, after a clarification of the terminology used
edge background in medical applications.
in the context of ontology integration, matching and reuse,
As mentioned in the introduction, ontology integration is
we provide the state of the art of tools and methodologies
mainly applied when the main concern is the reuse of on-
related to such disciplines. The general motivation for these

-2-
tologies. In this regard, it is worth noticing that several BioPortal, Schemapedia, Knoodl, etc., and search engines
knowledge management methodologies consider the reuse for semantic web ontologies, like Swoogle and the Watson
of knowledge as an important phase of the entire knowl- Semantic search engine. To make a first screening of ref-
edge management process. Common KADS methodology erence models, a set of qualitative criteria can be adopted.
[28], for instance, makes use of a collection of ready-made For example, the formality level that describes the formal-
model elements (a kind of building blocks) which prevent the ity of the reference model and can range from plain text to
knowledge engineer to reinventing the wheel when modeling description logic-based languages; the type generality, that
a knowledge domain. Pinto and Martins [25] have analyzed evaluates the model type from the viewpoint of its gener-
the process of knowledge reuse from a methodological point ality (upper ontologies like BFO and DOLCE or, on the
of view, thus introducing an approach that comprises sev- other hand, domain specific ontologies, like the IFC: Indus-
eral phases and activities. Moreover, the European research try Foundation Classes). Furthermore, it can be used the
project NeOn [32] proposed a novel methodology for build- type structure that evaluates the model type from the view-
ing ontology, which emphasizes the role of existing ontolog- point of its structure (simple classifications or taxonomies
ical and non-ontological resources for the knowledge reuse. versus semantic enriched ontologies); the model language,
However, some open issues remain, especially concerning the such as RDF/OWL, graphic-based languages and pure text;
difficulty of dealing with the extreme formalisms heterogene- the model provenance, e.g., standards or conceptual models
ity of the increasing number of models available in literature authored by influential scientific groups, like BIM: Building
[4]. The absence of an automatic framework for the rigorous Information Model or OBO: Open Biological and Biomed-
evaluation of the knowledge sources is also a severe limita- ical Foundry ontologies, or other; and finally, the model li-
tion to overcome. cense that evaluates the availability of the reference model
The research introduced in this paper tries to overcome (open data-model versus proprietary and licensed models).
the above difficulties, by adopting a rigorous framework for The second component of the framework is the Reference
knowledge reuse based on the combination of existing on- Model Reconciliation and Normalization function block. It
tology matching and integration methodologies. It exploits is responsible for adapting the collected reference models to
the available tools in order to automatize the tasks it in- a common representation format. This step is mandatory
volves thereby reducing the human intervention as much as due to the heterogeneity of languages used to represent and
possible. formalize existing reference models in the literature. An-
other important operation performed at this stage is the
normalization of textual representation of entities metadata
3. A FRAMEWORK FOR ONTOLOGY (labels or comments) at a morphological and syntactic level.
REUSE The linguistic normalization is performed by resolving multi-
languages mismatching automatically or manually translat-
This section describes a high-level architecture of the pro-
ing metadata description from whatever languages into En-
posed framework for ontology reuse. As shown in figure 1,
glish. The normalization also involves the following oper-
our framework presents four main functional blocks, which,
ations: the tokenization, the lemmatization and the stop-
starting from the identification of the existing knowledge
words elimination. The main component of the framework
sources and models (hereafter referred to as reference mod-
is the Reference Model Matching function block. It is re-
els or data models), along with the reconciliation and nor-
sponsible for obtaining an alignment (A), i.e., a set of corre-
malization of such models, obtain a comprehensive and in-
spondences between the matched entities from the reference
tegrated representation of the domain under study, by an
models (in this section and in the next one referred also as
effective and efficient reuse of selected data models existing
input models) and the target model (defined in section 4).
in the literature.
This function block is also responsible for helping domain
The first component of the framework is the Reference
experts to select the relevant reference models from the Cor-
Models Retrieval function block. It is responsible for re-
pus by performing extended linguistic analysis. As shown in
trieving the reference models corresponding to the domain
figure 1, the matcher involves three types of matching oper-
of interest. In order to search for proper data models, it
ations: string, linguistic and extended linguistic matching.
is needed to identify the knowledge domain and the related
It may use several sources as background knowledge, such
subdomains covering the specific topic under study. The
as general background knowledge bases like WordNet and
contribution of domain experts is essential in this phase in
domain specific knowledge bases. All the matching compo-
order to clarify the meaning of some poorly defined con-
nents will be detailed in section 4. The fourth component
cepts and to help knowledge engineers moving among the
of the framework is the Reference Models Merging or Inte-
existing knowledge sources over the Internet or other legacy
gration function block. It is responsible for integrating the
archives. Some of the available resources for domain iden-
selected input models into a global, richer and consistent
tification are: general purpose or content-specific encyclo-
view, abstracting the local conceptualizations of the input
pedia, like Wikipedia; web directories, e.g., DMOZ (from
models themselves. According to the correspondences mea-
directory.mozilla.org) and Yahoo! Directory; standard clas-
sures contained in the alignment set (A), the following oper-
sifications and other electronic and hard-copy knowledge
ations will be performed over the matched entities from the
sources, including technical manuals, reports, etc. Once
input models: if two or more entities (concepts or relations)
the domain of interest has been properly defined, a Cor-
from the input models are equivalent w.r.t. a certain target
pus of selected reference models is populated by collecting
entity, from a set-theoretic perspective, they will be auto-
them from different knowledge sources, such as: special-
matically merged in the same entity in the target ontology;
ized portals and websites within public or private organi-
if an entity expresses a concept or relation subsumed by a
zations; search engines (e.g., Google, Bing, etc.), special-
target entity, it will be imported in the target ontology with
ized semantic-based engines, ontology repositories including:

-3-
Figure 1: High-level view of the proposed framework

the consensus of the domain experts. The same approach 4.1 The string-based matching
will be used if a concept subsumes another target concept. If The string matching operation is performed between the
an entity is completely disjointed w.r.t all the target entity, input ontology entities and the target entities labels. In this
it may be irrelevant and so discarded or can be considered phase, also other ontology metadata like comments will be
a new entity that eventually enriches the target model. Fi- compared. We use an approximate string matching based
nally, the domain experts consensus is also needed in the on the consolidated Levenshtein distance [15] to quantify
case in which an intersection exists between two entities. how dissimilar two strings are to one another. It counts the
minimum number of edit operations (i.e. insertions, dele-
4. THE MATCHING METHODOLOGY tions or substitutions) required to transform one string into
the other. For example, the Levenshtein distance between
The objective of the proposed matching methodology con-
”kitten” and ”sitting” is 3, since are necessary three edits
sists in creating an alignment, i.e., a set of correspondences
to change one into the other, and there is no way to do it
among the entities coming from the input models and those
with fewer than three edits. We use the Levenshtein dis-
of the target ontology. In this work, the target ontology
tance since it represents a good trade-off between efficiency
can be envisioned as the final goal of the integration ap-
and ease of implementation. In fact, there are plenty of well-
proach but at the early phases of the proposed approach
known and tested third party APIs of Levenshtein algorithm
also as a kind of proto-ontology encompassing domain key-
in the most spread programming languages.
words, terms definitions, and concepts meanings related to
the target knowledge domain and the application require- 4.2 The linguistic matching
ments upon which the domain experts are agree. The align-
The linguistic matching is responsible for a comprehensive
ment is used as a basis for the integration of the input on-
analysis of the terms used in the input models at a seman-
tologies into a coherent and global conceptualization of the
tic level, using an external linguistic database like WordNet
domain under study.
or other domain specific knowledge sources as background
We define the correspondence c as the tuple:
knowledge. In WordNet, nouns, verbs, adjectives and ad-
c = (e1, e2, r, v) verbs are grouped into sets of cognitive synonyms (synsets),
each expressing a different concept [11]. The synsets are in-
e1 being an entity from the first ontology, e2 being an en- terlinked by conceptual-semantic and lexical relations, thus
tity from the second ontology to be compared, r being a realizing a graph-based structure where synsets are nodes
matching relation and v being a similarity measure between and lexical-relations are the edges. Exploiting the WordNet
the two entities based on the particular matching relation. graph-based representation, it is possible to relate concepts
Each correspondence can be expressed in different format, at a semantic level, for example, by calculating the Wu-
such as RDF/OWL, XML, or text. In this work, we use an Palmer similarity, which counts the number of edges between
RDF/OWL-based representation of alignments. Each entity two concepts by taking into account their proximity to the
involved in a matching operation has a unique id correspond- root concept of the hierarchy. According to [18], Wu-Palmer
ing to the URI of the input ontology. The matching relations similarity has the advantage of being simple to calculate, in
are: StringMatch and Wu-PalmerMatch. addition to its performances while remaining as expressive as
In the following sub-sections each of the matching method- the others. Both the string and semantic similarity measure
ologies will be further detailed. as a result of matching operation contribute to defining the

-4-
semantic relation between the entities. The semantic rela-
tions correspond to set-theoretic relations between ontology
classes: equivalence (=), disjointness (⊥), less general (≤),
more general (≥) and concept correlation (∩). A thresh-
olding method is used to establish the type of set-theoretic
relation that hold between the entities.

4.3 Extended linguistic matching


The extended linguistic matcher component defines and (a) Lexical Properties (b) Semantic Properties
implements a meta-model for ontology matching using a con-
ceptualization as much as possible close to the way in which Figure 3: Linguistic properties
the concepts are organized and expressed in human language
[26]. The matcher exploits the meta-model for improving the
accuracy for candidate reference model analysis. We define (see figure 3(a) and 3(b)). In table 1 some of the consid-
our meta-model as composed by a triple ⟨S; P ; C⟩ where: S ered properties and their domain and range of definition are
is a set of objects; P is the set of properties used to link the shown.
objects in S; C is a set of constraints on P . In this context,
we consider concepts and words as objects; the properties Table 1: Properties
are linguistic relations and the constraints are validity rules
Property Domain Range
applied on linguistic properties w.r.t. the considered term hasWord Concept Word
category. In our approach, the target knowledge is repre- hasConcept
hypernym
Word
NounsAnd
Concept
NounsAnd
sented by the target ontology. A concept is a set of words VerbsConcept VerbsConcept
holonym NounConcept NounConcept
which represent an abstract idea. In this model, every node, entailment VerbWord VerbWord
similar AdjectiveConcept AdjectiveConcept
both concept and word, is an OWL individual. The connect-
ing edges in the ontology are represented as ObjectProper-
ties. These properties have some constraints that depend on The use of domain and codomain reduces the property
the syntactic category or on the kind of property (semantic range application. For example, the hyponymy property is
or lexical). For example, the hyponymy property can relate defined on the sets of nouns and verbs; if it is applied on
only nouns to nouns or verbs to verbs; on the other hand a the set of nouns, it has the set of nouns as range, otherwise,
semantic property links concepts to concepts and a syntactic if it is applied to the set of verbs, it has the set of verbs as
one relates word forms to word forms. Concept and word range. In table 2 there are some of defined constraints and
attributes are considered with DatatypeProperties, which re- we specify on which classes they have been applied w.r.t. the
late individuals with a predefined data type. Each word is considered properties; the table shows the matching range
related to the represented concept by the ObjectProperty too.
hasConcept while a concept is related to words that repre-
sent it using the ObjectProperty hasWord. These are the Table 2: Model constraints
only properties able to relate words with concepts and vice Costraint Class Property Constraint range
versa; all the other properties relate words to words and AllValuesFrom NounConcept hyponym NounConcept
AllValuesFrom AdjectiveConcept attribute NounConcept
concepts to concepts. Concepts, words and properties are AllValuesFrom NounWord synonym NounWord
AllValuesFrom AdverbWord synonym AdverbWord
arranged in a class hierarchy, resulting from the syntactic AllValuesFrom VerbWord also see VerbWord
category for concepts and words and from the semantic or
lexical type for the properties. Figures 2(a) and 2(b) show
Sometimes the existence of a property between two or
that the two main classes are: Concept, in which all the ob-
more individuals entails the existence of other properties.
jects have defined as individuals and Word which represents
For example, being the concept dog a hyponym of animal, we
all the terms in the ontology.
can assert that animal is an hypernym of dog. We represent
this characteristics in OWL, by means of property features
shown in table 3.
Having defined the meta-model previously described, a
Semantic Network (i.e. DSN) is dynamically built using a
dictionary based on WordNet or other domain specific re-
(a) Concept (b) Word
sources. We define a semantic network as a graph consisting
of nodes which represent concepts and edges which repre-
Figure 2: Concept and Word sent semantic relations between concepts. The role of do-
main experts is strategic in this phase because they interact
The subclasses have been derived from the related cate- with the system by providing a list of domain keywords and
gories. There are some union classes useful to define prop- concept definition feeding the proto-ontology. The DSN is
erties domain and codomain. We define some attributes built starting from such first version of the target ontology,
for Concept and Word respectively: Concept hasName that i.e, the domain keywords and the concept definition words
represents the concept name; Description that gives a short sets. We then consider all the component synsets and con-
description of concept. On the other hand Word has Name struct a hierarchy, only based on the hyponymy property;
as attribute that is the word name. All elements have an ID the last level of our hierarchy corresponds to the last level
within the WordNet offset number or a user defined ID. The of WordNet one. After this first step, we enrich our hierarchy
semantic and lexical properties are arranged in a hierarchy considering all the other kinds of relationships in WordNet.

-5-
Table 3: Property features models, thus giving us the most relevant input model at a
linguistic level.
Property Features
hasWord inverse of hasConcept
hasConcept inverse of hasWord
hyponym inverse of hypernym; transitivity
hypernym
cause
inverse of hyponym; transitivity
transitivity
5. A CASE STUDY FROM THE FOOD DO-
verbGroup symmetry and transitivity
MAIN
In this section, we apply the proposed approach to the
food domain, specifically to the industrial production of
In our approach, after the DSN building step, we compare it
food. Since the food is an umbrella topic involving con-
with the selected input models lexical chains. The intersec-
cepts related to different disciplines and applications, it rep-
tion between DSN and the reference models give us a lexical
resents a valid benchmark to test our approach in order to
chain with the relevant terms related to the target ontology.
select those reference models that not only are about food
All terms are linked by properties from the DSN. Therefore,
in general, but whose main concern is on the production of
the DSN give us a conceptual frame useful to discriminate
food. Each of the function block in figure 1 will be applied in
the pertinent reference models from the other ones. In order
the following. Regarding to the reference model retrieving
to evaluate the relevancy of the selected reference model, it
function block, we have selected, with the consensus of do-
is necessary to define a system grading that is able to assign
main experts, the following main knowledge sources: Google
a vote to the model based on their syntactic and semantic
search engine, Google Scholar, ISO International Classifica-
content. We use the approach described in [26] to calcu-
tion of Standards and OAEI 2007 Food test case. The har-
late a Global Grade (GG) for each semantic network related
vesting of reference models has been executed mostly man-
to each selected reference model. The GG is given by the
ually even though some tools for automating search queries
sum of the Syntactic-Semantic Grade (SSG) and the Seman-
over Google Scholar have been successfully experimented.
tic Grade (SG). The first contribution gives us information
The retrieving function block has provided tens of reference
about the analyzed model by taking into account the poly-
models, which have been collected in the reference corpus.
semy of the term, i.e., the measure of ambiguity in the use
Throughout the first function block, we have also applied
of a word, thus, with an accurate definition of the role of the
the evaluation criteria discussed in section 3. In this re-
considered term in the model. We call this measure central-
1 gard, a greater weight has been given to reference models
ity of the term i and we define it as: ϖ(i) = poly(i) where
constructed in OWL or RDF language, these ones being
poly(i) is the polysemy (number of senses) of i. the final languages used in the integrated ontology, and to
We can define the relevance of the reference model as the model provenance and availability. Table 4(a) shows a list of
sum of its relevant word weights (terms centralities): the reference models sorted in order or preference. With the
reconciliation and normalization function block, we have ob-
!
n
tained a set of normalized, representation language-agnostic
SSG(ν) = ϖ(i) (1) data models. In order to do this, firstly we have flattened
i=1
the reference models concept hierarchy since we do not apply
where n is the number of terms in the model ν. any structural analysis in this phase. Later on, we have ap-
The other contribution (SG) is based on a combination of plied the linguistic normalization operations listed in section
the path length (l) between pairs of terms and the depth (d) 3, to the model signature, i.e., to the textual representation
of their subsumer (i.e. the first common ancestor), expressed of the model entities (concepts or relations) and their meta-
as number of hops. Moreover, to each linguistic property, data (label, comments). This process, conveniently applied
represented by arcs between the nodes of the DSN, a weight to each selected model, has resulted in a lexical chain for
is assigned in order to express the strength of each relation. each model. Figure 4 shows the application of these steps
We argue that not all the properties have the same strength to an excerpt of the National Cancer Institute Thesaurus
when they link concepts or words (this difference is related (NCIT).
to the nature of the considered linguistic property). The In order to apply the matching function block to the in-
weights are real numbers in the [0,1] interval and their val- put models, it is necessary to construct a prototype of the
ues are set by experiments and they are validated, from a target ontology. As already discussed in section 4.3, the
strength comparison point of view, by experts. prototype can be envisioned as a proto-ontology from which
We can now introduce the definition of Semantic Grade an integrated global ontology will result along with the se-
(SG), that extends a metric proposed in [17]: lected (reused) component ontologies. To the construction
of the proto-ontology have contributed mostly the domain
! experts by providing keywords and concept definitions re-
eβ·d(wi ,wj ) − e−β·d(wi ,wj ) lated to the food production topic. With the target ontology
SG(ν) = e−α·l(wi ,wj ) (2)
(wi ,wj )
eβ·d(wi ,wj ) + e−β·d(wi ,wj ) at hand the third block, on the one hand, helps knowledge
engineers to automatically select and evaluate the input on-
where (wi , wj ) are a pairs of word in the intersection be- tologies, while, on the other hand, provides a set of align-
tween DNS and model reference ν and α ≥ 0 and β > 0 are ments, which will be used in the integration module. It
two scaling parameters whose values have been defined by performs string and linguistic matching between the lexical
experiments. chains of the input models and that of the proto-ontology.
The final grade is the sum of the Syntactic-Semantic Grade Then it computes the Jaccard measure for each model as
and the Semantic Grade. Once we have obtained the Global the ratio between the cardinalities of the intersection of the
Grade for each semantic network, they are compared with input lexical chain (see figure 4(a)) and the target one and
a threshold value that act as a filter for the input reference their union. The intersection of the input lexical chain con-

-6-
(a) Qualitative criteria analysis
N. Reference model Formality Generality Structure Language Provenance License
1) National Cancer Institute Thesaurus Formal Domain Ontology OWL Non-stand Open
2) AGROVOC Semi-formal Domain Ontology RDF Non-stand. Open
3) Linked Recipe Schema Semi-formal Domain Ontolog RDF Other Open
4) BBC Food Ontology Semi-formal Domain Ontology RDF Other Open
5) LIRMM Semi-formal Domain Ontology RDF Other Open
6) The Product Types Ontology Semi-formal Application Ontology RDF Non-stand. Open
7) oregonstate.edu Food Glossary Informal Application Glossary Text Other Open
8) Eurocode 2 Food Coding System Informal Domain Classification Text Non-stand. Open
9) WAND Food and Beverage Taxonomy Semi-formal Domain Taxonomy Text Private companies Proprietary
10) Food technology ISO Standard Semi-formal Domain Taxonomy Text Stand. Organiz. Proprietary

(b) Linguistic Matching Analysis


NCIT Rel. AGROVOC
Food Product ≤ Food
Food Component = Food Composition
Beef ≤ Animal Product
Beef ≤ Fresh Meat
Beverage ≥ Alcoholic Beverage
Wine = Wine
Drink ≤ Alcoholic Beverage
Fruit and Vegetables ≥ Vegetable Product
Nutrient ∩ Food composition
Diary product ≥ Animal Product

Table 4: Selected Reference Models Analysis

(a) Ontology Excerpt and Lexical Chain (b) Extended Semantic Network Excerpt

Figure 4: NCIT Extended Linguistic analisys

tains all terms with have a string and a linguistic similarity the AGROVC ontologies.
greater than a certain prefixed thresholds. According to the At the moment, the results of the produced ontology has
linguistic and string analysis, the best models related to the been qualitatively validated by domain experts. We are
target ontology in this study case are: 1), 2), 6) and 10). planning to design and implement automatic tools for qual-
This result is consistent with the fact that the remaning itative and quantitative analysis of the results: a more de-
ontologies or models listed in table 4(a) are about others tailed analysis however goes beyond the scope of the present
aspects of food relating mostly with food service or receipts paper, in which we are mainly focused to validate the whole
(3), or about others not relevant aspects (7, 8) of the food integration process.
domain. The final step of the matching function block is
the extended linguistic analysis. As described in section 4.3,
this analysis converts the input and target lexical chains in 6. CONCLUSIONS
semantic networks containing an extended set of concepts This work has shown a mixed approach to ontology reuse
w.r.t. the initial set from the lexical chain. This new set exploiting some existing techniques in ontology integration
encompasses hyponyms, hypernyms, meronyms, holonyms, and outlining a semi-automatic workflow, particularly suit-
etc., retrieved from WordNet. Furthermore, the concepts of able for digital ecosystems knowledge base. We have carried
the semantic network are linked to each other with the lin- out several experimentations in a specific digital ecosystem,
guistic and semantic relations provided in the meta-model i.e. Food domain. We have thus effectively demonstrated
described in section 4.2. Figure 4(b) shows an excerpt of the that an approach based on linguistic matching can help to
direct semantic network (DNS) for the NICT lexical chain. automatize the selection of the most relevant reference mod-
The extended linguistic analysis has substantially confirmed els of a complex system, by properly distinguishing those
the choice of the previously selected model giving more rel- models that belong to a specific interpretation of the do-
evance to 10) and that is consistent with the scope of ISO main under study among others. Nonetheless, this process
data model. For this reason, the input models 1), 2) and requires a significant amount of manual work, even when it
10) have been selected as the local ontologies to be inte- deals with common and formal models. This requirement
grated by the merging and integration function block. The may be a severe limitation for a widespread adoption of the
fourth function block applies the alignments resulting from knowledge reuse and may represent a relevant technologi-
the matcher according to the section 4.2 and integrate the cal gap to be addressed by researchers in the next future.
local ontologies in the global one. Table 4(b) shows an ex- Furthermore, ambiguity, inconsistency, heterogeneity in on-
cerpt of the alignment resulting by matching the NICT and tology models need a practical technique able to overcome

-7-
these difficulties, also by testing the entire approach against [18] D. Lin. An information-theoretic definition of
other knowledge domains, in order to evaluate its practica- similarity. ICML, 98:296–304, 1998.
bility in different contexts. Finally, new matching similar- [19] D. L. McGuinness, R. Fikes, J. Rice, and S. Wilder.
ity measures will be subject of further researches with the The chimaera ontology environment. AAAI/IAAI,
aim of improving the precision and recall of the alignments, 2000:1123–1124, 2000.
comparing them with gold standard tests and, in this way, [20] G. Modoni, E. Caldarola, W. Terkaj, and M. Sacco.
ameliorating the entire approach. The knowledge reuse in an industrial scenario: A case
study. In eKNOW 2015, The Seventh International
7. REFERENCES Conference on Information, Process, and Knowledge
Management, pages 66–71, 2015.
[1] M. Albanese, P. Capasso, A. Picariello, and A. M. [21] V. Moscato, A. Picariello, and A. M. Rinaldi. A
Rinaldi. Information retrieval from the web: An recommendation strategy based on user behavior in
interactive paradigm. In Advances in Multimedia digital ecosystems. In Proceedings of the International
Information Systems, pages 17–32. Springer, 2005. Conference on Management of Emergent Digital
[2] M. Albanese, P. Maresca, A. Picariello, and A. M. EcoSystems, pages 25–32. ACM, 2010.
Rinaldi. Towards a multimedia ontology system: an [22] A. Nathalie. Schema matching based on attribute
approach using tao xml. In DMS, pages 52–57, 2005. values and background ontology. 12th AGILE
[3] T. Berners-Lee, J. Hendler, O. Lassila, et al. The International Conference on Geographic Information
semantic web. Scientific american, 284(5):28–37, 2001. Science, 1(1):1–9, 2009.
[4] E. P. Bontas, M. Mochol, and R. Tolksdorf. Case [23] N. F. Noy and M. A. Musen. Smart: Automated
studies on ontology reuse. Proceedings of the support for ontology merging and alignment. Proc. of
IKNOW05 International Conference on Knowledge the 12th Workshop on Knowledge Acquisition,
Management, 74, 2005. Modelling, and Management (KAW’99), Banf,
[5] E. G. Caldarola, A. Picariello, and D. Castelluccia. Canada, 1999.
Modern enterprises in the bubble: Why big data [24] A. Pease, I. Niles, and J. Li. The suggested upper
matters. ACM SIGSOFT Software Engineering Notes, merged ontology: A large ontology for the semantic
40(1):1–4, 2015. web and its applications. Working notes of the
[6] E. G. Caldarola, M. Sacco, and W. Terkaj. Big data: AAAI-2002 workshop on ontologies and the semantic
The current wave front of the tsunami. ACS Applied web, 28, 2002.
Computer Science, 10(4):7–18, 2014. [25] H. S. Pinto and J. P. Martins. Ontologies: How can
[7] A. Cataldo, V. D. Pinto, and A. M. Rinaldi. they be built? Knowledge and Information Systems,
Representing and sharing spatial knowledge using 6(4):441–464, 2004.
configurational ontology. International Journal of [26] A. M. Rinaldi. A content-based approach for
Business Intelligence and Data Mining, 10(2):123–151, document representation and retrieval. In Proceedings
2015. of the eighth ACM symposium on Document
[8] H. Chalupsky. Ontomorph: A translation system for engineering, pages 106–109. ACM, 2008.
symbolic knowledge. In KR, pages 471–482, 2000. [27] A. M. Rinaldi. A multimedia ontology model based on
[9] N. Choi, I.-Y. Song, and H. Han. A survey on ontology linguistic properties and audio-visual features.
mapping. ACM Sigmod Record, 35(3):34–41, 2006. Information Sciences, 277:234–246, 2014.
[10] J. Euzenat, P. Shvaiko, et al. Ontology matching, [28] G. Schreiber. Knowledge engineering and
volume 18. Springer, 2007. management: the CommonKADS methodology. MIT
[11] C. Fellbaum. Wordnet. The Encyclopedia of Applied press, 2000.
Linguistics, 1998. [29] J. Shamdasani, T. Hauer, P. Bloodsworth, A. Branson,
[12] G. Flouris, D. Plexousakis, and G. Antoniou. A M. Odeh, and R. McClatchey. Semantic matching
classification of ontology change. In SWAP, 2006. using the umls. In The Semantic Web: Research and
[13] F. Giunchiglia, P. Shvaiko, and M. Yatskevich. Applications, pages 203–217. Springer, 2009.
S-match: an algorithm and an implementation of [30] P. Shvaiko and J. Euzenat. Ontology matching: state
semantic matching. In The semantic web: research of the art and future challenges. Knowledge and Data
and applications, pages 61–75. Springer, 2004. Engineering, IEEE Transactions on, 25(1):158–176,
[14] T. R. Gruber. A translation approach to portable 2013.
ontology specifications. Knowledge acquisition, [31] G. Stumme and A. Maedche. Fca-merge: Bottom-up
5(2):199–220, 1993. merging of ontologies. IJCAI, 1:225–230, 2001.
[15] P. A. Hall and G. R. Dowling. Approximate string [32] M. C. Suárez-Figueroa, A. Gómez-Pérez, E. Motta,
matching. ACM computing surveys (CSUR), and A. Gangemi. Ontology engineering in a networked
12(4):381–402, 1980. world. Springer Science & Business Media, 2012.
[16] J. Heflin and J. Hendler. Dynamic ontologies on the [33] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A
web. In AAAI/IAAI, pages 443–449, 2000. large ontology from wikipedia and wordnet. Web
[17] Y. Li, Z. A. Bandar, and D. McLean. An approach for Semantics: Science, Services and Agents on the World
measuring semantic similarity between words using Wide Web, 6(3):203–217, 2008.
multiple information sources. Knowledge and Data
Engineering, IEEE Transactions on, 15(4):871–882,
2003.

-8-

You might also like