Professional Documents
Culture Documents
2013 TPDL Demo
2013 TPDL Demo
1 Introduction
Large amounts of cultural heritage (CH) material are now available through
online digital library portals, such as Europeana1. Europeana hosts millions of
books, paintings, films, museum objects and archival records that have been digi-
tised throughout Europe. Europeana collects contextual information or metadata
about different types of content, which the users can use for their searches.
The main strength of Europeana lays in the vast number of items it contains.
Sometimes, though, this quantity comes at the cost of a restricted amount of
metadata, with many items having very short descriptions and a lack of rich
contextual information. One of the goals of the PATHS project2 is precisely to
enrich CH items, using a selected subset of Europeana as a testbed[1].
Whithin the project, this enrichment will make possible to create a system
that acts as an interactive personalised tour guide through Europeana collec-
tions, offering suggestions about items to look at and assist in their interpre-
tation by providing relevant contextual information from related items within
Europeana and items from external sources like Wikipedia. Users of such digital
libraries may require information for purposes such as learning and seeking an-
swers to questions. This additional information supports users in fulfilling their
information need, as the evaluation of the first PATHS prototype shows [2].
In this paper we present a web service prototype which allows independent
content providers to enrich CH items. Specifically, the service enriches the items
1
http://www.europeana.eu/portal/
2
http://www.paths-project.eu
T. Aalberg et al. (Eds.): TPDL 2013, LNCS 8092, pp. 462–465, 2013.
c Springer-Verlag Berlin Heidelberg 2013
PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment 463
with two types of information. On the one hand, the item will be linked to
similar items within the collection. On the other hand, the item will be linked
to Wikipedia articles which are related to it.
There have been many attempts to automatically enrich cultural heritage
metadata. Some projects (for instance, MIMO-DB3 or MERLIN4 ) relate CH
objects with terms of an external authority or vocabulary. Some others (like
MACE5 or YUMA 6 ) adopt a collaborative annotation paradigm for metadata
enrichment. To our knowledge, PATHS is the first project using semantic NLP
processing to link CH items to similar items or external Wikipedia articles.
The current service has limited bandwidth, and provides a selected subset
of the enrichment functionality available internally in the PATHS project. The
quality of the links produce is also slightly lower, although we plan to improve it
in the short future. However, we think that the prototype is useful to demonstrate
the potential to construct a web service for automatically enriching CH items
with high quality information.
2 Demo Description
The web service takes as input one CH item represented following the Europeana
Data Model (EDM) in JSON format, as exported by the Europeana API v2.07 (a
sample record is provided in the interface). The web service returns the following:
– A list of 10 closely related items within the collection.
– A list of Wikipedia pages which are related to the target item.
Figure 1 shows a snapshot of the web service. The service is publicly accessible
following the URL http://ixa2.si.ehu.es/paths_wp2/paths_wp2.pl.
The enrichment is performed by analyzing the metadata associated with the
item, i.e., the title of the item, its description, etc. The next sections briefly
describe how this enrichment is performed.
Fig. 1. Web service interface. It consists of a text area to introduce the input item
in JSON format (top). The “Get EDM JSON example” button can be used to get an
input example. Once a JSON record is typed, click “Process” button to get the output.
The output (bottom) consists on a list of related items and background links.
References
1. Otegi, A., Agirre, E., Soroa, A., Aletras, N., Chandrinos, C., Fernando, S., Gonzalez-
Agirre, A.: Report accompanying D2.2: Processing and Representation of Content
for Second Prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/content/download/2489/18113/version/2/
file/D2.2.Content+Processing-2nd+Prototype-revised.v2.pdf
2. Griffiths, J., Goodale, P., Minelli, S., de Polo, A., Agerri, R., Soroa, A., Hall, M.,
Bergheim, S.R., Chandrinos, K., Chryssochoidis, G., Fernie, K., Usher, T.: D5.1:
Evaluation of the first PATHS prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/Resources/
D5.1-Evaluation-of-the-1st-PATHS-Prototype
3. Chang, A.X., Spitkovsky, V.I., Yeh, E., Agirre, E., Manning, C.D.: Stanford-UBC
entity linking at TAC-KBP. In: Proceedings of TAC 2010, Gaithersburg, Maryland,
USA (2010)
4. Han, X., Sun, L.: A Generative Entity-Mention Model for Linking Entities with
Knowledge Base. In: Proceedings of the ACL, Portland, Oregon, USA (2011)
5. Agirre, E., Aletras, N., Gonzalez-Agirre, A., Rigau, G., Stevenson, M.: UBC UOS-
TYPED: Regression for typed-similarity. In: Second Joint Conference on Lexical
and Computational Semantics (*SEM), Atlanta, Georgia, USA (2013)
6. Fernando, S., Hall, M., Agirre, E., Soroa, A., Clough, P., Stevenson, M.: Com-
paring Taxonomies for Organising Collections of Documents. In: Proceedings of
COLING 2012, Mumbai, India (2013)