Professional Documents
Culture Documents
CHIS A big data infrastructure to manage digital cultural items
CHIS A big data infrastructure to manage digital cultural items
highlights
• This work presents the prototype of a scalable cloud-based infrastructure for big data resource management and context-driven analysis in cultural
heritage environments.
• It leverages advanced big data management techniques and technological solutions for dealing with the information variety, velocity and volume.
• The system acquires big data from distributed and heterogeneous sources (e.g., Sensor Networks, Social Media, Digital Libraries, Multimedia Collections,
Web Data Services, etc.).
• It also provides advanced information retrieval facilities, with context-awareness in data access and analytics based on users’ preferences and on the
surrounding environment.
in a smart environment to be used by different applications, the On the other hand, regarding the challenges in the management
data must be properly processed and stored in the form of linked of big data, in recent years several projects (EUDAT, iCordi,
open data, in order to facilitate both access and semantic process- EUHIT, smartData, Projectcome, etc.) have been created to propose
ing. solutions for the effective and efficient treatment of such data,
A number of proposals, which focus on how the discussed in particular in the context of scientific data processing. More
technological solutions should be applied to the cultural domain, recent proposals have then investigated the use of big data
has been already presented for the Italian heritage. management techniques and architectures for digital cultural
For example, in the Technological District for Cultural Heritage contents, proposing applications [11].
and Activities, Lazio—DTC1 , a platform for the access to cultural Summarizing, the main research problems that can be corre-
heritage has been proposed and is focused on tourists through the lated to the management of Cultural Heritage digital contents in
realization of digital scenes, virtual reconstructions, augmented the Smart Cities context are:
reality techniques and mobile applications. • the adoption of architectural models and standards in the
Similarly, the Calabria region, in the context of the MESSIAH context of Future Internet and Big Data [12,13];
project2 , has proposed enabling methods and multi-functional • the interfacing and communication with the sensors of a
technologies to support cultural heritage identification, monitor- site [14–17];
ing and restoration. • the access, retrieval, integration and analysis of information
Indeed, the problem of evaluation and care of cultural heritage from all data sources and the correlation with spatial data
through smart city enabling technologies, is one of the most [18,19];
important issues that has to be taken into consideration, not only • the transformation of ‘‘captured’’ data in the form of knowledge
at national level, but also within the European scenario. and its management [20,21];
As a first example, Niker3 addresses the problem of the • the localization and tracking of users on a site [22,23];
protection of cultural heritage sites from the effects of seismic • the access to the knowledge based on the user profile, the
phenomena, through an integrated approach based on the use of context and the use of applications [24–26];
a Knowledge Base; in turn, H-KNOW4 proposes a similar approach • the analysis of users’ opinions from social networks [27].
for the restoration of buildings.
In this paper, we describe CHIS (Cultural Heritage Information
Other projects, based on Knowledge Management and Knowl-
System), a system to query, browse and analyze cultural digital
edge Discovery techniques aim to promote cultural heritage. Sig-
contents from a set of distributed and heterogeneous repositories.
nificant examples are: SMARTMUSEUM5 which implements a plat-
In particular, the system prototype has been developed within the
form for ‘‘knowledge exchange’’, and PAPYRUS6 , which address the
DATABENC project.10
problem of ‘‘dynamic’’ knowledge discovery from digital libraries CHIS is able to manage all the digital contents related to
and news archives. Cultural Items. More in details, in our vision each Cultural Heritage
Several projects, recently funded by the European community, environment (e.g., museums, archeological sites, old town centers,
propose methodologies and the best ways to manage and organize etc.) is grounded on a set of cultural Points of Interest (PoI), which
the knowledge related to cultural heritage. correspond to one or more cultural items (e.g., specific ruins of
ARIADNE7 aims to create an application infrastructure for the an archeological site, sculptures and/or pictures exhibited within
management of archeological data to enable archaeologists and a museum, historical buildings and famous squares in an old town
scholars of the ancient world to access the digital archives of center and so on).
European countries online and to be able to use new technologies In order to meet variety, velocity and volume of the managed
as an element of the methodology of archeological research. information, CHIS is characterized by the following technical
DC-NET (Digital Cultural heritage NETwork)8 aims to develop features that are typical of a Big Data platform [28,29]:
and strengthen coordination between public research programs
in the European countries involved in the field of digital cultural • capability to gather information from distributed and hetero-
heritage, through the development of data infrastructure and geneous data sources (e.g., Sensor Networks, Social Media Net-
services dedicated to virtual community research in digital cultural works, Digital Libraries and Archives, Multimedia Collections,
heritage. Web Data Services, etc.);
MeLa (European Museums and Libraries in/of the Age of
• advanced data management techniques and technologies;
• ability to provide useful and personalized data to users based
Migrations)9 aims to develop new strategies to organize multi-
on their preferences and context;
inter-trans-cultural conservation, presentation and transmission
• advanced information retrieval services, data analytics and
of knowledge, in ways and forms that reflect the conditions
other utilities, according to the SOA paradigm.
imposed by the migration of people and ideas in the global world.
In addition, several other projects dealing with the digitization and Regarding data sources, CHIS can manage the following kinds
use of cultural heritage have been proposed, with their emphasis of data, providing detailed information about the cultural heritage
on digital libraries and museums (EOD, Europeana 1914–1918 belonging to some specific geographic areas of Campania:
HOPE, Linked Heritage, etc.). • data coming from sensor networks deployed in a given cultural
environment;
• cultural items’ descriptions—coming from open web sources
1 http://www.futouring.it/web/filas/distretto-tecnologico. (e.g., Wikipedia) or from several digital libraries and archives;
2 www.culturaeinnovazione.it. • multimedia data – video, text, image and audio – associated
3 http://www.niker.eu/. with the various items of interest, retrieved from open (Social
4 http://www.hknow.eu/. Media Networks as Panoramio, Picasa, YouTube, and Flickr) or
5 http://www.smartmuseum.eu/. private collections;
6 http://www.ict-papyrus.eu/.
7 http://www.ariadne-infrastructure.eu/.
8 http://www.dc-net.org. 10 The High Technology District for Cultural Heritage (DATABENC) management
9 http://www.mela-project.eu. of the Campania Region, in Italy (www.databenc.it).
1136 A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145
• social data—e.g., user comments and opinions from common corresponding either to a single point or to a set of lines and more
on-line social networks like Facebook, Twitter, etc.; complex polygons of the considered environment.
• web service data—any kind of useful data gathered by means of In particular, we can adopt both ‘‘literals’’ or a set of URIs
web services. (Uniform Resource Identifiers), that allow to access the related
cultural information according to the Linked Data/Linked Open Data
Thus, CHIS provides all the necessary retrieval and presentation
(LD/LOD) paradigms, as values of the annotation attributes.
functionalities to search information of interest and present it to
the users in a suitable format and according to their needs. Definition 2.2 (Semantic Annotation). Given a set of ontologies
By means of a set of ad-hoc APIs, and exploiting the provided O , an annotation schema λO and cultural item CI, a Semantic
services, our system can support several applications: mobile Annotation of CI is a tuple λO (CI) = (a1 , . . . , an , b1 , . . . , bm ),
multimedia guides for cultural environments, web portals to where ∀i ∈ [1, n], ai ∈ dom(Ai ) and ∀j ∈ [1, m], bi ∈ dom(Bi )
promote the cultural heritage of a given organization, multimedia
recommender and storytelling systems and so on. Using various sets of ontologies/taxonomies and semantic
The paper is organized as follows. Section 2 describes the annotations, we can thus describe a cultural item from different
proposed data model. Section 3 presents the system architecture points of view supporting several applications.
and related functionalities with several implementation details. A large set of relationships can also be instantiated among
Section 4 reports a possible application of our system to support cultural items and the entire system Knowledge Base (KB) can be
a mobile multimedia guide for the archeological site of Paestum. modeled as a particular graph.
Eventually, Section 5 discusses some conclusions and the future
work. Definition 2.3 (Knowledge Base). The Knowledge Base is a graph
G = (C , R): each node c ∈ C can be a cultural item or an ontological
2. Data model attribute (concept), while each edge r ∈ R represents a relationship
derived from a semantic annotation or established between two
As previously described, the introduced data model for the cultural items.
management of cultural contents relies on the concept of ‘‘Cultural All possible relationships in the model are opportunely defined
Item’’ (CI), that is in turn related to a given ‘‘Cultural Environment’’ ‘‘a-priori’’ and the related meaning can be found in a proper
(e.g., a museum, an archeological site, an old town center, an thesaurus.
historical building etc.). Examples of CIs are specific ruins of an Fig. 1 shows how a portion of knowledge related to the Paestum
archeological site, sculptures and/or pictures exhibited within a ruins can be easily represented in our model. In particular, we can
museum, historical buildings and famous squares in an old town see some of the several cultural items a tourist in the Paestum
center and so on. ruins can be interested in: the Archeological Museum of Paestum
In the Cultural Heritage domain, a CI can be opportunely (containing evidence from the entire archeological area and related
described with respect to a variety of annotation schemata, for to a specific POI on the cultural environment map), the Temple
example the archeological view, the architectural perspective, the of Neptune, and the Tomb of the Diver fresco (the famous burial
archivist vision, the historical background, etc., that usually exploit painting belonging to the Greek period in Paestum) that is located
different harvesting sets of ‘‘metadata’’ and possibly domain within the museum. The non-ontological attributes of the temple
taxonomies or ontologies [30]. are essentially images related to planimetry (information useful
In a simplified way, we consider an ontology O = (V , E ) as a for architects) and some pictures retrieved from social media
network of concepts belonging to a given domain of interest, where networks; in turn, the fresco’s non-ontological attributes represent
a node v ∈ V represents a ‘‘concept’’ and an edge e ∈ E represents information related to Art History (images and text from digital
a relationship between two concepts11 . archives), but also comments from on-line social networks and
Thus, we define an annotation schema and a semantic annota- environmental measures collected by the sensors deployed in
tion [31] for a cultural item as follows. the museum rooms. Form the other hand, ontological attributes
are then referred to different metadata (type and period for the
Definition 2.1 (Annotation Schema). Given a set of ontologies O , an museum artifact, and type and materials of the ancient building)
Annotation Schema is a particular tuple λO = (A1 , . . . , An , B1 , . . . , whose values can be linked to the nodes of taxonomies and
Bm ), where A1 , . . . , An are attributes for which ∀i ∈ [1, n], ∃O = ontologies available for the Cultural Heritage domain. Leveraging
(V , E ) ∈ O s.t. dom(Ai ) ⊆ V (i.e., Ai assumes values corresponding different annotation schemata and available ontologies, our model
to nodes of some ontology), and B1 , . . . , Bm are attributes for which allows achieving interoperability goals [32].
∀j ∈ [1, m], ̸ ∃O = (V , E ) ∈ O s.t. dom(Bj ) ⊆ V (i.e., Bj does not The Knowledge Base content can be easily exported in the
assume values corresponding to nodes of any ontology). most used formats (e.g., XML, RDF, OWL) and according to the
most diffused harvesting standards for CH applications (e.g., EDM,
In other words, the attributes A1 , . . . , An are Ontological At- Italian ICCD, etc.). On the other hand, the LD/LOD paradigm permits
tributes (OAs) and correspond to concepts that are relevant for us to deal with several problems related to data consistency
the specific domain(s) being modeled. In turn, Non-Ontological At- and copyright constraints in an effective manner: some cultural
tributes B1 , . . . , Bm (NOAs) can contain other useful information, items descriptions are accessible only using URI, thus the data
such as measures returned by the sensors attached to cultural management issues are in charge to the related source.
items, or multimedia objects (e.g., audio, video, images, texts and
3D models, etc.) characterized by a set of low-level features and 3. System description
other metadata.
In addition, a CI may be associated with a specific ‘‘Point of Our system has to manage a large set of data characterized
Interest’’ (P oI), defined by a set of geographic coordinates, and by significant variety, volume and velocity, representing the well-
known big data features.
First of all, we have to deal with the large and heterogeneous
11 Note that a taxonomy is a particular ontology only containing ISA (con- amount of information related to the different cultural items, as an
cept–subconcept) relationships. example:
A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145 1137
• ontology based and query expansion semantic retrieval, machines available on the network. Furthermore, coping with ever
• browsing of a CI collection. increasing data volumes characterized by continuously changing
types and formats/structures, requires extremely flexible and
The data analytics layer is based on different Analytics Services
elastic storage management policies capable to follow the evolving
allowing to create personalized ‘‘dashboards’’ for a given cultural
demand and go beyond the vertical scaling limits of the traditional
environment. In addition, it provides basic data mining, graph
storage architectures. This implies the adoption of new fully
analysis and machine learning algorithms useful to infer new
distributed storage paradigms providing horizontal scalability and
knowledge.
hence virtually unlimited storage space available.
The Infrastructure as a Service (IaaS) facilities provided by mod-
3.2. Functionalities and implementation details ern cloud organizations allow multiple tenants to dynamically ob-
tain or dispose runtime/storage resources available as virtual ma-
3.2.1. Resource management chines (VMs) or containers according to a pay-per-use paradigms.
Resource management is an architectural element of paramount Depending on the cloud service providers’ commercial strategies
importance when designing big data processing solutions on a and management policies these object may be provided with fixed
wide area scale. In such a scenario, typically characterized by or variable amounts of virtual CPUs/cores, available memory and
strong runtime and storage distribution needs, the right resource disk space, so that the involved tenants can modify the number
management choice can result in significant improvements in both of VMs/containers or their available resource pool when their re-
performance and fairness. Indeed, running complex analytics ap- source demands varies over time.
plications on huge amounts of data, continuously collected from Our resource management strategy is based on a multi-tenant
hybrid and heterogeneous sources, cannot be effectively accom- cloud model, where tenants can buy multiple VMs or containers,
plished neither on a single machine nor by relying on a centralized characterized by heterogeneous resource demands (resources of
data repository. different types, such as storage space or memory), to run their
First, we need to break down processing activities into smaller applications. The fundamental resource management tasks include
tasks to be distributed and managed in parallel on multiple discovery, scheduling, allocation and performance monitoring.
A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145 1139
The discovery activity refers to the identification of a suitable definition and management activities as well as a set of slaves,
pool of resources that can be used to match the users’ demands, known as executors, that perform all the computational tasks. In
whereas the scheduling task aims at selecting the best available this scenario, RDD objects can be implemented in several different
options between the matching resources in the above pool, in ways:
order to perform provisioning to the requiring runtime objects.
• by using a file within a distributed file system, such as HDFS;
This implies allocating runtime or storage resources to specific • by partitioning them into multiple pieces to be delivered on
VMs, Containers and processes, whose usage will be continuously different nodes, through the parallelization of a collection of
monitored until their final release, after their legitimate use. elements within the driver program;
Several runtime resource management technologies exploiting • by transforming them through an operation such as Map
multiple levels of parallelism in multi-core, many-core, cluster, (passing their components through a user-provided function
grid or cloud computing architectures are available, differentiating that maps a set of element A into another set B) and
in their scalability, performance, cost, usage flexibility, robustness filter (selecting all the elements that match a user-defined
etc. [34]. Analogously, many Storage resource management predicate);
options are available for optimizing the efficiency of shared pools • by modifying the persistence characteristics of an already
of storage devices distributed throughout the network and support existing RDD, e.g., materializing a block of ephemeral data to
their flexible configuration, together with storage tiering and be used in the parallelization of some processing operations,
performance monitoring, by also forecasting future space demands or discarding it when no more needed. In detail, there are
and usage patterns. two specific operations that can be used to cope with data
In particular, YARN [35] is actually considered one of the most persistence: ‘‘save’’ that writes the dataset to HDFS after
effective and promising resource management solutions for data- evaluating it, and ‘‘cache’’ that suggest to keep a dataset in
intensive processing in distributed cloud-empowered scenarios. memory when possible.
Its Hadoop-based architecture provides the foundation enabling
big data processing applications to share computing clusters and Spark is also able to perform query processing and evaluation
storage space over the cloud while guaranteeing consistent service on big data, that can be extremely useful in optimizing huge
levels and acceptable response times. We also selected the Apache data management workflow, by also providing an high level
Spark [36] framework, developed at UC Berkley, in order to Application Programming Interface that can introduce significant
support in-memory data sharing across complex, multi-step data effects on productivity in applications development. Applications
pipelines implemented by using a direct acyclic graph abstraction, can request distributed processing operations such as map, reduce
so that multiple different applications can simultaneously work and filter by passing specific closures (i.e., functions) to the Spark
on the same datasets. Spark relies on the Hadoop Distributed File runtime framework. Like in traditional functional programming
System (HDFS) in order to provide advanced storage and access environments, such closures are typically referred to processing
functionalities within a unified and flexible big data management entities and variables that only exist within the scope in which
solution that is able to cope with a wide range of use cases and they have been instantiated, so that, when running a closure on
requirements. Essentially Spark consists in an execution engine a specific node, such entities have to be copied on the node itself.
that is able to operate both in-memory and on stable storage The overall system architecture is made by using multiple
devices. It keeps intermediate processing results in memory rather virtual machines rent by a cloud provider, each equipped with
than on semi-permanent storage, that becomes extremely useful Hadoop/YARN and Spark packages, accessing an horizontally
mainly when the same data need to be accessed multiple times scaling dynamic storage space that increases on demand, according
by different users. However, Spark is able to effectively perform to the evolving data memorization needs. It consists in a certain
number of slave virtual machines (that can be increased over time
data processing also when their aggregate volume does not fit into
in presence of changing demands) and two additional ones that
the available memory. In these cases it tries to split the data by
run as masters: one for controlling HDFS and another for resource
putting as much as possible in memory and leaving the remaining
management.
part on semi permanent storage devices. Due to its in-memory
data storage functions and near real-time processing capabilities,
it can obtain the best from the MapReduce paradigm, limiting the 3.2.2. Data gathering from big data sources
effect of expensive shuffling operations in processing activities, by One of the most important functionalities provided by our
also achieving significantly improved performances respect to the system consists in the capability of gathering the different kinds
other existing big data solutions. of data from the various Big Data sources.
Spark provides an abstraction of memorization capabilities In particular, we distinguish four classes of data sources that
based on distributed and fully reliable atomic storage elements require different wrapping techniques: Sensor & User Data, Social
known as Resilient Distributed Datasets (RDDs) that can be resident Data, Digital Repository Data and Web Data.
in memory or on stable storage, depending on their access patterns. Sensor and User Data
Several operations can be accomplished on these element by The Sensor Data generated by different devices are managed
generating new RDDs: using the ‘‘publish–subscribe’’ paradigm. For such kind of data, we
have to deal with data stream management challenges [37]; in
• through transformation operators, such as map, filter and re-
addition, event processing techniques can be exploited to reduce
duceByKey, that define the new elements without immediately
the amount of information flow [38].
processing them,
In particular, the wrapper for such kind of data is composed by a
• as values resulting from action operators like count, collect and
set of Gateways and Message Brokers together with a Coordination
save, returning their output to applications or exporting some
and Discovery module. Data generated by sensors are captured by
datasets to stable storage.
gateways and sent to Knowledge Base by message brokers; from
For efficiency purposes RDDs can be implemented according to the Knowledge Base side, a Stream Processing Engine receives
multiple caching/storage options such as, for example, MEMORY messages and convert them in a JSON format in according to our
ONLY, MEMORY and DISK and DISK ONLY. data model.
In order to support applications runtime facilities Spark creates In the current system implementation we deal only with
a master program known as driver that is in charge for RDDs Wireless Sensor Network (WSN) data sources. We leverage the PerLa
1140 A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145
15 Note that multimedia data that are managed by the system are suitably filtered
before the storing process. The number and kinds of multimedia data required by
the application are tuned by means of configuration parameters. 16 http://pro.europeana.eu/edm-documentation.
1142 A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145
environmental measures or users’ comments from an Online Social • All the basic analytics and query facilities are provided by Spark
Network to each cultural item. Similarly, the archeological aggre- based on HDFS;
gation supports the description of a cultural item by metadata that • Semantics of data can be specified by linking values of high-
are useful for the annotation work of an archaeologist. Two fur- level metadata to some available internal (managed by Sesame)
ther EDM basic concepts are considered and mapped in our model: or external ontological schemata.
Place (where the object is located corresponding to our P oI entity)
and Time Span (allowing to deal with different temporal views of a This heterogeneous Knowledge Base provides basic Restful
cultural item). APIs to read/write data and further functionalities for import-
Possibly, a certain number of web resources (e.g., images, ing/exporting data in the most common diffused Web standards.
videos, texts, etc.) can be linked to each item. Finally, metadata We chose to adopt such heterogeneous technologies in order
semantics is provided by the set of annotation schemata (in XML, to meet the specific requirements of the applications dealing with
RDF or OWL formats). For instance, the data can be represented as the huge amount of data at stake. For example, Social Networking
sequences of triples (⟨ subject, predicate, object ⟩) according to the applications typically benefit of graph database technologies
described data model. because of their focus on data relationships. In turn, more efficient
In particular, the KB is based on several technologies (see
technologies (key–value ore wide-column stores) are required
Fig. 8):
by Tourism applications to quickly and easily access the data of
• The data describing basic properties of CIs (e.g., name, short interest.
description, etc.) and basic information on users profiles are
stored into a key–value data store (i.e., Redis).
• The complete description in terms of all the metadata of CIs 3.2.4. Data processing and analytics
using the different annotation schemata are in turn saved using The search of data useful for the applications can be eased by
a wide column data store (i.e., Cassandra). We use a table for using different Information Filters that implement the right queries
each kind of CIs having a column for each ‘‘metadata family’’; to the various databases.
column values can be literals or URIs. The implementation of such filters is based on Spark, and we
• The document store technology (i.e., MongoDB) is used to deal distinguish three four kinds of query on the KB:
with JSON messages, complete user profiles and descriptions
of internal resources (multimedia data and textual documents, • query by keywords/tags: through such a query a user/application
etc.) associated with a cultural item. can search a set of CIs using keywords (as in Google search
• All the relationships among cultural items within a cultural engine) or specific tags (the query is then ‘‘expanded’’ with
environment and interactions with users (behaviors) are similar search terms leveraging the system thesauri);
managed by means of a graph database (i.e., Titan). • query by metadata: by this query a user/application can
• The entire cartography related to a cultural environment search a set of CIs using specific metadata of internal or
together with P oIs is managed by a GIS (i.e., PostGIS), which external annotation schemata (the query for OAs is semantically
provides the functionalities to filter and visualize on a map the ‘‘expanded’’ with the concepts of managed ontologies that are
geographic area around a given P oI; similar to the target one);
• Multimedia data management have been realized using the
Windsurf library [41];
• query by example: through such a query a user/application can
• We exploit an RDF store (i.e., different Allegrograph instances) search a set of multimedia contents – related to CIs – that
to memorize data views in terms of triples related to a are similar to a target one (the query processing is base on the
given cultural environment and useful for specific applications, Windsurf multimedia libraries);
providing a SPARQL endpoint for the applications; • query by user preferences: user profiles are exploited to find the
• All system configuration parameters, internal catalogs and set of cultural items that are more similar to user preferences
thesauri are stored in a relational database (i.e., PostegreSQL) using co-clustering techniques [42].
A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145 1143
The different types of query are combined by the Query Engine Doric style (the First Temple of Hera, also called Basilica, the Second
that provides a system query endpoint and support different query Temple of Hera, also known as Temple of Neptune, and the Temple of
languages. Athena), the Roman Forum with several ruins, and the Amphitheater.
Eventually, several analytics can be performed on the stored The buildings are surrounded by the remains of the city walls. In
data using Graph Analysis and Machine Learning library provided by addition, a museum near the ancient city contains many evidences
Spark. As an example, the analysis of the graph coding interactions of the Graeco-Roman life (e.g., amphorae, paintings and so on).
among users, multimedia objects and cultural items can be The ancient buildings, together with the museum and its
properly used to support recommendation or influence analysis main artifacts, constitute the set of cultural items for our case
tasks[43]. study. Tourists, both from their places and while visiting ruins,
can browse these cultural items and enjoy a useful multimedia
4. System running example guide describing them, or be recommended other nearby places,
comments of other users and other information of interest.
In this section we describe a possible application of our system As an example, suppose that some users from the nearby
to support the development of a multimedia guide for the Paestum seaside resorts would like to know more about the ruins in front of
archeological site. them, beyond their names and a very generic and basic explanation
The archeological site of Paestum is one of the major Graeco- of the main cultural items.
Roman cities in Southern Italy. Here, a set of ancient buildings is When users search a specific cultural item, as an example
the main cultural attraction for a tourist: three main temples of the Temple of Neptune, our system provide a basic description
1144 A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145
with the related multimedia objects (i.e., audio, images, video and References
texts) and detailed users’ comments. The list of proposed cultural
items with descriptions and multimedia objects depends on the [1] A. Caragliu, C. Del Bo, P. Nijkamp, Smart cities in Europe, J. Urban Technol. 18
(2) (2011) 65–82.
user’s preferences and system settings: images rather than voice, [2] H. Schaffers, N. Komninos, M. Pallot, B. Trousse, M. Nilsson, A. Oliveira, Smart
or expert-level rather than layman-level descriptions of the art cities and the future Internet: Towards cooperation frameworks for open
pieces; specific metadata and annotation schemata. In addition, innovation, in: The Future Internet Assembly, Springer, 2011, pp. 431–446.
[3] J.M. Hernández-Muñoz, J.B. Vercher, L. Muñoz, J.A. Galache, M. Presser, L.A.H.
query by example facilities can be exploited to determine other Gómez, J. Pettersson, Smart cities at the forefront of the future Internet, in: The
images that are similar to a given multimedia object. In addition, Future Internet Assembly, Springer, 2011, pp. 447–462.
a semantic based search can be performed on specific ontological [4] N. Komninos, H. Schaffers, M. Pallot, Developing a policy roadmap for
smart cities and the future Internet, in: eChallenges e-2011 Conference
attributes to find other cultural items of the same type. At the same Proceedings, IIMC International Information Management Corporation, 2011,
time, users can choose to retrieve some interesting information, IMC International Information Management Corporation.
to read comments, opinions and ratings about the visited cultural [5] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D.
Patterson, A. Rabkin, I. Stoica, et al., A view of cloud computing, Commun. ACM
items and to express their own ratings and opinions that will be 53 (4) (2010) 50–58.
posted on social networks. [6] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, M.B.
Fig. 9 shows a running example (obtained by assembling Srivastava, Participatory sensing, in: Proceedings of the Workshop on World-
Sensor-Web, WSW 2006, Boulder, Colorado, USA, Nov. 2006, pp. 6–10.
different screenshots of our system) concerning the search of CIs [7] C. Bizer, T. Heath, T. Berners-Lee, Linked data-the story so far, Semant. Serv.
related to the Paestum ruins. Users can browse the data by means Interoperability Web Appl.: Emerg. Concepts (2009) 205–227.
of an appropriate GUI; they can filter objects belonging to a given [8] T. Berners-Lee, J. Hendler, O. Lassila, et al., The semantic web, Sci. Am. 284 (5)
(2001) 28–37.
CI using different criteria: type of multimedia data, language, size, [9] S. Lohr, The Age of Big Data, New York Times, 2012.
and so on. [10] C. Bizer, P. Boncz, M.L. Brodie, O. Erling, The meaningful use of big data: four
Thus, the information can be dynamically tailored on the users perspectives–four challenges, ACM SIGMOD Rec. 40 (4) (2012) 56–60.
[11] F. Colace, M.D. Santo, V. Moscato, A. Picariello, F.A. Schreiber, L. Tanca,
depending on their preferences and needs. As an example, young Patch: A portable context-aware atlas for browsing cultural heritage, in: Data
people with high-school education and basic notions of history and Management in Pervasive Systems, 2015, pp. 345–361.
art could receive only some general concepts about the cultural [12] L. Tan, N. Wang, Future internet: The internet of things, in: 2010 3rd
International Conference on Advanced Computer Theory and Engineering, vol.
items, while CH experts could be interested in more detailed 5, (ICACTE), IEEE, 2010, pp. V5–376.
descriptions and richer multimedia data. [13] L. Atzori, A. Iera, G. Morabito, The Internet of things: A survey, Comput. Netw.
Eventually, other cultural items of interest can be automatically 54 (15) (2010) 2787–2805.
[14] J. Zhang, V. Varadharajan, Wireless sensor network key management survey
suggested using recommendation techniques that exploit sontext and taxonomy, J. Netw. Comput. Appl. 33 (2) (2010) 63–75.
information, user preferences and needs and items’ features [42]. [15] J.J. Rodrigues, P.A. Neves, A survey on ip-based wireless sensor network
All the information and functionalities (provided as API) can be solutions, Int. J. Commun. Syst. 23 (8) (2010) 963–981.
[16] N. Mohamed, J. Al-Jaroodi, A survey on service-oriented middleware for
exploited by third-part applications such as a multimedia guide for wireless sensor networks, Serv. Oriented Comput. Appl. 5 (2) (2011) 71–85.
the Paestum ruins. [17] H. Alemdar, C. Ersoy, Wireless sensor networks for healthcare: A survey,
The system prototype is currently running on a cloud-based Comput. Netw. 54 (15) (2010) 2688–2710.
[18] S. LaValle, E. Lesser, R. Shockley, M.S. Hopkins, N. Kruschwitz, Big data,
big data processing environment based on Spark. In particular, we analytics and the path from insights to value, MIT Sloan Manage. Rev. 52 (2)
exploited 5 computing nodes, each one composed by 8 cores and (2011) 21.
15 GB RAM. [19] P. Russom, et al. Big data analytics, TDWI Best Practices Report, Fourth Quarter,
2011, pp. 1–35.
[20] M. Damova, A. Ontotext, B.A. Kiryakov, M. Grinberg, M.K. Bergman, F. Giasson,
5. Conclusions and future work K. Simov, Creation and integration of reference ontologies for efficient lod
management, Semi-Automat. Ontol. Dev.: Process. Resour.: Process. Resour.
(2012) 162.
In this paper we showed an application and some examples of [21] R. Ruggles, Knowledge Management Tools, Routledge, 2009.
[22] H.M. Khoury, V.R. Kamat, Evaluation of position tracking technologies for user
the use of Big Data technologies for the Cultural Heritage domain. localization in indoor construction environments, Autom. Constr. 18 (4) (2009)
In particular, we described CHIS—a scalable prototype for the 444–457.
[23] L.M. Ni, D. Zhang, M.R. Souryal, Rfid-based localization and tracking
management and context-driven browsing of cultural environ-
technologies, IEEE Wirel. Commun. 18 (2) (2011) 45–51.
ments. The system is characterized by several features that are [24] K. Kabassi, Personalisation systems for cultural tourism, in: Multimedia
typical of the modern big data systems: (i) information gather- Services in Intelligent Environments, Springer, 2013, pp. 101–111.
ing from distributed and heterogeneous pervasive data sources [25] F. Ricci, L. Rokach, B. Shapira, Introduction to Recommender Systems
Handbook, Springer, 2011.
(e.g., Sensor Networks, Social Media Networks, Digital Libraries [26] C. Becker, F. Dürr, On location models for ubiquitous computing, Pers.
and Archives, Multimedia Collections, Web Data Services, etc.); (ii) Ubiquitous Comput. 9 (1) (2005) 20–31.
context-awareness, provisioning of useful and personalized data [27] J. Scott, Social Network Analysis, Sage, 2012.
[28] C.P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, techniques and
and services, on the base of users preferences and of the surround- technologies: A survey on big data, Inform. Sci. 275 (2014) 314–347.
ing environment; (iii) advanced data management techniques and [29] F. Colace, M. De Santo, V. Moscato, A. Picariello, F.A. Schreiber, L. Tanca,
technologies for dealing with the information variety, velocity and Pervasive systems architecture and the main related technologies, in: Data
Management in Pervasive Systems, Springer, 2015, pp. 19–42.
volume; (iv) advanced information retrieval facilities, data analyt- [30] M. Doerr, S. Gradmann, S. Hennicke, A. Isaac, C. Meghini, H. van de Sompel,
ics and other utilities/services. The Europeana data model (edm), in: World Library and Information Congress:
We detailed all the design choices and characteristics of 76th IFLA General Conference and Assembly, 2010, pp. 10–15.
[31] F. Amato, A. Mazzeo, A. Penta, A. Picariello, Building rdf ontologies from semi-
the implemented big data infrastructure, providing a system structured legal documents, in: 2008 International Conference on Complex,
running example concerning the cultural data management of the Intelligent and Software Intensive Systems, 2008.
archeological site of Paestum. [32] F. Amato, A.R. Fasolino, A. Mazzeo, V. Moscato, A. Picariello, S. Romano, P.
Tramontana, Ensuring semantic interoperability for e-health applications,
Future work will be devoted to collect the huge amount of data in: 2011 International Conference on Complex, Intelligent and Software
related to all the different cultural objects of the Campania region Intensive Systems, (CISIS), IEEE, 2011, pp. 315–320.
and to experiment our system from the efficiency and effectiveness [33] X.L. Dong, D. Srivastava, Big data integration, in: 2013 IEEE 29th International
Conference on Data Engineering, (ICDE), IEEE, 2013, pp. 1245–1248.
points of view with respect to the information retrieval and [34] J.L. Reyes-Ortiz, L. Oneto, D. Anguita, Big data analytics in the cloud: Spark on
filtering tasks providing a comparison with other systems. hadoop vs mpi/openmp on beowulf, Procedia Comput. Sci. 53 (2015) 121–130.
A. Castiglione et al. / Future Generation Computer Systems 86 (2018) 1134–1145 1145