RDF 4 Embedded

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/328248344
RDF4Led: an RDF engine for lightweight edge devices
Conference Paper · October 2018

DOI: 10.1145/3277593.3277600
CITATIONS READS
13 1,134
4 authors, including:
Le Tuan Anh Marcin Wylot

Technische Universität Berlin Technische Universität Berlin
16 PUBLICATIONS 90 CITATIONS 26 PUBLICATIONS 490 CITATIONS
SEE PROFILE SEE PROFILE
Danh Le Phuoc
Technische Universität Berlin
86 PUBLICATIONS 4,077 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Computing Foundations for Semantic Stream Processing (COSMO) View project
SMARTER: A Scalable and Elastic Platform for Near-Realtime Analytics on The Graph of Everything View project
All content following this page was uploaded by Le Tuan Anh on 25 February 2019.
The user has requested enhancement of the downloaded file.

RDF4Led: An RDF engine for Lightweight Edge Devices
Anh Le-Tuan Conor Hayes
Insight Centre for Data Analytics Insight Centre for Data Analytics
National University of Ireland, Galway National University of Ireland, Galway
anh.letuan@insight-centre.org conor.hayes@insight-centre.org
Marcin Wylot Danh Le-Phuoc

Technical University of Berlin Technical University of Berlin
m.wylot@tu-berlin.de danh.lephuoc@tu-berlin.de
ABSTRACT applications [29]. Just as RDF has been used to integrate het-
Semantic interoperability for the Internet of Things(IoT) is erogeneous web data sources, RDF engines have been pro-
being enabled by standards and technologies from the Se- posed as semantic integration gateways for IoT data [16].
mantic Web. As recent research suggests a move towards
decentralised IoT architectures, our focus is on how to en- Despite the existence of many centralised or cloud based so-
able scalable and robust RDF engines that can be embed- lutions, recent research suggests that IoT is better served by
ded throughout the architecture, in particular at edge nodes. decentralised architectures [28, 22]. Placing computational
RDF processing at edge enables the creation of semantic in- nodes closer to source devices offers opportunities to improve
tegration gateways for locally connected low-level devices. performance and to reduce network overhead, but also flexi-
We introduce a lightweight RDF engine, which comprises bility for the continuous integration of new IoT devices and
of RDF storage and SPARQL processor, for the lightweight data sources.
edge devices, called RDF4Led. RDF4Led follows the RISC- Lightweight devices such as ARM boards that can serve as
style (Reduce Instruction Set Computer) design philosophy. computational nodes, have been getting cheaper and smaller
The design comprises a flash-aware storage structure, an in- whilst more powerful. For example, a Raspberry PI Zero [26]
dexing scheme and a low-memory-footprint join algorithm or a C.H.I.P computer [8] costs less than 15 euro and is
which improves scalability as well as robustness over compet- smaller than a credit card. Devices such as this are power-
ing solutions. With a significantly smaller memory footprint, ful enough to run a fully functioning Linux distribution and
we show that RDF4Led can handle 2 to 5 times more data can be placed on the network edge as a processing gateway
than RDF engines such as Jena TDB and Virtuoso. On three for other IoT devices (e.g. sensors and actuators). For exam-
types of ARM boards, RDF4Led requires 10-30% memory ple, such a semantic integration gateway device may be used
of its competitors to operate up to 30 million triples dataset; for outdoor ad-hoc sensor network. On a street or a traffic
it can perform faster updates and can scale better than Jena junction, this gateway can be easily fitted in the lamp pole to
TDB and Virtuoso. Furthermore, we demonstrate consider- share the power source with a street lamp powered by a small
ably faster query operations than Jena TDB. solar panel.
Author Keywords Despite, their advantages in terms of power consumption, size
RDF Engine; Edge Device and cost-effectiveness, lightweight edge devices are signifi-
ACM Classification Keywords cantly underspecified in terms of the memory and CPU de-
H.2.2 Physical Design: Access methods; H.2.4 Systems: mands of the available RDF engines. The challenge that we
Query Processing; H.3.4 Systems and Software: Performance address in this paper is how to build a dedicated class of RDF
evaluation engine optimised to the hardware constraints of lightweight
edge devices.
INTRODUCTION Lightweight edge devices are distinctive from desktop work-
Semantic interoperability for the Internet of Things(IoT) is stations in two major ways: (i) they have a significantly
being enabled by standards and technologies from the Se- smaller amount of main memory and (ii) they are equipped
mantic Web project [5]. This has led to several efforts to with lightweight flash-based storage as a secondary memory.
integrate Semantic Web technology into IoT platforms and Besides integration and processing of data, network edge de-
vices are required to execute frequent update operations. For
example, due to adding new devices into the network or to
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed updating new data from connected sensors. To manage large
for profit or commercial advantage and that copies bear this notice and the full cita- static RDF datasets, existing RDF engines build sophisticated
tion on the first page. Copyrights for components of this work owned by others than indexing mechanisms that consume a large amount of main
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission memory and are expensive to update. Directly applying the
and/or a fee. Request permissions from permissions@acm.org. same approach on a memory-constrained device causes sys-
IOT ’18, October 15–18, 2018, Santa Barbara, CA, USA
c 2018 ACM. ISBN 978-1-4503-6564-2/18/10. . . 15.00
DOI: https://doi.org/10.1145/3277593.3277600
tem paging behaviours or out-of-memory errors that heavily tionary is often tied with Input handler, Query P arser to
penalise performance. encode the RDF resources in RDF documents or SPARQL
queries and Output handler to return the original form of
Furthermore, the I/O behaviours of flash-based storage, RDF resources. This technique reduces the storage space
specifically the erase-before-write limitation [6], degrade the required for RDF triples and makes comparisons (for joins)
efficiency of disk-based data indexing structures and caching more efficient.
mechanisms [12]. For instance, flash-based storages store in- As with a conventional
formation in arrays of electric memory cells. Cells are or- database system, the al-
Input Query Output
ganised into pages and pages are grouped into blocks. A gorithms and techniques
page is the smallest unit that can be read from or written to that build the Physical
flash memory. A block is the smallest structure that can be Storage, the Buffer Man-
Dictionary
erased from such storage. On flash memory, updating a single ager or the Query Execu-
page in the block is not possible; instead one has to erase the tor of an RDF Engine, are
whole block and only then newly updated data can be writ- Query Executor optimised to cope with
ten to this block. Thus, write-in-place operation, that updates the nature of the data,
a single piece of data in a block, consists of two operations in this case RDF, and
on the entire block: erase and write. Due to the flash I/O Buffer Manager the particular hardware of
behaviours, the commonly used indexing structure in RDF the machine on which it
triple store [27], B+ Tree, is not optimal for flash-based stor- runs [10, 25]. Exist-
age [14]. Physical RDF Storage ing RDF engines are op-
Inspired by the RISC design (Reduced Instruction Set Com- timised for CPU worksta-
puter) of ARM computing boards, we introduce a RISC-style Figure 1. RDF Engine architecture tions or cloud infrastruc-
approach similar to [23] for building RDF engines. However, tures that are equipped with a massive amount of RAM and
in contrast to [23] we focus on optimising robustness (min- multiples disk drivers enabling them to store datasets of bil-
imising memory consumption) and scalability (maximising lions of triples and answer complex SPARQL queries. How-
data capability), and not just processing performance. ever, Anh et al. [18, 17] showed that RDF engines suffered
Our approach is based on the redesign of the storage and in- performance issues when they run on lightweight devices due
dexing schemes in order to optimise flash I/O behaviours. to the significantly constrained hardware settings.
This has led to an improved join algorithm with significantly There have been several efforts to enable RDF data process-
lower memory requirements. Our RDF engine, RDF4Led, ing for lightweight devices. These approaches tended to sac-
has a small code-foot-print (4MB) and can outperform an rifice the features of a full-featured RDF engine rather than
RDF engine such as Jena TDB. Experiments show that to create a tailored version for the devices that they target.
RDF4Led requires less than 30% memory of competing RDF Mobile RDF [21] and µJena [19], the RDF libraries for mo-
engines when operating on the same scale of data. bile applications, are notable examples for such approaches.
The remainder of the paper is structured as follows. In the These implementations offered a limited set of functions for
second section, we present the fundamentals of RDF engines manipulating RDF data. For example, both lack SPARQL
and explain why the CPU-based, cloud-based approaches are query processing capabilities. On the other hand, Andro-
not suitable for lightweight edge devices. We discuss sev- Jena [2], an adoption of Jena for Android Os, offered the
eral pieces of related work. The third section introduces an full functionality of the original Jena framework. Crucially, it
overview of our RISC-style approach. After that, in the fol- failed to provide sufficient scalability. On a popular Android
lowing sections, we go into detail on our design for flash- device such as Table 7 [24], AndroJena only can store up to
aware storage, the indexing structure and the algorithm for 200 thousand of triples [17].
dynamically computing the joins. In the evaluation section, In earlier work, we pointed out the performance issues
we present and discuss the results of our empirical experi- of directly porting PC-based implementation to small de-
ments of RDF4Led against other engines on different types vices. [17]. The RDF On-The-Go system is a native RDF
of devices. Finally, the conclusion and the outlook for future storage system for Android phones that can store up to 5 mil-
work are presented in the final section. lion RDF triples.
BACKGROUND AND RELATED WORK Wiselib TupleStore [13] and µRDF [7] are recent RDF en-
There has been a significant attention paid to RDF data man- gines that have been tailor-built for resourced constraint IOT
agement, storage and query processing in recent years. We devices. They can store and process up to a thousand RDF
refer the reader to a recent survey of RDF data management triples on micro controllers with less than 100 kB RAM and
techniques and researches [27]. In general terms, the archi- 100 MHz CPU. However, due to the hardware limitations of
tecture of an RDF engine can be illustrated as in Figure 1. the targeted devices, these approaches were limited in terms
At the bottom layer, an RDF engine has a P hysical RDF of scalability and lacked full functionality of the semantic
Storage as the secondary memory to store persistent data. data processing, such as support for SPARQL queries.
The Physical RDF Storage is often coupled with Buf f er
M anager for managing in-memory data. The Buffer Man- RISC-STYLE APPROACH OF RDF4LED
ager caches the in-use data to reduce disk access when writ- Aligned with the RISC-style design philosophy used in [23],
ing to Physical Storage or when it is being read by the Query implementations of the features for an RDF engine are cen-
Executer. Typically, an RDF engine will use a Dictionary tralised around data access and join operations. On top of
to translate the string-based RDF resources identifiers into en- that, processing loads and resource consumptions are mainly
coded identifiers in the form of integers or longs. The Dic- consumed by these operations. Henceforth, our RDF engine
for lightweight devices, RDF4Led, is created by focus on de- To maintain the index for the triples in each layout, we specif-
signing sophisticated components for these operations and ically introduce a Physical Layer and a Buffer Layer. The
use simple implementations for the rest, thus reducing soft- Physical Layer stores data directly on flash storage (Physi-
ware size. cal RDF Storage) and the Buffer Layer operates in the main
In general, RDF4Led has the same architecture as those of memory (Buffer Manager) and has the following main roles:
traditional RDF engines illustrated in Figure 1. We reuse (i) grouping and caching atomic data updates before writing a
the Dictionary techniques to transform RDF resources into block; (ii) indexing the data stored on the Physical Layer and
encoded integers. The string representations of the RDF re- (iii) caching recently used data for read performance. This
sources are kept separately on the flash memory. The key allows us to group multiple updates within a block into one
components that differentiate our approach from traditional erase-and-write operation and to improve read performance
RDF engines are the Physical RDF Storage, the Buffer Man- through the cache.
ager and the Query Executor. These are specifically built to Logical Physical Layer: To achieve
o1 p1 representation
be suitable with the nature of the lightweight edge device as high compression of triples
discussed above. o2
p1
s1
p3
o6 on the flash storage we
To reduce the storage space, we use a very compact format for p2 p2 leverage the molecule-
p1
storing a list of RDF triples known as an RDF molecule [30]. o1
o4 o5
based storage model. RDF
To adapt to the flash I/O behaviour, the molecules are organ- molecule [30] is a hybrid
Physical data structures, it stores a
ised into block units whose size are equal to the flash erase s1 p1 o1 o2 o3 ; p2 o4 o5 ; p3 o6 . layout
block sizes. On top of that, we use an in-memory caching compact list of properties
Figure 2. Logical and physical and objects related to a
mechanism to cache the atomic data and to cluster the writes of a molecule
in order to improve the write performance. To reduce the subject, i.e., the root of
memory required to maintain the index of data in flash stor- the molecule. Molecule clusters are used in two ways: to
age, we use an alternative index structure that is based on the logically group sets of related resources, and to physically
Block Range Index(BRIN) approach [20]. The basic idea of co-locate information related to a given object. Physically,
BRIN is to summarise the information of a data block of per- we represent a molecule as a list of co-located integers
sistent storage (e.g. its location) into a small tuple. The result corresponding to S, P, and O (Figure 2). In such a way, we
is that we minimise the amount of memory required to main- avoid storing repetitive values multiple times. Moreover, we
tain the index of the data. enable further data compression, e.g., by storing deltas of
Managing memory usage of the RDF engine is the vital factor sorted integers instead of full values.
to achieve the robustness and scalability. The Buffer Manager In the Physical Layer, we store sorted molecules into contigu-
is used to buffer updated data or to cache data read from Phys- ous pages (read units) which are grouped into blocks (erase
ical Storage. Its primary role is to keep the engine safe from units). Moreover, all entities in molecules are also sorted
crashing by unexpected out-of-memory exceptions. When to improve search performance. Figure 3 is an example of
needed, it flushes data to the Physical RDF Storage to reclaim an SPO layout. In this example, each block in the Physical
free memory. Writes are prioritised by a buffer replacement Layer contains four pages and each page stores molecules.
policy designed to reduce the number of overwrites on the The molecule in Figure 2 is stored in page 0 of the Physical
same block data as well as the number of read from the Phys- Layer.
ical Storage.
s1 s2 s3 s3 s4 s5
The join operator that computes the joins of the RDF triples Buffer Layer p1 p1 p2 p3 p5 p5
that match the triple query patterns, is typically the most re- o1 o5 o1 o2 o2 o6
source intensive among the operators of a SPARQL query. To 0 1 2 3 4 5
reduce computational cost, our approach is to avoid caching

the intermediate results of joins. The Query Executor uses
a nested execution model [11] for joining and processes the Physical Layer
join in one-tuple-at-a-time fashion[4]. In each run, the Query
Executor adaptively chooses the next triple pattern to probe 0 1 2 3 4 5
s1 p1 o1 o2 s2 p1 o5; .. s3 p2 o1; .. s3 p3 o2; s4 p5 o2; .. s5 p5 o6; ..
and to scan. The Buffer Manager is also tightly coupled with o3; p2 o4 o5; .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
the Query Executor to provide cached data in the buffer for p3 o6.
efficient use of memory.

Figure 3. The two-layers storage model.
STORAGE LAYOUT AND INDEXING Buffer Layer: Similarly to the idea of BRIN, the Buffer
Following the approach of multiple-indexing [27], RDF4Led Layer summarises the information of the data in the Physi-
stores RDF triples in three storage layouts as sorted permu- cal Layer. In the Buffer Layer, we keep the information about
tations of triples: SPO (Subject - Predicate - Object), POS, the first triple of a molecule in a page and the first page in
and OSP. Three permutations are sufficient to cover all query a block. The Buffer Layer maps physical addresses of pages
patterns, e.g., the SPO layout can be used to cover the triple and blocks, thus it acts as an index for the Physical Layer.
query patterns with the bound subject (s ? ?) and the bound We distinguish three types of entries in the Buffer Layer: tu-
subject-predicate (s p ?). Although, storing all six combi- ple entry, page entry and block entry. A page entry is an entry
nations may answer complex queries more effectively, using that refers to the beginning of a page in the Physical Layer,
only three consumes less storage space and decreases the cost it contains the first triple in the first molecule of the page. A
of updates, i.e., we must update only three data structures in- block entry is a page entry with an extra field indicating that
stead of six, which is crucial for flash storage. this page is the first page a block. A tuple entry contains an
atomic triple and a value indicating whether this triple has moved to the Physical Layer. When we access a block from
been modified. In Figure 3, the gray columns represent block the cold part or from the Physical Layer, we place it in the hot
entries and white columns represent page entries. The first part of the Buffer Layer.
grey column is the block entry that indicates to the molecule The blocks are released from the hot part to the cold part
in the page 0, the first page of a flash block. For fast lookups and from the cold part to the Physical Layer by the follow-
on Buffer Layer, all triples are sorted. Moreover, this main- ing prioritised criteria: (i) the clean/unmodified blocks; (ii)the
tains the logical order of the triples while flash block in the block with higher number of atomic triples in buffer; (iii) the
Physical Layer is not necessary to be clustered. That also higher density blocks, define by the ratio between the num-
allows us to group and commit sequential pages containing ber of triples in a block and the capacity of the block i.e:
modified triples and belonging to the same block, within one densityBlockA = #triples BlockA
write operation. capacityBlockA ; (iv) the last recently used

blocks.
Index lookup: To retrieve RDF triples that match a triple
pattern we execute a lookup on a corresponding layout of Such order allows us to keep dirty/modified blocks in the
molecules. For example, a triple pattern(s, p, ?) is executed buffer as long as possible in order to delay write operations
on the SPO layout, or triples that match pattern (?, p, o) can and group more updates within dirty blocks. In case we need
be found with POS layout. As triples are sorted, the matched to release memory, we always refer to the cold part and we
tuples are retrieved as a sublist. For instance, the matched tu- begin from the beginning of the priority list. In consequence,
ples of the triple pattern (s, p, ?) are the tuples on the SPO we prioritise clean/unmodified blocks, as we do not have to
layout which are of a form (s, p, oi ) and they are located be- perform any write operation for them, we just release the
tween (s, p, omin ) and (s, p, omax ), where omin and omax memory they occupy. Then, we prioritise blocks that contain
are the smallest and the greatest object’s identifiers within the many triples, i.e., high-density blocks, as they group multiple
layout. In other words, the sublist is computed by finding the updates into one erase-write operation. The higher density
lower and the upper bound positions of the matched tuples on means the less chance that the next triple will jump into that
the layout. As in the example, the lower bound position of block.
the matched sublist is the position of the tuple (s, p, omin ) ADAPTIVE STRATEGY FOR ITERATIVE JOIN EXECUTION
and the upper bound position is the position of the tuple (s, p, The most resource-intensive task of answering a SPARQL
omax ). The matched triples are extracted from the layout by query is to perform the graph pattern matching over the RDF
probing the range tuple by tuple from the lower to the upper dataset. The graph matching operator executes a series of
bound positions. join operations between RDF triples that matches the triple
A position of a tuple is identified with a page, a molecule, and patterns. Join operations have the greatest impact on the over-
the exact position of the tuple within the page. To search for all performance of a SPARQL query engine, typically requir-
the lower bound position, we replace variables of the triple ing a large number of comparison operations that can only be
pattern with Integermin (the minimal integer value that can done efficiently if records are stored in memory.
be used as an ID) and we do a simple search with the tuple. The join performance can be tuned by optimisation algo-
For the upper bound, we replace the variables by Integermax rithms which plan optimal join orders and join algorithms.
(the maximal integer value that can be used as an ID). The These approaches assume that memory is always available
search first finds the page that contains the tuple by searching during the course of the execution of a chosen query plan.
in the Buffer Layer. The page entries within the Buffer Layer However, in light-weight computing devices memory is criti-
points to the first triple of the first molecule of each page, such cally low and, as such, the memory resource available for an
entries are also sorted. Therefore, to find a page we perform a RDF engine is unreliable, e.g. a surge of number network
binary search over the page entries in the Buffer Layer. Then, connections to the device might drain out available memory
we read the page from the Physical Layer and we perform the for all other running processes. Lack of memory may block
second binary search to find the exact position of the tuple join operations that require temporary virtual memory such
within the page. as hash-joins or sorted-merge joins, and thus hurts the over-
WRITE MANAGER all performance of the query engine or probably crashes the
The write from Buffer Layer to Physical Layer is handled by engine.
adapting to the I/O behaviors of flash memory using the fol- Materialisation techniques that write intermediate join results
lowing basic principles: (i) minimise the number of physical to storage is an attractive solution for the issue of memory
writes to physical storage; (ii) group multiple updates within shortage [10]. However, on flash storage, writing is much
one write operation; (iii) keep a relatively high hit ratio for the slower if a random write happens. Furthermore, a limited
data in the buffer. We use the Buffer Layer to delay write op- number of erase operations can be applied to a block of flash
erations and to group many updates in blocks, hence, to miti- memory before it becomes unreliable.
gate the issue of single erase-before-write operations. For the To minimise the memory required to execute a SPARQL
high hit ratio, we keep the data block with the higher chance query making the best usage of the indexing scheme intro-
to be accessed and modified in the future. duced in previous section, we adopt the one-tuple-at-a-time
In order to keep track on the access frequency among blocks, paradigm to compute the join. This approach can reduce the
we divide the buffer into two parts: cold and hot. In the memory consumption as no virtual temporary memory is re-
hot part, we keep blocks that have been recently accessed, quired to buffer the intermediate join results. The basic idea
hence we do not want to move them to the Physical Layer. of the algorithm to compute the join of a graph pattern is as
We organise data in a flat fashion within the hot part, i.e., we follows. A mapping solution (mapping for short) is contin-
keep sorted triples instead of molecules to speed up atomic uously sent to visit each triple pattern of the graph pattern.
updates. The cold part contains molecule blocks that can be In each visit, it searches for triples matching with the triple
pattern. For each matched triple, variables in the triple pat- an index search key pattern, pkey , is created (line 6). An in-
tern and the corresponding value in the triple are added to the dex lookup on pkey is executed to search for the upper bound
mapping. The mapping with new values will be sent to visit and lower bound positions of the set of the matching triples
the next triple pattern, or be returned as query result when all in the index, as described in the previous section (line 7). The
triple patterns have been visited. size of the index lookup I is defined as the range between
the upper bound and lower bound positions (lined 8-10). The
Algorithm 1: Join propagation function returns the triple pattern that has the minimal size of
1 Function propagate(µ, P) the index lookup at line 12.
The join propagation algorithm is similar to the nested iter-
input : µ : mapping, P : set of triple query patterns ations. Nested loop join is often argued to give poor perfor-
output: µ : mapping mance as it does not attempt to prune the number of compar-
2 if isEmpty(P) then isons. However, equipped with an efficient index scheme, an
3 return µ; index nested loops join can perform as well as other join algo-
4 p ← findNextPattern(µ, P); rithms [11]. With the design of our storage, the index lookup
5 pkey ← createKey(µ, p); can be done mostly within the buffer layer, only two extra
6 T ← indexScan(pkey ); I/Os may be required. The visitor pattern, that sends a map-
7 P0 ← P \ {p}; ping to visit each triple pattern and to execute index lookup,
8 for t ∈ T do reduces the extra memory for the joins as only a mapping is
9 µ ← bindMapping(t, p); kept in the main memory. This mechanism also enables the
10 propagate(µ, P0 ); adaptivity for the joins. The function findNextPattern(µ, P)
11 µ ← resetMapping(t, p); decides which triple pattern the mapping should visit first.
Similarly to routing policy of the stream processing engines,
e.g Eddies [4], this function defines the propagating policy to
The abstract of the join propagation algorithm is given in the achieve a certain optimisation purpose. In our case, we at-
Algorithm 1. The propagate(µ, P) function is used to recur- tempt to minimise the number of the propagations by choos-
sively propagate the input mapping. The function starts with ing the shortest index scan in each run. Note that, this is the
an empty mapping µ and a set of unvisited triple patterns P. key place holder to add sophisticated optimisation algorithms,
For each run, it checks and returns the input mapping as a re- e.g. adaptive caching algorithm to be discussed in our future
sult if there is no triple pattern left to visit (line 2-3). Based work.
on the given input mapping, it looks for the optimal unvisited EVALUATION
triple query pattern to visit (line 4). To search for triples com- We observe that the operating system for a type of device is
patible with pattern p, an index search key pkey is created usually specifically built to meet its hardware configuration.
by replacing the variables in p according to µ (line 5). For For example, by default Raspbian is installed on the Rasp-
each matched triple t, the corresponding variables and values berry Pi Zero while a Galileo Gen II is running Yocto. Mean-
are bound into the mapping. Then another propagation of the while, a Java virtual machine is available on most of the edge
mapping to the remaining unvisited triple patterns is called devices, and Java is platform independent. Hence, we choose
(line 8-11). to implement our approach in Java to take its advantage of
”compile once run anywhere” that enables the portability
Algorithm 2: Find the next triple pattern for our engine, RDF4Led. We compare RDF4Led against
1 Function findNextPattern(µ, P) Virtuoso [9] (open-source edition, v6.0) and Jena TDB [15]
input : µ : mapping, P : set of triple query patterns (v3.1.0). Note that, we developed RDF4Led by reusing Jena
output: P : triple query pattern TDB code base following the RISC-style design as presented
2 pnext ← null; in the previous sections whereby we selectively chose the re-
3 smin ← Integermax ; quired components and modify them as they fit. The size of
4 for p ∈ P do RDF4Led is 4MB while the size of Jena is 13 MB and Virtu-
5 if isShared(µ, p) then oso is 180 MB. The experiments are conducted on three types
6 pkey ← createKey(µ, p); of lightweight computing devices: Intel Galileo Gen II(GII),
Raspberry Pi Zero(Pi0), and Beagle Bone Black(BBB).The
7 I ← indexLookU p(pkey ); configurations of each device are summarised on table 1. We
8 s ← sizeOf (I); choose these devices because they are representatives for re-
9 if s < smin then source barriers of IoT gateways in terms of size, memory and
10 smin ← s; cost.
11 pnext ← p; In the following, we present the experimental setup and
then report and discuss the evaluation results. Note that
12 return pnext ; we also evaluated RDF4Led with other benchmarks and
datasets. Due to the page limit, we only present the evalu-
In each run of the propagation algorithm, the function ation with the WatDiv benchmark [1]. WatDiv benchmark
findNextPattern(µ, P) is called to find the optimal triple pat- provide queries with different complexity which is defined
tern to execute the propagation (see Algorithm 2). For each by the structure of the graph query pattern. Hence, com-
triple pattern p in P, the set of triple query patterns , the func- paring to other benchmarks, WatDiv provides a better in-
tion searches for a triple pattern that shares variables with the sight into the performance of the joins with different com-
input mapping µ at line 5. With each shared pattern found, plexity. All experiments presented in this paper are repro-
(a) Input throughput on Intel Galileo Gen II (b) Input throughput on PI Zero (c) Input throughput on BeagleBone Black
1k 2k 3k
Number of triples per second

RDF4Led RDF4Led RDF4Led
JENA-TDB JENA-TDB JENA-TDB
VIRTUOSO VIRTUOSO 2.5k VIRTUOSO
1.5k
2k
500 1k 1.5k
1k
500
500
2m 5m 10m 15m 20m 25m 30m 5m 10m 15m 20m 25m 30m 5m 10m 15m 20m 25m 30m
Number of triples in total Number of triples in total Number of triples in total
Figure 4. Update throughput results.
ducible. The systems, guidelines for setting up and scripts with 15 datasets and with different sizes and 15 set of queries
for running the experiments are publicly available on Github according to each dataset. The scale ranged from 2 million
at https://github.com/anhlt18vn/iot-2018, experiments triples to 30 million triples.
on other datasets like LinkedSensorData also can be found in Result and Discussion:
this repository. Figure 4 illustrates the results of the Exp1, in which we
Devices
measured the update throughput of Virtuoso, Jena TDB and
Pi0 BBB GII RDF4Led on GII, Pi and BBB. The results show that com-
CPU ARM 11, 1.0 GHz, ARM A8, 1.0 GHz, Quark, 0.4 GHz, 1- paring to Jena TDB and Virtuoso, RDF4Led can store larger
1-core 1-core core datasets and enable much higher updating throughputs. For
RAM 512 MB 512 MB 256 MB
Storage Transcend MicroSD 16GB class 10 (40MB/s) instance, on GII (see Figure 4 a), RDF4Led was able to store
OS Raspbian Debian 7.0 Yocto up to 15 million triple, whereas, Virtuoso could store only
Table 1. Hardware Configurations one-third of that size (5 million triples), and Jena TDB could
store only one-seventh of that size (2 million triples). Due to
Experimental Setup: the similar hardware settings, the scaling behaviours of these
The WatDiv benchmark provides a tool to generate sample RDF engines on Pi0 (see Figure 4 b) and on BBB (see Fig-
datasets with different scales and sample SPARQL queries ure 4 c) were similar. On both of the devices, Virtuoso and
with different complexity. In the benchmark, the complex- Jena TDB could store 20 million triples and 10 million triples
ity of a SPARQL query is classified by the structure of its respectively; meanwhile, RDF4Led was able to store the en-
graph pattern. The queries are generated with a set of query tire dataset of 30 million triples.
templates which form the query graph pattern with differ- In the experiment, the update throughput of three engines de-
ent shapes(e.g. linear(L), star(S) or snowflake(F)), number of creased when the size of their storage increased. Among three
triple patterns, join vertex types. Based on this, we conducted RDF engines, RDF4Led had the highest inserting through-
three following experiments: put. It performed updating operations 2-3 times faster
Exp1 - Update throughput: In the first experiment, we tested than Virtuoso. Even when its storage size grew nearly 30
how much new data the system can incrementally update million triples, RDF4Led’s speed still remained 200-250
with a particular underlying RDF store corresponding to each triples/second. Meanwhile, Jena TDB was the weakest per-
hardware configuration. We simulated the process of data former. In all cases, it only could insert with speed less than
growing by gradually adding more data to the system. We 20 triples per second, even when its storage size was just over
measured the throughput of inserting data (triples/second) un- 1 or 2 millions.
til the system crashed or until the throughput was below 10 The Exp1 has shown how the lack of memory on such devices
triples/second (whichever happened first). A dataset of 30 negatively influence the scale of PC-based RDF engines like
million triples was used in this test. Jena TDB and Virtuoso. In the Exp1, the two engines stopped
Exp2 - Query evaluation: In the second experiment, we tested because of the out of memory error. RDFLed could insert
the query response time of each engine. On each device, we more data as it has smaller memory footprint and requires less
generated a dataset that all the engines can handle. For each memory to maintain the indexes. Furthermore, comparing to
dataset, we generated 100 queries for each query template other engines, RDF inserted data faster as our flash-aware in-
provided by the benchmark. We used 15 query templates, 5 dex structure and writing strategy are better compatible with
from each set of F, L, S. The total queries generated for each the flash I/O behaviours. Meanwhile, Jena TDB employs B+
dataset was 1500. We recorded the maximum, the minimum tree to indexing RDF data in the RDF storage. The critical
and the average time that the engines answer each type of low throughput of Jena TDB once confirms the negative in-
queries. fluence of flash I/O behaviours on the write performance of
Exp3 - Memory consumption: In the third experiment, we such disk-based indexing technique.
measured the memory consumption of three system configu- In Exp2, we compared the query response time of RDF4Led
rations while performing the insertion and query. The exper- against that of Virtuoso and Jena TDB. Due to the limitation
imental application ran the different queries repeatedly and of the data size that Jena TDB can handle, we used a dataset
recorded the maximum memory heap that the operating sys- of 2 million triples for the query test on GII and a dataset
tem allocated for it. Note that the memory consumption is of 10 million triples for the test on BBB and Pi0. The re-
device-independent. To evaluate the impact of the data size sults of these tests are shown in Figure 5. We also conducted
on memory consumption, we conducted the test on the BBB query tests on a dataset of 20 million triples comparing only
(a) Query response time on Intel Galileo Gen II (b) Query response time on Raspberry Pi Zero (c) Query response time on Beagle Bone Black
100
VIRTUOSO VIRTUOSO VIRTUOSO
second (in log scale)

RDF4Led RDF4Led RDF4Led
JENA TDB JENA TDB JENA TDB
10 100 100
1 10 10
1 1
F1 F2 F3 F4 F5 L1 L2 L3 L4 L5 S1 S2 S3 S4 S5 F1 F2 F3 F4 F5 L1 L2 L3 L4 L5 S1 S2 S3 S4 S5 F1 F2 F3 F4 F5 L1 L2 L3 L4 L5 S1 S2 S3 S4 S5
Queries Queries Queries
Figure 5. Query response time of Virtuoso, Jena TDB and RDF4Led on Pi Gen II, Pi Zero, Beagle Bone
RDF4Led and Virtuoso on Pi0 and BBB. Due to the similar (a) Memory consumption of inserting of VIRTUOSO, JENA TDB, RDF4Led
comparison results and the lack of space, we only present the
Memory consumption (MB)

Memory boundary of the PI0 and BBB
test result on BBB (see Figure 6). 350
330
Virtuoso on PI0 and BBB

280 Jena TDB
(d) Query response time on Beagle Bone Black (20 mil)
Memory boundary of the GII
10 210
200
VIRTUOSO Virtuoso on GII
RDF4Led
RDF4Led
85
14
1 2m 5m 10m 15m 20m 25m 30m

Number of triples in total
(b) Memory consumption for querying of VIRTUOSO, JENA TDB, RDF4Led
Memory consumption (MB)

Memory boundary of the PI0 and BBB
350
330
Virtuoso on PI0 and BBB

F1 F2 F3 F4 F5 L1 L2 L3 L4 L5 S1 S2 S3 S4 S5 280 Jena TDB
Queries
210
Memory boundary of the GII
Figure 6. Query response time of Virtuoso and RDF4Led on Beagle Bone 200
Virtuoso on GII
with the dataset of 20 million triples 145
RDF4Led
On all the devices, RDF4Led answered all the queries con- 85
siderably faster than Jena TDB did. Both RDF4Led and Je- 14
naTDB follow the nested execution model to compute the 2m 5m 10m 15m 20m 25m 30m
multiple joins between RDF triples that match triple patterns. Number of triples in total
However, Jena TDB was implemented with iterator pattern, Figure 7. Memory consumption of Virtuoso, Jena TDB and RDF4Led
while RDF4Led followed the visitor pattern. In general, in throughput test(a) and query evaluation test(b)
both algorithms execute lookup operations and index scan
operations to extract the compatible triples from the dataset.
The performance of these algorithms is mainly influenced by maximum available memory for the applications is always
the performance of the lookup and index scan operations on lower than the size of RAM. There is only 210 MB avail-
the indexes. The better performance of RDF4Led against able memory on the GII and nearly 350MB on P0 and BBB.
Jena TDB explains that our lightweight index structure helps On three devices, the memory consumption of Jena TDB and
RDF4Led outperform the B+ tree implemented in Jena TDB. RDF4Led gradually increased according to the size of the
With the same dataset and on the same device, RDF4Led only storage while the buffering memory of Virtuoso was statically
answered the queries generated from templates F2 and S1 set. In the throughput test (see Figure 7a), the memory con-
faster than Virtuoso does. These queries contain the graph sumption of Jena TDB rose up to 210 MB and 350 MB after
patterns which have more than 6 triple patterns that form a inserting 2 million, 10 million respectively. Meanwhile, Vir-
star shape. In other cases, RDF4Led was slower than Vir- tuoso required more than 210 MB to add more than 5 million
tuoso as it did not aggressively pre-allocate a fixed amount triples and 330 for 20 million triples. That explains the lower
of memory (2-3 times more) for sophisticated optimisation scale of Jena TDB and Virtuoso comparing to our engine. On
algorithms. We see this as an option for improving query the other hand, RDF4Led consumed 85 MB of memory at
performance in our future work. However, in overall, at this most even when the storage goes up to 30 million triples. In
current stage of the engine, RDF4Led can deliver reasonably the query evaluating tests, Jena TDB and RDF4Led used less
good performance for up to 30 millions triples on this class of memory than they did in the update throughput tests. Even
devices, e.g. 5 seconds at maximum and 1 second in average. with the dataset of 30 million triples, RDFLed used only 80
This performance and scalability of RDF4Led can enable the MB. That is only a half of memory that Jena TDB used in
devices of this kind to handle approximately 1 million sensor the query test with 10 million triples and is one-third of the
observations or 6 months worth data of 10 weather stations in memory that the Virtuoso constantly occupies.
an active RDF graph [3]. CONCLUSION
Figure 7 reports the result of Exp3. Note that, a part of This paper presented RDF4Led, a tailored built RDF engine
the memory that is occupied by the operating system, the for lightweight edge devices. The engine is designed with the
RISC-style philosophy that aims to reduce the gap between 10. Garcia-Molina, H., Ullman, J. D., and Widom, J.
the resource limitation of the targeted devices and the scal- Database Systems: The Complete Book, 2 ed. 2008.
ability, robustness for an RDF engine. Considering the dis-
tinct nature of hardware of the devices, RDF4Led comes up 11. Graefe, G. Executing nested queries. In BTW 2003,
with a flash-aware storage structure and an indexing scheme Datenbanksysteme für Business, Technologie und Web
together with a low-memory-footprint join strategy. As a re- (2003).
sult, RDF4Led has the size significantly smaller than those 12. Graefe, G. The five-minute rule twenty years later, and
of generic RDF engines like Jena TDB and Virtuoso but can how flash memory changes the rules. DaMoN (2007).
handle 2-5 times as much its counterparts can in three types
of ARM boards. Moreover, RDF4Led only needs 10-30% 13. Hasemann, H., Kroller, A., and Pagel, M. The wiselib
memory that Jena TBD and Virtuoso consume to operate in tuplestore: A modular RDF database for the internet.
the same size of data. Even it can handle up to 30 million CoRR (2014).
triples with only approximately 85MB of memory, and it still 14. Ho, V., and Park, D.-J. A survey of the-state-of-the-art
can outperform its competitors in updating throughput and b-tree index on flash memory. International Journal of
significantly faster in answering queries to its Jena counter- Software Engineering and Its Applications (2016).
part, Jena TDB.
15. Jena tdb. https://jena.apache.org/documentation/tdb/.
As Virtuoso can deliver faster query processing time by pre-
allocating a fixed amount of memory which is 3 times more 16. Kiljander, J., D’elia, A., Morandi, F., Hyttinen, P.,
than what RDF4Led needs with much more complicated soft- Takalo-Mattila, J., Ylisaukko-Oja, A., Soininen, J. P.,
ware packages, 180MB v.s. 4MB of RDF4Led. Therefore, in and Cinotti, T. S. Semantic interoperability architecture
future work on improving query processing time, we will be for pervasive computing and internet of things. IEEE
investigating to add an adaptive caching module to RDF4Led Access (2014).
to enable dynamically caching materialised intermediate re-
17. Le-Phuoc, D., Le-Tuan, A., Schiele, G., and Hauswirth,
sults according to a memory threshold provided. Another
M. Querying heterogeneous personal information on the
attractive feature that the RDF4Led can support in the next
go. ISWC (2014).
stage is disk-based reasoning powered by our adaptive join
strategy and high-writing throughput storage structure. 18. Le-Tuan, A. Linked data processing for embedded
devices. In Proceedings of the Doctoral Consortium at
ACKNOWLEDGEMENTS the 15th International Semantic Web Conference (2016).
This publication has emanated from research supported in
part by a research grant from Irish Research Council under 19. µjena. http://poseidon.ws.dei.polimi.it/ca/?page id=59.
Grant Number GOIPG/2014/917 and Marie Skodowska- 20. Brin indexes. https://www.postgresql.org/docs/9.5/
Curie Programme H2020-MSCA-IF-2014 (SMARTER static/brin.html.
project) under Grant No. 661180.
21. Mobile rdf. http://www.hedenus.de/rdf/.
REFERENCES 22. Munir, A., Kansakar, P., and Khan, S. U. Ifciot:
1. Aluç, G., Hartig, O., Özsu, M. T., and Daudjee, K. Integrated fog cloud iot: A novel architectural paradigm
Diversified stress testing of rdf data management for the future internet of things. IEEE Consumer
systems. ISWC (2014). Electronics Magazine (2017).
2. Androjena. https://github.com/lencinhaus/androjena. 23. Neumann, T., and Weikum, G. Rdf-3x: A risc-style
3. Atemezing, G., Corcho, O., Garijo, D., Mora, J., Poveda engine for rdf. Proceeding in VLDB Endowment 2008.
Villalón, M., Rozas, P., Vila Suero, D., and Villazón 24. Nexus 7. https://www.asus.com/Tablets/Nexus 7/.
Terrazas, B. Transforming meteorological data into
linked data. Semantic Web journal (2012). 25. Owens, A. Using Low Latency Storage to Improve RDF
Store Performance. PhD thesis, University of
4. Avnur, R., and Hellerstein, J. M. Eddies: Continuously Southampton, 2011.
adaptive query processing. SIGMOD (2000).
26. Raspberry pi zero. https://www.raspberrypi.org/
5. Barnaghi, P., Wang, W., Henson, C., and Taylor, K. products/raspberry-pi-zero/.
Semantics for the internet of things: Early progress and
back to the future. Int. J. Semant. Web Inf. Syst. (2012). 27. Sakr, S., Wylot, M., Mutharaju, R., Phuoc, D. L., and
6. Bouganim, L., and Bonnet, P. uflip: Understanding flash Fundulaki, I. Linked Data - Storing, Querying, and
IO patterns. CoRR (2009). Reasoning. Springer, 2018.
7. Charpenay, V., Käbisch, S., and Kosch, H. µrdf store: 28. Satyanarayanan, M. The emergence of edge computing.
Towards extending the semantic web to embedded Computer (2017).
devices. ESWC (2017). 29. Thuluva, A. S., Bröring, A., Medagoda, G. P., Don, H.,
8. Chip pro: The smarter way to build smart things. Anicic, D., and Seeger, J. Recipes for iot applications.
https://getchip.com/pages/chip. IoT ’17 (2017).
9. Erling, O., and Mikhailov, I. Rdf support in the virtuoso 30. Wylot, M., Pont, J., Wisniewski, M., and
dbms. In Networked Knowledge - Networked Media Cudré-Mauroux, P. diplodocus[rdf]—short and long-tail
(2009). rdf analytics for massive webs of data. ISWC (2014).
View publication stats

RDF 4 Embedded

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RDF 4 Embedded

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

RDF4Led: an RDF engine for lightweight edge devices

Conference Paper · October 2018

Le Tuan Anh Marcin Wylot

SEE PROFILE SEE PROFILE

Computing Foundations for Semantic Stream Processing (COSMO) View project

The user has requested enhancement of the downloaded file.

Marcin Wylot Danh Le-Phuoc

reduce computational cost, our approach is to avoid caching

efficient use of memory.

write operation. capacityBlockA ; (iv) the last recently used

Number of triples per second

Number of triples per second

Figure 4. Update throughput results.

second (in log scale)

second (in log scale)

Memory consumption (MB)

Virtuoso on PI0 and BBB

1 2m 5m 10m 15m 20m 25m 30m

Memory consumption (MB)

Virtuoso on PI0 and BBB

View publication stats

You might also like