Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Data Stream Processing in Dynamic and Decentralized

Peer-to-Peer Networks
Timo Michelsen
University of Oldenburg, Department of Computer Science
Escherweg 2, 26129 Oldenburg, Germany
timo.michelsen@uni-oldenburg.de
supervised by H.-Jrgen Appelrath
expected graduation date: September 2015
ABSTRACT
Data stream management systems (DSMS) process data streams,
potentially innite amounts of data sent by active data sources.
Distributed DSMS use networks of interconnected machines to en-
hance the processing power. Typically, clusters of equal, non-au-
tonomous machines are used. However, in some applications, a
cluster of computers is not available, not feasible, their acquisition
costs are too high or they are too complex to deploy. An alterna-
tive would be to use a collection of notebooks, personal comput-
ers or smartphones, resulting in a network which only contains au-
tonomous and heterogeneous machines. This results in a dynamic
and decentralized network which has to be considered in distributed
data stream processing. In this paper, I present my PhD project for
developing and deploying a distributed DSMS that can be executed
in a Peer-to-Peer (P2P) network of autonomous and heterogeneous
peers. My approach addresses three main challenges: data source
management, continuous query distribution and distributed query
management. A prototypical implementation is already in place
and the evaluation is currently planned.
Categories and Subject Descriptors
H.2.4 [Database Management]: SystemsDistributed databases
Keywords
Data streams; Distributed systems; Peer Computing; P2P network
1. INTRODUCTION
Active data sources like sensors continuously produce data and
send them to processing systems. The resulting data streams are
potentially innite and cannot be persistently stored at once. Con-
ventional database management systems (DBMS) perform not very
well with these types of sources. It is inevitable that each incoming
data element has to be processed immediately or be stored tem-
porarily for later use (One-Pass-Paradigm). Data stream manage-
Permission to make digital or hard copies of all or part of this work for
personal or classroomuse is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from Permissions@acm.org.
http://dx.doi.org/10.1145/2602622.2602629.
ment systems (DSMS) use continuous queries which are installed
once, running innitely and producing streams of results [1].
In many cases, the continuous processing of data streams gen-
erates high system loads. The data frequency of one active data
source can be problematically high and a monolithic DSMS could
not be able to process it in reasonable time. Distributed DSMS
could solve this by sharing the processing task over a network of
machines. Each machine executes some processing steps and sends
the intermediary results to the next one. One application example
can be the analysis of sport games in real-time.
In professional basketball sports, games are recorded with many
cameras placed around the playing eld. After the game, a huge
amount of recorded data is processed, cut, rearranged and nally
analyzed. Coaches can replay and analyze the game and can make
tactical decisions to improve the team. Bigger sport teams with
professional environment and higher budget are able to buy and
deploy specialized software and hardware for this purpose. But
for smaller teams like amateur or hobby teams, this solution is not
feasible since the acquisition and maintenance costs for computer
clusters are too high according to their typically small budget.
Thus, a cheaper and more feasible solution should use existing
processing power of interconnected notebooks, private computers,
or smartphones. For instance, players of a hobby team can provide
their private machines and connect them (e.g., by wi). The de-
ployment of such a network could be easy and makes it possible
to attach additional notebooks at run-time during the game, mak-
ing the network mobile and exible. Sensors are used to record the
positions of the players and the ball (e. g. small cameras, posi-
tion sensors or smartphones in the players pockets). Nevertheless,
game data is a collection of data streams which should be processed
immediately by a distributed DSMS, since there are not enough re-
sources to process the huge amount of game data in reasonable time
afterwards. Typically, the coach decides which statistics should be
generated and shown. For instance, travel paths, counting the shots
or ball possession. The decision results in automatically gener-
ated continuous queries which are capable of processing the game
data and delivering the desired statistics in real-time. The results of
the queries are then shown immediately, making it possible for the
coach to react accordingly and give new orders to his players.
In a distributed DSMS, each participating private machine (e.g.,
notebook, personal computer or smartphone) executes one instance
of a DSMS overtaking a part of the data stream processing. How-
ever, such machines differ from machines in typical computer clus-
ters in two major aspects: (1) Each private machine is autonomous:
The user decides by himself what his machine is allowed to do in
the context of data stream processing and when the machine enters
1
SIGMOD14 PhD Symposium, June 22, 2014, Snowbird, Utah, USA.
Copyright 2014 ACM 978-1-4503-2924-8/14/06$15.00.
or leaves the network. (2) The private machines are heterogeneous:
they provide different amounts of system resources like memory,
processor power, or network bandwidth.
There is no distributed DSMS which can cope with these proper-
ties stemming from these environments (see section 4). The goal of
my PhD work is to develop and deploy a distributed DSMS which
is capable of being executed on a affordable and easy deployable
network of such private machines. With this, it should be possi-
ble to provide real-time data stream processing of continuous data
streams, e.g., to allow live sport analysis with smaller infrastruc-
tures. However, for higher usability, the distributed DSMS should
be used not only in sport analysis, but in other applications, too.
The structure of this paper is as follows: In Section 2, the con-
straints and goals of my PhD work are described in more detail.
Section 3 outlines the approach and concepts used to fulll these
goals. Additionally, it describes the actual state of my implemen-
tation and the planned evaluation. Section 4 describes the related
DSMS. Section 5 concludes this paper.
2. CONSTRAINTS AND GOALS
The underlying network architecture is a Peer-to-Peer (P2P) net-
work. It represents a network of autonomous and heterogeneous
machines, called Peers [2]. Each peer executes one instance of a
DSMS. Due to the distributed P2P approach, the following benets
are expected:
Higher performance A P2P network of DSMS can process more
continuous queries at the same time compared to single ma-
chines. Additionally, more active data sources with higher
data rates are supported.
Higher scalability Continuous queries are decomposable. There-
fore, additional peers can participate on data stream process-
ing by taking parts of already running queries if the overall
system load rises.
Higher reliability Failures of peers can be compensated without
interrupting the data stream processing. It is imaginable to
replicate continuous queries and execute them on different
peers. Additionally, queries could be transferred across the
network to balance the overall system load.
Hardware reuse Already available processing power provided by
the users can be utilized. No additional acquisition and main-
tenance costs are required.
Easy deployment New machines only have to install a DSMS in-
stance and have to be in the same network. During run-time,
other machines can be added on demand without interrupting
the data stream processing.
With this, a distributed DSMS with higher performance and usabil-
ity compared to a monolithic DSMS can be expected. However, us-
ing a P2P network also comes with some challenges which I have
to consider during my work:
Autonomy Peers cannot be forced to do a certain task or behave
in a specic manner.
Heterogeneity Each peer has its own hardware resources which
it is willing to provide (hardware heterogeneity). Its DSMS
could contain different component combinations, e.g., addi-
tional plugins or special processing operators (software het-
erogeneity).
Dynamic network Since all peers are autonomous, each of them
can enter or leave the network at any time.
Symmetry Each peer is server and client at the same time.
Because of the symmetry, each peer can receive new continuous
queries and participate on data stream processing. Autonomy re-
sults in the situation that a peer cannot make any assumptions about
the behavior and the functionality of other peers. Since a peer can
fail (or leave the network) at any time, queries must be replicated
to guarantee a data stream processing without interruptions. Addi-
tionally, no peer has the overview of the entire network, not know-
ing which peers are available. It forces the peers to solve their tasks
only locally with its neighboring peers. Therefore, a decentralized
solution is preferred to avoid single-point-of-failures. To sum up, I
want to answer the following research question:
In consideration for autonomy, heterogeneity, dynamic
network and symmetry, how can a P2P network be used to
reach distributed data stream processing with higher
performance, scalability and availability in relation to
monolithic DSMS?
I assume that each participating peer is cooperative and does not
damage the data stream processing on purpose. Therefore, I do not
consider security in my work. Additionally, privacy concerns like
user management or data encryption are not included.
3. APPROACH
Based on the typical use cases of DSMS, I divided the functional-
ity of a typical DSMS into three main challenges, each containing
several steps (see gure 1): data source management, continuous
query distribution and distributed query management. The follow-
ing sections describe each step in more detail.
Distributed query
management
Continuous query
distribution
Data source
management
Load balancing
Allocation
Modification
Partitioning
Integration
Distribution
Declaration
Recovery
Control
Figure 1: Conceptional overview of the distributed DSMS.
3.1 Data source management
Before a continuous query can be executed, the used active data
sources must be announced in the network. First, the data sources
are declared at one peer by providing connection details and a de-
scription about the data streams. For example, the data format and
a description how to connect to the sensors used during a basketball
game must be given. Then, the data source information can be dis-
tributed inside the P2P network. This must be triggered manually
due to the autonomy of the peer (e.g., by the user). A peer which
receives the information can decide to integrate the data source into
its own DSMS. This step must also be explicitly triggered.
3.2 Continuous query distribution
After the desired data sources are published, a given continuous
query can be distributed in the P2P network. Before the distribution
2
begins, the declared query is parsed and transformed into an op-
erator graph. In general, an operator graph is an acyclic, directed
graph of operators. Each operator represents a processing step (like
selection, aggregation, join), sending the results to the next opera-
tor [1]. I divided the distribution of such an operator graph into
three steps: partitioning, modication and allocation. Each step al-
lows multiple strategies (e.g., specied in the continuous query).
An schematic example is shown in gure 2.
The rst step is partitioning. There, the given operator graph
is divided into several query parts. A query part describes which
operators should be executed at the same peer. This step is needed
since there are too many possibilities to partition an operator graph
and it is not feasible to check each of them.
The second step is modication and typically contains fragmen-
tation and replication. In replication, each query part can be dupli-
cated and connected with each other. A special operator at the end
of the replicas merges the duplicated result streams to one single
data stream (omitting duplicates). Using fragmentation, a query
part is also duplicated, but a fragment-operator is placed before
them. Depending on the selected fragmentation-strategy, the op-
erator splits the data stream into several fragments. An union- or
join-operator after the duplicated query parts merges the fragments
back to a single result stream.
The third step is allocation. This step decides which part is ex-
ecuted on which peer. To consider autonomy and heterogeneity,
a contract-net based approach is used; however, an economic ap-
proach is also feasible. The distributing peer places an auction in
the network for each query part. If a peer is interested in executing
the query part, it will generate and send a bid. Currently, the bid
depends on the estimated memory-, processor- and network-usage
in relation to the total amount of the provided system resources of
that peer. If a peer has already a high system load, its bid will be
lower than the bid from a peer with more free system resources. In
most cases, the peer with the best bid will execute the correspond-
ing query part. However, it is possible that two successive query
parts are assigned to two peers which have a bad connection qual-
ity between each other (e.g., very high latencies due to the network
structure). To avoid such situations, the current latencies between
the peers must be estimated. The estimations are determined with
distributed network coordinates mentioned in [3]: each peer tries to
determine its position in a 3-dimensional latency cost space only by
communicating with its neighbors. The distance between two peers
in this cost space is the estimated latency. Finally, I use a modied
version of the operator placement strategy mentioned in [4] to -
nally decide, which query part is executed on which peer while
minimizing the latencies and maximizing the bids.
3.3 Distributed query management
Distributed query management considers the lifetime of a run-
ning continuous query. First of all, the query can be stopped, re-
started or removed at any time. Additionally, query management
considers transferring of queries from peer to peer due to network
and load dynamics. Because of peer failures, copies and fragments
of continuous queries could be lost. Recovery considers the fact
that these parts of queries must be regenerated over time. Load bal-
ancing is needed because the system load at one peer can change
over time. Then, a peer must be able to transfer running queries to
other peers to reduce the system load.
3.4 Current implementation status
My approach builds upon an existing framework for construct-
ing and maintaining P2P networks, the JXTA-framework [5]. If
two JXTA-applications are in the same network, they will automat-
ically be connected. JXTA provides the interfaces and utilities to
facilitate the communication between peers [6]. In addition, I do
need to implement the typical tasks of data stream processing like
query parser, query scheduler or graphical user interfaces. There-
fore, I will extend Odysseus, an already existing framework for
developing and deploying DSMS for different applications [7]. Its
architecture consists of easily extensible bundles, each of them en-
capsulates one specic function. Odysseus provides all basic func-
tions for data stream processing and makes it easy for me to add
new features. Each peer will execute an instance of Odysseus con-
taining different combinations of bundles and are interconnected
through the P2P network organized by JXTA (see gure 3).
Odysseus
JXTA
Query
execution
Optional
bundles
P
2
P

n
e
t
w
o
r
k
User/
Application
P2P bundles
Peer
Figure 3: Architecture of one peer using Odysseus and JXTA.
The rst goal is to develop a prototype called OdysseusP2P: ad-
ditional bundles for Odysseus, which make it possible to execute
Odysseus in a P2P network described above. Currently, following
features are implemented in OdysseusP2P:
Data source distribution Each user can declare a data source and
export it into the P2P network. Each user from other peers
can import the newly discovered data source into its local
DSMS to use it for himself.
Query distribution Auser can specify to distribute his continuous
query inside the P2P network. Alternatively, Odysseus can
generate continuous queries automatically. The query will
be partitioned and distributed to peers which are willing to
process (a part of) the query.
Distributed query execution The user can stop, restart and re-
move the continuous query at any time. Currently, this can
only be done at the same peer who received and distributed
the query at the rst place (representing the user).
Data stream fragmentation Data streams can be fragmented and
each fragment can be processed in parallel on different peers.
The results are merged. Currently, only horizontal fragmen-
tation is supported. However, vertical and hybrid fragmenta-
tion are planned.
Query replication I assume that the used processing operators are
deterministic and produce the same results given the same
input data. Therefore, a query can be placed and executed on
different peers simultaneously, running with the same data
streams.
With these features, a running prototype of my concept is in place
and usable. Features which are currently not implemented are load
balancing and recovery.
3.5 Planned evaluation
The planned application is to use OdysseusP2P for the already
mentioned live sport analysis in basketball to answer the research
3
Operator
Operator
Sink
Source
Operator
Merge
Operator
Merge
Operator
Sink
Source
Operator
Merge
Operator
Operator
Sink
Source
Operator
Merge
Operator
Merge
Operator
Sink
Source
Operator
Merge
PeerA
PeerB
PeerC
PeerD
PeerA
P
a
r
t
i
t
i
o
n
i
n
g
M
o
d
i
f
i
c
a
t
i
o
n
A
l
l
o
c
a
t
i
o
n
Figure 2: Overview of the steps of continuous query distribution using replication in the modication step.
question. To show the performance of OdysseusP2P, latency and
data throughput are measured. Scalability can be estimated by log-
ging the performance with increasing and decreasing amounts of
peers and queries. Availability can be shown by removing and
adding notebooks in the network, logging the reactions of the peers.
Each measurement will be compared with the measurements de-
livered by the monolithic version of Odysseus to conclude which
benets and drawbacks are given by using OdysseusP2P. Since the
application is in its planning phase, no evaluation results can be
shown here. But they should show that in consideration for au-
tonomy, heterogeneity, dynamic network and symmetry, a P2P net-
work can be used to create, deploy and execute distributed contin-
uous data stream processing which is generally feasible and has a
higher performance, scalability and availability than a monolithic
DSMS.
4. RELATED WORK
Tapestry [8], Alert [9], NiagaraCQ [10] and OpenCQ [11] are us-
ing conventional DBMS as the underlying processing engine (e.g.,
trigger). STREAM [12], PIPES [13] and Stream Mill [14] are
DSMS which can only be executed on a single machine, not allow-
ing a distributed execution in general. DSMS like Gigascope [15],
Global Sensor Networks [16] and SStreaMWare [17] are focusing
on one application allowing optimizations or simplications which
are not possible in general-purpose DSMS aimed in my PhD work.
Borealis [18] is a distributed DSMS which combines Aurora, a
DSMS, and Medusa, a framework for network communication [19].
To my best knowledge, it uses clusters of equal computers and the
entire network is always known which is not given in my work.
StreamGlobe [20] combines Peer Computing with Grid Comput-
ing using the Open Grid Service Architecture (OGSA). The data
stream processing is executed by so-called super peers positioned
in a grid, while thin peers are representing data sources and result
sinks. Thin peers always connect to one super peer. The autonomy
of a peer is not given in StreamGlobe (at least for super peers).
TelegraphCQ [21] is using a Client-Server architecture to divide
its front- and back-end. A further distribution is not given.
S4 (Simple Scalable Streaming System [22]) is developed by Ya-
hoo and focuses on high scalability and availability using the map-
reduce model [23]. It also uses a computer cluster of identical non-
autonomous machines. The network is not dynamic: new machines
cannot be added or removed at run-time.
System S [24] also focuses on high scalability in huge computer
clusters of identical, non-autonomous machines. Queries are com-
piled to processing code which is then executed in a special stream
processing core (SPC) [25]. Single machines in the cluster can ex-
ecute parts of theses queries or organize the general execution (like
resource management).
StreamCloud [26] uses cloud infrastructures for distribution. It
focuses on paralleling continuous queries and is primarily used
in private-clouds with computer clusters. The network is entirely
known and the machines are not autonomous or heterogeneous.
Storm [27] is a real-time stream processing engine, widely used
by Twitter and Groupon and very similar to Odysseus. Storm uses
computer cluster which have a central entity for coordinating so-
called workernodes.
Stratosphere [28] also gears to the map-reduce model. It uses
cluster of non-autonomous computers in the context of cloud com-
puting. Adding and removing machines during run-time are not
considered. However, Stratosphere supports heterogeneity.
To sum up, there is no distributed DSMS which is capable of us-
ing a network of autonomous and heterogeneous private machines
for data stream processing.
5. SUMMARY
Data stream management systems (DSMS) are capable of pro-
cessing data streams, potentially innite amounts of data sent by
active data sources. However, data stream processing can overbur-
den monolithic DSMS. Instead, distributed DSMS use a network
of machines, mostly typical computer cluster which are sufcient
for many applications. But in some cases, a computer cluster is
not applicable or feasible. A collection of private machines, result-
ing in a highly dynamic network which only contains autonomous
and heterogeneous participants could be a solution. Currently, no
distributed DSMS is capable of using such a network.
The goal of my PhD work is to develop a distributed DSMS in a
P2P network which contains only autonomous and heterogeneous
peers. I have to consider autonomy, heterogeneity, dynamic net-
work and symmetry in my concepts which is currently divided into
three main challenges: data source management, continuous query
distribution and distributed query management. I use Odysseus as
the underlying DSMS and JXTA for the P2P network communica-
tion. I already have a rst prototypical implementation which are
usable(called OdysseusP2P), but currently without practical evalu-
ation results, yet. The planned application in my evaluation is to
use OdysseusP2P for live sport analysis in basketball.
6. REFERENCES
[1] J. Krmer, Continuous queries over data streams - semantics
and implementation, University of Marburg, 2007.
[2] Q. H. Vu, M. Lupu, and B. C. Ooi, Peer-to-Peer Computing:
Principles and Applications, 1st ed. Springer Publishing
Company, Incorporated, 2009.
4
[3] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris,
Practical, distributed network coordinates, SIGCOMM
Comput. Commun. Rev., vol. 34, no. 1, Jan. 2004.
[4] P. R. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos,
M. Welsh, and M. I. Seltzer, Network-aware operator
placement for stream-processing systems, in ICDE, 2006.
[5] L. Gong, S. Oaks, and B. Traversat, JXTA in a Nutshell - A
Desktop Quick Reference. OReilly, 2002.
[6] S. Microsystems, Jxta java standard edition v2.5 -
programmers guide, 2010.
[7] H.-J. Appelrath, D. Geesen, M. Grawunder, T. Michelsen,
and D. Nicklas, Odysseus: a highly customizable
framework for creating efcient event stream management
systems, ser. DEBS 12. ACM, 2012, pp. 367368.
[8] D. Terry, D. Goldberg, D. Nichols, and B. Oki, Continuous
queries over append-only databases. ACM, 1992, vol. 21.
[9] U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan, Alert:
An architecture for transforming a passive dbms into an
active dbms, in Proceedings of the 17th International
Conference on Very Large Data Bases, ser. VLDB 91. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
1991, pp. 469478.
[10] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang, Niagaracq: a
scalable continuous query system for internet databases, in
Proceedings of the 2000 ACM SIGMOD international
conference on Management of data, ser. SIGMOD 00.
New York, NY, USA: ACM, 2000, pp. 379390.
[11] L. Liu, C. Pu, and W. Tang, Continual queries for internet
scale event-driven information delivery, Knowledge and
Data Engineering, IEEE Transactions on, vol. 11, no. 4, pp.
610628, 1999.
[12] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar,
K. Ito, R. Motwani, U. Srivastava, and J. Widom, Stream:
The stanford data stream management system, Stanford
InfoLab, Technical Report 2004-20, 2004.
[13] M. Cammert, C. Heinz, J. Krdmer, A. Markowetz, and
B. Seeger, Pipes: A multi-threaded publish-subscribe
architecture for continuous queries over streaming data
sources, Tech. Rep., 2003.
[14] H. Thakkar, B. Mozafari, and C. Zaniolo, Designing an
inductive data stream management system: the stream mill
experience, in Proceedings of the 2nd international
workshop on Scalable stream processing system, ser. SSPS
08. New York, NY, USA: ACM, 2008, pp. 7988.
[15] C. Cranor, T. Johnson, and O. Spataschek, Gigascope: a
stream database for network applications, in In SIGMOD,
2003, pp. 647651.
[16] K. Aberer, M. Hauswirth, and A. Salehi, The global sensor
networks middleware for efcient and exible deployment
and interconnection of sensor networks, Ecole
Polytechnique Fdrale de Lausanne (EPFL), Tech. Rep.
LSIR-REPORT-2006-006, 2006.
[17] L. Gurgen, C. Roncancio, C. Labb, A. Bottaro, and
V. Olive, Sstreamware: a service oriented middleware for
heterogeneous sensor data management, in Proceedings of
the 5th international conference on Pervasive services, ser.
ICPS 08. New York, NY, USA: ACM, 2008, pp. 121130.
[18] Y. Ahmad, B. Berg, U. Cetintemel, M. Humphrey, J.-H.
Hwang, A. Jhingran, A. Maskey, O. Papaemmanouil,
A. Rasin, N. Tatbul, W. Xing, Y. Xing, and S. Zdonik,
Distributed operation in the borealis stream processing
engine, in Proceedings of the 2005 ACM SIGMOD
international conference on Management of data, ser.
SIGMOD 05. ACM, 2005, pp. 882884.
[19] S. B. Zdonik, M. Stonebraker, M. Cherniack, U. etintemel,
M. Balazinska, and H. Balakrishnan, The aurora and
medusa projects, IEEE Data Eng. Bull., vol. 26, no. 1, pp.
310, 2003.
[20] R. Kuntschke, B. Stegmaier, A. Kemper, and A. Reiser,
Streamglobe: processing and sharing data streams in
grid-based p2p infrastructures, in Proceedings of the 31st
international conference on Very large data bases, ser.
VLDB 05. VLDB Endowment, 2005, pp. 12591262.
[21] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J.
Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy,
S. R. Madden, F. Reiss, and M. A. Shah, Telegraphcq:
continuous dataow processing, in Proceedings of the 2003
ACM SIGMOD international conference on Management of
data, ser. SIGMOD 03. ACM, 2003, pp. 668668.
[22] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, S4:
Distributed stream computing platform, in Proceedings of
the 2010 IEEE International Conference on Data Mining
Workshops, ser. ICDMW 10. Washington, DC, USA:
IEEE Computer Society, 2010, pp. 170177.
[23] J. Dean and S. Ghemawat, Mapreduce: simplied data
processing on large clusters, Commun. ACM, vol. 51, no. 1,
pp. 107113, Jan. 2008.
[24] B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo,
Spade: the system s declarative stream processing engine,
in Proceedings of the 2008 ACM SIGMOD international
conference on Management of data, ser. SIGMOD 08.
New York, NY, USA: ACM, 2008, pp. 11231134.
[25] L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King,
P. Selo, Y. Park, and C. Venkatramani, Spc: a distributed,
scalable platform for data mining, in Proceedings of the 4th
international workshop on Data mining standards, services
and platforms, ser. DMSSP 06. New York, NY, USA:
ACM, 2006, pp. 2737.
[26] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, and
P. Valduriez, Streamcloud: A large scale data streaming
system, in Proceedings of the 2010 IEEE 30th International
Conference on Distributed Computing Systems, ser. ICDCS
10. IEEE Computer Society, 2010, pp. 126137.
[27] Storm project page (last visit: Feb 07 2014),
http://storm-project.net/.
[28] M. Leich, J. Adamek, M. Schubotz, A. Heise,
A. Rheinlnder, and V. Markl, Applying stratosphere for big
data analytics. in BTW, 2013, pp. 507510.
5

You might also like