Professional Documents
Culture Documents
Processing Distributed Internet of Things Data in Clouds: Blue Skies
Processing Distributed Internet of Things Data in Clouds: Blue Skies
Processing Distributed
Internet of Things Data
in Clouds
J A N U A R Y/ F E B R U A R Y 2 0 1 5 I EEE CLO U D CO M P U T I N G 77
Authorized licensed use limited to: Amrutvahini College of Engineering. Downloaded on August 23,2022 at 05:53:46 UTC from IEEE Xplore. Restrictions apply.
BLUE SKIES
and Lightweight Middleware for Grid consistency, isolation, durability) PigMix, fueled by the need to analyze
Computing (gLite). transactional properties. the performance of different big data
technologies. These benchmark suites
Massive Data Processing Models and Accordingly, several research efforts model workloads for stress testing one
Framework have integrated different cloud data stor- or more categories of big data processing
The MapReduce paradigm has been age services by providing a transparent technologies. Among these frameworks,
widely used for large-scale data-intensive interface. Examples are Simple Cloud BigDataBench is most comprehensive
computing within datacenters due to API (http://simplecloud.com), Simple because it constitutes workload mod-
its low cost, massive data parallel- API for Grid Applications (SAGA) with els for NoSQL, database management
ism, and fault-tolerant processing. The an SRM interface, and some uniform systems (DBMSs), SPEs (Stream Pro-
most popular implementation, Ha- services such as PDC@KTH’s proxy cessing Engines), and batch processing
doop framework allows applications service12 and Open Grid Services Ar- frameworks. Primarily, BigDataBench
to run on large clusters and provides chitecture Data Access and Integration targets the search engine, social network,
transparent reliability and data trans- (OGSA-DAI, www.ogsadai.org.uk) Web and e-commerce application domains.
fer. Other implementations include services. In addition, a number of third- However, there are limited bench-
Compute Unified Device Architecture party providers (DropBox, Mozy, and so marks and application kernels for het-
(CUDA),5 field programmable gate array on) simplify online cloud storage access. erogeneous datacenters. In fact, there’s
(FPGA),6 virtual machines,7 as well as no agreement on available performance
streaming runtime,8 grid,8 and oppor- Data-Intensive Workflow Computing benchmarking for executing large-scale
tunistic environment.9 Apache Hadoop Typical data-intensive scientific work- IoT applications across distributed data-
on Demand (HOD) provides virtual flow frameworks include Pegasus, centers. Actually, the lack of intercent-
Hadoop clusters over a large physical Kepler, Taverna, Triana, Swift, and er benchmarks and standards should
cluster based on Torque. MyHadoop Trident. Various business workflow be the key research agenda for the fu-
provides on-demand Hadoop instances technologies have also been applied to ture. Currently, the National Institute
on high-performance computing (HPC) data-intensive workflow systems. Ex- of Standards and Technology (NIST),
resources via traditional schedulers.10 amples include service orchestration Open Grid Forum (OGF), Distributed
Other MapReduce-like projects in- with Business Process Execution Lan- Management Task Force (DMTF) Cloud
clude Twister (www.iterativemapreduce guage (BPEL) and YAWL (Yet Another working group, Cloud Security Alliance,
.org), Sector/Spear (http://sector Workflow Language),13 service choreog- and Cloud Standards Customer Council
.sourceforge.net), and All-pairs.11 raphy with Web Services Choreography are all working on cloud standards.
Description Language (WS-CDL, www
Data Management Service across .w3.org/TR/ws-cdl-10), and service- Research Issues
Datacenters oriented architectures.14 Big IoT data processing across multiple
The following four storage service ab- distributed datacenters remains chal-
stractions supported by cloud providers Benchmark, Application Kernels, lenging, mainly because of technical
differ in how they store, index, and ex- Standards, and Recommendations issues related to basic service stacks
ecute queries: Several benchmarks and application for datacenter computing infrastruc-
kernels have been developed, including tures, massive data processing models,
• Binary Large Object (Blob) for un- Graph 500 (www.graph500.org), Hadoop trusted data management services,
structured data such as Amazon S3 Sort (http://wiki.apache.org/hadoop/Sort) data-intensive workflow computing, and
and Azure Blob; and Sort benchmark (http://sortbenchmark benchmarks.
• key-value storage such as HBase, .org), MalStone,15 Yahoo Cloud Serving
MongoDB, and BigTable; Benchmark (http://research.yahoo.com/ Service Stacks in a Multidatacenter
• message queuing systems such as Web_Information_Management/YCSB), Computing Infrastructure
SQS and Apache Kafka; and Google cluster workload (http://code Despite significant advances, public
• relational database management sys- .google.com/p/googleclusterdata), TPC-H cloud computing technologies are still
tems such as Oracle and MySQL, benchmarks (www.tpc.org/tpch), Big- technically challenging for serving
which support ACID (atomicity, DataBench, BigBench, Hibench, and large-scale IoT applications across data-
J A N U A R Y/ F E B R U A R Y 2 0 1 5 I EEE CLO U D CO M P U T I N G 79
Authorized licensed use limited to: Amrutvahini College of Engineering. Downloaded on August 23,2022 at 05:53:46 UTC from IEEE Xplore. Restrictions apply.
BLUE SKIES
uploads/GITR-2014-Cisco-Chapter sources, tech. report, San Diego Ann. Conf. Internet Measurement,
.pdf. Supercomputer Center, Univ. of Cal- 2010, pp. 1–14.
2. “Bringing Big Data to the Enter- ifornia, San Diego, 2011. 19. A. Ruiz-Alvarez and M. Humphrey,
prise,” IBM, http://www-01.ibm 11. C. Moretti et al., “All-Pairs: An Ab- “An Automated Approach to Cloud
.com/software/in/data/bigdata. straction for Data-Intensive Cloud Storage Service Selection,” Proc.
3. J. Diaz et al., “FutureGrid Image Computing,” Proc. IEEE Int’l Symp. 2nd Int’l Workshop Scientific Cloud
Repository: A Generic Catalog and Parallel and Distributed Processing Computing (ScienceCloud 11),
Storage System for Heterogeneous (IPDPS 08), 2008, pp. 1–11. 2011, pp. 39–48.
Virtual Machine Images,” Proc. 3rd 12. I. Livenson and E. Laure, “Towards 20. L.M. Vaquero et al., “Dynamically
IEEE Int’l Conf. Cloud Computing Transparent Integration of Hetero- Scaling Applications in the Cloud,”
Technology and Science (CloudCom geneous Cloud Storage Platforms,” SIGCOMM Computer Comm. Rev.,
11), 2011, pp. 560–564. Proc. 4th Int’l Workshop Data- vol. 41, no. 1, 2011, pp. 45–52.
4. P. Radzikowski, “SAN vs DAS: A Intensive Distributed Computing 21. Z. Shen et al., “CloudScale: Elastic
Cost Analysis of Storage in the En- (DIDC 11), 2011, pp. 27–34. Resource Sharing for Multi-Tenant
terprise,” Capitalhead, 2008; http:// 13. C. Ouyang, M. Adams, and A.H.M. Cloud Systems,” Proc. 2011 ACM
capitalhead.com/articles/san-vs-das Hofstede, “Yet Another Workflow Symp. Cloud Computing, 2011.
-a-cost-analysis-of-storage-in-the Language: Concepts, Tool Support, 22. K. Alhamazani et al., “Cloud Moni-
-enterprise.aspx. and Application,” Handbook of Re- toring for Optimizing the QoS of
5. B. He et al., “Mars: A MapReduce search on Business Process Modeling, Hosted Applications,” Proc. 4th IEEE
Framework on Graphics Processors,” IGI Global, 2009, pp. 92–121. Int’l Conf. Cloud Computing Technol-
Proc. 17th Int’l Conf. Parallel Archi- 14. K.K. Droegemeier et al., “Service- ogy and Science, 2012, pp. 765–770.
tectures and Compilation Techniques Oriented Environments for Dynami-
(PACT 08), 2008, pp. 260–269. cally Interacting with Mesoscale LIZHE WANG is a professor at the In-
6. Y. Shan et al., “FPMR: MapReduce Weather,” Computing in Science stitute for Remote Sensing and Digital
Framework on FPGA,” Proc. 18th and Engineering, vol. 7, no. 6, 2005, Earth, Chinese Academy of Sciences, and
Ann. ACM/SIGDA Int’l Symp. Field pp. 12–29. at the School of Computer at the China
Programmable Gate Arrays (FPGA 15. C. Bennett, R. Grossman, and J. Se- University of Geosciences. Wang has a
10), 2010, pp. 93–102. idman, MalStone: A Benchmark for PhD in computer science from Karlsruhe
7. S. Ibrahim et al., “Cloudlet: To- Data Intensive Computing, tech. report University. He is a senior member of IEEE.
wards MapReduce Implementation TR-09-01, Open Cloud Consortium, Contact him at lizhewang@icloud.com.
on Virtual Machines,” Proc. 18th 2009; http://malgen.googlecode.com/
ACM Int’l Symp. High Performance files/malstone-TR-09-01.pdf. RAJIV RANJAN is a senior research
Distributed Computing (HPDC 09), 16. J. Wang, D. Crawl, and I. Altintas, scientist, Julius Fellow, and project
2009, pp. 65–66. “Kepler + Hadoop: A General Archi- leader at the Commonwealth Scientific
8. S. Pallickara, J. Ekanayake, and tecture Facilitating Data-Intensive and Industrial Research Organization.
G. Fox, “Granules: A Lightweight, Applications in Scientific Work- At CSIRO, he leads research projects re-
Streaming Runtime for Cloud Com- flow Systems,” Proc. 4th Workshop lated to cloud computing, content deliv-
puting with Support for Map-Reduce,” Workflows in Support of Large- ery networks, and big data analytics for
Proc. Cluster Computing and Work- Scale Science (WORKS 09), 2009; Internet of Things (IoT) and multimedia
shops (CLUSTER 09), 2009, pp. 1–10. doi:10.1145/1645164.1645176. applications. Ranjan has a PhD in com-
9. H. Lin et al., “Moon: MapReduce 17. M. Hajjat et al., “Cloudward Bound: puter science and software engineering
on Opportunistic Environments,” Planning for Beneficial Migration from the University of Melbourne. Con-
Proc. 19th ACM Int’l Symp. High of Enterprise Applications to the tact him at rajiv.ranjan@csiro.au.
Performance Distributed Computing Cloud,” ACM SIGCOMM Computer
(HPDC 10), 2010, pp. 95–106. Comm. Rev., vol. 40, no. 4, 2010, pp.
10. S. Krishnan, M. Tatineni, and C. 243–254. Selected CS articles and columns
Baru, MyHadoop—Hadoop-on- 18. A. Li et al., “CloudCMP: Comparing are also available for free at http://
ComputingNow.computer.org.
Demand on Traditional HPC Re- Public Cloud Providers,” Proc. 10th